Changes in version 0.7.0 (2026-01-10) New features - New function compare_databases(), which compares the structure of several extractions of a database: added/removed columns, number of patients, etc (#26). See the examples for a demo. - Cache system is supported in all reading functions read_all_csv(), read_all_sas(), and read_all_xpt() (#75). - New features in edc_viewer(): - Support for multiple instances on different ports with custom datasets (#100, #114) For instance, you can now run edc_viewer(data=lst(iris, mtcars), port=1212) - New button to browse all the column labels (#113). Bug fixes & Improvements - Fixed modifiers edc_clean_names(), edc_unify_subjid(), and edc_split_mixed() so they don't strip database attributes (like project name) (#111). - Fixed edc_data_stop() so it works without a SUBJID and defaults to no issue number (#109). - Fixed assert_no_duplicate() so it works in a table with both columns SUBJID and subjid (#105). - Fixed bugs in edc_left_join() with case-sensitivity on SUBJID (#108, #117). - Improved save_edc_data_warnings() with options to hide the resolved issues and to not include stops, and better default path (#107, #110, #112) - Improved reading functions so that all tables are sorted by SUBJID (#115). - Improved reading functions so that each dataset has a label attribute, taken from FORMDESC or CRFNAME (#118). - Improved edc_swimmerplot() by removing origin by default (#106). - Improved edc_swimmerplot() by adding arguments origin_fun to summarise origin at patient level using, and data_list to control the datasets. - Improved edc_warn_extraction_date() with a strict unit "days". - Improved save_plotly() with a glue syntax for parameter file. Changes in version 0.6.0 (2025-06-24) Documentation - New vignettes: vignette("reading"), vignette("postprocessing"), vignette("checking"), vignette("visualizing"), and vignette("utils") New features - New function edc_patient_gridplot(), which creates a ggplot matrix giving the presence of all patients in all datasets (#77) - New functions edc_left_join(), edc_right_join(), and edc_full_join(), which perform joins with defaults to subject ID as primary key (#82) - New function edc_viewer(), which runs a shiny application for easily browsing your database (#83) - New function set_project_name(), to set the project name when reading from a directory (#96) - New function edc_find_value(), which searches the whole database for a value, as edc_find_column() searches for column names or labels. - New function save_edc_data_warnings(), to save all the warnings triggered by edc_data_warn() into a .xlsx file for sharing. Bug fixes & Improvements - New argument unify(collapse_chr=TRUE), to collapse non-unique character values (#99) - New argument lastnews_table(show_delta=TRUE), which computes the difference between the last prefer date and the actual last date (#81) - Other improvements: allow regex in except & prefer (with regex=TRUE), improved warning message, and allow saving warnings in a csv file (#78) - New argument edc_data_warn(envir), the environment to evaluate message in. - New argument edc_swimmerplot(include), to subset the swimmer plot on significant variables only. - New argument subdirectories to all reading functions (read_trialmaster(), read_all_xpt(), read_all_sas(), and read_all_csv()), to control whether to read sub-directories. Note that until now, those subdirectories were read and could overwrite root files. - Fixed labels being sometimes duplicated. Internal improvements - read_trialmaster() won't read from cache if installed EDCimport version is different from cache's Deprecations - load_list(), renamed to load_database() - find_keyword(), renamed to edc_find_column() Breaking changes I don't think enough people are using this so that it is necessary to go through the deprecation process. - split_mixed_datasets becomes edc_split_mixed() - Removed export of internal functions: build_lookup(), extend_lookup(), get_key_cols(), get_subjid_cols(), get_crfname_cols(), get_meta_cols(), load_as_list(), save_list() Changes in version 0.5.2 (2024-11-14) - Fixed a bug in lastnews_table() when SUBJID is not numeric - Fixed a bug in read_all_sas() causing metadata (e.g. date_extraction) being converted to dataframes Changes in version 0.5.1 (2024-10-31) - Internal fix for CRAN check Changes in version 0.5.0 (2024-10-24) New features Read functions - New function read_all_sas() to read a database of .sas7bdat files. - New function read_all_csv() to read a database of .csv files. Sanity checks alerts - New functions edc_data_warn() and edc_data_stop(), to alert if data has inconsistencies (#29, #39, #43). ae %>% filter(grade<1 | grade>5) %>% edc_data_stop("AE of invalid grade") ae %>% filter(is.na(grade)) %>% edc_data_warn("Grade is missing", issue_n=13) #> Warning: Issue #13: Grade is missing (8 patients: #21, #28, #39, #95, #97, ...) - New function edc_data_warnings(), to get a dataframe of all warnings thrown by edc_data_warn(). - New function edc_warn_extraction_date(), to alert if data is too old. Miscellaneous utils - New function select_distinct() to select all columns that has only one level for a given grouping scope (#57). - New function edc_population_plot() to visualize which patient is in which analysis population (#56). - New function edc_db_to_excel() to export the whole database to an Excel file, easier to browse than RStudio's table viewer (#55). Use edc_browse_excel() to browse the file without knowing its name. - New function edc_inform_code() to show how much code your project contains (#49). - New function search_for_newer_data() to search a path (e.g. Downloads) for a newer data archive (#46). - New function edc_crf_plot() to show the current database completion status (#48). - New function save_sessioninfo(), to save sessionInfo() into a text file (#42). - New function fct_yesno(), to easily format Yes/No columns (#19, #23, #40). - New function lastnews_table() to find the last date an information has been entered for each patient (#37). Useful for survival analyses. - New function edc_unify_subjid(), to have the same structure for subject IDs in all the datasets of the database (#30). - New function save_plotly(), to save a plotly to an HTML file (#15). - New experimental functions table_format(), get_common_cols() and get_meta_cols() that might become useful to find keys to pivot or summarise data. Bug fixes & Improvements - get_datasets() will now work even if a dataset is named after a base function (#67). - read_trialmaster() will output a readable error when no password is entered although one is needed. - read_trialmaster(split_mixed="TRUE") will work as intended. - assert_no_duplicate() has now a by argument to check for duplicate in groups, for example by visit (#17). - find_keyword() is more robust and inform on the proportion of missing if possible. - edc_lookup() will now retrieve the lookup table. Use build_lookup() to build one from a table list. - extend_lookup() will not fail anymore when the database has a faulty table. Deprecations - get_key_cols() is replaced by get_subjid_cols() and get_crfname_cols(). - check_subjid() is replaced by edc_warn_patient_diffs(). It can either take a vector or a dataframe as input, and the message is more informative. Changes in version 0.4.1 (2023-12-19) Bug fixes & Improvements - Changes in testing environment so that the package can be installed from CRAN despite firewall policies forbidding password-protected archive downloading. - Fixed a bug where a corrupted XPT file can prevent the whole import to fail. Changes in version 0.4.0 (2023-12-11) New features - New function check_subjid() to check if a vector is not missing some patients (#8). options(edc_subjid_ref=enrolres$subjid) check_subjid(treatment$subjid) check_subjid(ae$subjid) - New function assert_no_duplicate() to abort if a table has duplicates in a subject ID column(#9). tibble(subjid=c(1:10, 1)) %>% assert_no_duplicate() %>% nrow() #Error in `assert_no_duplicate()`: #! Duplicate on column "subjid" for value 1. - New function manual_correction() to safely hard-code a correction while waiting for the TrialMaster database to be updated. - New function edc_options() to manage EDCimport global parameterization. - New argument edc_swimmerplot(id_lim) to subset the swimmer plot to some patients only. - New option read_trialmaster(use_cache="write") to read from the zip again but still update the cache. - You can now use the syntax read_trialmaster(split_mixed=c("col1", "col2")) to split only the datasets you need to (#10). Bug fixes & Improvements - Reading with read_trialmaster() from cache will output an error if parameters (split_mixed, clean_names_fun) are different (#4). - split_mixed_datasets() is now fully case-insensitive. - Non-UTF8 characters in labels are now identified and corrected during reading (#5). Minor breaking changes - read_trialmaster(use_cache="write") is now the default. Reading from cache is not stable yet, so you should opt-in rather than opt-out. - read_trialmaster(extend_lookup=TRUE) is now the default. - Options edc_id, edc_crfname, and edc_verbose have been respectively renamed edc_cols_id, edc_cols_crfname, and edc_read_verbose for more clarity. Changes in version 0.3.0 (2023-05-19) New features - New function edc_swimmerplot() to show a swimmer plot of all dates in the database and easily find outliers. - New features in read_trialmaster(): - clean_names_fun=some_fun will clean all names of all tables. For instance, clean_names_fun=janitor::clean_names() will turn default SAS uppercase column names into valid R snake-case column names. - split_mixed=TRUE will split tables that contain both long and short data regarding patient ID into one long table and one short table. See ?split_mixed_datasets() for details. - extend_lookup=TRUE will improve the lookup table with additional information. See ?extend_lookup() for details. - key_columns=get_key_cols() is where you can change the default column names for patient ID and CRF name (used in other new features). - Standalone functions extend_lookup() and split_mixed_datasets(). - New helper unify(), which turns a vector of duplicate values into a vector of length 1. Bug fixes - Reading errors are now handled by read_trialmaster() instead of failing. If one XPT file is corrupted, the resulting object will contain the error message instead of the dataset. - find_keyword() is now robust to non-UTF8 characters in labels. - Option edc_lookup is now set even when reading from cache. - SAS formats containing a = now work as intended. Changes in version 0.2.1 (2022-12-02) - Import your data from TrialMaster using tm = read_trialmaster("path/to/archive.zip"). - Search for a keyword in any column name or label using find_keyword("date", data=tm$.lookup). You can also generate a lookup table for an arbitrary list of dataframe using build_lookup(my_data). - Load the datasets to the global environment using load_list(tm) to avoid typing tm$ everywhere. - Browse available global options using ?EDCimport_options. Changes in version 0.1.0 - Draft version