Changes in version 0.7.0 (2026-01-10)                  

New features

  - New function compare_databases(), which compares the structure of
    several extractions of a database: added/removed columns, number of
    patients, etc (#26). See the examples for a demo.
  - Cache system is supported in all reading functions read_all_csv(),
    read_all_sas(), and read_all_xpt() (#75).
  - New features in edc_viewer():
      - Support for multiple instances on different ports with custom
        datasets (#100, #114)
        For instance, you can now run edc_viewer(data=lst(iris, mtcars),
        port=1212)
      - New button to browse all the column labels (#113).

Bug fixes & Improvements

  - Fixed modifiers edc_clean_names(), edc_unify_subjid(), and
    edc_split_mixed() so they don't strip database attributes (like
    project name) (#111).
  - Fixed edc_data_stop() so it works without a SUBJID and defaults to
    no issue number (#109).
  - Fixed assert_no_duplicate() so it works in a table with both columns
    SUBJID and subjid (#105).
  - Fixed bugs in edc_left_join() with case-sensitivity on SUBJID (#108,
    #117).
  - Improved save_edc_data_warnings() with options to hide the resolved
    issues and to not include stops, and better default path (#107,
    #110, #112)
  - Improved reading functions so that all tables are sorted by SUBJID
    (#115).
  - Improved reading functions so that each dataset has a label
    attribute, taken from FORMDESC or CRFNAME (#118).
  - Improved edc_swimmerplot() by removing origin by default (#106).
  - Improved edc_swimmerplot() by adding arguments origin_fun to
    summarise origin at patient level using, and data_list to control
    the datasets.
  - Improved edc_warn_extraction_date() with a strict unit "days".
  - Improved save_plotly() with a glue syntax for parameter file.

                 Changes in version 0.6.0 (2025-06-24)                  

Documentation

  - New vignettes: vignette("reading"), vignette("postprocessing"),
    vignette("checking"), vignette("visualizing"), and vignette("utils")

New features

  - New function edc_patient_gridplot(), which creates a ggplot matrix
    giving the presence of all patients in all datasets (#77)
  - New functions edc_left_join(), edc_right_join(), and
    edc_full_join(), which perform joins with defaults to subject ID as
    primary key (#82)
  - New function edc_viewer(), which runs a shiny application for easily
    browsing your database (#83)
  - New function set_project_name(), to set the project name when
    reading from a directory (#96)
  - New function edc_find_value(), which searches the whole database for
    a value, as edc_find_column() searches for column names or labels.
  - New function save_edc_data_warnings(), to save all the warnings
    triggered by edc_data_warn() into a .xlsx file for sharing.

Bug fixes & Improvements

  - New argument unify(collapse_chr=TRUE), to collapse non-unique
    character values (#99)
  - New argument lastnews_table(show_delta=TRUE), which computes the
    difference between the last prefer date and the actual last date
    (#81)
      - Other improvements: allow regex in except & prefer (with
        regex=TRUE), improved warning message, and allow saving warnings
        in a csv file (#78)
  - New argument edc_data_warn(envir), the environment to evaluate
    message in.
  - New argument edc_swimmerplot(include), to subset the swimmer plot on
    significant variables only.
  - New argument subdirectories to all reading functions
    (read_trialmaster(), read_all_xpt(), read_all_sas(), and
    read_all_csv()), to control whether to read sub-directories. Note
    that until now, those subdirectories were read and could overwrite
    root files.
  - Fixed labels being sometimes duplicated.

Internal improvements

  - read_trialmaster() won't read from cache if installed EDCimport
    version is different from cache's

Deprecations

  - load_list(), renamed to load_database()
  - find_keyword(), renamed to edc_find_column()

Breaking changes

I don't think enough people are using this so that it is necessary to go
through the deprecation process.

  - split_mixed_datasets becomes edc_split_mixed()
  - Removed export of internal functions: build_lookup(),
    extend_lookup(), get_key_cols(), get_subjid_cols(),
    get_crfname_cols(), get_meta_cols(), load_as_list(), save_list()

                 Changes in version 0.5.2 (2024-11-14)                  

  - Fixed a bug in lastnews_table() when SUBJID is not numeric
  - Fixed a bug in read_all_sas() causing metadata (e.g.
    date_extraction) being converted to dataframes

                 Changes in version 0.5.1 (2024-10-31)                  

  - Internal fix for CRAN check

                 Changes in version 0.5.0 (2024-10-24)                  

New features

Read functions

  - New function read_all_sas() to read a database of .sas7bdat files.
  - New function read_all_csv() to read a database of .csv files.

Sanity checks alerts

  - New functions edc_data_warn() and edc_data_stop(), to alert if data
    has inconsistencies (#29, #39, #43).
    
    ae %>% filter(grade<1 | grade>5) %>% edc_data_stop("AE of invalid grade")
    ae %>% filter(is.na(grade)) %>% edc_data_warn("Grade is missing", issue_n=13)
    #> Warning: Issue #13: Grade is missing (8 patients: #21, #28, #39, #95, #97, ...)

  - New function edc_data_warnings(), to get a dataframe of all warnings
    thrown by edc_data_warn().

  - New function edc_warn_extraction_date(), to alert if data is too
    old.

Miscellaneous utils

  - New function select_distinct() to select all columns that has only
    one level for a given grouping scope (#57).
  - New function edc_population_plot() to visualize which patient is in
    which analysis population (#56).
  - New function edc_db_to_excel() to export the whole database to an
    Excel file, easier to browse than RStudio's table viewer (#55). Use
    edc_browse_excel() to browse the file without knowing its name.
  - New function edc_inform_code() to show how much code your project
    contains (#49).
  - New function search_for_newer_data() to search a path (e.g.
    Downloads) for a newer data archive (#46).
  - New function edc_crf_plot() to show the current database completion
    status (#48).
  - New function save_sessioninfo(), to save sessionInfo() into a text
    file (#42).
  - New function fct_yesno(), to easily format Yes/No columns (#19, #23,
    #40).
  - New function lastnews_table() to find the last date an information
    has been entered for each patient (#37). Useful for survival
    analyses.
  - New function edc_unify_subjid(), to have the same structure for
    subject IDs in all the datasets of the database (#30).
  - New function save_plotly(), to save a plotly to an HTML file (#15).
  - New experimental functions table_format(), get_common_cols() and
    get_meta_cols() that might become useful to find keys to pivot or
    summarise data.

Bug fixes & Improvements

  - get_datasets() will now work even if a dataset is named after a base
    function (#67).
  - read_trialmaster() will output a readable error when no password is
    entered although one is needed.
  - read_trialmaster(split_mixed="TRUE") will work as intended.
  - assert_no_duplicate() has now a by argument to check for duplicate
    in groups, for example by visit (#17).
  - find_keyword() is more robust and inform on the proportion of
    missing if possible.
  - edc_lookup() will now retrieve the lookup table. Use build_lookup()
    to build one from a table list.
  - extend_lookup() will not fail anymore when the database has a faulty
    table.

Deprecations

  - get_key_cols() is replaced by get_subjid_cols() and
    get_crfname_cols().
  - check_subjid() is replaced by edc_warn_patient_diffs(). It can
    either take a vector or a dataframe as input, and the message is
    more informative.

                 Changes in version 0.4.1 (2023-12-19)                  

Bug fixes & Improvements

  - Changes in testing environment so that the package can be installed
    from CRAN despite firewall policies forbidding password-protected
    archive downloading.

  - Fixed a bug where a corrupted XPT file can prevent the whole import
    to fail.

                 Changes in version 0.4.0 (2023-12-11)                  

New features

  - New function check_subjid() to check if a vector is not missing some
    patients (#8).

options(edc_subjid_ref=enrolres$subjid)
check_subjid(treatment$subjid)
check_subjid(ae$subjid)

  - New function assert_no_duplicate() to abort if a table has
    duplicates in a subject ID column(#9).

tibble(subjid=c(1:10, 1)) %>% assert_no_duplicate() %>% nrow()
#Error in `assert_no_duplicate()`:
#! Duplicate on column "subjid" for value 1.

  - New function manual_correction() to safely hard-code a correction
    while waiting for the TrialMaster database to be updated.
  - New function edc_options() to manage EDCimport global
    parameterization.
  - New argument edc_swimmerplot(id_lim) to subset the swimmer plot to
    some patients only.
  - New option read_trialmaster(use_cache="write") to read from the zip
    again but still update the cache.
  - You can now use the syntax read_trialmaster(split_mixed=c("col1",
    "col2")) to split only the datasets you need to (#10).

Bug fixes & Improvements

  - Reading with read_trialmaster() from cache will output an error if
    parameters (split_mixed, clean_names_fun) are different (#4).
  - split_mixed_datasets() is now fully case-insensitive.
  - Non-UTF8 characters in labels are now identified and corrected
    during reading (#5).

Minor breaking changes

  - read_trialmaster(use_cache="write") is now the default. Reading from
    cache is not stable yet, so you should opt-in rather than opt-out.
  - read_trialmaster(extend_lookup=TRUE) is now the default.
  - Options edc_id, edc_crfname, and edc_verbose have been respectively
    renamed edc_cols_id, edc_cols_crfname, and edc_read_verbose for more
    clarity.

                 Changes in version 0.3.0 (2023-05-19)                  

New features

  - New function edc_swimmerplot() to show a swimmer plot of all dates
    in the database and easily find outliers.
  - New features in read_trialmaster():
      - clean_names_fun=some_fun will clean all names of all tables. For
        instance, clean_names_fun=janitor::clean_names() will turn
        default SAS uppercase column names into valid R snake-case
        column names.
      - split_mixed=TRUE will split tables that contain both long and
        short data regarding patient ID into one long table and one
        short table. See ?split_mixed_datasets() for details.
      - extend_lookup=TRUE will improve the lookup table with additional
        information. See ?extend_lookup() for details.
      - key_columns=get_key_cols() is where you can change the default
        column names for patient ID and CRF name (used in other new
        features).
  - Standalone functions extend_lookup() and split_mixed_datasets().
  - New helper unify(), which turns a vector of duplicate values into a
    vector of length 1.

Bug fixes

  - Reading errors are now handled by read_trialmaster() instead of
    failing. If one XPT file is corrupted, the resulting object will
    contain the error message instead of the dataset.
  - find_keyword() is now robust to non-UTF8 characters in labels.
  - Option edc_lookup is now set even when reading from cache.
  - SAS formats containing a = now work as intended.

                 Changes in version 0.2.1 (2022-12-02)                  

  - Import your data from TrialMaster using tm =
    read_trialmaster("path/to/archive.zip").
  - Search for a keyword in any column name or label using
    find_keyword("date", data=tm$.lookup). You can also generate a
    lookup table for an arbitrary list of dataframe using
    build_lookup(my_data).
  - Load the datasets to the global environment using load_list(tm) to
    avoid typing tm$ everywhere.
  - Browse available global options using ?EDCimport_options.

                        Changes in version 0.1.0                        

  - Draft version