percent_pattern
When working with categorical
variables, crosstable()
allows a very flexible output
thanks to the percent_pattern
argument.
This vignette will review the many things you can do using
percent_pattern
.
First, let’s add some missing values to the mtcars2
dataset and tweak some options:
By default, crosstable()
will use
percent_pattern="{n} ({p_row})"
, so it outputs the size
n
along with the row’s percentage p_row
:
label | variable | Engine | ||
---|---|---|---|---|
straight | vshaped | NA | ||
Number of cylinders | 4 | 7 (88%) | 1 (12%) | 2 |
6 | 0 (0%) | 1 (100%) | 3 | |
8 | 0 (0%) | 11 (100%) | 2 | |
NA | 2 | 2 | 1 |
Here, we will see how we can tweak percent_pattern
in
order to display other figures.
NOTE: Missing values will always be described with
n
alone. If you want to describe them as non-missing
values, you will have to mutate them as one, most likely using
forcats::fct_explicit_na()
.
First, here is the list of all the internal variables you can use:
n
, n_row
, n_col
, and
n_tot
: respectively the size of the cell, the row, the
column, and the whole table.p_row
, p_col
, and p_tot
:
respectively the proportion relative to the row, the column, and the
whole table.p_tot_inf
, p_tot_sup
,
p_row_inf
, p_row_sup
, p_col_inf
,
p_col_sup
: the confidence interval (calculated using Wilson
score) for each of the proportions above.Should you ever need it, note that it is also possible to use any
external variable defined outside of crosstable()
.
Here is a simple example:
label | variable | Engine | ||
---|---|---|---|---|
straight | vshaped | NA | ||
Number of cylinders | 4 | N=7/8 -> p=88% | N=1/8 -> p=12% | 2 |
6 | N=0/1 -> p=0% | N=1/1 -> p=100% | 3 | |
8 | N=0/11 -> p=0% | N=11/11 -> p=100% | 2 | |
NA | 2 | 2 | 1 |
As you can see, these internal variables do not account for missing
values (except for n
, obviously).
This should make sense in most cases, but if it doesn’t, you can use the following variables to account for NA explicitly:
n_row_na
, n_col_na
,
n_tot_na
p_tot_na
, p_row_na
,
p_col_na
p_tot_na_inf
, p_tot_na_sup
,
p_row_na_inf
, p_row_na_sup
,
p_col_na_inf
, p_col_na_sup
(See the last section for an example)
Note that if you use showNA="no"
, there will be no
difference between the standard variables and the _na
variables.
As you may have noticed, totals are considered separately:
crosstable(mtcars3, cyl, by=vs, total=TRUE,
percent_pattern="N={n}, p={p_row} ({n}/{n_row})") %>%
as_flextable()
label | variable | Engine | Total | ||
---|---|---|---|---|---|
straight | vshaped | NA | |||
Number of cylinders | 4 | N=7, p=88% (7/8) | N=1, p=12% (1/8) | 2 | 10 (37%) |
6 | N=0, p=0% (0/1) | N=1, p=100% (1/1) | 3 | 4 (15%) | |
8 | N=0, p=0% (0/11) | N=11, p=100% (11/11) | 2 | 13 (48%) | |
NA | 2 | 2 | 1 | 5 | |
Total | 9 (38%) | 15 (62%) | 8 | 32 (100%) |
Indeed, you cannot have the same pattern for totals. For instance, the proportion relative to the row would not make sense in the context of the entire row itself.
To get control over the percent_pattern
in totals, you
have to pass a list with names body
,
total_row
, total_col
, and
total_all
:
pp = list(body="N={n}, p={p_tot} ({n}/{n_tot})",
total_row="N={n} p=({p_col})",
total_col="{n}", total_all="Total={n}")
crosstable(mtcars3, cyl, by=vs, total=TRUE,
percent_pattern=pp) %>%
as_flextable()
label | variable | Engine | Total | ||
---|---|---|---|---|---|
straight | vshaped | NA | |||
Number of cylinders | 4 | N=7, p=35% (7/20) | N=1, p=5% (1/20) | 2 | 10 |
6 | N=0, p=0% (0/20) | N=1, p=5% (1/20) | 3 | 4 | |
8 | N=0, p=0% (0/20) | N=11, p=55% (11/20) | 2 | 13 | |
NA | 2 | 2 | 1 | 5 | |
Total | N=9 p=(38%) | N=15 p=(62%) | 8 | Total=32 |
get_percent_pattern()
To easily get a percent_pattern
list, you can use the
get_percent_pattern()
helper:
get_percent_pattern("all")
#> $body
#> [1] "{n} ({p_tot} / {p_row} / {p_col})"
#>
#> $total_row
#> [1] "{n} ({p_col})"
#>
#> $total_col
#> [1] "{n} ({p_row})"
#>
#> $total_all
#> [1] "{n} ({p_tot})"
get_percent_pattern("col", na=TRUE)
#> $body
#> [1] "{n} ({p_col_na})"
#>
#> $total_row
#> [1] "{n} ({p_col_na})"
#>
#> $total_col
#> [1] "{n} ({p_row_na})"
#>
#> $total_all
#> [1] "{n} ({p_tot_na})"
You can also set the result to a variable and modify its members at
will. See ?get_percent_pattern
for more information.
Here is the ultimate example for percent_pattern
. Give a
close look to all possible values and you will surely find the one that
you need.
ULTIMATE_PATTERN=list(
body="N={n}
Cell: p = {p_tot} ({n}/{n_tot}) [{p_tot_inf}; {p_tot_sup}]
Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
Cell (NA): p = {p_tot_na} ({n}/{n_tot_na}) [{p_tot_na_inf}; {p_tot_na_sup}]
Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]
Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
total_row="N={n}
Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
total_col="N={n}
Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]",
total_all="N={n}
P: {p_col} [{p_col_inf}; {p_col_sup}]
P (NA): {p_col} [{p_col_na_inf}; {p_col_na_sup}]"
)
crosstable(mtcars3, cyl, by=vs,
percent_digits=0, total=TRUE, showNA="always",
percent_pattern=ULTIMATE_PATTERN) %>%
as_flextable() %>%
flextable::theme_box()
label | variable | Engine | Total | ||
---|---|---|---|---|---|
straight | vshaped | NA | |||
Number of cylinders | 4 | N=7 | N=1 | 2 | N=10 |
6 | N=0 | N=1 | 3 | N=4 | |
8 | N=0 | N=11 | 2 | N=13 | |
NA | 2 | 2 | 1 | 5 | |
Total | N=9 | N=15 | 8 | N=32 |