After cleaning the data, there are three sections of the template that produce tabular, graphic, and cartographic (map) analysis outputs.
This first section produces descriptive analyses about the patients demographics and attack rates.
The person analysis section begins with a couple sentences that contain in-line code - code embedded in normal RMarkdown text (not within an R code chunk). The second sentence in-line code inserts the number of males and females by counting the observations with “Male” and “Female” in the variable
sex. Because our dataset’s variable
F instead, we must modify this in-line code (or modify our variable) so the
fmt_count() function is searching for the correct terms.
The first demographic table presents patients by their age group (the table’s rows) and their relationship with the case definition (the table’s columns). This chunk uses piping to link six different functions and produce a table (
%>% is the pipe operator - see the R Basics Advanced and Miscellaneous page):
tab_linelist()function creates a frequency and percent table of
age_group, with columns differentiating by
select()function removes an unnecessary column “variable” generated by
rename()function renames the default column “value” as “Age Group”
rename_redundant()function replaces any column name that has proportion with “%”
augment_redundant()command replaces any column name with n with " cases (n)"
kable()command completes the table, in this case with one digit after each decimal
# Describe observations by age_group and case_def tab_linelist(linelist_cleaned, age_group, strata = case_def, col_total = TRUE, row_total = TRUE) %>% select(-variable) %>% rename("Age group" = value) %>% rename_redundant("%" = proportion) %>% augment_redundant(" cases (n)" = " n$") %>% kable(digits = 1)
|Age group||Confirmed cases (n)||%||Probable cases (n)||%||Suspected cases (n)||%||Missing cases (n)||%||Total|
You can change the argument
strata = to refer to any number of variables (in this next example,
sex). To show proportions of the total population of observations, add the argument
prop_total = specified as
TRUE, as below.
tab_linelist(linelist_cleaned, age_group, strata = sex, col_total = TRUE, row_total = TRUE, prop_total = TRUE) %>% select(-variable) %>% rename("Age group" = value) %>% # rename_redundant("%" = proportion) %>% augment_redundant(" cases (n)" = " n$") %>% kable(digits = 1)
|Age group||F cases (n)||F proportion||M cases (n)||M proportion||Total|
You can also choose to exclude or edit the in-line code that gives a count of the missing cases by sex and age_group.
To print an age pyramid in your report, use the code below. A few things to note:
split_by =should have two non-missing value options (e.g. Male or Female, Oui or Non, etc. Three will create a messy output.)
legend.position = "bottom"with “top”, “left”, or “right”
You can make this a pyramid of months by supplying
age_group_mon to the
age_group = argument.
# plot age pyramid by sex plot_age_pyramid(linelist_cleaned, age_group = "age_group", split_by = "sex") + labs(y = "Cases (n)", x = "Age group") + # change axis labels (nb. x/y flip) theme(legend.position = "bottom", # move legend to bottom legend.title = element_blank(), # remove title text = element_text(size = 18) # change text size )
To have an age pyramid of patients under 2 by month age groups, it is best to add a
filter() step to the beginning of the code chunk, as below. This selects for
linelist_cleaned observations that meet the specified critera and passes that reduced dataset through the “pipes” to the next function: (
plot_age_pyramid()). If this
filter() step is not added, you will see that the largest pyramid bars are of “missing”. These are the patients old enough to not have a months age group.
If you add this filtering step, you must also modify
plot_age_pyramid() by removing its first argument
linelist_cleaned,. The dataset is already given to the command in the
filter() and is passed to
plot_age_pyramid() via piping.
Note that the filter step does not drop any observations from the
linelist_cleaned object itself. Because the filter is not being assigned (
<-) to over-write
linelist_cleaned, this filter is only temporarily applied for the purpose of producing the age pyramid.
# plot age pyramid by month groups, for observations under 2 years filter(linelist_cleaned, age_years <= 2) %>% plot_age_pyramid(age_group = "age_group_mon", split_by = "sex") + # stack_by = "case_def") + labs(y = "Cases (n)", x = "Age group (months)") + # change axis labels (nb. x/y flip) theme(legend.position = "bottom", # move legend to bottom legend.title = element_blank(), # remove title text = element_text(size = 18) # change text size )
The text following the age pyramids uses in-line code to describe the distribution of outpatient and inpatient observations, and descriptive statistics of the length of stay for inpatients. The Am Timan dataset does not have length of stay, so it is best to delete those related sentences related to
obs_days for the final report.
This next code also uses the
tab_linelist() function to create descriptive tables of all the variables that were included in the
SYMPTOMS variable list.
SYMPTOMS(the value supplied to the second argument) is an object we defined in data cleaning that is a list of variables to tabulate. If this code produces an error about an “Unknown column”, ensure that the variables in the object
SYMPTOMSare all present in your dataset (and spelled correctly).
tab_linelist(), the argument
keep =must represent the character value to be counted for the the table. As these Am Timan variables are still in French, we change
keep = "Yes"to
keep = "Oui".
mutate()function is aesthetically changing variable names with underscores to spaces.
# get counts and proportions for all variables named in SYMPTOMS tab_linelist(linelist_cleaned, SYMPTOMS, keep = "Oui") %>% select(-value) %>% # fix the way symptom names are displayed mutate(variable = str_to_sentence(str_replace_all(variable, "_", " "))) %>% # rename accordingly rename_redundant("%" = proportion) %>% augment_redundant(" (n)" = " n$") %>% kable(digits = 1)
|Epigastric pain heartburn||469||59.2|
The code for the lab table is very similar, but has this difference:
transpose = "value"is set because the values in the lab data are important in-and-of themselves (e.g. IgM+/IgG-, etc.) - i.e. the values should become column headers.
This table may be large and unwieldy at first, until you clean your data. The step in data cleaning where we converted 0, 1, “yes”, “pos”, “neg”, etc. to standardized “Positive” and “Negative” was crucial towards making this table readable. Once knitted into a word document, you will likely need to adjust the font size, column widths, etc.
# get counts and proportions for all variables named in LABS tab_linelist(linelist_cleaned, LABS, transpose = "value") %>% # fix the way lab test names are displayed mutate(variable = str_to_sentence(str_replace_all(variable, "_", " "))) %>% # rename accordingly rename("Lab test" = variable) %>% rename_redundant("%" = proportion) %>% augment_redundant(" (n)" = " n$") %>% kable(digits = 1)
|Lab test||Negative (n)||%||Positive (n)||%||IgG-/IgM- (n)||%||IgG-/IgM+ (n)||%||IgG+/IgM- (n)||%||IgG+/IgM+ (n)||%||IgG±/IgM- (n)||%|
|Hep b rdt||222||91.7||20||8.3||-||-||-||-||-||-||-||-||-||-|
|Hep c rdt||239||99.2||2||0.8||-||-||-||-||-||-||-||-||-||-|
|Hep e rdt||149||59.8||100||40.2||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis a||23||100.0||-||-||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis b||22||95.7||1||4.3||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis c||23||100.0||-||-||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis e igm||21||33.9||41||66.1||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis e genotype||-||-||2||100.0||-||-||-||-||-||-||-||-||-||-|
|Test hepatitis e virus||7||11.3||1||1.6||9||14.5||14||22.6||5||8.1||26||41.9||-||-|
|Malaria rdt at admission||160||63.5||92||36.5||-||-||-||-||-||-||-||-||-||-|
|Other arthropod transmitted virus||23||100.0||-||-||-||-||-||-||-||-||-||-||-||-|
The opening text of this chunk with in-line code must be edited to match our Am Timan data. The second in-line code references the variable
exit_status - this must now reference the variable
exit_status2. Also, we do not have Dead on Arrival recorded in our dataset, so that part of the sentence should be deleted.
Likewise, the code section on time-to-death does not apply to our dataset and should be deleted.
Overall CFR is produced with the code below. Note the following:
patient_facility_type, that it has a value “Inpatient”, and the
rename()is used to change the column labels in the table
# use arguments from above to produce overal CFR overall_cfr <- linelist_cleaned %>% filter(patient_facility_type == "Inpatient") %>% case_fatality_rate_df(deaths = DIED, mergeCI = TRUE) %>% rename("Deaths" = deaths, "Cases" = population, "CFR (%)" = cfr, "95%CI" = ci) knitr::kable(overall_cfr, digits = 1) # print nicely with 1 decimal digit
The next code adds arguments to
case_fatality_rate_df() such as
group = sex (which stratified the CFR by sex), and
add_total = TRUE (which makes a total row across the sex groups).
linelist_cleaned %>% filter(patient_facility_type == "Inpatient") %>% mutate(sex = forcats::fct_explicit_na(sex, "-")) %>% case_fatality_rate_df(deaths = DIED, group = sex, mergeCI = TRUE, add_total = TRUE) %>% rename("Sex" = sex, "Deaths" = deaths, "Cases" = population, "CFR (%)" = cfr, "95%CI" = ci) %>% knitr::kable(digits = 1)
When creating a table of CFR by age groups, one additional step, using the function
complete() is required to ensure that all
age_group levels are shown even if they have no observations.
linelist_cleaned %>% filter(patient_facility_type == "Inpatient") %>% case_fatality_rate_df(deaths = DIED, group = age_group, mergeCI = TRUE, add_total = TRUE) %>% tidyr::complete(age_group, fill = list(deaths = 0, population = 0, cfr = 0, ci = 0)) %>% # Ensure all levels are represented rename("Age group" = age_group, "Deaths" = deaths, "Cases" = population, "CFR (%)" = cfr, "95%CI" = ci) %>% knitr::kable(digits = 1)
|Age group||Deaths||Cases||CFR (%)||95%CI|
The commented code below examines CFR by case definition. Note that this is dependant upon our working
# Use if you have enough confirmed cases for comparative analysis linelist_cleaned %>% filter(patient_facility_type == "Inpatient") %>% case_fatality_rate_df(deaths = DIED, group = case_def, mergeCI = TRUE, add_total = TRUE) %>% rename("Case definition" = case_def, "Deaths" = deaths, "Cases" = population, "CFR (%)" = cfr, "95%CI" = ci) %>% knitr::kable(digits = 1)
|Case definition||Deaths||Cases||CFR (%)||95%CI|
To use the attack rate section, we need to modify the first code slightly. An object
population is created from the sum of population counts in the population figures. Because we only imported region-based population counts, we must change this command to reflect that we do not have
population_data_age, but rather
# OLD command from template # population <- sum(population_data_age$population)
Running the correct command and printing the value of population, we see that the sum population across regions is estimated to be 62336.
# CORRECTED command for Am Timan exercise population <- sum(population_data_region$population) population
##  62336
The first line of code below creates a multi-part object
ar with the number of cases, population, attack rate per 10,000, and lower and upper confidence intervals (you can run just this line to verify yourself). The subsequent commands alter the aesthetics of to produce a neat table with appropriate column names.
# calculate the attack rate information and store them in object "ar"" ar <- attack_rate(nrow(linelist_cleaned), population, multiplier = 10000) # Create table from the information in the object "ar"" ar %>% merge_ci_df(e = 3) %>% # merge the lower and upper CI into one column rename("Cases (n)" = cases, "Population" = population, "AR (per 10,000)" = ar, "95%CI" = ci) %>% select(-Population) %>% # drop the population column as it is not changing knitr::kable(digits = 1, align = "r")
|Cases (n)||AR (per 10,000)||95%CI|
We are unable to calculate the attack rate by age group, because we do not have population counts for each age group. Comment out (#) this code.
Mortality attributable to AJS is also not appropriate for this example. Comment out (#) the 4 related code chunks.