Skip to contents

Given a use pattern string with missing visits, make naive imputations for each missing visit

Usage

impute_missing_visits(
  use_pattern,
  method = c("locf", "locfD", "mode", "kNV"),
  missing_is = "o",
  mixed_is = "*",
  tiebreaker = "+",
  k = 1,
  knvWeights_num = c(o = NA, `+` = 1, `*` = 0.5, `-` = 0),
  quietly = FALSE
)

Arguments

use_pattern

A character string showing the daily, by visit, or weekly substance use pattern for a single subject

method

Which naive imputation method should be used? Current supported options are "locf" (last observation carried forward), "locfD" (last observation carried forward until dropout), "mode" (most common non-missing value), and "kNV" (k nearest visits).

missing_is

Which single character is used to mark missing UDS in a use pattern string? Defaults to "o".

mixed_is

Which single character is used to mark mixed UDS (both positive and negative UDS for the visit block) in a use pattern string? Defaults to "*". When imputing by the mode, all mixed result UDS will be assigned the tiebreaker value in order to calculate the mode but will remain unchanged in the returned use pattern string.

tiebreaker

In the event of ties between two modes, should positive or negative UDS be the mode? Defaults to positive ("+").

k

The number of nearest visits to use in kNV imputation. This defaults to 1; we recommend that this parameter stays at 1 unless the use patterns in your data have extraordinarily few missing values.

knvWeights_num

A named vector matching the use pattern word "letters" to their numerical use values. The names of this vector should match the "letters" of the use pattern word exactly; use backticks to escape special characters. For example, if the study protocol counts a mixed result (one positive and one negative UDS in a single observation period [week]) as worth three "use days", then mixed results should have a weight of 3/7. Additionally, a study protocol may count missing values as five "use days" out of a week. The defaults for this function are to leave "o" as missing (NA), and give weights of 1, 1/2, and 0 for visits with "+", "*", and "-" UDS, respectively.

quietly

Should warning messages be muted? Defaults to FALSE

Value

A use pattern string the same length as use_pattern with missing values imputed according to the chosen imputation method.

Details

If you would like to replace all UDS for missing visits with a single, pre-specified value (such as positive), please use recode_missing_visits instead. Furthermore, there will most likely still be missing values in the use pattern even after imputation. This would occur if all the values are missing, if the first values of the use pattern are missing (if LOCF is used), if the first and/or last values of the use pattern are missing (if LOCF-D is used), or if there are back to back missing visits (if kNV with k = 1 is used). Because of this, you may need to call recode_missing_visits in a pipeline after this function to replace or remove the remaining non-imputable missing visits.

If you are using the kNV imputation option, there are some caveats to consider. Due to rounding rules, any rounding ties are broken by order of the values to the knvWeights_num vector. For instance, consider a subject who had a negative UDS in one week, then a missing UDS for the next week, and then two UDS in the following week (of which one was positive and the other was negative). This is represented by the use pattern "-o*". The default behavior of the kNV method is to impute this to "-**" because the order of the knvWeights_num vector has "+", then "*", then "-" UDS values. In this order, a positive result trumps a mixed result, and a mixed result trumps a negative result. Similarly, the use pattern "+o*" will be imputed to "++*" by default.

At current, we allow for many symbols in the use pattern "word", such as "_" for missing by study design, "o" missing for protocol non-compliance (the most common form of missing), "+" for positive, "-" for negative, and "*" for mixed positive and negative results (this usually comes up when the visit represents multiple days and there are both positive and negative results in those days; for example, a subject is tested weekly; they provided a positive test on Tuesday but came back to provide a negative test the following day).

Examples

  pattern_char <- "__++++*o-------+--+-o-o-o+o+oooooo"
  impute_missing_visits(pattern_char)
#> [1] "__++++**-------+--+------+++++++++"
  impute_missing_visits(pattern_char, method = "locfD")
#> [1] "__++++**-------+--+------+++oooooo"
  impute_missing_visits(pattern_char, method = "mode")
#> [1] "__++++*--------+--+------+-+------"
  
  pattern2_char <- "ooooooooooo"
  impute_missing_visits(pattern2_char)
#> Warning: No non-missing visits. No imputation done.
#> [1] "ooooooooooo"