Given a use pattern string with missing visits, make naive imputations for each missing visit
Arguments
- use_pattern
A character string showing the daily, by visit, or weekly substance use pattern for a single subject
- method
Which naive imputation method should be used? Current supported options are
"locf"
(last observation carried forward),"locfD"
(last observation carried forward until dropout),"mode"
(most common non-missing value), and"kNV"
(k nearest visits).- missing_is
Which single character is used to mark missing UDS in a use pattern string? Defaults to
"o"
.- mixed_is
Which single character is used to mark mixed UDS (both positive and negative UDS for the visit block) in a use pattern string? Defaults to
"*"
. When imputing by the mode, all mixed result UDS will be assigned thetiebreaker
value in order to calculate the mode but will remain unchanged in the returned use pattern string.- tiebreaker
In the event of ties between two modes, should positive or negative UDS be the mode? Defaults to positive (
"+"
).- k
The number of nearest visits to use in kNV imputation. This defaults to 1; we recommend that this parameter stays at 1 unless the use patterns in your data have extraordinarily few missing values.
- knvWeights_num
A named vector matching the use pattern word "letters" to their numerical use values. The names of this vector should match the "letters" of the use pattern word exactly; use backticks to escape special characters. For example, if the study protocol counts a mixed result (one positive and one negative UDS in a single observation period [week]) as worth three "use days", then mixed results should have a weight of 3/7. Additionally, a study protocol may count missing values as five "use days" out of a week. The defaults for this function are to leave
"o"
as missing (NA
), and give weights of 1, 1/2, and 0 for visits with"+"
,"*"
, and"-"
UDS, respectively.- quietly
Should warning messages be muted? Defaults to
FALSE
Value
A use pattern string the same length as use_pattern
with
missing values imputed according to the chosen imputation method.
Details
If you would like to replace all UDS for missing visits with a
single, pre-specified value (such as positive), please use
recode_missing_visits
instead. Furthermore, there will most
likely still be missing values in the use pattern even after imputation.
This would occur if all the values are missing, if the first values of the
use pattern are missing (if LOCF is used), if the first and/or last values
of the use pattern are missing (if LOCF-D is used), or if there are back to
back missing visits (if kNV with k = 1
is used). Because of this,
you may need to call recode_missing_visits
in a pipeline
after this function to replace or remove the remaining non-imputable
missing visits.
If you are using the kNV imputation option, there are some caveats to
consider. Due to rounding rules, any rounding ties are broken by order of
the values to the knvWeights_num
vector. For instance, consider a
subject who had a negative UDS in one week, then a missing UDS for the next
week, and then two UDS in the following week (of which one was positive and
the other was negative). This is represented by the use pattern
"-o*"
. The default behavior of the kNV method is to impute this to
"-**"
because the order of the knvWeights_num
vector has
"+"
, then "*"
, then "-"
UDS values. In this order, a
positive result trumps a mixed result, and a mixed result trumps a negative
result. Similarly, the use pattern "+o*"
will be imputed to
"++*"
by default.
At current, we allow for many symbols in the use pattern "word", such as "_" for missing by study design, "o" missing for protocol non-compliance (the most common form of missing), "+" for positive, "-" for negative, and "*" for mixed positive and negative results (this usually comes up when the visit represents multiple days and there are both positive and negative results in those days; for example, a subject is tested weekly; they provided a positive test on Tuesday but came back to provide a negative test the following day).
Examples
pattern_char <- "__++++*o-------+--+-o-o-o+o+oooooo"
impute_missing_visits(pattern_char)
#> [1] "__++++**-------+--+------+++++++++"
impute_missing_visits(pattern_char, method = "locfD")
#> [1] "__++++**-------+--+------+++oooooo"
impute_missing_visits(pattern_char, method = "mode")
#> [1] "__++++*--------+--+------+-+------"
pattern2_char <- "ooooooooooo"
impute_missing_visits(pattern2_char)
#> Warning: No non-missing visits. No imputation done.
#> [1] "ooooooooooo"