Getting Started
Prerequisites
- Python 3.12+
- R (required for Propensity Score Matching via
MatchIt) - Git
Installation
Running the Pipeline
Basic Usage
Examples
CLI Reference
| Argument | Default | Description |
|---|---|---|
--data |
(required) | Path to cleaned input dataset (CSV) |
-o / --outcome |
all | Outcome(s) to run |
-l / --loop |
— | Min and max seed for multi-seed runs |
-d / --dir |
"" |
Output directory for results |
--type |
— | Endpoint type for custom outcomes: logical, integer, survival |
--majority |
1 |
Value representing the majority group in the PSM column |
--split |
RaceEth |
Column used to define majority/minority groups |
--match |
age is_female |
Columns to match on during PSM |
--group_size |
500 |
Size of each PSM group |
--heldout_size |
100 |
Size of the held-out evaluation set |
--heldout_set_percent_majority |
58 |
Percent majority in the held-out set |
--data_only |
False |
Skip model training, save preprocessed data only |
-p / --prof |
— | Profiling mode: simple or complex |
Viewing Results
After a run, your output directory contains:
results/
├── logs/ # Execution logs per run
├── heldout_predictions/ # Held-out set predictions (CSV)
├── subset_predictions/ # Subset predictions by ratio (CSV)
├── heldout_evaluations/ # Metrics on the held-out set (CSV)
├── subset_evaluations/ # Metrics per demographic ratio (CSV)
└── experiments/ # Experiment metadata (JSON)
Reading the evaluation CSVs
Each row in an evaluation CSV corresponds to a different majority/minority demographic ratio. The rightmost columns are the evaluation metrics. Compare rows to see how model performance changes as the training cohort composition shifts.
R Package Installation
When you run PSM for the first time, you will be prompted to install R packages:
Select 1 for both MatchIt and optmatch. This only happens once.