Multivariate and Propensity Score Matching Software for Causal Inference

Jasjeet S. Sekhon

This website is for the distribution of "Matching" which is a R package for estimating causal effects by multivariate and propensity score matching. The package provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate tests to determine if balance has been obtained are also provided. These tests can also be used to determine if an experiment or quasi-experiment is balanced on baseline covariates.

For an introduction to the package with documentation and examples, please see "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software, 42(7): 1-52. 2011. And the following two papers provides examples where GenMatch() is able to recover experimental benchmarks using observational data: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies" and "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations".

Match() is the fastest multivariate and propensity score matching function I know of. Maximum speed is achieved when one uses the replace=FALSE and/or ties=FALSE options---see the Match() help for details. But the most reliable estimates are obtained with the defaults settings: replace=TRUE and ties=TRUE. GenMatch() supports the use of multiple computers, CPUs or cores to perform parallel computations. Examples are provided for how to use multiple chips on the same computer to perform parallel computations. Examples are also provided for how to use multiple computers to perform parallel calculations. A Change Log is available which tracks changes across versions.

The easiest way to install the latest version (4.8-2), if you have an active network connection, is to type in a R session:
> install.packages("Matching", dependencies=TRUE)

Also, make sure that the latest version of rgenoud is also installed:
> install.packages("rgenoud")

Alternatively, the package may be directly downloaded:
Source package: Matching_4.8-2.tar.gz
Windows binary package: Matching_4.8-2.zip
Mac OS X universal binary package: Matching_4.8-2.tgz
Also make sure to download and install the rgenoud package.
Other binary Packages: http://www.cran.r-project.org/bin

The package includes the following main user exposed functions, two replication datasets and three demos:
GenMatch(): finds optimal balance using multivariate matching where a genetic search algorithm determines the weight each covariate is given. The user can choose which function of covariate balance to optimize from a list or provide one of her own.

Match(): performs multivariate and propensity score matching.

MatchBalance(): provides a variety of univariate and multivariate tests to determine if balance exists.

Matchby(): This function is a wrapper for the Match() function which separates the matching problem into subgroups defined by a factor. This function is much faster for large datasets than the Match() function itself.

qqstats()
ks.boot()
balanceUV()
Gerber, Green and Imai data
LaLonde data
AbadieImbens demo
DehejiaWahba demo
GerberGreenImai demo
Examples of how to use multiple chips on the same computer to perform parallel computations
Examples of how to use multiple computers to perform parallel calculations
General R Documentation

The package is under active development so please check back for updates. Please cite the software as follows:
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software. 42(7): 1-52.

GenMatch() can make use of multiple chips on the same computer or multiple computers to perform parallel computations. Examples are provided for how to use multiple chips on the same computer. Examples are also provided for how to use multiple computers to perform parallel computations in the Journal of Statistical Software article.

The following paper describes GenMatch() in detail and discusses its theoretical properties: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies." Monte Carlo experiments are presented in the paper which illustrate GenMatch's properties, and real data examples are provided where GenMatch recovers the experimental bench. Also see the paper entitled "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations," where GenMatch is used to recover another experimental benchmark.

Also see my "Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference" paper which critically reviews various ways to measure balance. Cumulative probability distribution functions of standardized statistics are advocated as balance metrics. Formal hypothesis tests of balance should not be conducted as is common in the matching literature because no measure of balance is a monotonic function of bias and because balance should be optimized without limit. However, descriptive measures of discrepancy ignore information related to bias which is captured by probability distribution functions of standardized statistics. The rbounds package by Luke Keele implements a number of Rosenbaum's methods of sensitivity analysis for matched data. One can conduct sensitivity analyses for matched data with binary, ordinal or continuous outcomes, and for matched data with multiple control units matched to each treated unit. The package is designed work with the object returned by the Match() function.

The Matching software was used to produce the following working paper: The Varying Role of Voter Information Across Democratic Societies. The robust propensity score methods discussed in the paper will be included in a future version. The core matching estimator which is implemented is that of Alberto Abadie and Guido Imbens. This algorithm provides principled standard errors when matching is done with covariates or a known propensity score. Ties are handled in a deterministic and coherent fashion. For details see Large Sample Properties of Matching Estimators for Average Treatment.

Significant performance enhancements were provided by Nate Begeman (Mac OS X Performance Group at Apple). And "Matching" relies on a modified version of the Scythe Statistical Library developed by Andrew Martin, Kevin Quinn and Daniel Pemstein. My modified version of the library is included in the "Matching" package.

For more details on matching and causal inference see Jonathan Wand's Reading List.

Return to Jasjeet Sekhon's Homepage