MatchBalance {Matching} | R Documentation |

This function provides a variety of balance statistics useful for
determining if balance exists in any unmatched dataset and
in matched datasets produced by the `Match`

function. Matching is performed by the `Match`

function,
and `MatchBalance`

is used to determine if `Match`

was successful in achieving balance on the observed covariates.

MatchBalance(formul, data = NULL, match.out = NULL, ks = TRUE, nboots=500, weights=NULL, digits=5, paired=TRUE, print.level=1)

`formul` |
This formula does not estimate any model. The formula is
simply an efficient way to use the R modeling language to list the
variables we wish to obtain univariate balance statistics for. The
dependent variable in the formula is usually the treatment
indicator. One should include many functions of the observed
covariates. Generally, one should request balance statistics on
more higher-order terms and interactions than were used to conduct
the matching itself. |

`data` |
A data frame which contains all of the variables in the formula. If a data frame is not provided, the variables are obtained via lexical scoping. |

`match.out` |
The output object from the `Match`
function. If this output is included, `MatchBalance` will provide
balance statistics for both before and after matching. Otherwise
balance statistics will only be reported for the raw unmatched
data. |

`ks` |
A logical flag for whether the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. See `ks.boot` for more
details. |

`weights` |
An optional vector of observation specific weights. |

`nboots` |
The number of bootstrap samples to be run. If zero, no
bootstraps are done. Bootstrapping is highly recommended because
the bootstrapped Kolmogorov-Smirnov test provides correct coverage
even when the distributions being compared are not continuous. At
least 500 `nboots` (preferably 1000) are recommended for
publication quality p-values. |

`digits` |
The number of significant digits that should be displayed. |

`paired` |
A flag for whether the paired `t.test` should be
used after matching. Regardless of the value of this option, an
unpaired `t.test` is done for the unmatched data because
it is assumed that the unmatched data were not generated by a paired
experiment. |

`print.level` |
The amount of printing to be done. If zero, there is no printing. If one, the results are summarized. If two, details of the computations are printed. |

This function can be used to determine if there is balance in the pre-
and/or post-matching datasets. Difference of means between treatment
and control groups are provided as well as a variety of summary
statistics for the empirical CDF (eCDF) and empirical-QQ (eQQ) plot
between the two groups. The eCDF results are the standardized mean,
median and maximum differences in the empirical CDF. The eQQ results
are summaries of the raw differences in the empirical-QQ plot.

Two univariate tests are also provided: the t-test and the bootstrap
Kolmogorov-Smirnov (KS) test. These tests should not be treated as
hypothesis tests in the usual fashion because we wish to maximize
balance without limit. The bootstrap KS test is highly
recommended (see the `ks`

and `nboots`

options) because the
bootstrap KS is consistent even for non-continuous distributions.
Before matching, the two sample t-test is used; after matching, the
paired t-test is used.

Two multivariate tests are provided. The KS and Chi-Square null deviance tests. The KS test is to be preferred over the Chi-Square test because the Chi-Square test is not testing the relevant hypothesis. The null hypothesis for the KS test is equal balance in the estimated probabilities between treated and control. The null hypothesis for the Chi-Square test, however, is all of the parameters being insignificant; a comparison of residual versus null deviance. If the covariates being considered are discrete, this KS test is asymptotically nonparametric as long as the logit model does not produce zero parameter estimates.

`NA`

's are handled by the `na.action`

option. But it
is highly recommended that `NA`

's not simply be deleted, but
one should check to make sure that missingness is balanced.

`BeforeMatching` |
A list containing the before matching univariate
balance statistics. That is, a list containing the results of
the `balanceUV` function applied to all of the
covariates described in `formul` . Note that the univariate
test results for all of the variables in `formul` are printed
if `verbose > 0` . |

`AfterMatching` |
A list containing the after matching univariate
balance statistics. That is, a list containing the results of
the `balanceUV` function applied to all of the
covariates described in `formul` . Note that the univariate
test results for all of the variables in `formul` are printed
if `verbose > 0` . This object is `NULL` , if no matched
dataset was provided. |

`BMsmallest.p.value` |
The smallest p.value found across all of the
before matching balance tests (including t-tests and KS-tests. |

`BMsmallestVarName` |
The name of the variable with the
`BMsmallest.p.value` (a vector in case of ties). |

`BMsmallestVarNumber` |
The number of the variable with the
`BMsmallest.p.value` (a vector in case of ties). |

`AMsmallest.p.value` |
The smallest p.value found across all of the
after matching balance tests (including t-tests and
KS-tests. |

`AMsmallestVarName` |
The name of the variable with the
`AMsmallest.p.value` (a vector in case of ties). |

`AMsmallestVarNumber` |
The number of the variable with the
`AMsmallest.p.value` (a vector in case of ties). |

Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.

Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score
Matching Software with Automated Balance Optimization.”
*Journal of Statistical Software* 42(7): 1-52.
http://www.jstatsoft.org/v42/i07/

Diamond, Alexis and Jasjeet S. Sekhon. 2005. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Working Paper. http://sekhon.berkeley.edu/papers/GenMatch.pdf

Sekhon, Jasjeet Singh and Richard D. Grieve. 2011. "A Matching Method
For Improving Covariate Balance in Cost-Effectiveness Analyses."
*Health Economics*. forthcoming.

Abadie, Alberto. 2002. ``Bootstrap Tests for Distributional Treatment
Effects in Instrumental Variable Models.'' *Journal of the
American Statistical Association*, 97:457 (March) 284-292.

Hall, Peter. 1992. *The Bootstrap and Edgeworth Expansion*. New
York: Springer-Verlag.

Wilcox, Rand R. 1997. *Introduction to Robust Estimation*. San
Diego, CA: Academic Press.

William J. Conover (1971), *Practical nonparametric statistics*.
New York: John Wiley & Sons. Pages 295-301 (one-sample
"Kolmogorov" test), 309-314 (two-sample "Smirnov" test).

Shao, Jun and Dongsheng Tu. 1995. *The Jackknife and Bootstrap*.
New York: Springer-Verlag.

Also see `Match`

, `GenMatch`

,
`balanceUV`

, `qqstats`

, `ks.boot`

,
`GerberGreenImai`

, `lalonde`

# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in Non-Experimental Studies: Re-Evaluating the # Evaluation of Training Programs.''Journal of the American Statistical Association 94 (448): 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); #Let's summarize the output summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)