经济学中的倾向评分匹配(PSM)方法

观点 · 2009-03-02 00:00

作者:

返回

Propensity Score Matching,我试图翻译为“倾向评分匹配”是一种近来经济学界(特别是发展经济学和劳动经济学)逐渐采用的非实验方法。顾名思义,他分为以下两步,倾向评分(Propensity Score)和匹配(Matching)。...

Propensity Score Matching,我试图翻译为“倾向评分匹配”是一种近来经济学界(特别是发展经济学和劳动经济学)逐渐采用的非实验方法。顾名思义,他分为以下两步,倾向评分(Propensity Score)和匹配(Matching)。该方法对于一些没有采用实验方法区分实验组和参照组的数据采用了一种近似于实验的方法,尽可能的产生出随机分组(randomized subclassification),以控制可观测变量。如果配合二阶差分方法(DID),则可以进一步控制不随时间改变的不可观测变量。

附件中是我在康奈尔大学营养经济学和家户调查研讨课上的发言稿,总结了这种方法的使用前提、使用步骤和局限。

Propensity Score Matching

Xi Chen[1]

The general idea is that those who have been affected by the policy or program are compared with people who are similar in as many respects as possible.

Covariate Matching and Propensity Score Matching (PSM)

PSM constructs a statistical comparison group based on a model of the estimated probability of participating in the treatment: . Posenbaum and Ruben (1983) overcome the curse of dimensionality of covariate matching (i.e. matching treated and untreated observations on observable characteristics) by showing that, under certain assumptions, matching on P(X) is as good as matching on X.

Two Assumptions of the PSM

Known as “conditional mean independence”

Says that after conditioning on observables, X, treatment group members would have the same outcome, , as control group members in the absence of the treatment

For all X, there is a positive probability of participating. A valid match on P(X) can be found for all T=1.

Overall, assumptions of PSM are quite strong and are not testable. PSM require good data on a complete set of X variables. PSM provides similar program impact estimates as experimental methods under assumptions a) the same data source is used for participants; b) both groups have access to the same markets, and c) there enough control variables to identify both the Probit and the DID equation).

Estimation Steps

1. It uses observables to predict the probability of program participation Pi.

2. It matches treatment group with comparison group by picking the “nearest non-participant” for each participant that minimizes |Pˆ(Zi )−Pˆ(Zj )| as long as this does not exceed some reasonable bound. Here i denotes participates in the program, and j denotes non-participates.[2] A balancing test is followed to make sure that average propensity score and mean of X variables are the “same” within quantiles of the propensity score distribution.[3]

△(3. run a regression of the outcome variable on a list of individual level control variables, say Xi, and the predicted probability of participation, Pi.)

How to choose X

1. PSM estimators have lower bias when X includes variables that affect both program participation and the outcome.

2. The probit used in PSM is not a determinants model, so t-test and adj-R2 are not very informative and may be misleading

3. There is limited guidance on how to Select X variables using statistical tests. We must rely on economic models explaining realizations of the outcome and models of program participation.

How to Measure Mean Treatment Effects

Method 1: In Normal Form

Method 1’: with weighting Scheme

where NT is the number receiving the program, NC is the number of non-participants and the Wij ’s are the weights. There are several weighting schemes ranging from nearest-neighbor weights to non-parametric weights based on kernel functions of the differences in scores.

Method 2: DID and PSM

When data are available on outcomes before and after the program begins, the PSM estimator can be improved by subtracting off differences in pre-program outcomes between participants and matched non-participants. DID PSM estimator removes residual bias due to unobservable, time invariant differences between treatment and comparison group not controlled for by conditioning on pre-program variables, .

Method 3: Cumulative Distribution Functions and Dominance

How Does PSM Differ from Other Methods

1. in PSM it is the conditional probability (P (Z)) that is intended to be uniform between participants and matched comparators, while randomization assures that the participant and comparison groups are identical in terms of the distribution of all characteristics whether observed or not. Hence there are always concerns about remaining Selection bias in PSM estimates.

2. OLS and PSM

Thus PSM allows estimation of mean impacts without arbitrary assumptions about functional forms and error distributions. This can also facilitate testing for the presence of potentially complex interaction effects.

3. PSM confines attention to the region of common support. By contrast, the regression methods commonly found in the literature use the full sample. It is found that impact estimates based on full (unmatched) samples are generally more biased, and less robust to miss-specification of the regression function, than those based on matched samples.

4. Whether preference is given to variables that one can argue to be exogenous to outcomes

5. It is unknown how much difference it would make to mean-impact estimates by using PSM rather than OLS.

Key empirical challenges

1. For this method to be credible we need the first stage regression to have good explanatory power so that we can have some confidence in the predicted probabilities and the matching. To achieve the best matches possible it is desirable to overparameterize the probit.

2. We further need to have data on some observables that do not enter the first stage Probit, but only feature in the regression on the outcome variable. If both the Probit and the final regression use all Xi then identification is achieved only off the non-linearity of the Probit model.

An Example using PSM

Gilligan and Hoddinott (2007 AJAE) is an excellent example of propensity score matching in a development setting. Examining the impact of emergency food aid after the 2002 drought in rural Ethiopia they find a significant effect of receiving food aid through the Employment Generation Scheme food-for-work program on growth in total and on food consumption and a significantly reduced risk exposure to famine of food-for-work participants. Similarly, recipients of traditional food aid (“Gratuitous Relief”) also had improved food consumption, but also experienced a negative effect on food security.

PSM Estimation Steps with Stata Software

1. Develop a model of program participation (dummy) as a function of variables correlated with the probability of participating and with the outcome of interest. These variables should be unaffected by participation. That is to say, they should be pre-program or plausibly exogenous to the participation decision.

2. Run -pscore- on the participation model to test the “balancing properties” of the data. It tests that treatment and comparison obs have identical mean propensity scores (pps) within groups of the propensity score. Once it has identified groups for which this holds, -pscore- tests that equality of the means of each RHS variable within these groups of the pps. If it rejects equality of means for any x variable, it reports that “balancing property” does not hold. Then, need to change specification until it is satisfied. Note, you may want to test the “balancing property” again on the matched sample after doing the matching. This can be done using -pstest-.

3. Run -psmatch2- to estimate impact through matching followed by bootstrapping to estimate the standard error on the impact estimate.

--------------------------------------------------------------------------------

[1] Xi Chen is a Ph.D. student in the Department of Applied Economics and Management at Cornell University.

[2] PSM finishes this by defining region of common support, where distribution of Pi for treatment and comparison group are overlapping.

[3] Balancing tests is a test for systematic differences in the covariates between the treatment and comparison groups constructed by PSM.


好文章,需要你的鼓励