Policy Evaluation by Double Machine Learning

Author

Dr. Roch Nianogo, Bowen Zhang, Dr. Hua Zhou

Code
library(tidyverse)
library(tidycensus)
census_api_key("4cf445b70eabd0b297a45e7f62f52c27ba3b5cae",
               install = TRUE, overwrite = TRUE)
Sys.setenv("CENSUS_KEY" = "4cf445b70eabd0b297a45e7f62f52c27ba3b5cae")

library(censusapi)
library(data.table)
library(DoubleML)
library(gtsummary)
library(mlr3)
library(ranger)
library(tidymodels)

1 Roadmap

Data Science Diagram

2 Causal inference

  • You’ve done tremendous work carrying out an intervention program. You want to demonstrate to stakeholders (government, funding agencies, etc) that your program “works.”

    • You compared the average of Food Security Rasch Scale Score in the past 30 days (higher score indicates lower food security) among program participants versus those not participating our program. We found a significant difference in the averages 😄 But people criticize your analysis as biased 🙁 What could go wrong?

    • In another try, you ran a multiple linear regression that included the program participation status and many socio-economical variables (race, age, income, region, …), hoping to adjust for hidden confounding. Then you found that program participation does not significantly improve the food security Rasch score 🙁

    • Causal inference methods can help more reliably estimate the “treatment effect.”

  • Main approaches for causal inference:

    • RCT (randomized controlled trials): gold standard, but impractical in most Community-Based Participatory Research (CBPR) with multi-level interventions.

    • Observational studies under unconfoundedness: matching, regression estimators, inverse-propensity-score weighting.

    • Observational studies without unconfoundedness: instrumental variable, difference-in-differences (DiD), synthetic control, regression discontinuity designs.

3 Partially linear regression and interactive regression model

3.1 Partially linear regression (PLR)

Partially linear regression (PLR) model introduced by Robinson (1988):

PLR

\[ \begin{aligned} Y &= D \theta_0 + g_0(X) + U, &E[U | X, D] = 0, \\ D &= m_0(X) + V, &E[V | X] = 0. \end{aligned} \]

Here, \(Y\) is the outcome variable, \(D\) is the policy/treatment variable of interest, vector

\[ X = (X_1, \ldots, X_p) \]

consists of other control, and \(U\) and \(V\) are disturbances terms. The first equation is the main equation, and \(\theta_0\) is the main regression coefficient that we would like to infer. If \(D\) is exogenous conditional on controls X, \(\theta_0\) has the interpretation of the treatment effect parameter or ‘lift’ parameter in business applications. The second equation keeps track of confounding, namely the dependence of the treatment variable on controls.

The confounding factors \(X\) affect the policy variable \(D\) via the function \(m_0(X)\) and the outcome variable via the function \(g_0(X)\). These function does not need to be linear! We can take advantage of the great prediction performance of machine learning techniques to estimate these functions.

Naive application of machine learning methods directly to two equations may have a very high bias.

bias

Remarks:

  1. The policy variable \(D\) can be either binary or continuous (dosage).

  2. The policy variable \(D\) can be multi-dimensional (multi-level intervention: food pantry, mobile clinics, referral system, education classes, ads.).

3.2 Interactive regression model (IRM)

We consider estimation of average treatment effects when treatment effects are fully heterogeneous, i.e., the response curves under control and treatment can be different nonparametric functions, and the treatment variable is binary, \(D \in \{0,1\}\). We consider vectors \((Y,D,X)\) such that

\[ \begin{aligned} Y &= g_0(D,X) + U, &E[U | X, D] = 0, \\ D &= m_0(X) + V, &E[V | X] = 0. \end{aligned} \]

Since \(D\) is not additively separable, this model is more general than the partially linear model for the case of binary \(D\). A common target parameter of interest in this model is the average treatment effect (ATE).

\[ \theta_0 = E[g_0(1,X) - g_0(0,X)]. \]

The confounding factors \(X\) affect the policy variable via the propensity score \(m_0(X)\) and the outcome variable via the function \(g_0(X)\). Both of these functions are unknown and potentially complex, and we can employ ML methods to learn them.

The general idea for identification of \(θ_0\) using the IRM is similar. Once we are able to account for all confounding variables \(X\) in our analysis, we can consistently estimate the causal parameter \(\theta_0\). A difference to the PLR refers to assumptions on the functional form of the main regression equation. Whereas it is assumed that the effect of \(D\) on \(Y\) in the PLR model is additively separable, the IRM model comes with less restrictive assumptions.

3.3 Basic Idea behind Double Machine Learning for the PLR Model

The basic idea behind double machine learning method is to use machine learning methods to estimate the nuisance functions \(m_0(X)\) and \(g_0(X)\) and then use these estimates to construct a doubly robust estimator of the target parameter \(\theta_0\). The doubly robust estimator is consistent if either the machine learning estimator of \(m_0(X)\) or \(g_0(X)\) is consistent.

Reference: Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1-C68.

3.3.1 Idea 1: Neyman orthogonal score

The Neyman orthogonal score is orthogonal to the nuisance parameter in the sense that its expectation is zero. The Neyman orthogonal score is a key concept in the theory of efficient estimation of the target parameter in semiparametric models.

The PLR model can be rewritten in the following residualized form:

\[ \begin{aligned} W &= V\theta_0 + \zeta, &E[\zeta | X, D] = 0, \\ W &= (Y - \ell_0(X)), &\ell_0(X) = E[Y | X], \\ V &= (D - m_0(X)), &m_0(X) = E[D | X]. \end{aligned} \]

The variable \(W\) and \(V\) represent original variables after taking out or partialling out the effect of X. Given identification, double machine learning for a PLR proceeds as follows:

  1. Estimate \(\ell_0\) and \(m_0\) by sloving two problems of predicting \(Y\) and \(D\) using \(X\), using any generic machine learning method, giving us estimated residuals \[ \begin{aligned} \hat{W} &= Y - \hat{\ell}_0(X), \\ \hat{V} &= D - \hat{m}_0(X). \end{aligned} \]

    The residuals should be of a cross-validated form to avoid overfitting.

  2. Estimate \(\theta_0\) by regressing the residuals \(\hat{W}\) on \(\hat{V}\). Use the conventional inference for this regression estimator, ignoring the estimation error in \(\hat{W}\) and \(\hat{V}\).

3.3.2 Idea 2: Sample splitting

The key idea behind sample splitting is to split the sample into two parts, one for estimating the nuisance functions and the other for estimating the causal parameter. This is also known as cross-fitting.

4 Example: causal effect of WIC on food security

4.1 Background

The Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) is a federally funded nutrition program that provides grants to States to support distribution of supplemental foods, health care referrals, and nutrition education to safeguard the health of low-income pregnant, breastfeeding, and non-breastfeeding postpartum women; for infants in low-income families; and for children younger than age 5 in low-income families and who are found to be at nutritional risk.

In 2020, WIC served over 6.2 million participants per month at an average monthly cost for food (after rebates to WIC from manufacturers) of about $38 per person. Many household under low food security benefit from WIC. We are curious to know the causal effect of WIC on food security status.

4.2 Data preparation

We use the 2020 Current Population Survey (CPS) Food Security Supplement (FSS) data to estimate the causal effect of WIC on food security status. We screen the eligible household (below 185 percent of the poverty threshold, with children under age 5 or women aged 15-45). Besides, we set another eligibility criteria that the household must have at least 1 food insecure event in the past 30 days, because we are interested in the causal effect of WIC on food security status among households that suffered from food insecurity.

We include the following variables in our analysis:

  • HRFS30D4: Food Security Rasch Scale Score in the past 30 days (100 - 1400). Higher score indicates lower food security.

  • HESP8: WIC participation status (1: Yes, 2: No).

  • HRNUMHOU: Number of people in the household.

  • HRHTYPE: Household type.

  • GEREG: Geographic region.

  • PRCHLD: Presence of children in the household.

  • PRTAGE: Age of the reference person.

  • PEEDUCA: Education level.

  • PEMLR: Employment status.

  • RACE: recoded from PTDTRACE and PEHSPNON.

  • HEFAMINC: Family income (take median of each class).

fss_20_data <- censusapi::getCensus(
  name = "cps/foodsec/dec",
  vintage = 2020,
  vars = c("HRHHID", "HRHHID2", "HRNUMHOU",
           "HRHTYPE", "GEREG", "HESP8",
           "PRTAGE", "PEEDUCA", "PRCHLD",
           "PEMLR", "PTDTRACE", "PEHSPNON",
           "HEFAMINC", "HRFS30D1", "HHSUPWGT",
           "PERRP", "HRFS30D4")
  ) |>
  as_tibble() |>
  filter(PERRP %in% c(40, 41),
         HRFS30D1 %in% c(1, 2, 3),
         PEMLR %in% c(1, 2, 3, 4, 5, 6, 7),
         HESP8 %in% c(1, 2),
         HRFS30D4 != -6) |>
  mutate(HRFS30D1 = factor(HRFS30D1,
                           levels = c("1", "2", "3"),
                           labels = c("Food Security",
                                      "Low Food Security",
                                      "Very Low Food Security")),
         HESP8 = ifelse(HESP8 == 1, 1, 0) |> 
           factor(levels = c(0, 1), labels = c("NoWIC", "WIC")),
         HRNUMHOU = as.numeric(HRNUMHOU),
         HHSUPWGT = as.numeric(HHSUPWGT),
         HRHTYPE = cut(as.numeric(HRHTYPE), breaks = c(0, 2, 5, 8, 10),
                       labels = c("MarriedFamily", "UnmarriedFamily",
                                  "Individual", "GroupQuarters")) |> fct_drop(),
         GEREG = factor(GEREG, levels = c(1, 2, 3, 4),
                        labels = c("Northeast", "Midwest", "South", "West")),
         PEEDUCA = cut(as.numeric(PEEDUCA), breaks = c(30, 38, 42, 46),
                       labels = c("LessThanHighSchool",
                                  "HighSchoolOrAssociateDegree",
                                  "CollegeOrHigher")),
         PEMLR = cut(as.numeric(PEMLR), breaks = c(0, 2, 4, 7),
                     labels = c("Employed", "NotEmployed",
                                "NotInLaborForce")),
         PRTAGE = as.numeric(PRTAGE),
         PEHSPNON = factor(PEHSPNON, levels = c(2, 1),
                           labels = c("Non-Hispanic", "Hispanic")),
         PTDTRACE = fct_recode(PTDTRACE,
                          "White" = "1" ,
                          "Black" = "2",
                          "AIAN" = "3",
                          "Asian" = "4",
                          "HPI" = "5") |>
           fct_other(keep = c("White", "Black", "AIAN", "Asian", "HPI")),
         PRCHLD = ifelse(PRCHLD == 0, "NoChildren", "Children") |> 
           as_factor() |> relevel(ref = "NoChildren"),
         HEFAMINC = case_when(HEFAMINC == 1 ~ 2500,
                              HEFAMINC == 2 ~ 6250,
                              HEFAMINC == 3 ~ 8750,
                              HEFAMINC == 4 ~ 11250,
                              HEFAMINC == 5 ~ 13750,
                              HEFAMINC == 6 ~ 17500,
                              HEFAMINC == 7 ~ 22500,
                              HEFAMINC == 8 ~ 27500,
                              HEFAMINC == 9 ~ 32500,
                              HEFAMINC == 10 ~ 37500,
                              HEFAMINC == 11 ~ 45000,
                              HEFAMINC == 12 ~ 55000,
                              HEFAMINC == 13 ~ 67500,
                              HEFAMINC == 14 ~ 87500,
                              HEFAMINC == 15 ~ 125000,
                              HEFAMINC == 16 ~ 175000),
         HEFAMINC = as.numeric(HEFAMINC),
         HRFS30D4 = as.numeric(HRFS30D4),
  ) |>
  print()
# A tibble: 1,383 × 17
   HRHHID       HRHHID2 HRNUMHOU HRHTYPE GEREG HESP8 PRTAGE PEEDUCA PRCHLD PEMLR
   <chr>          <dbl>    <dbl> <fct>   <fct> <fct>  <dbl> <fct>   <fct>  <fct>
 1 00577061009…   10011        4 Unmarr… South NoWIC     23 HighSc… Child… NotE…
 2 01721051407…   12011        6 Unmarr… Nort… NoWIC     34 LessTh… Child… Empl…
 3 10901946300…   10011        6 Marrie… Nort… NoWIC     51 Colleg… Child… Empl…
 4 01535061200…   12011        3 Unmarr… Nort… NoWIC     49 HighSc… Child… Empl…
 5 01306948400…   10011        6 Marrie… Nort… WIC       42 HighSc… Child… NotI…
 6 50139717020…   12011        3 Marrie… South NoWIC     31 HighSc… Child… Empl…
 7 02130502431…   12011        4 Unmarr… South NoWIC     31 HighSc… Child… NotI…
 8 21060572121…   12011        3 Unmarr… South WIC       33 HighSc… Child… NotE…
 9 01011663046…   12011        5 Indivi… South NoWIC     43 HighSc… NoChi… Empl…
10 27000223810…   10012        4 Unmarr… South NoWIC     52 Colleg… Child… NotE…
# ℹ 1,373 more rows
# ℹ 7 more variables: PTDTRACE <fct>, PEHSPNON <fct>, HEFAMINC <dbl>,
#   HRFS30D1 <fct>, HHSUPWGT <dbl>, PERRP <chr>, HRFS30D4 <dbl>
fss_20_data |>
  select(HRFS30D4, HESP8, HRNUMHOU, HRHTYPE, GEREG, PRTAGE,
         PEEDUCA, PRCHLD, PEMLR, HEFAMINC, PEHSPNON) |>
  tbl_summary()
Characteristic N = 1,3831
HRFS30D4 414 (172, 602)
HESP8
    NoWIC 1,231 (89%)
    WIC 152 (11%)
HRNUMHOU 3.00 (2.00, 5.00)
HRHTYPE
    MarriedFamily 532 (38%)
    UnmarriedFamily 617 (45%)
    Individual 234 (17%)
GEREG
    Northeast 175 (13%)
    Midwest 243 (18%)
    South 577 (42%)
    West 388 (28%)
PRTAGE 38 (31, 45)
PEEDUCA
    LessThanHighSchool 226 (16%)
    HighSchoolOrAssociateDegree 929 (67%)
    CollegeOrHigher 228 (16%)
PRCHLD
    NoChildren 547 (40%)
    Children 836 (60%)
PEMLR
    Employed 808 (58%)
    NotEmployed 164 (12%)
    NotInLaborForce 411 (30%)
HEFAMINC 32,500 (17,500, 55,000)
PEHSPNON
    Non-Hispanic 1,034 (75%)
    Hispanic 349 (25%)
1 Median (IQR); n (%)

4.3 Two-sample t-test (simple linear regression)

lm(HRFS30D4 ~ factor(HESP8), data = fss_20_data) |>
  tbl_regression() |>
  bold_labels() |>
  bold_p(t = 0.05)
Characteristic Beta 95% CI1 p-value
factor(HESP8)


    NoWIC
    WIC -62 -105, -20 0.004
1 CI = Confidence Interval

4.4 Multiple linear regression

fit <- lm(HRFS30D4 ~ factor(HESP8) + HRNUMHOU + HRHTYPE + GEREG + PRTAGE +
            PEEDUCA + PRCHLD + PEMLR + HEFAMINC + PEHSPNON, data = fss_20_data)
fit

Call:
lm(formula = HRFS30D4 ~ factor(HESP8) + HRNUMHOU + HRHTYPE + 
    GEREG + PRTAGE + PEEDUCA + PRCHLD + PEMLR + HEFAMINC + PEHSPNON, 
    data = fss_20_data)

Coefficients:
                       (Intercept)                    factor(HESP8)WIC  
                         4.395e+02                          -3.224e+01  
                          HRNUMHOU              HRHTYPEUnmarriedFamily  
                        -7.631e+00                           3.096e+01  
                 HRHTYPEIndividual                        GEREGMidwest  
                         3.962e+01                           2.883e+01  
                        GEREGSouth                           GEREGWest  
                         1.263e+01                          -4.463e+00  
                            PRTAGE  PEEDUCAHighSchoolOrAssociateDegree  
                         1.173e+00                          -3.363e+01  
            PEEDUCACollegeOrHigher                      PRCHLDChildren  
                        -3.351e+01                          -5.966e+01  
                  PEMLRNotEmployed                PEMLRNotInLaborForce  
                         8.678e+01                           4.418e+01  
                          HEFAMINC                    PEHSPNONHispanic  
                        -3.088e-04                          -1.699e+01  
fit |>
  tbl_regression() |>
  bold_labels() |>
  bold_p(t = 0.05)
Characteristic Beta 95% CI1 p-value
factor(HESP8)


    NoWIC
    WIC -32 -75, 11 0.14
HRNUMHOU -7.6 -18, 2.2 0.13
HRHTYPE


    MarriedFamily
    UnmarriedFamily 31 0.59, 61 0.046
    Individual 40 -13, 92 0.14
GEREG


    Northeast
    Midwest 29 -19, 77 0.2
    South 13 -29, 54 0.6
    West -4.5 -49, 40 0.8
PRTAGE 1.2 -0.13, 2.5 0.077
PEEDUCA


    LessThanHighSchool
    HighSchoolOrAssociateDegree -34 -72, 4.4 0.083
    CollegeOrHigher -34 -84, 17 0.2
PRCHLD


    NoChildren
    Children -60 -96, -24 0.001
PEMLR


    Employed
    NotEmployed 87 45, 128 <0.001
    NotInLaborForce 44 13, 75 0.005
HEFAMINC 0.00 0.00, 0.00 0.2
PEHSPNON


    Non-Hispanic
    Hispanic -17 -49, 15 0.3
1 CI = Confidence Interval

4.5 DoubleML workflow

The Python and R package DoubleML provide an implementation of the double / debiased machine learning framework of Chernozhukov et al. (2018). The R package is built on top of mlr3 and the mlr3 ecosystem (Lang et al., 2019).

DMLlogo

  • Data-backend

    we initialize the data-backend and thereby declare the role of the outcome, the treatment, and the confounding variables.

doubleML_data <- fss_20_data |>
  mutate(HESP8 = ifelse(HESP8 == "WIC", 1, 0)) |>
  data.table()

obj_dml_data <- DoubleMLData$new(
  doubleML_data,
  y_col = "HRFS30D4",
  d_cols = "HESP8",
  x_cols = c("HRNUMHOU", "HRHTYPE", "GEREG", "PRCHLD", "PRTAGE",
             "PEEDUCA", "PEMLR", "HEFAMINC", "PEHSPNON")
)

obj_dml_data
================= DoubleMLData Object ==================


------------------ Data summary      ------------------
Outcome variable: HRFS30D4
Treatment variable(s): HESP8
Covariates: HRNUMHOU, HRHTYPE, GEREG, PRCHLD, PRTAGE, PEEDUCA, PEMLR, HEFAMINC, PEHSPNON
Instrument(s): 
No. Observations: 1383
  • Choose causal model

    There are several models currently implemented in DoubleML which differ in terms of the underlying causal structure. In this example, we use the interactive regression model (IRM).

causalmodel

  • ML Methods

    we can specify the machine learning tools used for estimation of the nuisance parts. We can generally choose any learner from mlr3 ecosystem in R. In this example, we use random forest for both the main equation and the confounding equation.

set.seed(2024)

learner_g <- lrn("regr.ranger", num.trees = 500, min.node.size = 2,
                 max.depth = 5)
learner_classif_m <- lrn("classif.ranger", num.trees = 500,
                         min.node.size = 2, max.depth = 5)
  • DML specifications

    we initialize and parametrize the model object which will later be used to perform the estimation. we specify the resampling, the dml algorithm and the score function.

set.seed(2024)

doubleml_mod <- DoubleMLIRM$new(
  obj_dml_data,
  ml_g = learner_g, ml_m = learner_classif_m,
  score = "ATE", dml_procedure = "dml2",
  n_folds = 5, n_rep = 1
)

doubleml_mod
================= DoubleMLIRM Object ==================


------------------ Data summary      ------------------
Outcome variable: HRFS30D4
Treatment variable(s): HESP8
Covariates: HRNUMHOU, HRHTYPE, GEREG, PRCHLD, PRTAGE, PEEDUCA, PEMLR, HEFAMINC, PEHSPNON
Instrument(s): 
No. Observations: 1383

------------------ Score & algorithm ------------------
Score function: ATE
DML algorithm: dml2

------------------ Machine learner   ------------------
ml_g: regr.ranger
ml_m: classif.ranger

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE

------------------ Fit summary       ------------------
 
  • Estimation
doubleml_mod$fit()
INFO  [18:11:42.070] [mlr3] Applying learner 'classif.ranger' on task 'nuis_m' (iter 1/5)
INFO  [18:11:42.177] [mlr3] Applying learner 'classif.ranger' on task 'nuis_m' (iter 2/5)
INFO  [18:11:42.272] [mlr3] Applying learner 'classif.ranger' on task 'nuis_m' (iter 3/5)
INFO  [18:11:42.364] [mlr3] Applying learner 'classif.ranger' on task 'nuis_m' (iter 4/5)
INFO  [18:11:42.457] [mlr3] Applying learner 'classif.ranger' on task 'nuis_m' (iter 5/5)
INFO  [18:11:42.694] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g0' (iter 1/5)
INFO  [18:11:42.767] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g0' (iter 2/5)
INFO  [18:11:42.839] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g0' (iter 3/5)
INFO  [18:11:42.912] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g0' (iter 4/5)
INFO  [18:11:42.984] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g0' (iter 5/5)
INFO  [18:11:43.184] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g1' (iter 1/5)
INFO  [18:11:43.210] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g1' (iter 2/5)
INFO  [18:11:43.237] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g1' (iter 3/5)
INFO  [18:11:43.263] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g1' (iter 4/5)
INFO  [18:11:43.289] [mlr3] Applying learner 'regr.ranger' on task 'nuis_g1' (iter 5/5)
doubleml_mod$summary()
Estimates and significance testing of the effect of target variables
      Estimate. Std. Error t value Pr(>|t|)  
HESP8    -78.14      37.51  -2.083   0.0372 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
doubleml_mod$confint()
          2.5 %    97.5 %
HESP8 -151.6677 -4.621797

5 Feedback

Slido