Demystifying Causality: An Introduction in Causal Inference and Applications. Part 5.

IvanGor
6 min readAug 27, 2023

--

Ever wondered what binds the concept of rivers to the quality of schools? Or how the assignment of a judge relates to education? Or even, what’s the relation between the hometowns of university alumni and protests? Well, brace yourself as we go into the world of Instrumental Variables (IV)!

Before we delve deeper, consider this causal graph:

Where:

T = treatment

Y = target variable

In this graph, there’s an unobserved confounder denoted as U, creating a backdoor path that hinders our capacity to estimate the Average Treatment Effect (ATE) of T on Y. The absence of backdoor or frontdoor adjustments due to the unobserved U, combined with the lack of mediators from T to Y, presents a challenge. But, did you notice the variable Z? It might be our beacon of hope.

For Z to be an instrument, it must meet the following conditions:

  1. Z exerts a causal effect on T — termed relevance.
  2. The causal effect of Z on Y operates solely through T — known as exclusion restriction.
  3. Z and Y share no backdoor paths, ensuring unconfoundedness.

This third assumption can be made flexible, considering only conditional unconfoundedness, resulting in what we term as a conditional instrument.

However, a word of caution! Instrumental variables enable only parametrical identification, implying a dependency on certain assumptions.

Assuming a binary linear relation, where all variables of interest (T and Z) are binary, the relationship is expressed as:

With a binary instrument, we call the Wald estimand the following:

By substituting empirical means in place of expectations, we can estimate this. The rationale is simple: we aim to determine delta, which represents the causal effect in our structural equation.

We got rid of expectation of U as it is independent on Z (by definition of Z) and then the part of equation in brackets just equalizes to zero.

If we transition from a binary to a continuous setting, the Wald estimand evolves to:

For a deeper dive into the proof for this continuous setting, refer to Brady Neal’s book.

A classic causal effect estimator, equivalent to the one mentioned, is the 2-Stage Least Squares (2SLS) estimator. Let’s say our causal graph incorporates observable covariates, X, influencing Y. Here’s the two-step algorithm:

  1. Estimate the regression:

This provides an estimate T hat.

2. Substitute T hat for T and estimate the concluding regression:

If all assumptions are maintained, the regression coefficient adjacent to T hat represents our sought-after causal effect.

Nevertheless, it’s worth noting that 2SLS heavily relies on robust linear associations between the instrument and treatment variable. This isn’t always the case, so some modifications are often required.

To understand the nuances and practical applications of the 2SLS and Wald estimator let’s delve into a data generation process.

The data generating process aligns with our previously considered causal graph. In this setup:

  • Z is our instrument.
  • U represents the confounder.
  • T is the treatment variable.
  • Y is our target outcome.

The true ATE here is 0.3 as can be seen from the equations.

The dataset’s pairplot is displayed here:

Results from a basic regression of Y on T are shown below:

Interestingly, the coefficient adjacent to T (which would represent the average treatment effect (ATE) with a correctly specified model) seems biased. This bias is likely because of the unobserved confounder U.

Wald estimator:

After employing a bootstrapped Wald estimator to obtain a confidence interval (CI), we derived the histogram as:

Metrics derived:

  • Point Estimate: 0.34
  • Standard Deviation: 0.107
  • 90% CI: [0.12756347199201984, 0.5545256705800827]

These results showcase a significant improvement over the simple regression, indicating the elimination of bias.

2SLS estimator:

First step:

  • The coefficient next to Z is notable, and the F-statistic value of 224 confirms the relevance of the IV.
  • Making an informed assumption, based on our understanding of the data generating process, we believe the other conditions like exclusion restriction and unconfoundedness hold true.

Second step:

We notice that the point estimate of ATE from 2SLS matches that of the Wald estimator. Furthermore, the standard deviation for Wald (0.107) is marginally lesser than that for 2SLS (0.124).

Ok, we got it. But what is about nonlinear model? Let’s create a new data generating process:

Now the true ATE is equal to 3.

Pairplots will look as follows:

In this new setup, the relation between Z and T seems to have evolved into a non-linear one.

Simple regression:

Regression is upward biased as we may see (coeff T = 4.4 and the real effect is still 3).

For the Wald estimator, we get:

  • Point Estimate: 8.09
  • CI: [-392.997, 409.766]

This reveals an enormous variance, indicating no significance.

By rightly incorporating Z squared in the 2SLS first step, the treatment effect is correctly determined, although with a wide CI:

Second step:

The treatment effect (coefficient of T_hat) is now correctly estimated.

Our exploration delved into the mechanisms and implications of the Wald estimator and 2SLS in the realm of instrumental variables. We discerned that while both are potent tools, their efficacy varies based on the linearity of the relationships at play. Particularly, in non-linear settings, 2SLS maintains its reliability, given that the model specification is pinpointed accurately. As we unpacked these intricacies, it’s clear that understanding and correctly applying these methods can significantly impact the robustness of causal inference.

Stay tuned for our forthcoming issues, where we’ll pivot to the fascinating domain of uplift models. These models offer a unique lens to view and analyze the incremental effects of treatments or interventions, a cornerstone in personalized marketing and tailored interventions. Dive in with us to unravel their nuances and potentials!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

IvanGor
IvanGor

Written by IvanGor

Senior Data Scientist @Careem

No responses yet

Write a response