Demystifying Causality: An Introduction to Causal Inference and Applications. Part 2.

IvanGor
9 min readApr 24, 2023

--

I appreciate you taking the time to read the first installment of my blog series. If you haven’t had the opportunity to read it yet, you can find it at this link. Through the previous post, I aimed to shed light on the significance of causality and the intricacies involved in grasping causal relationships, as opposed to mere statistical estimates and associations.

Comprehending causality is vital across various domains, such as economics, medicine, and social sciences, to name a few. Causal inference, the process of determining how one event leads to another, plays a pivotal role in scientific research. What sets causal relationships apart is the need to not only understand what occurred but also why it took place.

On the other hand, statistical estimates and associations hinge on correlations between variables without necessarily establishing causation. A correlation merely suggests a potential relationship between two variables, without asserting that one variable directly influences the other.

Hence, it’s crucial to possess a robust understanding of causality when undertaking research or evaluating data, particularly in fields with high stakes like medicine or public policy. With that in mind, I hope you found the first part of this series enlightening, and I’m eager to delve further into causality in the forthcoming posts. Like the previous post, this one draws from Brady Neal’s book “Introduction to Causal Inference”. Let’s dive in!

We’ll begin with some theoretical concepts, followed by practical examples and illustrations.

Let’s start by examining the following Directed Acyclic Graph (DAG):

Example DAG

In causal graphs, we use specific terminology. We refer to A, B, C, and so on as nodes, while the connections (or arrows) between the nodes are called edges. When there is a sequence of nodes and edges between nodes (e.g., A and D), we say that a path exists between A and D. A directed path comprises nodes and edges in which all the edges point in the same direction. In directed graphs, the edge extends from the parent to the child. In a directed path, all nodes following a particular node are its descendants, and all nodes preceding that node are its ancestors. For instance, in the image above, A, B, and C are ancestors of D; C is a parent of D. B, C, and D are descendants of A; B is a child of A. The direction of an edge in causal graphs represents the causal effect of the parent on the child.

Causal graphs essentially originate from probabilistic graphical models. Let’s begin with them. Suppose we have a data distribution P(x_1, x_2, … x_n). Without any restrictions in terms of the graphical model, we can factorize it using the chain rule of probability as:

Even for binary variables, this gives us 2^{n-1} parameters. Naturally, we would like to reduce this number. To do so, we can utilize the knowledge of the data distribution’s Probabilistic Graphical Model (PGM) using the Local Markov Assumption (LMA): A node X, given its parents in the DAG, is independent of all its non-descendants. For example, considering our graph, instead of performing the following calculations:

we can simplify the calculations as follows:

In general, Probabilistic Graphical Models (PGMs) provide us with information on the independence between variables (nodes). Without any additional assumptions, we can confidently state that there is no association (or dependency) between certain nodes (for example, A and E). However, this does not inform us about the dependency or independence between A and B. To address this, we can introduce the minimality assumption, which states that if there is a link between two nodes, then there is a dependency between these variables. Formally:

  • Given its parents in the DAG, a node X is independent of all its non-descendants.
  • Adjacent nodes in the DAG are dependent.

Minimality informs us that if we have a specific graph, such as the subgraph C-D from the picture, we can factorize it only as P(C,D) = P(C)P(D|C) and not as P(C,D) = P(C)P(D). All edges carry dependency with them.

Up to this point, we have discussed dependencies in terms of associations between variables. Now let’s move on to causal dependencies. First, let’s define what a cause is:

Variable X is a cause of variable Y if Y can change in response to changes in X.

In a causal graph, every parent is a direct cause of all its children.

This concept is known as the strict causal edges assumption (CEA). The combination of the Local Markov Assumption (LMA) and CEA allows us to consider DAGs as a description of both causal and associational dependencies of a process.

It’s worth mentioning that causal effects follow the direction of edges, while associations can flow both ways. Thus, we can view our graph as undirected in terms of association. When we have A→B as a causal effect, we have A↔B in terms of association. From this, we can deduce that causal dependency is a subset of association.

Now, let’s focus on the flow of causation and association in DAGs. We will examine this using three fundamental building blocks of our DAGs: chain, fork, and immorality.

A chain is represented as follows:

Chain

We observe that the causal effect follows the path X1→X2→X3. Simultaneously, we can see an associational link between X1 and X3. Here, we have an association as X1 causes X2, which in turn causes X3. We can identify a causal path, making it evident why there is an association between them. Another example is a fork:

Fork

In this case, we also observe an associational link between X1 and X3, even though there is no valid causal path between them. X2 is a common cause for both X1 and X3. What we can do with both of these graphs is to block the associational path by conditioning on X2. Graphically, it can be represented as follows.

Blocked association in chain
Blocked association in fork

Next, let’s demonstrate that conditioning on X2 in these graphs blocks the association, or in other words, makes X1 and X3 independent.

For forks, the proof is similar.

Another type of structure we consider is immorality, which has the following form:

Immorality

In this structure, X2 is called a collider. Interestingly, the collider blocks the association between X1 and X3 when it is uncontrolled. This is easy to show:

As soon as we control for X2, we observe an association between X1 and X3. Let’s look at this in practice.

X1 = np.random.normal(0, 1, size=200)
X2 = np.random.normal(0, 1, size=200)
X3 = X1 + X2

Let’s examine the scatterplot of X1 against X2 and calculate their correlation:

The graph clearly shows that X1 is independent of X2. Now, let’s try conditioning on X3. First, let’s cut X3 by quartiles and then examine the correlation between X1 and X2 within these obtained groups:

We can see a rather obvious negative correlation between X1 and X2, even though they are independent by design! As a result, we can expect the same outcome in the case of regression analysis. Let’s create two regression specifications:

Regression X2~X1
Regression X2~X1+X3

As anticipated (given our knowledge of the true model of X3, which we typically don’t have access to), we find that in the first regression, there isn’t any significant dependence of X2 on X1. However, when we control for X3, a significant negative dependence emerges. One might argue that we observe dependencies between all three variables in this scenario, but just imagine if you included such a variable in your machine learning algorithm and then tried to interpret the dependencies between X1 and X2. This could lead you to incorrect conclusions and inaccurate predictions.

D-separation

The last concept we will discuss here is d-separation:

Two sets of nodes X and Y are d-separated by a set of nodes Z if all paths between any nodes in X and any nodes in Y are blocked by Z.

If two sets of nodes are d-separated by a third set of nodes, it gives us a desirable property:

Global Markov Assumption:

Given that P is Markov with respect to G (satisfies the local Markov assumption), if X and Y are d-separated in G conditioned on Z, then X and Y are independent in P conditioned on Z. We can write this as follows:

The final piece of information I want to underline once again is the difference between types of associations. We can say that causation is a subset of association, or in other words, we can divide association into causal association and confounding association. Confounding association is what distinguishes association from causation. Additionally, association is symmetric, while causation is asymmetric. We can say “X is associated with Y” is equal to “Y is associated with X”, but we cannot exchange X and Y in the phrase “X causes Y”. As an example, let’s visualize two types of associations:

Types of association

But how can we isolate causal association? Using our property of d-separation. In the graph above, if we block confounding association by controlling for X, we will obtain the causal association. In other terms, we can run the following regression:

The coefficient near T will give us the causal effect of T on Y.

In conclusion, understanding causal graphs and their properties is essential for identifying and isolating causal relationships in complex systems. By exploring concepts such as directed acyclic graphs (DAGs), probabilistic graphical models (PGMs), d-separation, and the Global Markov Assumption, we can more effectively analyze dependencies and associations between variables. Using these techniques, we can distinguish between causal and confounding associations and identify the causal effects of variables on one another.

As demonstrated, causal graphs can provide valuable insights and offer a robust framework for examining the relationships between variables in a given system. By leveraging these concepts and techniques, researchers and practitioners can make more informed decisions and develop better models for their analyses.

Thank you for reading this blog post. I hope you found it informative and helpful in understanding the intricacies of causal graphs. Please leave your comments, and stay tuned for more exciting posts in this series!

--

--

IvanGor
IvanGor

Written by IvanGor

Senior Data Scientist @Careem

No responses yet