do-calculus: Nonlinear Function
Created: August 02, 2021
Modified: August 06, 2021

do-calculus

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • References:
  • A causal graph contains variables connected by directed edges, indicating causal effects. It is inherently a model, indicating assumptions about the existence and direction of causal relationships. If we also assume specific conditional distributions (and/or deterministic functions) for the edges, then we have a structural equation model.
    • A structural equation model is usually specified in terms of functions that take random inputs, e.g., c=f(a,b,εc)c = f(a, b, \varepsilon_c). This allows us to talk about counterfactual: questions of the form 'given that (X, Y) happened, what would have happened if (X', Y) had happened instead?'
  • Interpreted as a directed graphical model, the causal graph defines a joint distribution on latent and observed variables (say these are w,x,y,zw, x, y, z); in particular, it defines a marginal joint distribution on the observed variables (x,y,zx, y, z, say). This is the distribution from which we derive conditionals like p(yx)p(y | x).
  • We model an intervention on the causal graph by setting the value of the corresponding node and deleting all incoming edges. This gives a new joint distribution, and a new marginal joint distribution on observables. Associations derived from this joint distribution have the form p(ydo(x=k))p(y | do(x=k)), or e.g. we might have quantities like p(yw,do(x=k))p(y | w, do(x=k)).
    • In general, given a graph GG, let GXG_{\overline{X}} denote the mutilated graph in which we've deleted all incoming edges to X.
    • Also let GXG_{\underline{X}} denote the graph in which we've deleted all edges out of X. Note that this is a valid operation on the graph itself, even though we can't in general derive a joint distribution corresponding to the new graph.
  • Note that the intervention joint distribution might have different conditional independence relationships than the original joint.
  • Of course, we don't know the intervention joint. Our goal is to connect the quantity we care about under that joint, p(ydo(x=k)p(y | do(x=k), with some quantity that we can estimate from observations of the original joint. For example, if our causal assumption is that XYX \to Y (and there are no other variables), then p(ydo(x=k))=p(yx=k)p(y | do(x=k)) = p(y | x=k). On the other hand, if we assume YXY\to X, then we have p(ydo(x=k))=p(y)p(y | do(x=k)) = p(y) because the intervention breaks the dependence.
  • In general, do-calculus is the set of rules by which we try to derive observable quantities (like p(y)p(y) or p(yx=k)p(y | x=k)) from quantities that include an intervention (like p(ydo(x=k))p(y | do(x=k)). The three rules are:
    1. Ignoring observations: P(ydo(x),w,z)=p(ydo(x),w)P(y | do(x), w, z) = p(y | do(x), w) if yzx,wy \perp z | x, w in the mutilated graph GxG_{\overline{x}}.
      • This just says that the standard rules of conditioning apply in the mutilated graph.
    2. Action / observation exchange (aka the backdoor criterion): P(ydo(x),do(z),w)=P(ydo(x),z,w)P(y | do(x), do(z), w) = P(y | do(x), z, w) if yzx,wy \perp z | x, w in the graph Gx,zG_{\overline{x}, \underline{z}}, where we've removed the edges into xx and the edges out of zz.
      • In words: it doesn't matter whether you intervene or condition on zz, if the only dependence between y and z is via a causal chain(s) from z to y. (i.e., if there is no 'back door' latent variable that affects both y and z).
    3. Ignoring actions / interventions: P(ydo(x),do(z),w)=P(ydo(x),w)P(y | do(x), do(z), w) = P(y | do(x), w) if yzx,wy \perp z | x, w in the graph Gx,z(w)G_{\overline{x}, \overline{z(w)}}, where z(w)z(w) denotes the set of nodes in zz that are not ancestors of ww.
      • In words: intervening on zz has no effect on yy, if ww is known/fixed and the only dependence between y and z was through a path that is blocked by ww.
  • These are not known to be complete for deriving all causal effects. But I don't think they're known not to be, either.
  • Example: smoking and lung cancer. Let X indicate smoking, Y indicate tar buildup in the lungs, and Z indicate lung cancer. Suppose that these are observed, but we also hypothesize a hidden confounder U (perhaps genetic or societal factors): do calculus 0
    • We are interested in the effect p(ydo(x))p(y | do(x)). In the mutilated graph GXG_{\overline{X}} on the right, we can write this conditional as p(ydo(x))=z,up(yz,u,do(x))p(Z=zdo(x))p(u)p(y | do(x)) = \sum_{z, u} p(y| z, u, do(x)) p(Z=z | do(x)) p(u). Note that we need to keep the do(x)do(x) conditioning, even where we could otherwise drop conditioning on xx, in order to indicate that we are still in the mutilated graph.
    • By the backdoor criterion, we have p(Z=zdo(x))=p(Z=zx)p(Z=z | do(x)) = p(Z=z | x).
    • By rule 3, we also have p(yz,u,do(x))=p(yz,u)p(y | z, u, do(x)) = p(y | z, u).
    • In order to get an estimate in terms of real-world quantities, we'd need to sum out the uu. To do this, we need: up(yz,u)p(u)=u,xp(yz,u,x)p(u,x)p(zx)/p(zx)=u,xp(yz,u,x)p(u,x,z)/p(zx)=u,xp(yz,u,x)p(u,xz)p(z)/p(zx)=xp(yz,x)p(xz)p(z)p(x)/p(z,x)=xp(yz,x)p(x)\sum_u p(y | z, u) p(u) = \sum_{u, x} p(y | z, u, x) p(u, x) p(z | x) / p(z | x) = \sum_{u, x} p(y | z, u, x) p(u, x, z) / p(z | x) = \sum_{u, x} p(y | z, u, x) p(u, x | z) p(z) / p(z | x) = \sum_{x} p(y | z, x) p(x | z) p(z) p(x)/ p(z , x) = \sum_{x} p(y | z, x) p(x). (this is not the shortest derivation, but I did make it work).
    • Therefore we have p(ydo(x))=zp(zx)xp(yz,x)p(x)p(y | do(x)) = \sum_z p(z | x) \sum_{x'} p(y| z, x') p(x'). This formula turns out to be known as the front-door adjustment
    • TODO finish this: sources Introduction to Causal Calculus (ubc.ca) Lies, Damned Lies, and Causal Inference | Keyon Vafa