R-squared and treatment effect sizes - Erik Ø. Sørensen

I wrote this for my another purpose, and thought that I might as well make it public.

For a treatment-control contrast, let’s examine the regression model $y_{i} = α + β x_{i} + ϵ_{i},$ with $v a r (ϵ_{i}) = σ^{2}$ and $x_{i}$ being a 0/1 indicator variable for treatment (1) vs control (0). Assume that the proportion of treated units is $p$ . Now, since the OLS estimate is consistent (randomization), the limit of $R^{2} = v a r (\hat{y}) / v a r (y)$ can be calculated to be $lim_{n \to \infty} R^{2} = \frac{β^{2} p (1 - p)}{β^{2} p (1 - p) + σ^{2}} .$ Expressing the treatment effect in standardized form (Glass $Δ$ ), we can write $Δ = β / σ$ , and then we have $lim_{n \to \infty} R^{2} = \frac{Δ^{2} p (1 - p)}{Δ^{2} p (1 - p) + 1} .$ If we also assume that the treatment and control arms are equally large ( $p = 1 / 2$ , which gives us the largest possible $R^{2}$ given the treatment effect), we get $p (1 - p) = 1 / 4$ and
$lim_{n \to \infty} R^{2} = \frac{Δ^{2}}{Δ^{2} + 4} .$

We can draw this for different values of $Δ$ :

We see that even for treatment effect sizes that are quite respectable, the amount of explained variance is quite limited. For a $0.2 σ$ effect, $R^{2} \approx 0.01$ .