1 min read

R-squared and treatment effect sizes

I wrote this for my another purpose, and thought that I might as well make it public.

For a treatment-control contrast, let’s examine the regression model yi=α+βxi+ϵi, with var(ϵi)=σ2 and xi being a 0/1 indicator variable for treatment (1) vs control (0). Assume that the proportion of treated units is p. Now, since the OLS estimate is consistent (randomization), the limit of R2=var(y^)/var(y) can be calculated to be limnR2=β2p(1p)β2p(1p)+σ2. Expressing the treatment effect in standardized form (Glass Δ), we can write Δ=β/σ, and then we have limnR2=Δ2p(1p)Δ2p(1p)+1. If we also assume that the treatment and control arms are equally large (p=1/2, which gives us the largest possible R2 given the treatment effect), we get p(1p)=1/4 and
limnR2=Δ2Δ2+4.

We can draw this for different values of Δ:

We see that even for treatment effect sizes that are quite respectable, the amount of explained variance is quite limited. For a 0.2σ effect, R20.01.