Regression models for the relative risk

Klaus Kähler Holst

Aug 17, 2019 4 min read

$ \newcommand{\pr}{\mathbb{P}}\newcommand{\E}{\mathbb{E}} $ Relative risks (and risk differences) are collapsible and generally considered easier to interpret than odds-ratios. In a recent publication Richardson et al (JASA, 2017) proposed a new regression model for a binary exposure which solves the computational problems that are associated with using for example binomial regression with a log-link function (or identify link for the risk difference) to obtain such parameter estimates.

Let $Y$ be the binary response, $A$ binary exposure, and $V$ a vector of covariates, then the target parameter is

\begin{align*} &\mathrm{RR}(v) = \frac{\pr(Y=1\mid A=1, V=v)}{\pr(Y=1\mid A=0, V=v)}. \end{align*}

Let $p_a(V) = \pr(Y \mid A=a, V), a\in\{0,1\}$, the idea is then to posit a linear model for \[ \theta(v) = \log \big(RR(v)\big) \] and a nuisance model for the odds-product \[ \phi(v) = \log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) \] noting that these two parameters are variation independent which can be from the below L’Abbé plot. Similarly, a model can be constructed for the risk-difference on the following scale \[\theta(v) = \mathrm{arctanh} \big(RD(v)\big).\]

p0 <- seq(0,1,length.out=100)
p1 <- function(p0,op) 1/(1+(op*(1-p0)/p0)^-1)
plot(0, type="n", xlim=c(0,1), ylim=c(0,1),
   xlab="P(Y=1|A=0)", ylab="P(Y=1|A=1)", main="Constant odds product")
for (op in exp(seq(-6,6,by=.25))) lines(p0,p1(p0,op), col="lightblue")
logrr <- 0.25; abline(a=0, b=exp(logrr), col=1, lty=2) # Add line \(\log(\mathrm{RR})=0.25\)

The model can be specified in R with lava using the following syntax

library("lava")
m <- lvm(a ~ x+z,
       logrr ~ 1,
       nuisance ~ x + z)
m <- binomial.rr(m, y ~ a | logrr | nuisance)
## Alternative syntax:
## binomial.rr(m, response="y", exposure="a", target.model="logrr", nuisance.model="nuisance")
m


Binomial regression (exposure | relative-risk | odds-product) :

   y ~ a | logrr | nuisance

Latent Variable Model

  a ~ x+z          binomial(logit)
  logrr ~ 1        deterministic
  nuisance ~ x+z   deterministic

Exogenous variables:
  x        gaussian
  z        gaussian

We can then simulate from the model

d <- lava::sim(m, n=5e4, seed=1, p=c(logrr=1))
head(d)


  a          x          z logrr    nuisance y
1 1 -0.6264538  0.5258908     1 -0.10056299 1
2 0  0.1836433 -0.4875444     1 -0.30390103 0
3 0 -0.8356286  1.1382508     1  0.30262217 1
4 1  1.5952808  1.2151344     1  2.81041518 1
5 0  0.3295078 -0.4248307     1 -0.09532293 0
6 0 -0.8204684 -1.4508403     1 -2.27130867 0

A double-robust estimator can be constructed by considering both a model for the exposure as a function of the auxillary covariates as well as a model for the nuisance parameter. This estimator (and other double-robust estimators) can be applied using a newly developed library targeted in R (and python)

library(targeted)
riskreg(y ~ a | 1 | x+z | x+z, data=d, type="rr")

riskreg(formula = y ~ a | 1 | x + z | x + z, data = d, type = "rr")

            Estimate Std.Err   2.5% 97.5% P-value
(Intercept)    1.005 0.01327 0.9787 1.031       0

It is also possible to examine the interaction effect between the exposure and a covariate:

riskreg(y ~ a | x | x+z | x+z, data=d, type="rr")

riskreg(formula = y ~ a | x | x
z | x
z, data = d, type = "rr")

            Estimate Std.Err     2.5%   97.5% P-value
(Intercept) 1.000487 0.01323  0.97456 1.02642  0.0000
x           0.001409 0.01527 -0.02853 0.03134  0.9265

The double-robustness is here illustrated with a misspecified nuissance model (odds-product)

summary(riskreg(y ~ a | 1 | 1 | x+z, data=d, type="rr"))

riskreg(formula = y ~ a | 1 | 1 | x+z, data = d, type = "rr")

Relative risk model
  Response:  y
  Exposure:  a

             Estimate Std.Err     2.5%   97.5%   P-value
log(RR):
 (Intercept) 1.005273 0.01439  0.97707 1.03348 0.000e+00
log(OP):
 (Intercept) 0.142626 0.02187  0.09976 0.18549 6.978e-11
logit(Pr):
 (Intercept) 0.001136 0.01094 -0.02030 0.02257 9.173e-01
 x           1.004365 0.01366  0.97758 1.03115 0.000e+00
 z           1.004225 0.01372  0.97734 1.03111 0.000e+00

or a misspecified model for the exposure

summary(riskreg(y ~ a | 1 | x+z | 1, data=d, type="rr"))

riskreg(formula = y ~ a | 1 | x+z | 1, data = d, type = "rr")

Relative risk model
  Response:  y
  Exposure:  a

              Estimate  Std.Err     2.5%   97.5% P-value
log(RR):
 (Intercept)  1.004493 0.013201  0.97862 1.03037  0.0000
log(OP):
 (Intercept)  0.003391 0.021968 -0.03967 0.04645  0.8773
 x            1.003925 0.023428  0.95801 1.04984  0.0000
 z            1.007712 0.023480  0.96169 1.05373  0.0000
logit(Pr):
 (Intercept) -0.002000 0.008944 -0.01953 0.01553  0.8231