Regression models for the relative risk

\(\newcommand{\pr}{\mathbb{P}}\newcommand{\E}{\mathbb{E}}\) Relative risks (and risk differences) are collapsible

\begin{align*} a \end{align*}

and generally considered easier to interpret than odds-ratios. In a recent publication Richardson et al (JASA, 2017) proposed a new regression model for a binary exposure which solves the computational problems that are associated with using for example binomial regression with a log-link function (or identify link for the risk difference) to obtain such parameter estimates.

Let \(Y\) be the binary response, \(A\) binary exposure, and \(V\) a vector of covariates, then the target parameter is

\begin{align*} &\mathrm{RR}(v) = \frac{\pr(Y=1\mid A=1, V=v)}{\pr(Y=1\mid A=0, V=v)}. \end{align*}

Let \(p_a(V) = \pr(Y \mid A=a, V), a\in{0,1}\), then the idea is to posit a linear model for \[ \theta(v) = \log \big(RR(v)\big) \] and a nuisance model for the odds-product \[ \phi(v) = \log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) \] noting that these two parameters are variation independent. Similarly, a model can be constructed for the risk-difference on the following scale \[\theta(v) = \mathrm{arctanh} \big(RD(v)\big).\]

p0 <- seq(0,1,length.out=100)
p1 <- function(p0,op) 1/(1+(op*(1-p0)/p0)^-1)
plot(0, type="n", xlim=c(0,1), ylim=c(0,1),
   xlab="P(Y=1|A=0)", ylab="P(Y=1|A=1)", main="Constant odds product")
for (op in exp(seq(-6,6,by=.25))) lines(p0,p1(p0,op), col="lightblue")
logrr <- 0.25; abline(a=0, b=exp(logrr), col=1, lty=2) # Add line \(\log(\mathrm{RR})=0.25\)

The model can be specified in lava using the following syntax

library("lava")
m <- lvm(a ~ x+z,
	 logrr ~ 1,
	 nuisance ~ x + z)
m <- binomial.rr(m, y ~ a | logrr | nuisance)
## m <- binomial.rr(m, response="y",
## 		    exposure="a",
## 		    target.model="logrr",
## 		    nuisance.model="nuisance")
m
Error in binomial.rrw(x, response, exposure, target.model, nuisance.model,  :
  argument "exposure" is missing, with no default
Latent Variable Model

  a ~ x+z          gaussian
  logrr ~ 1        gaussian
  nuisance ~ x+z   gaussian

Exogenous variables:
  x        gaussian
  z        gaussian

We can then simulate from the model

d <- sim(m, n=5e4, seed=1, p=c(logrr=1))

A double-robust estimator can be using the target library

library(target)
riskreg(y ~ a | 1 | x+z | x+z, data=d, type="rr")

Relative risk model
  Response:  y
  Exposure:  a

             Estimate Std.Err     2.5%   97.5% P-value
log(RR):
 (Intercept) 1.004669 0.01327  0.97867 1.03067  0.0000
log(OP):
 (Intercept) 0.003391 0.02197 -0.03967 0.04645  0.8773
 x           1.003925 0.02343  0.95801 1.04984  0.0000
 z           1.007712 0.02348  0.96169 1.05373  0.0000
logit(Pr):
 (Intercept) 0.001136 0.01094 -0.02030 0.02257  0.9173
 x           1.004365 0.01366  0.97758 1.03115  0.0000
 z           1.004225 0.01372  0.97734 1.03111  0.0000
Avatar
Klaus Kähler Holst
Principal Scientist

My research interests include distributed robotics, mobile computing and programmable matter.