Regression models for the relative risk

\(\newcommand{\pr}{\mathbb{P}}\newcommand{\E}{\mathbb{E}}\) Relative risks (and risk differences) are collapsible and generally considered easier to interpret than odds-ratios. In a recent publication Richardson et al (JASA, 2017) proposed a new regression model for a binary exposure which solves the computational problems that are associated with using for example binomial regression with a log-link function (or identify link for the risk difference) to obtain such parameter estimates.

Let \(Y\) be the binary response, \(A\) binary exposure, and \(V\) a vector of covariates, then the target parameter is

\begin{align*} &\mathrm{RR}(v) = \frac{\pr(Y=1\mid A=1, V=v)}{\pr(Y=1\mid A=0, V=v)}. \end{align*}

Let \(p_a(V) = \pr(Y \mid A=a, V), a\in\{0,1\}\), then the idea is to posit a linear model for \[ \theta(v) = \log \big(RR(v)\big) \] and a nuisance model for the odds-product \[ \phi(v) = \log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) \] noting that these two parameters are variation independent. Similarly, a model can be constructed for the risk-difference on the following scale \[\theta(v) = \mathrm{arctanh} \big(RD(v)\big).\]

p0 <- seq(0,1,length.out=100)
p1 <- function(p0,op) 1/(1+(op*(1-p0)/p0)^-1)
plot(0, type="n", xlim=c(0,1), ylim=c(0,1),
   xlab="P(Y=1|A=0)", ylab="P(Y=1|A=1)", main="Constant odds product")
for (op in exp(seq(-6,6,by=.25))) lines(p0,p1(p0,op), col="lightblue")
logrr <- 0.25; abline(a=0, b=exp(logrr), col=1, lty=2) # Add line \(\log(\mathrm{RR})=0.25\)

The model can be specified in lava using the following syntax

library("lava")
m <- lvm(a ~ x+z,
	 logrr ~ 1,
	 nuisance ~ x + z)
m <- binomial.rr(m, y ~ a | logrr | nuisance)
## m <- binomial.rr(m, response="y",
## 		    exposure="a",
## 		    target.model="logrr",
## 		    nuisance.model="nuisance")
m
Error in binomial.rrw(x, response, exposure, target.model, nuisance.model,  :
  argument "exposure" is missing, with no default
Latent Variable Model

  a ~ x+z          gaussian
  logrr ~ 1        gaussian
  nuisance ~ x+z   gaussian

Exogenous variables:
  x        gaussian
  z        gaussian

We can then simulate from the model

d <- sim(m, n=5e4, seed=1, p=c(logrr=1))

A double-robust estimator can be using the target library

library(target)
riskreg(y ~ a | 1 | x+z | x+z, data=d, type="rr")

Relative risk model
  Response:  y
  Exposure:  a

             Estimate Std.Err     2.5%   97.5% P-value
log(RR):
 (Intercept) 1.004669 0.01327  0.97867 1.03067  0.0000
log(OP):
 (Intercept) 0.003391 0.02197 -0.03967 0.04645  0.8773
 x           1.003925 0.02343  0.95801 1.04984  0.0000
 z           1.007712 0.02348  0.96169 1.05373  0.0000
logit(Pr):
 (Intercept) 0.001136 0.01094 -0.02030 0.02257  0.9173
 x           1.004365 0.01366  0.97758 1.03115  0.0000
 z           1.004225 0.01372  0.97734 1.03111  0.0000
Avatar
Klaus Kähler Holst
Principal Scientist

My primary research interests are general aspects of computational statistics, measurement error models, and causal inference with applications in logistics.

Next