Example: A flexible parametric proportional hazards model
We will use the German Breast Cancer Study Group dataset (see
?rstanarm-datasets
for details and references). In brief,
the data consist of N = 686
patients with primary node positive breast cancer recruited between
1984-1989. The primary response is time to recurrence or death. Median
follow-up time was 1084 days. Overall, there were 299 (44%) events and
the remaining 387 (56%) individuals were right censored. We concern our
analysis here with a 3-category baseline covariate for cancer prognosis
(good/medium/poor).
First, let us load the data and fit the proportional hazards
model
mod1 <- stan_surv(formula = Surv(recyrs, status) ~ group,
data = bcancer,
chains = CHAINS,
cores = CORES,
seed = SEED,
iter = ITER)
The model here is estimated using the default cubic M-splines (with 5
degrees of freedom) for modelling the baseline hazard. Since there are
no time-varying effects in the model (i.e. we did not wrap any
covariates in the tve()
function) there is a closed form
expression for the cumulative hazard and survival function and so the
model is relatively fast to fit. Specifically, the model takes ~3.5 sec
for each MCMC chain based on the default 2000 (1000 warm up, 1000
sampling) MCMC iterations.
We can easily obtain the estimated hazard ratios for the 3-catgeory
group covariate using the generic print
method for
stansurv
objects, as follows
We see from this output we see that individuals in the groups with
Poor
or Medium
prognosis have much higher
rates of death relative to the group with Good
prognosis
(as we might expect!). The hazard of death in the Poor
prognosis group is approximately 5.0-fold higher than the hazard of
death in the Good
prognosis group. Similarly, the hazard of
death in the Medium
prognosis group is approximately
2.3-fold higher than the hazard of death in the Good
prognosis group.
It may also be of interest to compare the different types of the
baseline hazard we could potentially use. Here, we will fit a series of
models, each with a different baseline hazard specification
mod1_exp <- update(mod1, basehaz = "exp")
mod1_weibull <- update(mod1, basehaz = "weibull")
mod1_gompertz <- update(mod1, basehaz = "gompertz")
mod1_bspline <- update(mod1, basehaz = "bs")
mod1_mspline1 <- update(mod1, basehaz = "ms")
mod1_mspline2 <- update(mod1, basehaz = "ms", basehaz_ops = list(df = 10))
and then plot the baseline hazards with 95% posterior uncertainty
limits using the generic plot
method for
stansurv
objects (note that the default plot
for stansurv
objects is the estimated baseline hazard). We
will write a little helper function to adjust the y-axis limits, add a
title, and centre the title, on each plot, as follows
library(ggplot2)
plotfun <- function(model, title) {
plot(model, plotfun = "basehaz") + # plot baseline hazard
coord_cartesian(ylim = c(0,0.4)) + # adjust y-axis limits
labs(title = title) + # add plot title
theme(plot.title = element_text(hjust = 0.5)) # centre plot title
}
p_exp <- plotfun(mod1_exp, title = "Exponential")
p_weibull <- plotfun(mod1_weibull, title = "Weibull")
p_gompertz <- plotfun(mod1_gompertz, title = "Gompertz")
p_bspline <- plotfun(mod1_bspline, title = "B-splines with df = 5")
p_mspline1 <- plotfun(mod1_mspline1, title = "M-splines with df = 5")
p_mspline2 <- plotfun(mod1_mspline2, title = "M-splines with df = 10")
bayesplot::bayesplot_grid(p_exp,
p_weibull,
p_gompertz,
p_bspline,
p_mspline1,
p_mspline2,
grid_args = list(ncol = 3))
We can also compare the fit of these models using the
loo
method for stansurv
objects
loo_compare(loo(mod1_exp),
loo(mod1_weibull),
loo(mod1_gompertz),
loo(mod1_bspline),
loo(mod1_mspline1),
loo(mod1_mspline2))
where we see that models with a flexible parametric (spline-based)
baseline hazard fit the data best followed by the standard parametric
(Weibull, Gompertz, exponential) models. Roughly speaking, the B-spline
and M-spline models seem to fit the data equally well since the
differences in elpd
or looic
between the
models are very small relative to their standard errors. Moreover,
increasing the degrees of freedom for the M-splines from 5 to 10 doesn’t
seem to improve the fit (that is, the default degrees of freedom
df = 5
seems to provide sufficient flexibility to model the
baseline hazard).
After fitting the survival model, we often want to estimate the
predicted survival function for individual’s with different covariate
patterns. Here, let us estimate the predicted survival function between
0 and 5 years for an individual in each of the prognostic groups. To do
this, we can use the posterior_survfit
method for
stansurv
objects, and it’s associated plot
method. First let us construct the prediction (covariate) data
nd <- data.frame(group = c("Good", "Medium", "Poor"))
head(nd)
and then we will generate the posterior predictions
ps <- posterior_survfit(mod1, newdata = nd, times = 0, extrapolate = TRUE,
control = list(edist = 5))
head(ps)
Here we note that the id
variable in the data frame of
posterior predictions identifies which row of newdata
the
predictions correspond to. For demonstration purposes we have also shown
a couple of other arguments in the posterior_survfit
call,
namely
- the
times = 0
argument says that we want to predict at
time = 0 (i.e. baseline) for each individual in the newdata
(this is the default anyway)
- the
extrapolate = TRUE
argument says that we want to
extrapolate forward from time 0 (this is also the default)
- the
control = list(edist = 5)
identifies the control of
the extrapolation; this is saying extrapolate the survival function
forward from time 0 for a distance of 5 time units (the default would
have been to extrapolate as far as the largest event or censoring time
in the estimation dataset, which is 7.28 years in the
brcancer
data).
Let us now plot the survival predictions. We will relabel the
id
variable with meaningful labels identifying the
covariate profile of each new individual in our prediction data
panel_labels <- c('1' = "Good", '2' = "Medium", '3' = "Poor")
plot(ps) +
ggplot2::facet_wrap(~ id, labeller = ggplot2::labeller(id = panel_labels))
We can see from the plot that predicted survival is worst for
patients with a Poor
diagnosis, and best for patients with
a Good
diagnosis, as we would expect based on our previous
model estimates.
Alternatively, if we wanted to obtain and plot the predicted
hazard function for each individual in our new data (instead of
their survival function), then we just need to specify
type = "haz"
in our posterior_survfit
call
(the default is type = "surv"
), as follows
ph <- posterior_survfit(mod1, newdata = nd, type = "haz")
plot(ph) +
ggplot2::facet_wrap(~ id, labeller = ggplot2::labeller(id = panel_labels))
We can quite clearly see in the plot the assumption of proportional
hazards. We can also see that the hazard is highest in the
Poor
prognosis group (i.e. worst survival) and the hazard
is lowest in the Good
prognosis group (i.e. best survival).
This corresponds to what we saw in the plot of the survival functions
previously.
Example: Non-proportional hazards modelled using B-splines
To demonstrate the implementation of time-varying effects in
stan_surv
we will use a simulated dataset, generated using
the simsurv package (Brilleman, 2018).
We will simulate a dataset with N = 200 individuals with event times
generated under the following Weibull hazard function
with scale parameter λ = 0.1,
shape parameter γ = 1.5,
binary baseline covariate Xi ∼ Bern(0.5),
and time-varying hazard ratio β(t) = −0.5 + 0.2t.
We will enforce administrative censoring at 5 years if an individual’s
simulated event time is >5 years.
# load package
library(simsurv)
# set seed for reproducibility
set.seed(999111)
# simulate covariate data
covs <- data.frame(id = 1:200,
trt = rbinom(200, 1L, 0.5))
# simulate event times
dat <- simsurv(lambdas = 0.1,
gammas = 1.5,
betas = c(trt = -0.5),
tde = c(trt = 0.2),
x = covs,
maxt = 5)
# merge covariate data and event times
dat <- merge(dat, covs)
# examine first few rows of data
head(dat)
Now that we have our simulated dataset, let us fit a model with
time-varying hazard ratio for trt
mod2 <- stan_surv(formula = Surv(eventtime, status) ~ tve(trt),
data = dat,
chains = CHAINS,
cores = CORES,
seed = SEED,
iter = ITER)
The tve
function is used in the model formula to state
that we want a time-varying effect (i.e. a time-varying coefficient) to
be estimated for the variable trt
. By default, a cubic
B-spline basis with 3 degrees of freedom (i.e. two boundary knots placed
at the limits of the range of event times, but no internal knots) is
used for modelling the time-varying log hazard ratio. If we wanted to
change the degree, knot locations, or degrees of freedom for the
B-spline function we can specify additional arguments to the
tve
function.
For example, to model the time-varying log hazard ratio using
quadratic B-splines with 4 degrees of freedom (i.e. two boundary knots
placed at the limits of the range of event times, as well as two
internal knots placed – by default – at the 33.3rd and 66.6th
percentiles of the distribution of uncensored event times) we could
specify the model formula as
Surv(eventtime, status) ~ tve(trt, df = 4, degree = 2)
Let us now plot the estimated time-varying hazard ratio from the
fitted model. We can do this using the generic plot
method
for stansurv
objects, for which we can specify the
plotfun = "tve"
argument. (Note that in this case, there is
only one covariate in the model with a time-varying effect, but if there
were others, we could specify which covariate(s) we want to plot the
time-varying effect for by specifying the pars
argument to
the plot
call).
plot(mod2, plotfun = "tve")
From the plot, we can see how the hazard ratio (i.e. the effect of
treatment on the hazard of the event) changes as a function of time. The
treatment appears to be protective during the first few years following
baseline (i.e. HR < 1), and then the treatment appears to become
harmful after about 4 years post-baseline. Thankfully, this is a
reflection of the model we simulated under!
The plot shows a large amount of uncertainty around the estimated
time-varying hazard ratio. This is to be expected, since we only
simulated a dataset of 200 individuals of which only around 70%
experienced the event before being censored at 5 years. So, there is
very little data (i.e. very few events) with which to reliably estimate
the time-varying hazard ratio. We can also see this reflected in the
differences between our data generating model and the estimates from our
fitted model. In our data generating model, the time-varying hazard
ratio equals 1 (i.e. the log hazard ratio equals 0) at 2.5 years, but in
our fitted model the median estimate for our time-varying hazard ratio
equals 1 at around ~3 years. This is a reflection of the large amount of
sampling error, due to our simulated dataset being so small.
Example: Non-proportional hazards modelled using a piecewise
constant function
In the previous example we showed how non-proportional hazards can be
modelled by using a smooth B-spline function for the time-varying log
hazard ratio. This is the default approach when the tve
function is used to estimate a time-varying effect for a covariate in
the model formula. However, another approach for modelling a
time-varying log hazard ratio is to use a piecewise constant function.
If we want to use a piecewise constant for the time-varying log hazard
ratio (instead of the smooth B-spline function) then we just have to
specify the type
argument to the tve
function.
We will again simulate some survival data using the
simsurv package to show how a piecewise constant hazard
ratio can be estimated using stan_surv
.
Similar to the previous example, we will simulate a dataset with
N = 500 individuals with event
times generated under a Weibull hazard function with scale parameter
λ = 0.1, shape parameter γ = 1.5, and binary baseline
covariate Xi ∼ Bern(0.5).
However, in this example our time-varying hazard ratio will be defined
as β(t) = −0.5 + 0.7 × I(t > 2.5)
where I(X) is the
indicator function taking the value 1 if X is true and 0 otherwise. This
corresponds to a piecewise constant log hazard ratio with just two
“pieces” or time intervals. The first time interval is [0, 2.5] during which the true hazard ratio
is exp (−0.5) = 0.61. The second time
interval is (2.5, ∞] during which the
true log hazard ratio is exp (−0.5 + 0.7) = 1.22. Our example uses
only two time intervals for simplicity, but in general we could easily
have considered more (although it would have required couple of
additional lines of code to simulate the data). We will again enforce
administrative censoring at 5 years if an individual’s simulated event
time is >5 years.
# load package
library(simsurv)
# set seed for reproducibility
set.seed(888222)
# simulate covariate data
covs <- data.frame(id = 1:500,
trt = rbinom(500, 1L, 0.5))
# simulate event times
dat <- simsurv(lambdas = 0.1,
gammas = 1.5,
betas = c(trt = -0.5),
tde = c(trt = 0.7),
tdefun = function(t) (t > 2.5),
x = covs,
maxt = 5)
# merge covariate data and event times
dat <- merge(dat, covs)
# examine first few rows of data
head(dat)
We now estimate a model with a piecewise constant time-varying effect
for the covariate trt
as
mod3 <- stan_surv(formula = Surv(eventtime, status) ~
tve(trt, degree = 0, knots = 2.5),
data = dat,
chains = CHAINS,
cores = CORES,
seed = SEED,
iter = ITER)
This time we specify some additional arguments to the
tve
function, so that our time-varying effect corresponds
to the true data generating model used to simulate our event times.
Specifically, we specify degree = 0
to say that we want the
time-varying effect (i.e. the time-varying log hazard ratio) to be
estimated using a piecewise constant function and
knots = 2.5
says that we only want one internal knot placed
at the time t = 2.5.
We can again use the generic plot
function with argument
plotfun = "tve"
to examine our estimated hazard ratio for
treatment
plot(mod3, plotfun = "tve")
Here we see that the estimated hazard ratio reasonably reflects our
true data generating model (i.e. a hazard ratio of ≈ 0.6 during the first time interval and a
hazard ratio of ≈ 1.2 during the
second time interval) although there is a slight discrepancy due to the
sampling variation in the simulated event times.
Example: Hierarchical survival models
To demonstrate the estimation of a hierarchical model for survival
data in stan_surv
we will use the frail
dataset (see help("rstanarm-datasets")
for a description).
The frail
datasets contains simulated event times for 200
patients clustered within 20 hospital sites (10 patients per hospital
site). The event times are simulated from a parametric proportional
hazards model under the following assumptions: (i) a constant
(i.e. exponential) baseline hazard rate of 0.1; (ii) a fixed treatment
effect with log hazard ratio of 0.3; and (iii) a site-specific random
intercept (specified on the log hazard scale) drawn from a N(0,1)
distribution.
Let’s look at the first few rows of the data:
To fit a hierarchical model for clustered survival data we use a
formula syntax similar to what is used in the lme4 R
package (Bates et al. (2015)). Let’s consider the following model (which
aligns with the model used to generate the simulated data):
mod_randint <- stan_surv(
formula = Surv(eventtime, status) ~ trt + (1 | site),
data = frail,
basehaz = "exp",
chains = CHAINS,
cores = CORES,
seed = SEED,
iter = ITER)
The model contains a baseline covariate for treatment (0 or 1) as
well as a site-specific intercept to allow for correlation in the event
times for patients from the same site. We’ve call the model object
mod_randint
to denote the fact that it includes a
site-specific (random) intercept. Let’s examine the parameter estimates
from the model:
print(mod_randint, digits = 2)
We see that the estimated log hazard ratio for treatment (β̂(trt) = 0.46) is a bit
larger than the “true” log hazard ratio used in the data generating
model (β(trt) = 0.3). The
estimated baseline hazard rate is exp (−2.3716) = 0.093, which is pretty close
to the baseline hazard rate used in the data generating model (0.1). Of course, the differences between the
estimated parameters and the true parameters from the data generating
model are attributable to sampling noise.
If this were a real analysis, we might wonder whether the
site-specific estimates are necessary! Well, we can assess that by
fitting an alternative model that does not include the
site-specific intercepts and compare it to the model we just estimated.
We will compare it using the loo
function. We first need to
fit the model without the site-specific intercept. To do this, we will
just use the generic update
method for
stansurv
objects, since all we are changing is the model
formula:
mod_fixed <- update(mod_randint, formula. = Surv(eventtime, status) ~ trt)
Let’s calculate the loo
for both these models and
compare them:
loo_fixed <- loo(mod_fixed)
loo_randint <- loo(mod_randint)
loo_compare(loo_fixed, loo_randint)
We see strong evidence in favour of the model with the site-specific
intercepts!
But let’s not quite finish there. What about if we want to generalise
the random effects structure further. For instance, is the site-specific
intercept enough? Perhaps we should consider estimating both a
site-specific intercept and a site-specific treatment effect. We have
minimal data to estimate such a model (recall that there is only 20
sites and 10 patients per site) but for the sake of demonstration we
will forge on nonetheless. Let’s fit a model with both a site-specific
intercept and a site-specific coefficient for the covariate
trt
(i.e. treatment):
mod_randtrt <- update(mod_randint, formula. =
Surv(eventtime, status) ~ trt + (trt | site))
print(mod_randtrt, digits = 2)
We see that we have an estimated standard deviation for the
site-specific intercepts and the site-specific coefficients for
trt
, as well as the estimated correlation between those
site-specific parameters.
Let’s now compare all three of these models based on
loo
:
loo_randtrt <- loo(mod_randtrt)
loo_compare(loo_fixed, loo_randint, loo_randtrt)
It appears that the model with just a site-specific intercept is the
best fitting model. It is much better than the model without a
site-specific intercept, and slightly better than the model with both a
site-specific intercept and a site-specific treatment effect. In other
words, including a site-specific intercept appears important, but
including a site-specific treatment effect is not. This conclusion is
reassuring, because it aligns with the data generating model we used to
simulate the data!