---
title: "Writing Stan programs for use with the loo package"
author: "Aki Vehtari and Jonah Gabry"
date: "`r Sys.Date()`"
output:
  html_vignette:
    toc: yes
params:
  EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Writing Stan programs for use with the loo package}
-->

```{r settings, child="children/SETTINGS-knitr.txt"}
```

```{r, child="children/SEE-ONLINE.txt", eval = if (isTRUE(exists("params"))) !params$EVAL else TRUE}
```

# Introduction

This vignette demonstrates how to write a Stan program that computes and stores
the pointwise log-likelihood required for using the __loo__ package. The other
vignettes included with the package demonstrate additional functionality.

Some sections from this vignette are excerpted from our papers

* Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).

* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling.  *Journal of Machine Learning Research*,
25(72):1-58. [PDF](https://jmlr.org/papers/v25/19-556.html)

which provide important background for understanding the methods implemented in
the package.


# Example: Well water in Bangladesh

This example comes from a survey of residents from a small area in Bangladesh
that was affected by arsenic in drinking water. Respondents with elevated
arsenic levels in their wells were asked if they were interested in getting
water from a neighbor's well, and a series of logistic regressions were fit to
predict this binary response given various information about the households
(Gelman and Hill, 2007). Here we fit a model for the well-switching response
given two predictors: the arsenic level of the water in the resident's home, and
the distance of the house from the nearest safe well.

The sample size in this example is $N=3020$, which is not huge but is large
enough that it is important to have a computational method for LOO that is fast
for each data point. On the plus side, with such a large dataset, the influence
of any given observation is small, and so the computations should be stable.

## Coding the Stan model

Here is the Stan code for fitting the logistic regression model, which 
we save in a file called `logistic.stan`:

```
// Note: some syntax used in this program requires RStan >= 2.26 (or CmdStanR)
// To use an older version of RStan change the line declaring `y` to:
//    int<lower=0,upper=1> y[N];
data {
  int<lower=0> N;                   // number of data points
  int<lower=0> P;                   // number of predictors (including intercept)
  matrix[N,P] X;                    // predictors (including 1s for intercept)
  array[N] int<lower=0,upper=1> y;  // binary outcome
}
parameters {
  vector[P] beta;
}
model {
  beta ~ normal(0, 1);
  y ~ bernoulli_logit(X * beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);
  }
}
```

We have defined the log likelihood as a vector named `log_lik` in the generated 
quantities block so that the individual terms will be saved by Stan. After 
running Stan, `log_lik` can be extracted (using the `extract_log_lik` function
provided in the **loo** package) as an $S \times N$ matrix, where $S$ is the 
number of simulations (posterior draws) and $N$ is the number of data points.

## Fitting the model with RStan

Next we fit the model in Stan using the **rstan** package:

```{r, eval=FALSE}
library("rstan")

# Prepare data 
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
wells$dist100 <- with(wells, dist / 100)
X <- model.matrix(~ dist100 + arsenic, wells)
standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X))

# Fit model
fit_1 <- stan("logistic.stan", data = standata)
print(fit_1, pars = "beta")
```

```
         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  0.00       0 0.08 -0.16 -0.05  0.00  0.05  0.15  1964    1
beta[2] -0.89       0 0.10 -1.09 -0.96 -0.89 -0.82 -0.68  2048    1
beta[3]  0.46       0 0.04  0.38  0.43  0.46  0.49  0.54  2198    1
```

## Computing approximate leave-one-out cross-validation using PSIS-LOO

We can then use the **loo** package to compute the efficient PSIS-LOO
approximation to exact LOO-CV:

```{r, eval=FALSE}
library("loo")

# Extract pointwise log-likelihood
# using merge_chains=FALSE returns an array, which is easier to 
# use with relative_eff()
log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)

# as of loo v2.0.0 we can optionally provide relative effective sample sizes
# when calling loo, which allows for better estimates of the PSIS effective
# sample sizes and Monte Carlo error
r_eff <- relative_eff(exp(log_lik_1), cores = 2) 

# preferably use more than 2 cores (as many cores as possible)
# will use value of 'mc.cores' option if cores is not specified
loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)
```

```
Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1968.5 15.6
p_loo         3.2  0.1
looic      3937.0 31.2
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.3]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

The printed output from the `loo` function shows the estimates
$\widehat{\mbox{elpd}}_{\rm loo}$ (expected log predictive density),
$\widehat{p}_{\rm loo}$ (effective number of parameters), and ${\rm looic} =-2\,
\widehat{\mbox{elpd}}_{\rm loo}$ (the LOO information criterion).

The line at the bottom of the printed output provides information about the
reliability of the LOO approximation (the interpretation of the $k$ parameter is
explained in `help('pareto-k-diagnostic')` and in greater detail
in Vehtari, Simpson, Gelman, Yao, and Gabry (2019)). In this case the message tells us that
all of the estimates for $k$ are fine.

## Comparing models

To compare this model to an alternative model for the same data we can use the 
`loo_compare` function in the **loo** package. First we'll fit a second model to the
well-switching data, using `log(arsenic)` instead of `arsenic` as a predictor:

```{r, eval=FALSE}
standata$X[, "arsenic"] <- log(standata$X[, "arsenic"])
fit_2 <- stan(fit = fit_1, data = standata) 

log_lik_2 <- extract_log_lik(fit_2, merge_chains = FALSE)
r_eff_2 <- relative_eff(exp(log_lik_2))
loo_2 <- loo(log_lik_2, r_eff = r_eff_2, cores = 2)
print(loo_2)
```

```
Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1952.3 16.2
p_loo         3.1  0.1
looic      3904.6 32.4
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.2]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

We can now compare the models on LOO using the `loo_compare` function:

```{r, eval=FALSE}
# Compare
comp <- loo_compare(loo_1, loo_2)
```

This new object, `comp`, contains the estimated difference of expected 
leave-one-out prediction errors between the two models, along with the standard 
error:

```{r, eval=FALSE}
print(comp) # can set simplify=FALSE for more detailed print output
```
```
       elpd_diff se_diff
model2   0.0       0.0  
model1 -16.3       4.4  
```

The first column shows the difference in ELPD relative to the model with 
the largest ELPD. In this case, the difference in `elpd` and its scale relative 
to the approximate standard error of the difference) indicates a 
preference for the second model (`model2`).


# References

Gelman, A., and Hill, J. (2007).  *Data Analysis Using Regression and Multilevel Hierarchical Models.*  Cambridge University Press.

Stan Development Team (2017). _The Stan C++ Library, Version 2.17.0._   https://mc-stan.org/

Stan Development Team (2018) _RStan: the R interface to Stan, Version 2.17.3._   https://mc-stan.org/

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model
evaluation using leave-one-out cross-validation and WAIC. _Statistics and
Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4.
[online](https://link.springer.com/article/10.1007/s11222-016-9696-4), 
[arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling.  *Journal of Machine Learning Research*,
25(72):1-58. [PDF](https://jmlr.org/papers/v25/19-556.html)