--- title: "Writing Stan programs for use with the loo package" author: "Aki Vehtari and Jonah Gabry" date: "`r Sys.Date()`" output: html_vignette: toc: yes params: EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true") --- ```{r settings, child="children/SETTINGS-knitr.txt"} ``` ```{r, child="children/SEE-ONLINE.txt", eval = if (isTRUE(exists("params"))) !params$EVAL else TRUE} ``` # Introduction This vignette demonstrates how to write a Stan program that computes and stores the pointwise log-likelihood required for using the __loo__ package. The other vignettes included with the package demonstrate additional functionality. Some sections from this vignette are excerpted from our papers * Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544). * Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. *Journal of Machine Learning Research*, 25(72):1-58. [PDF](https://jmlr.org/papers/v25/19-556.html) which provide important background for understanding the methods implemented in the package. # Example: Well water in Bangladesh This example comes from a survey of residents from a small area in Bangladesh that was affected by arsenic in drinking water. Respondents with elevated arsenic levels in their wells were asked if they were interested in getting water from a neighbor's well, and a series of logistic regressions were fit to predict this binary response given various information about the households (Gelman and Hill, 2007). Here we fit a model for the well-switching response given two predictors: the arsenic level of the water in the resident's home, and the distance of the house from the nearest safe well. The sample size in this example is $N=3020$, which is not huge but is large enough that it is important to have a computational method for LOO that is fast for each data point. On the plus side, with such a large dataset, the influence of any given observation is small, and so the computations should be stable. ## Coding the Stan model Here is the Stan code for fitting the logistic regression model, which we save in a file called `logistic.stan`: ``` // Note: some syntax used in this program requires RStan >= 2.26 (or CmdStanR) // To use an older version of RStan change the line declaring `y` to: // int y[N]; data { int N; // number of data points int P; // number of predictors (including intercept) matrix[N,P] X; // predictors (including 1s for intercept) array[N] int y; // binary outcome } parameters { vector[P] beta; } model { beta ~ normal(0, 1); y ~ bernoulli_logit(X * beta); } generated quantities { vector[N] log_lik; for (n in 1:N) { log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta); } } ``` We have defined the log likelihood as a vector named `log_lik` in the generated quantities block so that the individual terms will be saved by Stan. After running Stan, `log_lik` can be extracted (using the `extract_log_lik` function provided in the **loo** package) as an $S \times N$ matrix, where $S$ is the number of simulations (posterior draws) and $N$ is the number of data points. ## Fitting the model with RStan Next we fit the model in Stan using the **rstan** package: ```{r, eval=FALSE} library("rstan") # Prepare data url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat" wells <- read.table(url) wells$dist100 <- with(wells, dist / 100) X <- model.matrix(~ dist100 + arsenic, wells) standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X)) # Fit model fit_1 <- stan("logistic.stan", data = standata) print(fit_1, pars = "beta") ``` ``` mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 0.00 0 0.08 -0.16 -0.05 0.00 0.05 0.15 1964 1 beta[2] -0.89 0 0.10 -1.09 -0.96 -0.89 -0.82 -0.68 2048 1 beta[3] 0.46 0 0.04 0.38 0.43 0.46 0.49 0.54 2198 1 ``` ## Computing approximate leave-one-out cross-validation using PSIS-LOO We can then use the **loo** package to compute the efficient PSIS-LOO approximation to exact LOO-CV: ```{r, eval=FALSE} library("loo") # Extract pointwise log-likelihood # using merge_chains=FALSE returns an array, which is easier to # use with relative_eff() log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE) # as of loo v2.0.0 we can optionally provide relative effective sample sizes # when calling loo, which allows for better estimates of the PSIS effective # sample sizes and Monte Carlo error r_eff <- relative_eff(exp(log_lik_1), cores = 2) # preferably use more than 2 cores (as many cores as possible) # will use value of 'mc.cores' option if cores is not specified loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2) print(loo_1) ``` ``` Computed from 4000 by 3020 log-likelihood matrix Estimate SE elpd_loo -1968.5 15.6 p_loo 3.2 0.1 looic 3937.0 31.2 ------ Monte Carlo SE of elpd_loo is 0.0. MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.3]). All Pareto k estimates are good (k < 0.7). See help('pareto-k-diagnostic') for details. ``` The printed output from the `loo` function shows the estimates $\widehat{\mbox{elpd}}_{\rm loo}$ (expected log predictive density), $\widehat{p}_{\rm loo}$ (effective number of parameters), and ${\rm looic} =-2\, \widehat{\mbox{elpd}}_{\rm loo}$ (the LOO information criterion). The line at the bottom of the printed output provides information about the reliability of the LOO approximation (the interpretation of the $k$ parameter is explained in `help('pareto-k-diagnostic')` and in greater detail in Vehtari, Simpson, Gelman, Yao, and Gabry (2019)). In this case the message tells us that all of the estimates for $k$ are fine. ## Comparing models To compare this model to an alternative model for the same data we can use the `loo_compare` function in the **loo** package. First we'll fit a second model to the well-switching data, using `log(arsenic)` instead of `arsenic` as a predictor: ```{r, eval=FALSE} standata$X[, "arsenic"] <- log(standata$X[, "arsenic"]) fit_2 <- stan(fit = fit_1, data = standata) log_lik_2 <- extract_log_lik(fit_2, merge_chains = FALSE) r_eff_2 <- relative_eff(exp(log_lik_2)) loo_2 <- loo(log_lik_2, r_eff = r_eff_2, cores = 2) print(loo_2) ``` ``` Computed from 4000 by 3020 log-likelihood matrix Estimate SE elpd_loo -1952.3 16.2 p_loo 3.1 0.1 looic 3904.6 32.4 ------ Monte Carlo SE of elpd_loo is 0.0. MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.2]). All Pareto k estimates are good (k < 0.7). See help('pareto-k-diagnostic') for details. ``` We can now compare the models on LOO using the `loo_compare` function: ```{r, eval=FALSE} # Compare comp <- loo_compare(loo_1, loo_2) ``` This new object, `comp`, contains the estimated difference of expected leave-one-out prediction errors between the two models, along with the standard error: ```{r, eval=FALSE} print(comp) # can set simplify=FALSE for more detailed print output ``` ``` elpd_diff se_diff model2 0.0 0.0 model1 -16.3 4.4 ``` The first column shows the difference in ELPD relative to the model with the largest ELPD. In this case, the difference in `elpd` and its scale relative to the approximate standard error of the difference) indicates a preference for the second model (`model2`). # References Gelman, A., and Hill, J. (2007). *Data Analysis Using Regression and Multilevel Hierarchical Models.* Cambridge University Press. Stan Development Team (2017). _The Stan C++ Library, Version 2.17.0._ https://mc-stan.org/ Stan Development Team (2018) _RStan: the R interface to Stan, Version 2.17.3._ https://mc-stan.org/ Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. [online](https://link.springer.com/article/10.1007/s11222-016-9696-4), [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544). Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. *Journal of Machine Learning Research*, 25(72):1-58. [PDF](https://jmlr.org/papers/v25/19-556.html)