--- title: "Simulation Based Calibration" author: "Stan Development Team" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulation Based Calibration} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} library(rstan) knitr::opts_chunk$set(eval = .Platform$OS.type != "windows") ``` Here is a Stan program for a beta-binomial model ```{stan output.var="beta_binomial", eval = FALSE} data { int N; real a; real b; } transformed data { // these adhere to the conventions above real pi_ = beta_rng(a, b); int y = binomial_rng(N, pi_); } parameters { real pi; } model { target += beta_lpdf(pi | a, b); target += binomial_lpmf(y | N, pi); } generated quantities { // these adhere to the conventions above int y_ = y; vector[1] pars_; int ranks_[1] = {pi > pi_}; vector[N] log_lik; pars_[1] = pi_; for (n in 1:y) log_lik[n] = bernoulli_lpmf(1 | pi); for (n in (y + 1):N) log_lik[n] = bernoulli_lpmf(0 | pi); } ``` Notice that it adheres to the following conventions: * Realizations of the unknown parameters are drawn in the `transformed data` block are postfixed with an underscore, such as `pi_`. These are considered the "true" parameters being estimated by the corresponding symbol declared in the `parameters` block, which have the same names except for the trailing underscore, such as `pi`. * The realizations of the unknown parameters are then conditioned on when drawing from the prior predictive distribution in `transformed data` block, which in this case is `int y = binomial_rng(N, pi_);`. To avoid confusion, `y` does not have a training underscore. * The realizations of the unknown parameters are copied into a `vector` in the `generated quantities` block named `pars_` * The realizations from the prior predictive distribution are copied into an object (of the same type) in the `generated quantities` block named `y_. This is optional. * The `generated quantities` block contains an integer array named `ranks_` whose only values are zero or one, depending on whether the realization of a parameter from the posterior distribution exceeds the corresponding "true" realization, which in this case is `ranks_[1] = {pi > pi_};`. These are not actually "ranks" but can be used afterwards to reconstruct (thinned) ranks. * The `generated quantities` block contains a vector named `log_lik` whose values are the contribution to the log-likelihood by each observation. In this case, the "observations" are the implicit successes and failures that are aggregated into a binomial likelihood. This is optional but facilitates calculating the Pareto k shape parameter estimates that indicate whether the posterior distribution is sensitive to particular observations. Assuming the above is compile to a code `stanmodel` named `beta_binomial`, we can then call the `sbc` function ```{r, eval = FALSE} output <- sbc(beta_binomial, data = list(N = 10, a = 1, b = 1), M = 500, refresh = 0) ``` ```{r, include = FALSE} # This fakes what would happen if we actually took the time to run Stan. N <- 10 M <- 500 pars_ <- rbeta(M, 1, 1) y_ <- matrix(rbinom(pars_, size = N, prob = pars_), ncol = M) post_ <- matrix(rbeta(M * 1000L, 1 + y_, 1 + N - y_), ncol = M) ranks_ <- lapply(1:M, FUN = function(m) { matrix(post_[ , m] > pars_[m], ncol = 1, dimnames = list(NULL, "pi")) }) log_lik <- t(sapply(1:M, FUN = function(m) { c(dbinom(rep(1, y_[m]), size = 1, prob = pars_[m], log = TRUE), dbinom(rep(1, N - y_[m]), size = 1, prob = pars_[m], log = TRUE)) })) sampler_params <- array(0, dim = c(1000, 6, M)) colnames(sampler_params) <- c("accept_stat__", "stepsize__", "treedepth__", "n_leapfrog__", "divergent__", "energy__") output <- list(ranks = ranks_, Y = y_, pars = pars_, log_lik = log_lik, sampler_params = sampler_params) class(output) <- "sbc" ``` At which point, we can then call ```{r} print(output) plot(output, bins = 10) # it is best to specify the bins argument yourself ``` # References Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. [arXiv preprint arXiv:1804.06788](https://arxiv.org/abs/1804.06788)