Title:  Efficient LeaveOneOut CrossValidation and WAIC for Bayesian Models 

Description:  Efficient approximate leaveoneout crossvalidation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s1122201696964>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions. 
Authors:  Aki Vehtari [aut], Jonah Gabry [cre, aut], Måns Magnusson [aut], Yuling Yao [aut], PaulChristian Bürkner [aut], Topi Paananen [aut], Andrew Gelman [aut], Ben Goodrich [ctb], Juho Piironen [ctb], Bruno Nicenboim [ctb], Leevi Lindgren [ctb] 
Maintainer:  Jonah Gabry <[email protected]> 
License:  GPL (>=3) 
Version:  2.8.0.9000 
Built:  20240912 19:19:50 UTC 
Source:  https://github.com/standev/loo 
Stan Development Team
This package implements the methods described in Vehtari, Gelman, and
Gabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2024), and
Yao et al. (2018). To get started see the loo package
vignettes, the
loo()
function for efficient approximate leaveoneout
crossvalidation (LOOCV), the psis()
function for the Pareto
smoothed importance sampling (PSIS) algorithm, or
loo_model_weights()
for an implementation of Bayesian stacking of
predictive distributions from multiple models.
Leaveoneout crossvalidation (LOOCV) and the widely applicable information criterion (WAIC) are methods for estimating pointwise outofsample prediction accuracy from a fitted Bayesian model using the loglikelihood evaluated at the posterior simulations of the parameter values. LOOCV and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. This package implements the fast and stable computations for approximate LOOCV laid out in Vehtari, Gelman, and Gabry (2017). From existing posterior simulation draws, we compute LOOCV using Pareto smoothed importance sampling (PSIS; Vehtari, Simpson, Gelman, Yao, and Gabry, 2024), a new procedure for stabilizing and diagnosing importance weights. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparing of predictive errors between two models.
We recommend PSISLOOCV instead of WAIC, because PSIS provides useful diagnostics and effective sample size and Monte Carlo standard error estimates.
Maintainer: Jonah Gabry [email protected]
Authors:
Aki Vehtari [email protected]
Måns Magnusson
Yuling Yao
PaulChristian Bürkner
Topi Paananen
Andrew Gelman
Other contributors:
Ben Goodrich [contributor]
Juho Piironen [contributor]
Bruno Nicenboim [contributor]
Leevi Lindgren [contributor]
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17BA1091. (online).
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). LeaveOneOut CrossValidation for Large Data. In Thirtysixth International Conference on Machine Learning, PMLR 97:42444253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). LeaveOneOut CrossValidation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341351.
Epifani, I., MacEachern, S. N., and Peruggia, M. (2008). Casedeletion importance sampling estimators: Central limit theorems and related results. Electronic Journal of Statistics 2, 774806.
Gelfand, A. E. (1996). Model determination using samplingbased methods. In Markov Chain Monte Carlo in Practice, ed. W. R. Gilks, S. Richardson, D. J. Spiegelhalter, 145162. London: Chapman and Hall.
Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination using predictive distributions with implementation via samplingbased methods. In Bayesian Statistics 4, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 147167. Oxford University Press.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 9971016.
Ionides, E. L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17, 295311.
Koopman, S. J., Shephard, N., and Creal, D. (2009). Testing the assumptions behind importance sampling. Journal of Econometrics 149, 211.
Peruggia, M. (1997). On the variability of casedeletion importance sampling weights in the Bayesian linear model. Journal of the American Statistical Association 92, 199207.
Stan Development Team (2017). The Stan C++ Library, Version 2.17.0. https://mcstan.org.
Stan Development Team (2018). RStan: the R interface to Stan, Version 2.17.3. https://mcstan.org.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 35713594.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316325.
Useful links:
Report bugs at https://github.com/standev/loo/issues
Pareto smoothed importance sampling (PSIS) using approximate posteriors
ap_psis(log_ratios, log_p, log_g, ...) ## S3 method for class 'array' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## Default S3 method: ap_psis(log_ratios, log_p, log_g, ...)
ap_psis(log_ratios, log_p, log_g, ...) ## S3 method for class 'array' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## Default S3 method: ap_psis(log_ratios, log_p, log_g, ...)
log_ratios 
The loglikelihood ratios (ie log_liks) 
log_p 
The logposterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. 
log_g 
The logdensity (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. 
... 
Currently not in use. 
cores 
The number of cores to use for parallelization. This defaults to
the option

ap_psis(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
ap_psis(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
ap_psis(default)
: A vector of length $S$
(posterior sample size).
This function is deprecated. Please use the new loo_compare()
function
instead.
compare(..., x = list())
compare(..., x = list())
... 

x 
A list of at least two objects returned by 
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in elpd_loo
or
elpd_waic
(or multiplied by 2, if desired, to be on the
deviance scale).
When that difference, elpd_diff
, is positive then the expected
predictive accuracy for the second model is higher. A negative
elpd_diff
favors the first model.
When using compare()
with more than two models, the values in the
elpd_diff
and se_diff
columns of the returned matrix are
computed by making pairwise comparisons between each model and the model
with the best ELPD (i.e., the model in the first row).
Although the elpd_diff
column is equal to the difference in
elpd_loo
, do not expect the se_diff
column to be equal to the
the difference in se_elpd_loo
.
To compute the standard error of the difference in ELPD we use a paired estimate to take advantage of the fact that the same set of N data points was used to fit both models. These calculations should be most useful when N is large, because then nonnormality of the distribution is not such an issue when estimating the uncertainty in these sums. These standard errors, for all their flaws, should give a better sense of uncertainty than what is obtained using the current standard approach of comparing differences of deviances to a Chisquared distribution, a practice derived for Gaussian linear models or asymptotically, and which only applies to nested models in any case.
A vector or matrix with class 'compare.loo'
that has its own
print method. If exactly two objects are provided in ...
or
x
, then the difference in expected predictive accuracy and the
standard error of the difference are returned. If more than two objects are
provided then a matrix of summary information is returned (see Details).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
## Not run: loo1 < loo(log_lik1) loo2 < loo(log_lik2) print(compare(loo1, loo2), digits = 3) print(compare(x = list(loo1, loo2))) waic1 < waic(log_lik1) waic2 < waic(log_lik2) compare(waic1, waic2) ## End(Not run)
## Not run: loo1 < loo(log_lik1) loo2 < loo(log_lik2) print(compare(loo1, loo2), digits = 3) print(compare(x = list(loo1, loo2))) waic1 < waic(log_lik1) waic2 < waic(log_lik2) compare(waic1, waic2) ## End(Not run)
The crps()
and scrps()
functions and their loo_*()
counterparts can be
used to compute the continuously ranked probability score (CRPS) and scaled
CRPS (SCRPS) (see Bolin and Wallin, 2022). CRPS is a proper scoring rule, and
strictly proper when the first moment of the predictive distribution is
finite. Both can be expressed in terms of samples form the predictive
distribution. See e.g. Gneiting and Raftery (2007) for a comprehensive
discussion on CRPS.
crps(x, ...) scrps(x, ...) loo_crps(x, ...) loo_scrps(x, ...) ## S3 method for class 'matrix' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_crps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_scrps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) )
crps(x, ...) scrps(x, ...) loo_crps(x, ...) loo_scrps(x, ...) ## S3 method for class 'matrix' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_crps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_scrps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) )
x 
A 
... 
Passed on to 
x2 
Independent draws from the same distribution as draws in 
y 
A vector of observations or a single value. 
permutations 
An integer, with default value of 1, specifying how many
times the expected value of X  X' ( 
log_lik 
A loglikelihood matrix the same size as 
r_eff 
An optional vector of relative effective sample size estimates
containing one element per observation. See 
cores 
The number of cores to use for parallelization of 
To compute (S)CRPS, the user needs to provide two sets of draws, x
and
x2
, from the predictive distribution. This is due to the fact that formulas
used to compute CRPS involve an expectation of the absolute difference of x
and x2
, both having the same distribution. See the permutations
argument,
as well as Gneiting and Raftery (2007) for details.
A list containing two elements: estimates
and pointwise
.
The former reports estimator and standard error and latter the pointwise
values.
Bolin, D., & Wallin, J. (2022). Local scale invariance and robustness of proper scoring rules. arXiv. doi:10.48550/arXiv.1912.05642
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378.
## Not run: # An example using rstanarm library(rstanarm) data("kidiq") fit < stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq) ypred1 < posterior_predict(fit) ypred2 < posterior_predict(fit) crps(ypred1, ypred2, y = fit$y) loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit)) ## End(Not run)
## Not run: # An example using rstanarm library(rstanarm) data("kidiq") fit < stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq) ypred1 < posterior_predict(fit) ypred2 < posterior_predict(fit) crps(ypred1, ypred2, y = fit$y) loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit)) ## End(Not run)
The E_loo()
function computes weighted expectations (means, variances,
quantiles) using the importance weights obtained from the
PSIS smoothing procedure. The expectations estimated by the
E_loo()
function assume that the PSIS approximation is working well.
A small Pareto k estimate is necessary,
but not sufficient, for E_loo()
to give reliable estimates. Additional
diagnostic checks for gauging the reliability of the estimates are in
development and will be added in a future release.
E_loo(x, psis_object, ...) ## Default S3 method: E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL ) ## S3 method for class 'matrix' E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL )
E_loo(x, psis_object, ...) ## Default S3 method: E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL ) ## S3 method for class 'matrix' E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL )
x 
A numeric vector or matrix. 
psis_object 
An object returned by 
... 
Arguments passed to individual methods. 
type 
The type of expectation to compute. The options are

probs 
For computing quantiles, a vector of probabilities. 
log_ratios 
Optionally, a vector or matrix (the same dimensions as 
A named list with the following components:
value
The result of the computation.
For the matrix method, value
is a vector with ncol(x)
elements, with one exception: when type="quantile"
and
multiple values are specified in probs
the value
component of
the returned object is a length(probs)
by ncol(x)
matrix.
For the default/vector method the value
component is scalar, with
one exception: when type="quantile"
and multiple values
are specified in probs
the value
component is a vector with
length(probs)
elements.
pareto_k
Functionspecific diagnostic.
For the matrix method it will be a vector of length ncol(x)
containing estimates of the shape parameter $k$
of the
generalized Pareto distribution. For the default/vector method,
the estimate is a scalar. If log_ratios
is not specified when
calling E_loo()
, the smoothed logweights are used to estimate
Paretok's, which may produce optimistic estimates.
For type="mean"
, type="var"
, and type="sd"
, the returned Paretok is
usually the maximum of the Paretok's for the left and right tail of $hr$
and the right tail of $r$
, where $r$
is the importance ratio and
$h=x$
for type="mean"
and $h=x^2$
for type="var"
and type="sd"
.
If $h$
is binary, constant, or not finite, or if type="quantile"
, the
returned Paretok is the Paretok for the right tail of $r$
.
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a loglikelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl < c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt < c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d < data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit < stan_glm(weight ~ group, data = d, refresh = 0) yrep < posterior_predict(fit) dim(yrep) log_ratios < 1 * log_lik(fit) dim(log_ratios) r_eff < relative_eff(exp(log_ratios), chain_id = rep(1:4, each = 1000)) psis_object < psis(log_ratios, r_eff = r_eff, cores = 2) E_loo(yrep, psis_object, type = "mean") E_loo(yrep, psis_object, type = "var") E_loo(yrep, psis_object, type = "sd") E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9)) # We can get more accurate Pareto k diagnostic if we also provide # the log_ratios argument E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios) }
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a loglikelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl < c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt < c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d < data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit < stan_glm(weight ~ group, data = d, refresh = 0) yrep < posterior_predict(fit) dim(yrep) log_ratios < 1 * log_lik(fit) dim(log_ratios) r_eff < relative_eff(exp(log_ratios), chain_id = rep(1:4, each = 1000)) psis_object < psis(log_ratios, r_eff = r_eff, cores = 2) E_loo(yrep, psis_object, type = "mean") E_loo(yrep, psis_object, type = "var") E_loo(yrep, psis_object, type = "sd") E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9)) # We can get more accurate Pareto k diagnostic if we also provide # the log_ratios argument E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios) }
The elpd()
methods for arrays and matrices can compute the expected log
pointwise predictive density for a new dataset or the log pointwise
predictive density of the observed data (an overestimate of the elpd).
elpd(x, ...) ## S3 method for class 'array' elpd(x, ...) ## S3 method for class 'matrix' elpd(x, ...)
elpd(x, ...) ## S3 method for class 'array' elpd(x, ...) ## S3 method for class 'matrix' elpd(x, ...)
x 
A loglikelihood array or matrix. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. 
... 
Currently ignored. 
The elpd()
function is an S3 generic and methods are provided for
3D pointwise loglikelihood arrays and matrices.
elpd(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
elpd(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
The vignette Holdout validation and Kfold crossvalidation of Stan
programs with the loo package for demonstrations of using the elpd()
methods.
# Calculate the lpd of the observed data LLarr < example_loglik_array() elpd(LLarr)
# Calculate the lpd of the observed data LLarr < example_loglik_array() elpd(LLarr)
Example pointwise loglikelihood objects to use in demonstrations and tests. See the Value and Examples sections below.
example_loglik_array() example_loglik_matrix()
example_loglik_array() example_loglik_matrix()
example_loglik_array()
returns a 500 (draws) x 2 (chains) x 32
(observations) pointwise loglikelihood array.
example_loglik_matrix()
returns the same pointwise loglikelihood values
as example_loglik_array()
but reshaped into a 1000 (draws*chains) x 32
(observations) matrix.
LLarr < example_loglik_array() (dim_arr < dim(LLarr)) LLmat < example_loglik_matrix() (dim_mat < dim(LLmat)) all.equal(dim_mat[1], dim_arr[1] * dim_arr[2]) all.equal(dim_mat[2], dim_arr[3]) all.equal(LLarr[, 1, ], LLmat[1:500, ]) all.equal(LLarr[, 2, ], LLmat[501:1000, ])
LLarr < example_loglik_array() (dim_arr < dim(LLarr)) LLmat < example_loglik_matrix() (dim_mat < dim(LLmat)) all.equal(dim_mat[1], dim_arr[1] * dim_arr[2]) all.equal(dim_mat[2], dim_arr[3]) all.equal(LLarr[, 1, ], LLmat[1:500, ]) all.equal(LLarr[, 2, ], LLmat[501:1000, ])
Convenience function for extracting the pointwise loglikelihood
matrix or array from a stanfit
object from the rstan package.
Note: recent versions of rstan now include a loo()
method for
stanfit
objects that handles this internally.
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)
stanfit 
A 
parameter_name 
A character string naming the parameter (or generated quantity) in the Stan model corresponding to the loglikelihood. 
merge_chains 
If 
Stan does not automatically compute and store the loglikelihood. It is up to the user to incorporate it into the Stan program if it is to be extracted after fitting the model. In a Stan model, the pointwise log likelihood can be coded as a vector in the transformed parameters block (and then summed up in the model block) or it can be coded entirely in the generated quantities block. We recommend using the generated quantities block so that the computations are carried out only once per iteration rather than once per HMC leapfrog step.
For example, the following is the generated quantities
block for
computing and saving the loglikelihood for a linear regression model with
N
data points, outcome y
, predictor matrix X
,
coefficients beta
, and standard deviation sigma
:
vector[N] log_lik;
for (n in 1:N) log_lik[n] = normal_lpdf(y[n]  X[n, ] * beta, sigma);
If merge_chains=TRUE
, an $S$
by $N$
matrix of
(postwarmup) extracted draws, where $S$
is the size of the posterior
sample and $N$
is the number of data points. If
merge_chains=FALSE
, an $I$
by $C$
by $N$
array, where
$I \times C = S$
.
Stan Development Team (2017). The Stan C++ Library, Version 2.16.0. https://mcstan.org/
Stan Development Team (2017). RStan: the R interface to Stan, Version 2.16.1. https://mcstan.org/
Given a sample $x$
, Estimate the parameters $k$
and $\sigma$
of
the generalized Pareto distribution (GPD), assuming the location parameter is
0. By default the fit uses a prior for $k$
, which will stabilize
estimates for very small sample sizes (and low effective sample sizes in the
case of MCMC samples). The weakly informative prior is a Gaussian prior
centered at 0.5.
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)
x 
A numeric vector. The sample from which to estimate the parameters. 
wip 
Logical indicating whether to adjust 
min_grid_pts 
The minimum number of grid points used in the fitting
algorithm. The actual number used is 
sort_x 
If 
Here the parameter $k$
is the negative of $k$
in Zhang &
Stephens (2009).
A named list with components k
and sigma
.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316325.
A parent class for different importance sampling methods.
importance_sampling(log_ratios, method, ...) ## S3 method for class 'array' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## Default S3 method: importance_sampling(log_ratios, method, ..., r_eff = 1)
importance_sampling(log_ratios, method, ...) ## S3 method for class 'array' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## Default S3 method: importance_sampling(log_ratios, method, ..., r_eff = 1)
log_ratios 
An array, matrix, or vector of importance ratios on the log scale (for PSISLOO these are negative loglikelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. 
method 
The importance sampling method to use. The following methods are implemented: 
... 
Arguments passed on to the various methods. 
r_eff 
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of 
cores 
The number of cores to use for parallelization. This defaults to
the option

For developers of Bayesian modeling packages, loo includes
a generic function kfold()
so that methods may be defined for Kfold
CV without name conflicts between packages. See, for example, the
kfold()
methods in the rstanarm and brms packages.
The Value section below describes the objects that kfold()
methods should return in order to be compatible with
loo_compare()
and the loo package print methods.
kfold(x, ...) is.kfold(x)
kfold(x, ...) is.kfold(x)
x 
A fitted model object. 
... 
Arguments to pass to specific methods. 
For developers defining a kfold()
method for a class
"foo"
, the kfold.foo()
function should return a list with class
c("kfold", "loo")
with at least the following named elements:
"estimates"
: A 1x2
matrix containing the ELPD estimate and its
standard error. The matrix must have row name "elpd_kfold
" and column
names "Estimate"
and "SE"
.
"pointwise"
: A Nx1
matrix with column name "elpd_kfold"
containing
the pointwise contributions for each data point.
It is important for the object to have at least these classes and
components so that it is compatible with other functions like
loo_compare()
and print()
methods.
These functions can be used to generate indexes for use with Kfold crossvalidation. See the Details section for explanations.
kfold_split_random(K = 10, N = NULL) kfold_split_stratified(K = 10, x = NULL) kfold_split_grouped(K = 10, x = NULL)
kfold_split_random(K = 10, N = NULL) kfold_split_stratified(K = 10, x = NULL) kfold_split_grouped(K = 10, x = NULL)
K 
The number of folds to use. 
N 
The number of observations in the data. 
x 
A discrete variable of length 
kfold_split_random()
splits the data into K
groups
of equal size (or roughly equal size).
For a categorical variable x
kfold_split_stratified()
splits the observations into K
groups ensuring that relative
category frequencies are approximately preserved.
For a grouping variable x
, kfold_split_grouped()
places
all observations in x
from the same group/level together in
the same fold. The selection of which groups/levels go into which
fold (relevant when when there are more groups than folds) is
randomized.
An integer vector of length N
where each element is an index in 1:K
.
ids < kfold_split_random(K = 5, N = 20) print(ids) table(ids) x < sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95)) table(x) ids < kfold_split_stratified(K = 5, x = x) print(ids) table(ids, x) grp < gl(n = 50, k = 15, labels = state.name) length(grp) head(table(grp)) ids_10 < kfold_split_grouped(K = 10, x = grp) (tab_10 < table(grp, ids_10)) colSums(tab_10) ids_9 < kfold_split_grouped(K = 9, x = grp) (tab_9 < table(grp, ids_9)) colSums(tab_9)
ids < kfold_split_random(K = 5, N = 20) print(ids) table(ids) x < sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95)) table(x) ids < kfold_split_stratified(K = 5, x = x) print(ids) table(ids, x) grp < gl(n = 50, k = 15, labels = state.name) length(grp) head(table(grp)) ids_10 < kfold_split_grouped(K = 10, x = grp) (tab_10 < table(grp, ids_10)) colSums(tab_10) ids_9 < kfold_split_grouped(K = 9, x = grp) (tab_9 < table(grp, ids_9)) colSums(tab_9)
The loo()
methods for arrays, matrices, and functions compute PSISLOO
CV, efficient approximate leaveoneout (LOO) crossvalidation for Bayesian
models using Pareto smoothed importance sampling (PSIS). This is
an implementation of the methods described in Vehtari, Gelman, and Gabry
(2017) and Vehtari, Simpson, Gelman, Yao, and Gabry (2024).
The loo_i()
function enables testing loglikelihood
functions for use with the loo.function()
method.
loo(x, ...) ## S3 method for class 'array' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class 'matrix' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class ''function'' loo( x, ..., data = NULL, draws = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis") is.loo(x) is.psis_loo(x)
loo(x, ...) ## S3 method for class 'array' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class 'matrix' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class ''function'' loo( x, ..., data = NULL, draws = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis") is.loo(x) is.psis_loo(x)
x 
A loglikelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. 
r_eff 
Vector of relative effective sample size estimates for the
likelihood ( 
save_psis 
Should the 
cores 
The number of cores to use for parallelization. This defaults to
the option

is_method 
The importance sampling method to use. The following methods are implemented: 
data , draws , ...

For the 
i 
For 
llfun 
For 
The loo()
function is an S3 generic and methods are provided for
3D pointwise loglikelihood arrays, pointwise loglikelihood matrices, and
loglikelihood functions. The array and matrix methods are the most
convenient, but for models fit to very large datasets the loo.function()
method is more memory efficient and may be preferable.
The loo()
methods return a named list with class
c("psis_loo", "loo")
and components:
estimates
A matrix with two columns (Estimate
, SE
) and three rows (elpd_loo
,
p_loo
, looic
). This contains point estimates and standard errors of the
expected log pointwise predictive density (elpd_loo
), the
effective number of parameters (p_loo
) and the LOO
information criterion looic
(which is just 2 * elpd_loo
, i.e.,
converted to deviance scale).
pointwise
A matrix with five columns (and number of rows equal to the number of
observations) containing the pointwise contributions of the measures
(elpd_loo
, mcse_elpd_loo
, p_loo
, looic
, influence_pareto_k
).
in addition to the three measures in estimates
, we also report
pointwise values of the Monte Carlo standard error of elpd_loo
(mcse_elpd_loo
), and statistics describing the influence of
each observation on the posterior distribution (influence_pareto_k
).
These are the estimates of the shape parameter $k$
of the
generalized Pareto fit to the importance ratios for each leaveoneout
distribution (see the paretokdiagnostic page for details).
diagnostics
A named list containing two vectors:
pareto_k
: Importance sampling reliability diagnostics. By default,
these are equal to the influence_pareto_k
in pointwise
.
Some algorithms can improve importance sampling reliability and
modify these diagnostics. See the paretokdiagnostic page for details.
n_eff
: PSIS effective sample size estimates.
psis_object
This component will be NULL
unless the save_psis
argument is set to
TRUE
when calling loo()
. In that case psis_object
will be the object
of class "psis"
that is created when the loo()
function calls psis()
internally to do the PSIS procedure.
The loo_i()
function returns a named list with components
pointwise
and diagnostics
. These components have the same
structure as the pointwise
and diagnostics
components of the
object returned by loo()
except they contain results for only a single
observation.
loo(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
loo(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
loo(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the loglikelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
loglikelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
loglikelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
loglikelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise loglikelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the loglikelihood function.
The ...
can be used if your loglikelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
loo()
methods in a packagePackage developers can define
loo()
methods for fitted models objects. See the example loo.stanfit()
method in the Examples section below for an example of defining a
method that calls loo.array()
. The loo.stanreg()
method in the
rstanarm package is an example of defining a method that calls
loo.function()
.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
The loo package vignettes for demonstrations.
The FAQ page on the loo website for answers to frequently asked questions.
psis()
for the underlying Pareto Smoothed Importance Sampling (PSIS)
procedure used in the LOOCV approximation.
paretokdiagnostic for convenience functions for looking at diagnostics.
loo_compare()
for model comparison.
### Array and matrix methods (using example objects included with loo package) # Array method LLarr < example_loglik_array() rel_n_eff < relative_eff(exp(LLarr)) loo(LLarr, r_eff = rel_n_eff, cores = 2) # Matrix method LLmat < example_loglik_matrix() rel_n_eff < relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) loo(LLmat, r_eff = rel_n_eff, cores = 2) ### Using loglikelihood function instead of array or matrix set.seed(124) # Simulate data and draw from posterior N < 50; K < 10; S < 100; a0 < 3; b0 < 2 p < rbeta(1, a0, b0) y < rbinom(N, size = K, prob = p) a < a0 + sum(y); b < b0 + N * K  sum(y) fake_posterior < as.matrix(rbeta(S, a, b)) dim(fake_posterior) # S x 1 fake_data < data.frame(y,K) dim(fake_data) # N x 2 llfun < function(data_i, draws) { # each time called internally within loo the arguments will be equal to: # data_i: ith row of fake_data (fake_data[i,, drop=FALSE]) # draws: entire fake_posterior matrix dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE) } # Use the loo_i function to check that llfun works on a single observation # before running on all obs. For example, using the 3rd obs in the data: loo_3 < loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior) print(loo_3$pointwise[, "elpd_loo"]) # Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC) loo_with_fn < loo(llfun, draws = fake_posterior, data = fake_data) # If we look at the elpd_loo contribution from the 3rd obs it should be the # same as what we got above with the loo_i function and i=3: print(loo_with_fn$pointwise[3, "elpd_loo"]) print(loo_3$pointwise[, "elpd_loo"]) # Check that the loo.matrix method gives same answer as loo.function method log_lik_matrix < sapply(1:N, function(i) { llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior) }) loo_with_mat < loo(log_lik_matrix) all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE! ## Not run: ### For package developers: defining loo methods # An example of a possible loo method for 'stanfit' objects (rstan package). # A similar method is included in the rstan package. # In order for users to be able to call loo(stanfit) instead of # loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately # (roxygen2 and devtools packages are good for that). # loo.stanfit < function(x, pars = "log_lik", ..., save_psis = FALSE, cores = getOption("mc.cores", 1)) { stopifnot(length(pars) == 1L) LLarray < loo::extract_log_lik(stanfit = x, parameter_name = pars, merge_chains = FALSE) r_eff < loo::relative_eff(x = exp(LLarray), cores = cores) loo::loo.array(LLarray, r_eff = r_eff, cores = cores, save_psis = save_psis) } ## End(Not run)
### Array and matrix methods (using example objects included with loo package) # Array method LLarr < example_loglik_array() rel_n_eff < relative_eff(exp(LLarr)) loo(LLarr, r_eff = rel_n_eff, cores = 2) # Matrix method LLmat < example_loglik_matrix() rel_n_eff < relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) loo(LLmat, r_eff = rel_n_eff, cores = 2) ### Using loglikelihood function instead of array or matrix set.seed(124) # Simulate data and draw from posterior N < 50; K < 10; S < 100; a0 < 3; b0 < 2 p < rbeta(1, a0, b0) y < rbinom(N, size = K, prob = p) a < a0 + sum(y); b < b0 + N * K  sum(y) fake_posterior < as.matrix(rbeta(S, a, b)) dim(fake_posterior) # S x 1 fake_data < data.frame(y,K) dim(fake_data) # N x 2 llfun < function(data_i, draws) { # each time called internally within loo the arguments will be equal to: # data_i: ith row of fake_data (fake_data[i,, drop=FALSE]) # draws: entire fake_posterior matrix dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE) } # Use the loo_i function to check that llfun works on a single observation # before running on all obs. For example, using the 3rd obs in the data: loo_3 < loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior) print(loo_3$pointwise[, "elpd_loo"]) # Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC) loo_with_fn < loo(llfun, draws = fake_posterior, data = fake_data) # If we look at the elpd_loo contribution from the 3rd obs it should be the # same as what we got above with the loo_i function and i=3: print(loo_with_fn$pointwise[3, "elpd_loo"]) print(loo_3$pointwise[, "elpd_loo"]) # Check that the loo.matrix method gives same answer as loo.function method log_lik_matrix < sapply(1:N, function(i) { llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior) }) loo_with_mat < loo(log_lik_matrix) all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE! ## Not run: ### For package developers: defining loo methods # An example of a possible loo method for 'stanfit' objects (rstan package). # A similar method is included in the rstan package. # In order for users to be able to call loo(stanfit) instead of # loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately # (roxygen2 and devtools packages are good for that). # loo.stanfit < function(x, pars = "log_lik", ..., save_psis = FALSE, cores = getOption("mc.cores", 1)) { stopifnot(length(pars) == 1L) LLarray < loo::extract_log_lik(stanfit = x, parameter_name = pars, merge_chains = FALSE) r_eff < loo::relative_eff(x = exp(LLarray), cores = cores) loo::loo.array(LLarray, r_eff = r_eff, cores = cores, save_psis = save_psis) } ## End(Not run)
Efficient approximate leaveoneout crossvalidation (LOO) for posterior approximations
loo_approximate_posterior(x, log_p, log_g, ...) ## S3 method for class 'array' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class ''function'' loo_approximate_posterior( x, ..., data = NULL, draws = NULL, log_p = NULL, log_g = NULL, save_psis = FALSE, cores = getOption("mc.cores", 1) )
loo_approximate_posterior(x, log_p, log_g, ...) ## S3 method for class 'array' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class ''function'' loo_approximate_posterior( x, ..., data = NULL, draws = NULL, log_p = NULL, log_g = NULL, save_psis = FALSE, cores = getOption("mc.cores", 1) )
x 
A loglikelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. 
log_p 
The logposterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. 
log_g 
The logdensity (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. 
save_psis 
Should the 
cores 
The number of cores to use for parallelization. This defaults to
the option

data , draws , ...

For the 
The loo_approximate_posterior()
function is an S3 generic and
methods are provided for 3D pointwise loglikelihood arrays, pointwise
loglikelihood matrices, and loglikelihood functions. The implementation
works for posterior approximations where it is possible to compute the log
density for the posterior approximation.
The loo_approximate_posterior()
methods return a named list with
class c("psis_loo_ap", "psis_loo", "loo")
. It has the same structure
as the objects returned by loo()
but with the additional slot:
posterior_approximation
A list with two vectors, log_p
and log_g
of the same length
containing the posterior density and the approximation density
for the individual draws.
loo_approximate_posterior(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
loo_approximate_posterior(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
loo_approximate_posterior(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the loglikelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
loglikelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
loglikelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
loglikelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise loglikelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the loglikelihood function.
The ...
can be used if your loglikelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). LeaveOneOut CrossValidation for Large Data. In Thirtysixth International Conference on Machine Learning, PMLR 97:42444253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). LeaveOneOut CrossValidation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341351.
Compare fitted models based on ELPD.
By default the print method shows only the most important information. Use
print(..., simplify=FALSE)
to print a more detailed summary.
loo_compare(x, ...) ## Default S3 method: loo_compare(x, ...) ## S3 method for class 'compare.loo' print(x, ..., digits = 1, simplify = TRUE) ## S3 method for class 'compare.loo_ss' print(x, ..., digits = 1, simplify = TRUE)
loo_compare(x, ...) ## Default S3 method: loo_compare(x, ...) ## S3 method for class 'compare.loo' print(x, ..., digits = 1, simplify = TRUE) ## S3 method for class 'compare.loo_ss' print(x, ..., digits = 1, simplify = TRUE)
x 
An object of class 
... 
Additional objects of class 
digits 
For the print method only, the number of digits to use when printing. 
simplify 
For the print method only, should only the essential columns of the summary matrix be printed? The entire matrix is always returned, but by default only the most important columns are printed. 
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in
elpd_loo
or elpd_waic
(or multiplied by $2$
, if
desired, to be on the deviance scale).
When using loo_compare()
, the returned matrix will have one row per model
and several columns of estimates. The values in the
elpd_diff
and se_diff
columns of the
returned matrix are computed by making pairwise comparisons between each
model and the model with the largest ELPD (the model in the first row). For
this reason the elpd_diff
column will always have the value 0
in the
first row (i.e., the difference between the preferred model and itself) and
negative values in subsequent rows for the remaining models.
To compute the standard error of the difference in ELPD —
which should not be expected to equal the difference of the standard errors
— we use a paired estimate to take advantage of the fact that the same
set of $N$
data points was used to fit both models. These calculations
should be most useful when $N$
is large, because then nonnormality of
the distribution is not such an issue when estimating the uncertainty in
these sums. These standard errors, for all their flaws, should give a
better sense of uncertainty than what is obtained using the current
standard approach of comparing differences of deviances to a Chisquared
distribution, a practice derived for Gaussian linear models or
asymptotically, and which only applies to nested models in any case.
Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and se_diff
is good.
If more than $11$
models are compared, we internally recompute the model
differences using the median model by ELPD as the baseline model. We then
estimate whether the differences in predictive performance are potentially
due to chance as described by McLatchie and Vehtari (2023). This will flag
a warning if it is deemed that there is a risk of overfitting due to the
selection process. In that case users are recommended to avoid model
selection based on LOOCV, and instead to favor model averaging/stacking or
projection predictive inference.
A matrix with class "compare.loo"
that has its own
print method. See the Details section.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leaveoneout crossvalidation based model comparison. preprint arXiv:2008.10296v3..
McLatchie, Y., and Vehtari, A. (2023). Efficient estimation and correction of selectioninduced bias with order statistics. preprint arXiv:2309.03742
The FAQ page on the loo website for answers to frequently asked questions.
# very artificial example, just for demonstration! LL < example_loglik_array() loo1 < loo(LL) # should be worst model when compared loo2 < loo(LL + 1) # should be second best model when compared loo3 < loo(LL + 2) # should be best model when compared comp < loo_compare(loo1, loo2, loo3) print(comp, digits = 2) # show more details with simplify=FALSE # (will be the same for all models in this artificial example) print(comp, simplify = FALSE, digits = 3) # can use a list of objects with custom names # will use apple, banana, and cherry, as the names in the output loo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3)) ## Not run: # works for waic (and kfold) too loo_compare(waic(LL), waic(LL  10)) ## End(Not run)
# very artificial example, just for demonstration! LL < example_loglik_array() loo1 < loo(LL) # should be worst model when compared loo2 < loo(LL + 1) # should be second best model when compared loo3 < loo(LL + 2) # should be best model when compared comp < loo_compare(loo1, loo2, loo3) print(comp, digits = 2) # show more details with simplify=FALSE # (will be the same for all models in this artificial example) print(comp, simplify = FALSE, digits = 3) # can use a list of objects with custom names # will use apple, banana, and cherry, as the names in the output loo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3)) ## Not run: # works for waic (and kfold) too loo_compare(waic(LL), waic(LL  10)) ## End(Not run)
Model averaging via stacking of predictive distributions, pseudoBMA weighting or pseudoBMA+ weighting with the Bayesian bootstrap. See Yao et al. (2018), Vehtari, Gelman, and Gabry (2017), and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for background.
loo_model_weights(x, ...) ## Default S3 method: loo_model_weights( x, ..., method = c("stacking", "pseudobma"), optim_method = "BFGS", optim_control = list(), BB = TRUE, BB_n = 1000, alpha = 1, r_eff_list = NULL, cores = getOption("mc.cores", 1) ) stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list()) pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)
loo_model_weights(x, ...) ## Default S3 method: loo_model_weights( x, ..., method = c("stacking", "pseudobma"), optim_method = "BFGS", optim_control = list(), BB = TRUE, BB_n = 1000, alpha = 1, r_eff_list = NULL, cores = getOption("mc.cores", 1) ) stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list()) pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)
x 
A list of 
... 
Unused, except for the generic to pass arguments to individual methods. 
method 
Either 
optim_method 
If 
optim_control 
If 
BB 
Logical used when 
BB_n 
For pseudoBMA+ weighting only, the number of samples to use for
the Bayesian bootstrap. The default is 
alpha 
Positive scalar shape parameter in the Dirichlet distribution
used for the Bayesian bootstrap. The default is 
r_eff_list 
Optionally, a list of relative effective sample size
estimates for the likelihood 
cores 
The number of cores to use for parallelization. This defaults to
the option

lpd_point 
If calling 
loo_model_weights()
is a wrapper around the stacking_weights()
and
pseudobma_weights()
functions that implements stacking, pseudoBMA, and
pseudoBMA+ weighting for combining multiple predictive distributions. We can
use approximate or exact leaveoneout crossvalidation (LOOCV) or Kfold CV
to estimate the expected log predictive density (ELPD).
The stacking method (method="stacking"
), which is the default for
loo_model_weights()
, combines all models by maximizing the leaveoneout
predictive density of the combination distribution. That is, it finds the
optimal linear combining weights for maximizing the leaveoneout log score.
The pseudoBMA method (method="pseudobma"
) finds the relative weights
proportional to the ELPD of each model. However, when
method="pseudobma"
, the default is to also use the Bayesian bootstrap
(BB=TRUE
), which corresponds to the pseudoBMA+ method. The Bayesian
bootstrap takes into account the uncertainty of finite data points and
regularizes the weights away from the extremes of 0 and 1.
In general, we recommend stacking for averaging predictive distributions, while pseudoBMA+ can serve as a computationally easier alternative.
A numeric vector containing one weight for each model.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17BA1091. (online).
The loo package vignettes, particularly Bayesian Stacking and PseudoBMA weights using the loo package.
loo()
for details on leaveoneout ELPD estimation.
constrOptim()
for the choice of optimization methods and controlparameters.
relative_eff()
for computing r_eff
.
## Not run: ### Demonstrating usage after fitting models with RStan library(rstan) # generate fake data from N(0,1). N < 100 y < rnorm(N, 0, 1) # Suppose we have three models: N(1, sigma), N(0.5, sigma) and N(0.6,sigma). stan_code < " data { int N; vector[N] y; real mu_fixed; } parameters { real<lower=0> sigma; } model { sigma ~ exponential(1); y ~ normal(mu_fixed, sigma); } generated quantities { vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n] mu_fixed, sigma); }" mod < stan_model(model_code = stan_code) fit1 < sampling(mod, data=list(N=N, y=y, mu_fixed=1)) fit2 < sampling(mod, data=list(N=N, y=y, mu_fixed=0.5)) fit3 < sampling(mod, data=list(N=N, y=y, mu_fixed=0.6)) model_list < list(fit1, fit2, fit3) log_lik_list < lapply(model_list, extract_log_lik) # optional but recommended r_eff_list < lapply(model_list, function(x) { ll_array < extract_log_lik(x, merge_chains = FALSE) relative_eff(exp(ll_array)) }) # stacking method: wts1 < loo_model_weights( log_lik_list, method = "stacking", r_eff_list = r_eff_list, optim_control = list(reltol=1e10) ) print(wts1) # can also pass a list of psis_loo objects to avoid recomputing loo loo_list < lapply(1:length(log_lik_list), function(j) { loo(log_lik_list[[j]], r_eff = r_eff_list[[j]]) }) wts2 < loo_model_weights( loo_list, method = "stacking", optim_control = list(reltol=1e10) ) all.equal(wts1, wts2) # can provide names to be used in the results loo_model_weights(setNames(loo_list, c("A", "B", "C"))) # pseudoBMA+ method: set.seed(1414) loo_model_weights(loo_list, method = "pseudobma") # pseudoBMA method (set BB = FALSE): loo_model_weights(loo_list, method = "pseudobma", BB = FALSE) # calling stacking_weights or pseudobma_weights directly lpd1 < loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1] lpd2 < loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1] lpd3 < loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1] stacking_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE) ## End(Not run)
## Not run: ### Demonstrating usage after fitting models with RStan library(rstan) # generate fake data from N(0,1). N < 100 y < rnorm(N, 0, 1) # Suppose we have three models: N(1, sigma), N(0.5, sigma) and N(0.6,sigma). stan_code < " data { int N; vector[N] y; real mu_fixed; } parameters { real<lower=0> sigma; } model { sigma ~ exponential(1); y ~ normal(mu_fixed, sigma); } generated quantities { vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n] mu_fixed, sigma); }" mod < stan_model(model_code = stan_code) fit1 < sampling(mod, data=list(N=N, y=y, mu_fixed=1)) fit2 < sampling(mod, data=list(N=N, y=y, mu_fixed=0.5)) fit3 < sampling(mod, data=list(N=N, y=y, mu_fixed=0.6)) model_list < list(fit1, fit2, fit3) log_lik_list < lapply(model_list, extract_log_lik) # optional but recommended r_eff_list < lapply(model_list, function(x) { ll_array < extract_log_lik(x, merge_chains = FALSE) relative_eff(exp(ll_array)) }) # stacking method: wts1 < loo_model_weights( log_lik_list, method = "stacking", r_eff_list = r_eff_list, optim_control = list(reltol=1e10) ) print(wts1) # can also pass a list of psis_loo objects to avoid recomputing loo loo_list < lapply(1:length(log_lik_list), function(j) { loo(log_lik_list[[j]], r_eff = r_eff_list[[j]]) }) wts2 < loo_model_weights( loo_list, method = "stacking", optim_control = list(reltol=1e10) ) all.equal(wts1, wts2) # can provide names to be used in the results loo_model_weights(setNames(loo_list, c("A", "B", "C"))) # pseudoBMA+ method: set.seed(1414) loo_model_weights(loo_list, method = "pseudobma") # pseudoBMA method (set BB = FALSE): loo_model_weights(loo_list, method = "pseudobma", BB = FALSE) # calling stacking_weights or pseudobma_weights directly lpd1 < loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1] lpd2 < loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1] lpd3 < loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1] stacking_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE) ## End(Not run)
Moment matching algorithm for updating a loo object when Pareto k estimates are large.
loo_moment_match(x, ...) ## Default S3 method: loo_moment_match( x, loo, post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars, max_iters = 30L, k_threshold = NULL, split = TRUE, cov = TRUE, cores = getOption("mc.cores", 1), ... )
loo_moment_match(x, ...) ## Default S3 method: loo_moment_match( x, loo, post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars, max_iters = 30L, k_threshold = NULL, split = TRUE, cov = TRUE, cores = getOption("mc.cores", 1), ... )
x 
A fitted model object. 
... 
Further arguments passed to the custom functions documented above. 
loo 
A loo object to be modified. 
post_draws 
A function the takes 
log_lik_i 
A function that takes 
unconstrain_pars 
A function that takes arguments 
log_prob_upars 
A function that takes arguments 
log_lik_i_upars 
A function that takes arguments 
max_iters 
Maximum number of moment matching iterations. Usually this
does not need to be modified. If the maximum number of iterations is
reached, there will be a warning, and increasing 
k_threshold 
Threshold value for Pareto k values above which the moment
matching algorithm is used. The default value is 
split 
Logical; Indicate whether to do the split transformation or not at the end of moment matching for each LOO fold. 
cov 
Logical; Indicate whether to match the covariance matrix of the
samples or not. If 
cores 
The number of cores to use for parallelization. This defaults to
the option

The loo_moment_match()
function is an S3 generic and we provide a
default method that takes as arguments userspecified functions
post_draws
, log_lik_i
, unconstrain_pars
, log_prob_upars
, and
log_lik_i_upars
. All of these functions should take ...
. as an argument
in addition to those specified for each function.
The loo_moment_match()
methods return an updated loo
object. The
structure of the updated loo
object is similar, but the method also
stores the original Pareto k diagnostic values in the diagnostics field.
loo_moment_match(default)
: A default method that takes as arguments a
userspecified model object x
, a loo
object and userspecified
functions post_draws
, log_lik_i
, unconstrain_pars
, log_prob_upars
,
and log_lik_i_upars
.
Paananen, T., Piironen, J., Buerkner, P.C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222020099822. arXiv preprint arXiv:1906.08850.
loo()
, loo_moment_match_split()
# See the vignette for loo_moment_match()
# See the vignette for loo_moment_match()
A function that computes the split moment matching importance sampling loo. Takes in the moment matching total transformation, transforms only half of the draws, and computes a single elpd using multiple importance sampling.
loo_moment_match_split( x, upars, cov, total_shift, total_scaling, total_mapping, i, log_prob_upars, log_lik_i_upars, r_eff_i, cores, is_method, ... )
loo_moment_match_split( x, upars, cov, total_shift, total_scaling, total_mapping, i, log_prob_upars, log_lik_i_upars, r_eff_i, cores, is_method, ... )
x 
A fitted model object. 
upars 
A matrix containing the model parameters in unconstrained space where they can have any real value. 
cov 
Logical; Indicate whether to match the covariance matrix of the
samples or not. If 
total_shift 
A vector representing the total shift made by the moment matching algorithm. 
total_scaling 
A vector representing the total scaling of marginal variance made by the moment matching algorithm. 
total_mapping 
A vector representing the total covariance transformation made by the moment matching algorithm. 
i 
Observation index. 
log_prob_upars 
A function that takes arguments 
log_lik_i_upars 
A function that takes arguments 
r_eff_i 
MCMC relative effective sample size of the 
cores 
The number of cores to use for parallelization. This defaults to
the option

is_method 
The importance sampling method to use. The following methods are implemented: 
... 
Further arguments passed to the custom functions documented above. 
A list containing the updated logimportance weights and loglikelihood values. Also returns the updated MCMC effective sample size and the integrandspecific logimportance weights.
Paananen, T., Piironen, J., Buerkner, P.C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222020099822. arXiv preprint arXiv:1906.08850.
The loo_predictive_metric()
function computes estimates of leaveoneout
predictive metrics given a set of predictions and observations. Currently
supported metrics are mean absolute error, mean squared error and root mean
squared error for continuous predictions and accuracy and balanced accuracy
for binary classification. Predictions are passed on to the E_loo()
function, so this function assumes that the PSIS approximation is working
well.
loo_predictive_metric(x, ...) ## S3 method for class 'matrix' loo_predictive_metric( x, y, log_lik, ..., metric = c("mae", "rmse", "mse", "acc", "balanced_acc"), r_eff = 1, cores = getOption("mc.cores", 1) )
loo_predictive_metric(x, ...) ## S3 method for class 'matrix' loo_predictive_metric( x, y, log_lik, ..., metric = c("mae", "rmse", "mse", "acc", "balanced_acc"), r_eff = 1, cores = getOption("mc.cores", 1) )
x 
A numeric matrix of predictions. 
... 
Additional arguments passed on to 
y 
A numeric vector of observations. Length should be equal to the
number of rows in 
log_lik 
A matrix of pointwise loglikelihoods. Should be of same
dimension as 
metric 
The type of predictive metric to be used. Currently
supported options are

r_eff 
A Vector of relative effective sample size estimates containing
one element per observation. See 
cores 
The number of cores to use for parallelization of 
A list with the following components:
estimate
Estimate of the given metric.
se
Standard error of the estimate.
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a loglikelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl < c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt < c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d < data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit < stan_glm(weight ~ group, data = d, refresh = 0) ll < log_lik(fit) r_eff < relative_eff(exp(ll), chain_id = rep(1:4, each = 1000)) mu_pred < posterior_epred(fit) # Leaveoneout mean absolute error of predictions mae < loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff) # Leaveoneout 90%quantile of mean absolute error mae_90q < loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff, type = 'quantile', probs = 0.9) }
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a loglikelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl < c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt < c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d < data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit < stan_glm(weight ~ group, data = d, refresh = 0) ll < log_lik(fit) r_eff < relative_eff(exp(ll), chain_id = rep(1:4, each = 1000)) mu_pred < posterior_epred(fit) # Leaveoneout mean absolute error of predictions mae < loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff) # Leaveoneout 90%quantile of mean absolute error mae_90q < loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff, type = 'quantile', probs = 0.9) }
Efficient approximate leaveoneout crossvalidation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOOfold, and more costly and accurate computations are made only for m<N LOOfolds.
loo_subsample(x, ...) ## S3 method for class ''function'' loo_subsample( x, ..., data = NULL, draws = NULL, observations = 400, log_p = NULL, log_g = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), loo_approximation = "plpd", loo_approximation_draws = NULL, estimator = "diff_srs", llgrad = NULL, llhess = NULL )
loo_subsample(x, ...) ## S3 method for class ''function'' loo_subsample( x, ..., data = NULL, draws = NULL, observations = 400, log_p = NULL, log_g = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), loo_approximation = "plpd", loo_approximation_draws = NULL, estimator = "diff_srs", llgrad = NULL, llhess = NULL )
x 
A function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs. 
data , draws , ...

For 
observations 
The subsample observations to use. The argument can take four (4) types of arguments:

log_p , log_g

Should be supplied only if approximate posterior draws are
used. The default ( 
r_eff 
Vector of relative effective sample size estimates for the
likelihood ( 
save_psis 
Should the 
cores 
The number of cores to use for parallelization. This defaults to
the option

loo_approximation 
What type of approximation of the loo_i's should be used?
The default is
As point estimates of 
loo_approximation_draws 
The number of posterior draws used when
integrating over the posterior. This is used if 
estimator 
How should

llgrad 
The gradient of the loglikelihood. This
is only used when 
llhess 
The Hessian of the loglikelihood. This is only used
with 
The loo_subsample()
function is an S3 generic and a methods is
currently provided for loglikelihood functions. The implementation works
for both MCMC and for posterior approximations where it is possible to
compute the log density for the approximation.
loo_subsample()
returns a named list with class c("psis_loo_ss", "psis_loo", "loo")
. This has the same structure as objects returned by
loo()
but with the additional slot:
loo_subsampling
: A list with two vectors, log_p
and log_g
, of the
same length containing the posterior density and the approximation density
for the individual draws.
loo_subsample(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the loglikelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
loglikelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
loglikelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
loglikelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise loglikelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the loglikelihood function.
The ...
can be used if your loglikelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). LeaveOneOut CrossValidation for Large Data. In Thirtysixth International Conference on Machine Learning, PMLR 97:42444253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). LeaveOneOut CrossValidation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341351.
Small datasets for use in loo examples and vignettes. The Kline
and milk
datasets are also included in the rethinking package
(McElreath, 2016a), but we include them here as rethinking is not
on CRAN.
Currently the data sets included are:
Kline
:
Small dataset from Kline and Boyd (2010) on tool complexity and demography
in Oceanic islands societies. This data is discussed in detail in
McElreath (2016a,2016b). (Link to variable descriptions)
milk
:
Small dataset from Hinde and Milligan (2011) on primate milk
composition.This data is discussed in detail in McElreath (2016a,2016b).
(Link to variable descriptions)
voice
:
Voice rehabilitation data from Tsanas et al. (2014).
Hinde and Milligan. 2011. Evolutionary Anthropology 20:923.
Kline, M.A. and R. Boyd. 2010. Proc R Soc B 277:25592564.
McElreath, R. (2016a). rethinking: Statistical Rethinking book package. R package version 1.59.
McElreath, R. (2016b). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC.
A. Tsanas, M.A. Little, C. Fox, L.O. Ramig: Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 22, pp. 181190, January 2014
str(Kline) str(milk)
str(Kline) str(milk)
The pages provides definitions to key terms. Also see the FAQ page on the loo website for answers to frequently asked questions.
Note: VGG2017 refers to Vehtari, Gelman, and Gabry (2017). See References, below.
elpd_loo
The ELPD is the theoretical expected log pointwise predictive density for a new
dataset (Eq 1 in VGG2017), which can be estimated, e.g., using
crossvalidation. elpd_loo
is the Bayesian LOO estimate of the
expected log pointwise predictive density (Eq 4 in VGG2017) and
is a sum of N individual pointwise log predictive densities. Probability
densities can be smaller or larger than 1, and thus log predictive densities
can be negative or positive. For simplicity the ELPD acronym is used also for
expected log pointwise predictive probabilities for discrete models.
Probabilities are always equal or less than 1, and thus log predictive
probabilities are 0 or negative.
elpd_loo
As elpd_loo
is defined as the sum of N independent components (Eq 4 in
VGG2017), we can compute the standard error by using the standard deviation
of the N components and multiplying by sqrt(N)
(Eq 23 in VGG2017).
This standard error is a coarse description of our uncertainty about the
predictive performance for unknown future data. When N is small or there is
severe model misspecification, the current SE estimate is overoptimistic and
the actual SE can even be twice as large. Even for moderate N, when the SE
estimate is an accurate estimate for the scale, it ignores the skewness. When
making model comparisons, the SE of the componentwise (pairwise) differences
should be used instead (see the se_diff
section below and Eq 24 in
VGG2017). Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and se_diff
is good.
The Monte Carlo standard error is the estimate for the computational accuracy
of MCMC and importance sampling used to compute elpd_loo
. Usually this
is negligible compared to the standard describing the uncertainty due to
finite number of observations (Eq 23 in VGG2017).
p_loo
(effective number of parameters)p_loo
is the difference between elpd_loo
and the noncrossvalidated
log posterior predictive density. It describes how much more difficult it
is to predict future data than the observed data. Asymptotically under
certain regularity conditions, p_loo
can be interpreted as the
effective number of parameters. In well behaving cases p_loo < N
and
p_loo < p
, where p
is the total number of parameters in the
model. p_loo > N
or p_loo > p
indicates that the model has very
weak predictive capability and may indicate a severe model misspecification.
See below for more on interpreting p_loo
when there are warnings
about high Pareto k diagnostic values.
The Pareto $k$
estimate is a diagnostic for Pareto smoothed importance
sampling (PSIS), which is used to compute components of elpd_loo
. In
importancesampling LOO the full posterior distribution is used as the
proposal distribution. The Pareto k diagnostic estimates how far an
individual leaveoneout distribution is from the full distribution. If
leaving out an observation changes the posterior too much then importance
sampling is not able to give a reliable estimate. Pareto smoothing stabilizes
importance sampling and guarantees a finite variance estimate at the
cost of some bias.
The diagnostic threshold for Pareto $k$
depends on sample size
$S$
(sample size dependent threshold was introduced by Vehtari
et al., 2024, and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity, loo
package uses the nominal sample
size $S$
when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but even then if ESS/S > 1/2 the difference
is usually negligible. Thinning of MCMC draws can be used to improve
the ratio ESS/S.
If $k < \min(1  1 / \log_{10}(S), 0.7)$
, where $S$
is the
sample size, the PSIS estimate and the corresponding Monte
Carlo standard error estimate are reliable.
If $1  1 / \log_{10}(S) <= k < 0.7$
, the PSIS estimate and the
corresponding Monte Carlo standard error estimate are not
reliable, but increasing the (effective) sample size $S$
above
2200 may help (this will increase the sample size specific
threshold $(1  1 / \log_{10}(2200) > 0.7$
and then the bias specific
threshold 0.7 dominates).
If $0.7 <= k < 1$
, the PSIS estimate and the corresponding Monte
Carlo standard error have large bias and are not reliable. Increasing
the sample size may reduce the variability in the $k$
estimate, which
may also result in a lower $k$
estimate.
If $k \geq 1$
, the target distribution is estimated to
have nonfinite mean. The PSIS estimate and the corresponding Monte
Carlo standard error are not well defined. Increasing the sample size
may reduce the variability in $k$
estimate, which may also result in
a lower $k$
estimate.
Pareto $k$
is also useful as a measure of influence of an
observation. Highly influential observations have high $k$
values. Very high $k$
values often indicate model
misspecification, outliers or mistakes in data processing. See
Section 6 of Gabry et al. (2019) for an example.
p_loo
when Pareto k
is largeIf $k > 0.7$
then we can also look at
the p_loo
estimate for some additional information about the problem:
If p_loo << p
(the total number of parameters in the model),
then the model is likely to be misspecified. Posterior predictive checks
(PPCs) are then likely to also detect the problem. Try using an overdispersed
model, or add more structural information (nonlinearity, mixture model,
etc.).
If p_loo < p
and the number of parameters p
is relatively
large compared to the number of observations (e.g., p>N/5
), it is
likely that the model is so flexible or the population prior so weak that it’s
difficult to predict the left out observation (even for the true model).
This happens, for example, in the simulated 8 schools (in VGG2017), random
effect models with a few observations per random effect, and Gaussian
processes and spatial models with short correlation lengths.
If p_loo > p
, then the model is likely to be badly misspecified.
If the number of parameters p<<N
, then PPCs are also likely to detect the
problem. See the case study at
https://avehtari.github.io/modelselection/roaches.html for an example.
If p
is relatively large compared to the number of
observations, say p>N/5
(more accurately we should count number of
observations influencing each parameter as in hierarchical models some groups
may have few observations and other groups many), it is possible that PPCs won't
detect the problem.
elpd_diff
is the difference in elpd_loo
for two models. If more
than two models are compared, the difference is computed relative to the
model with highest elpd_loo
.
The standard error of componentwise differences of elpd_loo (Eq 24 in VGG2017) between two models. This SE is smaller than the SE for individual models due to correlation (i.e., if some observations are easier and some more difficult to predict for all models).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leaveoneout crossvalidation based model comparison. preprint arXiv:2008.10296v3..
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389402. doi:10.1111/rssa.12378 (journal version, preprint arXiv:1709.01449, code on GitHub)
psis_loo_ss
object.The number of observations in a psis_loo_ss
object.
## S3 method for class 'psis_loo_ss' nobs(object, ...)
## S3 method for class 'psis_loo_ss' nobs(object, ...)
object 
a 
... 
Currently unused. 
Get observation indices used in subsampling
obs_idx(x, rep = TRUE)
obs_idx(x, rep = TRUE)
x 
A 
rep 
If sampling with replacement is used, an observation can have
multiple samples and these are then repeated in the returned object if

An integer vector.
Print a diagnostic table summarizing the estimated Pareto shape parameters
and PSIS effective sample sizes, find the indexes of observations for which
the estimated Pareto shape parameter $k$
is larger than some
threshold
value, or plot observation indexes vs. diagnostic estimates.
The Details section below provides a brief overview of the
diagnostics, but we recommend consulting Vehtari, Gelman, and Gabry (2017)
and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for full details.
pareto_k_table(x) pareto_k_ids(x, threshold = NULL) pareto_k_values(x) pareto_k_influence_values(x) psis_n_eff_values(x) mcse_loo(x, threshold = NULL) ## S3 method for class 'psis_loo' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" ) ## S3 method for class 'psis' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" )
pareto_k_table(x) pareto_k_ids(x, threshold = NULL) pareto_k_values(x) pareto_k_influence_values(x) psis_n_eff_values(x) mcse_loo(x, threshold = NULL) ## S3 method for class 'psis_loo' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" ) ## S3 method for class 'psis' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" )
x 

threshold 
For 
diagnostic 
For the 
label_points , ...

For the 
main 
For the 
The reliability and approximate convergence rate of the PSISbased
estimates can be assessed using the estimates for the shape
parameter $k$
of the generalized Pareto distribution. The
diagnostic threshold for Pareto $k$
depends on sample size
$S$
(sample size dependent threshold was introduced by Vehtari
et al. (2024), and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity, loo
package uses the nominal sample
size $S$
when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but if MCMCESS > S/2 the difference
is usually negligible. Thinning of MCMC draws can be used to
improve the ratio ESS/S.
If $k < min(1  1 / log10(S), 0.7)$
, where $S$
is the
sample size, the PSIS estimate and the corresponding Monte Carlo
standard error estimate are reliable.
If $1  1 / log10(S) <= k < 0.7$
, the PSIS estimate and the
corresponding Monte Carlo standard error estimate are not
reliable, but increasing the (effective) sample size $S$
above
2200 may help (this will increase the sample size specific
threshold $(11/log10(2200)>0.7$
and then the bias specific
threshold 0.7 dominates).
If $0.7 <= k < 1$
, the PSIS estimate and the corresponding Monte
Carlo standard error have large bias and are not reliable. Increasing
the sample size may reduce the variability in $k$
estimate, which
may result in lower $k$
estimate, too.
If $k \geq 1$
, the target distribution is estimated to
have a nonfinite mean. The PSIS estimate and the corresponding Monte
Carlo standard error are not well defined. Increasing the sample size
may reduce the variability in the $k$
estimate, which
may also result in a lower $k$
estimate.
$k$
exceeds the diagnostic threshold? Importance sampling is likely to
work less well if the marginal posterior $p(\theta^s  y)$
and
LOO posterior $p(\theta^s  y_{i})$
are very different, which
is more likely to happen with a nonrobust model and highly
influential observations. If the estimated tail shape parameter
$k$
exceeds the diagnostic threshold, the user should be
warned. (Note: If $k$
is greater than the diagnostic threshold
then WAIC is also likely to fail, but WAIC lacks as accurate
diagnostic.) When using PSIS in the context of approximate LOOCV,
we recommend one of the following actions:
With some additional computations, it is possible to transform
the MCMC draws from the posterior distribution to obtain more
reliable importance sampling estimates. This results in a smaller
shape parameter $k$
. See loo_moment_match()
and the
vignette Avoiding model refits in leaveoneout crossvalidation
with moment matching for an example of this.
Sampling from a leaveoneout mixture distribution (see the
vignette Mixture IS leaveoneout crossvalidation for
highdimensional Bayesian models), directly from $p(\theta^s
 y_{i})$
for the problematic observations $i$
, or using
$K$
fold crossvalidation (see the vignette Holdout
validation and Kfold crossvalidation of Stan programs with the
loo package) will generally be more stable.
Using a model that is more robust to anomalous observations will generally make approximate LOOCV more stable.
The estimated shape parameter
$k$
for each observation can be used as a measure of the observation's
influence on posterior distribution of the model. These can be obtained with
pareto_k_influence_values()
.
In the case that we
obtain the samples from the proposal distribution via MCMC the loo
package also computes estimates for the Monte Carlo error and the effective
sample size for importance sampling, which are more accurate for PSIS than
for IS and TIS (see Vehtari et al (2024) for details). However, the PSIS
effective sample size estimate will be
overoptimistic when the estimate of $k$
is greater than
$min(11/log10(S), 0.7)$
, where $S$
is the sample size.
pareto_k_table()
returns an object of class
"pareto_k_table"
, which is a matrix with columns "Count"
,
"Proportion"
, and "Min. n_eff"
, and has its own print method.
pareto_k_ids()
returns an integer vector indicating which
observations have Pareto $k$
estimates above threshold
.
pareto_k_values()
returns a vector of the estimated Pareto
$k$
parameters. These represent the reliability of sampling.
pareto_k_influence_values()
returns a vector of the estimated Pareto
$k$
parameters. These represent influence of the observations on the
model posterior distribution.
psis_n_eff_values()
returns a vector of the estimated PSIS
effective sample sizes.
mcse_loo()
returns the Monte Carlo standard error (MCSE)
estimate for PSISLOO. MCSE will be NA if any Pareto $k$
values are
above threshold
.
The plot()
method is called for its side effect and does not
return anything. If x
is the result of a call to loo()
or psis()
then plot(x, diagnostic)
produces a plot of
the estimates of the Pareto shape parameters (diagnostic = "k"
) or
estimates of the PSIS effective sample sizes (diagnostic = "ESS"
).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
psis()
for the implementation of the PSIS algorithm.
The FAQ page on the loo website for answers to frequently asked questions.
Convenience function for extracting pointwise estimates
pointwise(x, estimate, ...) ## S3 method for class 'loo' pointwise(x, estimate, ...)
pointwise(x, estimate, ...) ## S3 method for class 'loo' pointwise(x, estimate, ...)
x 
A 
estimate 
Which pointwise estimate to return. By default all are
returned. The objects returned by the different functions ( 
... 
Currently ignored. 
A vector of length equal to the number of observations.
x < loo(example_loglik_array()) pointwise(x, "elpd_loo")
x < loo(example_loglik_array()) pointwise(x, "elpd_loo")
Print methods
## S3 method for class 'loo' print(x, digits = 1, ...) ## S3 method for class 'waic' print(x, digits = 1, ...) ## S3 method for class 'psis_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis_loo_ap' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling' print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'loo' print(x, digits = 1, ...) ## S3 method for class 'waic' print(x, digits = 1, ...) ## S3 method for class 'psis_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis_loo_ap' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling' print(x, digits = 1, plot_k = FALSE, ...)
x 

digits 
An integer passed to 
... 
Arguments passed to 
plot_k 
Logical. If 
x
, invisibly.
Implementation of Pareto smoothed importance sampling (PSIS), a method for stabilizing importance ratios. The version of PSIS implemented here corresponds to the algorithm presented in Vehtari, Simpson, Gelman, Yao, and Gabry (2024). For PSIS diagnostics see the paretokdiagnostic page.
psis(log_ratios, ...) ## S3 method for class 'array' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: psis(log_ratios, ..., r_eff = 1) is.psis(x) is.sis(x) is.tis(x)
psis(log_ratios, ...) ## S3 method for class 'array' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: psis(log_ratios, ..., r_eff = 1) is.psis(x) is.sis(x) is.tis(x)
log_ratios 
An array, matrix, or vector of importance ratios on the log scale (for PSISLOO these are negative loglikelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. 
... 
Arguments passed on to the various methods. 
r_eff 
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of 
cores 
The number of cores to use for parallelization. This defaults to
the option

x 
For 
The psis()
methods return an object of class "psis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed (and truncated) but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class "psis"
.
diagnostics
A named list containing two vectors:
pareto_k
: Estimates of the shape parameter $k$
of the
generalized Pareto distribution. See the paretokdiagnostic
page for details.
n_eff
: PSIS effective sample size estimates.
Objects of class "psis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights
method to normalize the log weights.
tail_len
Vector of tail lengths used for fitting the generalized Pareto distribution.
r_eff
If specified, the user's r_eff
argument.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here psis
.
psis(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
psis(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
psis(default)
: A vector of length $S$
(posterior sample size).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
loo()
for approximate LOOCV using PSIS.
paretokdiagnostic for PSIS diagnostics.
The loo package vignettes for demonstrations.
The FAQ page on the loo website for answers to frequently asked questions.
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) psis_result < psis(log_ratios, r_eff = r_eff) str(psis_result) plot(psis_result) # extract smoothed weights lw < weights(psis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(psis_result, normalize=FALSE) # unnormalized logweights w < weights(psis_result, log=FALSE) # normalized weights (not logweights) uw < weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) psis_result < psis(log_ratios, r_eff = r_eff) str(psis_result) plot(psis_result) # extract smoothed weights lw < weights(psis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(psis_result, normalize=FALSE) # unnormalized logweights w < weights(psis_result, log=FALSE) # normalized weights (not logweights) uw < weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weights
As of version 2.0.0
this function is deprecated. Please use the
psis()
function for the new PSIS algorithm.
psislw( lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("mc.cores", 1), llfun = NULL, llargs = NULL, ... )
psislw( lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("mc.cores", 1), llfun = NULL, llargs = NULL, ... )
lw 
A matrix or vector of log weights. For computing LOO, 
wcp 
The proportion of importance weights to use for the generalized
Pareto fit. The 
wtrunc 
For truncating very large weights to 
cores 
The number of cores to use for parallelization. This defaults to
the option 
llfun , llargs

See 
... 
Ignored when 
A named list with components lw_smooth
(modified log weights) and
pareto_k
(estimated generalized Pareto shape parameter(s) k).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
paretokdiagnostic for PSIS diagnostics.
relative_eff()
computes the the MCMC effective sample size divided by
the total sample size.
relative_eff(x, ...) ## Default S3 method: relative_eff(x, chain_id, ...) ## S3 method for class 'matrix' relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'array' relative_eff(x, ..., cores = getOption("mc.cores", 1)) ## S3 method for class ''function'' relative_eff( x, chain_id, ..., cores = getOption("mc.cores", 1), data = NULL, draws = NULL ) ## S3 method for class 'importance_sampling' relative_eff(x, ...)
relative_eff(x, ...) ## Default S3 method: relative_eff(x, chain_id, ...) ## S3 method for class 'matrix' relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'array' relative_eff(x, ..., cores = getOption("mc.cores", 1)) ## S3 method for class ''function'' relative_eff( x, chain_id, ..., cores = getOption("mc.cores", 1), data = NULL, draws = NULL ) ## S3 method for class 'importance_sampling' relative_eff(x, ...)
x 
A vector, matrix, 3D array, or function. See the Methods (by
class) section below for details on specifying 
chain_id 
A vector of length 
cores 
The number of cores to use for parallelization. 
data , draws , ...

Same as for the 
A vector of relative effective sample sizes.
relative_eff(default)
: A vector of length $S$
(posterior sample size).
relative_eff(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
relative_eff(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
relative_eff(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the loglikelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
loglikelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
loglikelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
loglikelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise loglikelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the loglikelihood function.
The ...
can be used if your loglikelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
relative_eff(importance_sampling)
: If x
is an object of class "psis"
, relative_eff()
simply returns
the r_eff
attribute of x
.
LLarr < example_loglik_array() LLmat < example_loglik_matrix() dim(LLarr) dim(LLmat) rel_n_eff_1 < relative_eff(exp(LLarr)) rel_n_eff_2 < relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) all.equal(rel_n_eff_1, rel_n_eff_2)
LLarr < example_loglik_array() LLmat < example_loglik_matrix() dim(LLarr) dim(LLmat) rel_n_eff_1 < relative_eff(exp(LLarr)) rel_n_eff_2 < relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) all.equal(rel_n_eff_1, rel_n_eff_2)
Implementation of standard importance sampling (SIS).
sis(log_ratios, ...) ## S3 method for class 'array' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## Default S3 method: sis(log_ratios, ..., r_eff = NULL)
sis(log_ratios, ...) ## S3 method for class 'array' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## Default S3 method: sis(log_ratios, ..., r_eff = NULL)
log_ratios 
An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative loglikelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. 
... 
Arguments passed on to the various methods. 
r_eff 
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of 
cores 
The number of cores to use for parallelization. This defaults to
the option

The sis()
methods return an object of class "sis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class sis
.
diagnostics
A named list containing one vector:
pareto_k
: Not used in sis
, all set to 0.
n_eff
: effective sample size estimates.
Objects of class "sis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights
method to normalize the log weights.
r_eff
If specified, the user's r_eff
argument.
tail_len
Not used for sis
.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here sis
.
sis(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
sis(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
sis(default)
: A vector of length $S$
(posterior sample size).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
psis()
for approximate LOOCV using PSIS.
loo()
for approximate LOOCV.
paretokdiagnostic for PSIS diagnostics.
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) sis_result < sis(log_ratios, r_eff = r_eff) str(sis_result) # extract smoothed weights lw < weights(sis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(sis_result, normalize=FALSE) # unnormalized logweights w < weights(sis_result, log=FALSE) # normalized weights (not logweights) uw < weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) sis_result < sis(log_ratios, r_eff = r_eff) str(sis_result) # extract smoothed weights lw < weights(sis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(sis_result, normalize=FALSE) # unnormalized logweights w < weights(sis_result, log=FALSE) # normalized weights (not logweights) uw < weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weights
Implementation of truncated (selfnormalized) importance sampling (TIS), truncated at S^(1/2) as recommended by Ionides (2008).
tis(log_ratios, ...) ## S3 method for class 'array' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: tis(log_ratios, ..., r_eff = 1)
tis(log_ratios, ...) ## S3 method for class 'array' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: tis(log_ratios, ..., r_eff = 1)
log_ratios 
An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative loglikelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. 
... 
Arguments passed on to the various methods. 
r_eff 
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of 
cores 
The number of cores to use for parallelization. This defaults to
the option

The tis()
methods return an object of class "tis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed (and truncated) but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class tis
.
diagnostics
A named list containing one vector:
pareto_k
: Not used in tis
, all set to 0.
n_eff
: Effective sample size estimates.
Objects of class "tis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights()
method to normalize the log weights.
r_eff
If specified, the user's r_eff
argument.
tail_len
Not used for tis
.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here tis
.
tis(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
tis(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
tis(default)
: A vector of length $S$
(posterior sample size).
Ionides, Edward L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17(2): 295–311.
psis()
for approximate LOOCV using PSIS.
loo()
for approximate LOOCV.
paretokdiagnostic for PSIS diagnostics.
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) tis_result < tis(log_ratios, r_eff = r_eff) str(tis_result) # extract smoothed weights lw < weights(tis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(tis_result, normalize=FALSE) # unnormalized logweights w < weights(tis_result, log=FALSE) # normalized weights (not logweights) uw < weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios < 1 * example_loglik_array() r_eff < relative_eff(exp(log_ratios)) tis_result < tis(log_ratios, r_eff = r_eff) str(tis_result) # extract smoothed weights lw < weights(tis_result) # default args are log=TRUE, normalize=TRUE ulw < weights(tis_result, normalize=FALSE) # unnormalized logweights w < weights(tis_result, log=FALSE) # normalized weights (not logweights) uw < weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weights
psis_loo_ss
objectsUpdate psis_loo_ss
objects
## S3 method for class 'psis_loo_ss' update( object, ..., data = NULL, draws = NULL, observations = NULL, r_eff = 1, cores = getOption("mc.cores", 1), loo_approximation = NULL, loo_approximation_draws = NULL, llgrad = NULL, llhess = NULL )
## S3 method for class 'psis_loo_ss' update( object, ..., data = NULL, draws = NULL, observations = NULL, r_eff = 1, cores = getOption("mc.cores", 1), loo_approximation = NULL, loo_approximation_draws = NULL, llgrad = NULL, llhess = NULL )
object 
A 
... 
Currently not used. 
data , draws


observations 
The subsample observations to use. The argument can take four (4) types of arguments:

r_eff 
Vector of relative effective sample size estimates for the
likelihood ( 
cores 
The number of cores to use for parallelization. This defaults to
the option

loo_approximation 
What type of approximation of the loo_i's should be used?
The default is
As point estimates of 
loo_approximation_draws 
The number of posterior draws used when
integrating over the posterior. This is used if 
llgrad 
The gradient of the loglikelihood. This
is only used when 
llhess 
The Hessian of the loglikelihood. This is only used
with 
If observations
is updated then if a vector of indices or a psis_loo_ss
object is supplied the updated object will have exactly the observations
indicated by the vector or psis_loo_ss
object. If a single integer is
supplied, new observations will be sampled to reach the supplied sample size.
A psis_loo_ss
object.
The waic()
methods can be used to compute WAIC from the pointwise
loglikelihood. However, we recommend LOOCV using PSIS (as implemented by
the loo()
function) because PSIS provides useful diagnostics as well as
effective sample size and Monte Carlo estimates.
waic(x, ...) ## S3 method for class 'array' waic(x, ...) ## S3 method for class 'matrix' waic(x, ...) ## S3 method for class ''function'' waic(x, ..., data = NULL, draws = NULL) is.waic(x)
waic(x, ...) ## S3 method for class 'array' waic(x, ...) ## S3 method for class 'matrix' waic(x, ...) ## S3 method for class ''function'' waic(x, ..., data = NULL, draws = NULL) is.waic(x)
x 
A loglikelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. 
draws , data , ...

For the function method only. See the Methods (by class) section below for details on these arguments. 
A named list (of class c("waic", "loo")
) with components:
estimates
A matrix with two columns ("Estimate"
, "SE"
) and three
rows ("elpd_waic"
, "p_waic"
, "waic"
). This contains
point estimates and standard errors of the expected log pointwise predictive
density (elpd_waic
), the effective number of parameters
(p_waic
) and the information criterion waic
(which is just
2 * elpd_waic
, i.e., converted to deviance scale).
pointwise
A matrix with three columns (and number of rows equal to the number of
observations) containing the pointwise contributions of each of the above
measures (elpd_waic
, p_waic
, waic
).
waic(array)
: An $I$
by $C$
by $N$
array, where $I$
is the number of MCMC iterations per chain, $C$
is the number of
chains, and $N$
is the number of data points.
waic(matrix)
: An $S$
by $N$
matrix, where $S$
is the size
of the posterior sample (with all chains merged) and $N$
is the number
of data points.
waic(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the loglikelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
loglikelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
loglikelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
loglikelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise loglikelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the loglikelihood function.
The ...
can be used if your loglikelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 35713594.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s1122201696964 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):158. PDF
The loo package vignettes and
Vehtari, Gelman, and Gabry (2017) and Vehtari, Simpson, Gelman, Yao,
and Gabry (2024) for more details on why we prefer loo()
to waic()
.
loo_compare()
for comparing models on approximate LOOCV or WAIC.
### Array and matrix methods LLarr < example_loglik_array() dim(LLarr) LLmat < example_loglik_matrix() dim(LLmat) waic_arr < waic(LLarr) waic_mat < waic(LLmat) identical(waic_arr, waic_mat) ## Not run: log_lik1 < extract_log_lik(stanfit1) log_lik2 < extract_log_lik(stanfit2) (waic1 < waic(log_lik1)) (waic2 < waic(log_lik2)) print(compare(waic1, waic2), digits = 2) ## End(Not run)
### Array and matrix methods LLarr < example_loglik_array() dim(LLarr) LLmat < example_loglik_matrix() dim(LLmat) waic_arr < waic(LLarr) waic_mat < waic(LLmat) identical(waic_arr, waic_mat) ## Not run: log_lik1 < extract_log_lik(stanfit1) log_lik2 < extract_log_lik(stanfit2) (waic1 < waic(log_lik1)) (waic2 < waic(log_lik2)) print(compare(waic1, waic2), digits = 2) ## End(Not run)
Extract importance sampling weights
## S3 method for class 'importance_sampling' weights(object, ..., log = TRUE, normalize = TRUE)
## S3 method for class 'importance_sampling' weights(object, ..., log = TRUE, normalize = TRUE)
object 

... 
Ignored. 
log 
Should the weights be returned on the log scale? Defaults to

normalize 
Should the weights be normalized? Defaults to 
The weights()
method returns an object with the same dimensions as
the log_weights
component of object
. The normalize
and log
arguments control whether the returned weights are normalized and whether
or not to return them on the log scale.
# See the examples at help("psis")
# See the examples at help("psis")