Title: | Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models |
---|---|
Description: | Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions. |
Authors: | Aki Vehtari [aut], Jonah Gabry [cre, aut], Måns Magnusson [aut], Yuling Yao [aut], Paul-Christian Bürkner [aut], Topi Paananen [aut], Andrew Gelman [aut], Ben Goodrich [ctb], Juho Piironen [ctb], Bruno Nicenboim [ctb], Leevi Lindgren [ctb] |
Maintainer: | Jonah Gabry <[email protected]> |
License: | GPL (>=3) |
Version: | 2.8.0.9000 |
Built: | 2024-09-12 19:19:50 UTC |
Source: | https://github.com/stan-dev/loo |
Stan Development Team
This package implements the methods described in Vehtari, Gelman, and
Gabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2024), and
Yao et al. (2018). To get started see the loo package
vignettes, the
loo()
function for efficient approximate leave-one-out
cross-validation (LOO-CV), the psis()
function for the Pareto
smoothed importance sampling (PSIS) algorithm, or
loo_model_weights()
for an implementation of Bayesian stacking of
predictive distributions from multiple models.
Leave-one-out cross-validation (LOO-CV) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO-CV and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. This package implements the fast and stable computations for approximate LOO-CV laid out in Vehtari, Gelman, and Gabry (2017). From existing posterior simulation draws, we compute LOO-CV using Pareto smoothed importance sampling (PSIS; Vehtari, Simpson, Gelman, Yao, and Gabry, 2024), a new procedure for stabilizing and diagnosing importance weights. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparing of predictive errors between two models.
We recommend PSIS-LOO-CV instead of WAIC, because PSIS provides useful diagnostics and effective sample size and Monte Carlo standard error estimates.
Maintainer: Jonah Gabry [email protected]
Authors:
Aki Vehtari [email protected]
Måns Magnusson
Yuling Yao
Paul-Christian Bürkner
Topi Paananen
Andrew Gelman
Other contributors:
Ben Goodrich [contributor]
Juho Piironen [contributor]
Bruno Nicenboim [contributor]
Leevi Lindgren [contributor]
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
Epifani, I., MacEachern, S. N., and Peruggia, M. (2008). Case-deletion importance sampling estimators: Central limit theorems and related results. Electronic Journal of Statistics 2, 774-806.
Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice, ed. W. R. Gilks, S. Richardson, D. J. Spiegelhalter, 145-162. London: Chapman and Hall.
Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In Bayesian Statistics 4, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 147-167. Oxford University Press.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997-1016.
Ionides, E. L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17, 295-311.
Koopman, S. J., Shephard, N., and Creal, D. (2009). Testing the assumptions behind importance sampling. Journal of Econometrics 149, 2-11.
Peruggia, M. (1997). On the variability of case-deletion importance sampling weights in the Bayesian linear model. Journal of the American Statistical Association 92, 199-207.
Stan Development Team (2017). The Stan C++ Library, Version 2.17.0. https://mc-stan.org.
Stan Development Team (2018). RStan: the R interface to Stan, Version 2.17.3. https://mc-stan.org.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571-3594.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316-325.
Useful links:
Report bugs at https://github.com/stan-dev/loo/issues
Pareto smoothed importance sampling (PSIS) using approximate posteriors
ap_psis(log_ratios, log_p, log_g, ...) ## S3 method for class 'array' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## Default S3 method: ap_psis(log_ratios, log_p, log_g, ...)
ap_psis(log_ratios, log_p, log_g, ...) ## S3 method for class 'array' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1)) ## Default S3 method: ap_psis(log_ratios, log_p, log_g, ...)
log_ratios |
The log-likelihood ratios (ie -log_liks) |
log_p |
The log-posterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. |
log_g |
The log-density (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. |
... |
Currently not in use. |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
ap_psis(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
ap_psis(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
ap_psis(default)
: A vector of length (posterior sample size).
This function is deprecated. Please use the new loo_compare()
function
instead.
compare(..., x = list())
compare(..., x = list())
... |
|
x |
A list of at least two objects returned by |
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in elpd_loo
or
elpd_waic
(or multiplied by -2, if desired, to be on the
deviance scale).
When that difference, elpd_diff
, is positive then the expected
predictive accuracy for the second model is higher. A negative
elpd_diff
favors the first model.
When using compare()
with more than two models, the values in the
elpd_diff
and se_diff
columns of the returned matrix are
computed by making pairwise comparisons between each model and the model
with the best ELPD (i.e., the model in the first row).
Although the elpd_diff
column is equal to the difference in
elpd_loo
, do not expect the se_diff
column to be equal to the
the difference in se_elpd_loo
.
To compute the standard error of the difference in ELPD we use a paired estimate to take advantage of the fact that the same set of N data points was used to fit both models. These calculations should be most useful when N is large, because then non-normality of the distribution is not such an issue when estimating the uncertainty in these sums. These standard errors, for all their flaws, should give a better sense of uncertainty than what is obtained using the current standard approach of comparing differences of deviances to a Chi-squared distribution, a practice derived for Gaussian linear models or asymptotically, and which only applies to nested models in any case.
A vector or matrix with class 'compare.loo'
that has its own
print method. If exactly two objects are provided in ...
or
x
, then the difference in expected predictive accuracy and the
standard error of the difference are returned. If more than two objects are
provided then a matrix of summary information is returned (see Details).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
## Not run: loo1 <- loo(log_lik1) loo2 <- loo(log_lik2) print(compare(loo1, loo2), digits = 3) print(compare(x = list(loo1, loo2))) waic1 <- waic(log_lik1) waic2 <- waic(log_lik2) compare(waic1, waic2) ## End(Not run)
## Not run: loo1 <- loo(log_lik1) loo2 <- loo(log_lik2) print(compare(loo1, loo2), digits = 3) print(compare(x = list(loo1, loo2))) waic1 <- waic(log_lik1) waic2 <- waic(log_lik2) compare(waic1, waic2) ## End(Not run)
The crps()
and scrps()
functions and their loo_*()
counterparts can be
used to compute the continuously ranked probability score (CRPS) and scaled
CRPS (SCRPS) (see Bolin and Wallin, 2022). CRPS is a proper scoring rule, and
strictly proper when the first moment of the predictive distribution is
finite. Both can be expressed in terms of samples form the predictive
distribution. See e.g. Gneiting and Raftery (2007) for a comprehensive
discussion on CRPS.
crps(x, ...) scrps(x, ...) loo_crps(x, ...) loo_scrps(x, ...) ## S3 method for class 'matrix' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_crps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_scrps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) )
crps(x, ...) scrps(x, ...) loo_crps(x, ...) loo_scrps(x, ...) ## S3 method for class 'matrix' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' crps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_crps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'numeric' scrps(x, x2, y, ..., permutations = 1) ## S3 method for class 'matrix' loo_scrps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1) )
x |
A |
... |
Passed on to |
x2 |
Independent draws from the same distribution as draws in |
y |
A vector of observations or a single value. |
permutations |
An integer, with default value of 1, specifying how many
times the expected value of |X - X'| ( |
log_lik |
A log-likelihood matrix the same size as |
r_eff |
An optional vector of relative effective sample size estimates
containing one element per observation. See |
cores |
The number of cores to use for parallelization of |
To compute (S)CRPS, the user needs to provide two sets of draws, x
and
x2
, from the predictive distribution. This is due to the fact that formulas
used to compute CRPS involve an expectation of the absolute difference of x
and x2
, both having the same distribution. See the permutations
argument,
as well as Gneiting and Raftery (2007) for details.
A list containing two elements: estimates
and pointwise
.
The former reports estimator and standard error and latter the pointwise
values.
Bolin, D., & Wallin, J. (2022). Local scale invariance and robustness of proper scoring rules. arXiv. doi:10.48550/arXiv.1912.05642
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378.
## Not run: # An example using rstanarm library(rstanarm) data("kidiq") fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq) ypred1 <- posterior_predict(fit) ypred2 <- posterior_predict(fit) crps(ypred1, ypred2, y = fit$y) loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit)) ## End(Not run)
## Not run: # An example using rstanarm library(rstanarm) data("kidiq") fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq) ypred1 <- posterior_predict(fit) ypred2 <- posterior_predict(fit) crps(ypred1, ypred2, y = fit$y) loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit)) ## End(Not run)
The E_loo()
function computes weighted expectations (means, variances,
quantiles) using the importance weights obtained from the
PSIS smoothing procedure. The expectations estimated by the
E_loo()
function assume that the PSIS approximation is working well.
A small Pareto k estimate is necessary,
but not sufficient, for E_loo()
to give reliable estimates. Additional
diagnostic checks for gauging the reliability of the estimates are in
development and will be added in a future release.
E_loo(x, psis_object, ...) ## Default S3 method: E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL ) ## S3 method for class 'matrix' E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL )
E_loo(x, psis_object, ...) ## Default S3 method: E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL ) ## S3 method for class 'matrix' E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL )
x |
A numeric vector or matrix. |
psis_object |
An object returned by |
... |
Arguments passed to individual methods. |
type |
The type of expectation to compute. The options are
|
probs |
For computing quantiles, a vector of probabilities. |
log_ratios |
Optionally, a vector or matrix (the same dimensions as |
A named list with the following components:
value
The result of the computation.
For the matrix method, value
is a vector with ncol(x)
elements, with one exception: when type="quantile"
and
multiple values are specified in probs
the value
component of
the returned object is a length(probs)
by ncol(x)
matrix.
For the default/vector method the value
component is scalar, with
one exception: when type="quantile"
and multiple values
are specified in probs
the value
component is a vector with
length(probs)
elements.
pareto_k
Function-specific diagnostic.
For the matrix method it will be a vector of length ncol(x)
containing estimates of the shape parameter of the
generalized Pareto distribution. For the default/vector method,
the estimate is a scalar. If
log_ratios
is not specified when
calling E_loo()
, the smoothed log-weights are used to estimate
Pareto-k's, which may produce optimistic estimates.
For type="mean"
, type="var"
, and type="sd"
, the returned Pareto-k is
usually the maximum of the Pareto-k's for the left and right tail of
and the right tail of
, where
is the importance ratio and
for
type="mean"
and for
type="var"
and type="sd"
.
If is binary, constant, or not finite, or if
type="quantile"
, the
returned Pareto-k is the Pareto-k for the right tail of .
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a log-likelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit <- stan_glm(weight ~ group, data = d, refresh = 0) yrep <- posterior_predict(fit) dim(yrep) log_ratios <- -1 * log_lik(fit) dim(log_ratios) r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000)) psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2) E_loo(yrep, psis_object, type = "mean") E_loo(yrep, psis_object, type = "var") E_loo(yrep, psis_object, type = "sd") E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9)) # We can get more accurate Pareto k diagnostic if we also provide # the log_ratios argument E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios) }
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a log-likelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit <- stan_glm(weight ~ group, data = d, refresh = 0) yrep <- posterior_predict(fit) dim(yrep) log_ratios <- -1 * log_lik(fit) dim(log_ratios) r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000)) psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2) E_loo(yrep, psis_object, type = "mean") E_loo(yrep, psis_object, type = "var") E_loo(yrep, psis_object, type = "sd") E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9)) # We can get more accurate Pareto k diagnostic if we also provide # the log_ratios argument E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios) }
The elpd()
methods for arrays and matrices can compute the expected log
pointwise predictive density for a new dataset or the log pointwise
predictive density of the observed data (an overestimate of the elpd).
elpd(x, ...) ## S3 method for class 'array' elpd(x, ...) ## S3 method for class 'matrix' elpd(x, ...)
elpd(x, ...) ## S3 method for class 'array' elpd(x, ...) ## S3 method for class 'matrix' elpd(x, ...)
x |
A log-likelihood array or matrix. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. |
... |
Currently ignored. |
The elpd()
function is an S3 generic and methods are provided for
3-D pointwise log-likelihood arrays and matrices.
elpd(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
elpd(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
The vignette Holdout validation and K-fold cross-validation of Stan
programs with the loo package for demonstrations of using the elpd()
methods.
# Calculate the lpd of the observed data LLarr <- example_loglik_array() elpd(LLarr)
# Calculate the lpd of the observed data LLarr <- example_loglik_array() elpd(LLarr)
Example pointwise log-likelihood objects to use in demonstrations and tests. See the Value and Examples sections below.
example_loglik_array() example_loglik_matrix()
example_loglik_array() example_loglik_matrix()
example_loglik_array()
returns a 500 (draws) x 2 (chains) x 32
(observations) pointwise log-likelihood array.
example_loglik_matrix()
returns the same pointwise log-likelihood values
as example_loglik_array()
but reshaped into a 1000 (draws*chains) x 32
(observations) matrix.
LLarr <- example_loglik_array() (dim_arr <- dim(LLarr)) LLmat <- example_loglik_matrix() (dim_mat <- dim(LLmat)) all.equal(dim_mat[1], dim_arr[1] * dim_arr[2]) all.equal(dim_mat[2], dim_arr[3]) all.equal(LLarr[, 1, ], LLmat[1:500, ]) all.equal(LLarr[, 2, ], LLmat[501:1000, ])
LLarr <- example_loglik_array() (dim_arr <- dim(LLarr)) LLmat <- example_loglik_matrix() (dim_mat <- dim(LLmat)) all.equal(dim_mat[1], dim_arr[1] * dim_arr[2]) all.equal(dim_mat[2], dim_arr[3]) all.equal(LLarr[, 1, ], LLmat[1:500, ]) all.equal(LLarr[, 2, ], LLmat[501:1000, ])
Convenience function for extracting the pointwise log-likelihood
matrix or array from a stanfit
object from the rstan package.
Note: recent versions of rstan now include a loo()
method for
stanfit
objects that handles this internally.
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)
stanfit |
A |
parameter_name |
A character string naming the parameter (or generated quantity) in the Stan model corresponding to the log-likelihood. |
merge_chains |
If |
Stan does not automatically compute and store the log-likelihood. It is up to the user to incorporate it into the Stan program if it is to be extracted after fitting the model. In a Stan model, the pointwise log likelihood can be coded as a vector in the transformed parameters block (and then summed up in the model block) or it can be coded entirely in the generated quantities block. We recommend using the generated quantities block so that the computations are carried out only once per iteration rather than once per HMC leapfrog step.
For example, the following is the generated quantities
block for
computing and saving the log-likelihood for a linear regression model with
N
data points, outcome y
, predictor matrix X
,
coefficients beta
, and standard deviation sigma
:
vector[N] log_lik;
for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | X[n, ] * beta, sigma);
If merge_chains=TRUE
, an by
matrix of
(post-warmup) extracted draws, where
is the size of the posterior
sample and
is the number of data points. If
merge_chains=FALSE
, an by
by
array, where
.
Stan Development Team (2017). The Stan C++ Library, Version 2.16.0. https://mc-stan.org/
Stan Development Team (2017). RStan: the R interface to Stan, Version 2.16.1. https://mc-stan.org/
Given a sample , Estimate the parameters
and
of
the generalized Pareto distribution (GPD), assuming the location parameter is
0. By default the fit uses a prior for
, which will stabilize
estimates for very small sample sizes (and low effective sample sizes in the
case of MCMC samples). The weakly informative prior is a Gaussian prior
centered at 0.5.
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)
x |
A numeric vector. The sample from which to estimate the parameters. |
wip |
Logical indicating whether to adjust |
min_grid_pts |
The minimum number of grid points used in the fitting
algorithm. The actual number used is |
sort_x |
If |
Here the parameter is the negative of
in Zhang &
Stephens (2009).
A named list with components k
and sigma
.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316-325.
A parent class for different importance sampling methods.
importance_sampling(log_ratios, method, ...) ## S3 method for class 'array' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## Default S3 method: importance_sampling(log_ratios, method, ..., r_eff = 1)
importance_sampling(log_ratios, method, ...) ## S3 method for class 'array' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1) ) ## Default S3 method: importance_sampling(log_ratios, method, ..., r_eff = 1)
log_ratios |
An array, matrix, or vector of importance ratios on the log scale (for PSIS-LOO these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. |
method |
The importance sampling method to use. The following methods are implemented: |
... |
Arguments passed on to the various methods. |
r_eff |
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
For developers of Bayesian modeling packages, loo includes
a generic function kfold()
so that methods may be defined for K-fold
CV without name conflicts between packages. See, for example, the
kfold()
methods in the rstanarm and brms packages.
The Value section below describes the objects that kfold()
methods should return in order to be compatible with
loo_compare()
and the loo package print methods.
kfold(x, ...) is.kfold(x)
kfold(x, ...) is.kfold(x)
x |
A fitted model object. |
... |
Arguments to pass to specific methods. |
For developers defining a kfold()
method for a class
"foo"
, the kfold.foo()
function should return a list with class
c("kfold", "loo")
with at least the following named elements:
"estimates"
: A 1x2
matrix containing the ELPD estimate and its
standard error. The matrix must have row name "elpd_kfold
" and column
names "Estimate"
and "SE"
.
"pointwise"
: A Nx1
matrix with column name "elpd_kfold"
containing
the pointwise contributions for each data point.
It is important for the object to have at least these classes and
components so that it is compatible with other functions like
loo_compare()
and print()
methods.
These functions can be used to generate indexes for use with K-fold cross-validation. See the Details section for explanations.
kfold_split_random(K = 10, N = NULL) kfold_split_stratified(K = 10, x = NULL) kfold_split_grouped(K = 10, x = NULL)
kfold_split_random(K = 10, N = NULL) kfold_split_stratified(K = 10, x = NULL) kfold_split_grouped(K = 10, x = NULL)
K |
The number of folds to use. |
N |
The number of observations in the data. |
x |
A discrete variable of length |
kfold_split_random()
splits the data into K
groups
of equal size (or roughly equal size).
For a categorical variable x
kfold_split_stratified()
splits the observations into K
groups ensuring that relative
category frequencies are approximately preserved.
For a grouping variable x
, kfold_split_grouped()
places
all observations in x
from the same group/level together in
the same fold. The selection of which groups/levels go into which
fold (relevant when when there are more groups than folds) is
randomized.
An integer vector of length N
where each element is an index in 1:K
.
ids <- kfold_split_random(K = 5, N = 20) print(ids) table(ids) x <- sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95)) table(x) ids <- kfold_split_stratified(K = 5, x = x) print(ids) table(ids, x) grp <- gl(n = 50, k = 15, labels = state.name) length(grp) head(table(grp)) ids_10 <- kfold_split_grouped(K = 10, x = grp) (tab_10 <- table(grp, ids_10)) colSums(tab_10) ids_9 <- kfold_split_grouped(K = 9, x = grp) (tab_9 <- table(grp, ids_9)) colSums(tab_9)
ids <- kfold_split_random(K = 5, N = 20) print(ids) table(ids) x <- sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95)) table(x) ids <- kfold_split_stratified(K = 5, x = x) print(ids) table(ids, x) grp <- gl(n = 50, k = 15, labels = state.name) length(grp) head(table(grp)) ids_10 <- kfold_split_grouped(K = 10, x = grp) (tab_10 <- table(grp, ids_10)) colSums(tab_10) ids_9 <- kfold_split_grouped(K = 9, x = grp) (tab_9 <- table(grp, ids_9)) colSums(tab_9)
The loo()
methods for arrays, matrices, and functions compute PSIS-LOO
CV, efficient approximate leave-one-out (LOO) cross-validation for Bayesian
models using Pareto smoothed importance sampling (PSIS). This is
an implementation of the methods described in Vehtari, Gelman, and Gabry
(2017) and Vehtari, Simpson, Gelman, Yao, and Gabry (2024).
The loo_i()
function enables testing log-likelihood
functions for use with the loo.function()
method.
loo(x, ...) ## S3 method for class 'array' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class 'matrix' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class ''function'' loo( x, ..., data = NULL, draws = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis") is.loo(x) is.psis_loo(x)
loo(x, ...) ## S3 method for class 'array' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class 'matrix' loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) ## S3 method for class ''function'' loo( x, ..., data = NULL, draws = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis") ) loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis") is.loo(x) is.psis_loo(x)
x |
A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. |
r_eff |
Vector of relative effective sample size estimates for the
likelihood ( |
save_psis |
Should the |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
is_method |
The importance sampling method to use. The following methods are implemented: |
data , draws , ...
|
For the |
i |
For |
llfun |
For |
The loo()
function is an S3 generic and methods are provided for
3-D pointwise log-likelihood arrays, pointwise log-likelihood matrices, and
log-likelihood functions. The array and matrix methods are the most
convenient, but for models fit to very large datasets the loo.function()
method is more memory efficient and may be preferable.
The loo()
methods return a named list with class
c("psis_loo", "loo")
and components:
estimates
A matrix with two columns (Estimate
, SE
) and three rows (elpd_loo
,
p_loo
, looic
). This contains point estimates and standard errors of the
expected log pointwise predictive density (elpd_loo
), the
effective number of parameters (p_loo
) and the LOO
information criterion looic
(which is just -2 * elpd_loo
, i.e.,
converted to deviance scale).
pointwise
A matrix with five columns (and number of rows equal to the number of
observations) containing the pointwise contributions of the measures
(elpd_loo
, mcse_elpd_loo
, p_loo
, looic
, influence_pareto_k
).
in addition to the three measures in estimates
, we also report
pointwise values of the Monte Carlo standard error of elpd_loo
(mcse_elpd_loo
), and statistics describing the influence of
each observation on the posterior distribution (influence_pareto_k
).
These are the estimates of the shape parameter of the
generalized Pareto fit to the importance ratios for each leave-one-out
distribution (see the pareto-k-diagnostic page for details).
diagnostics
A named list containing two vectors:
pareto_k
: Importance sampling reliability diagnostics. By default,
these are equal to the influence_pareto_k
in pointwise
.
Some algorithms can improve importance sampling reliability and
modify these diagnostics. See the pareto-k-diagnostic page for details.
n_eff
: PSIS effective sample size estimates.
psis_object
This component will be NULL
unless the save_psis
argument is set to
TRUE
when calling loo()
. In that case psis_object
will be the object
of class "psis"
that is created when the loo()
function calls psis()
internally to do the PSIS procedure.
The loo_i()
function returns a named list with components
pointwise
and diagnostics
. These components have the same
structure as the pointwise
and diagnostics
components of the
object returned by loo()
except they contain results for only a single
observation.
loo(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
loo(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
loo(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the log-likelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
log-likelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
log-likelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
log-likelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise log-likelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the log-likelihood function.
The ...
can be used if your log-likelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
loo()
methods in a packagePackage developers can define
loo()
methods for fitted models objects. See the example loo.stanfit()
method in the Examples section below for an example of defining a
method that calls loo.array()
. The loo.stanreg()
method in the
rstanarm package is an example of defining a method that calls
loo.function()
.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
The loo package vignettes for demonstrations.
The FAQ page on the loo website for answers to frequently asked questions.
psis()
for the underlying Pareto Smoothed Importance Sampling (PSIS)
procedure used in the LOO-CV approximation.
pareto-k-diagnostic for convenience functions for looking at diagnostics.
loo_compare()
for model comparison.
### Array and matrix methods (using example objects included with loo package) # Array method LLarr <- example_loglik_array() rel_n_eff <- relative_eff(exp(LLarr)) loo(LLarr, r_eff = rel_n_eff, cores = 2) # Matrix method LLmat <- example_loglik_matrix() rel_n_eff <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) loo(LLmat, r_eff = rel_n_eff, cores = 2) ### Using log-likelihood function instead of array or matrix set.seed(124) # Simulate data and draw from posterior N <- 50; K <- 10; S <- 100; a0 <- 3; b0 <- 2 p <- rbeta(1, a0, b0) y <- rbinom(N, size = K, prob = p) a <- a0 + sum(y); b <- b0 + N * K - sum(y) fake_posterior <- as.matrix(rbeta(S, a, b)) dim(fake_posterior) # S x 1 fake_data <- data.frame(y,K) dim(fake_data) # N x 2 llfun <- function(data_i, draws) { # each time called internally within loo the arguments will be equal to: # data_i: ith row of fake_data (fake_data[i,, drop=FALSE]) # draws: entire fake_posterior matrix dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE) } # Use the loo_i function to check that llfun works on a single observation # before running on all obs. For example, using the 3rd obs in the data: loo_3 <- loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior) print(loo_3$pointwise[, "elpd_loo"]) # Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC) loo_with_fn <- loo(llfun, draws = fake_posterior, data = fake_data) # If we look at the elpd_loo contribution from the 3rd obs it should be the # same as what we got above with the loo_i function and i=3: print(loo_with_fn$pointwise[3, "elpd_loo"]) print(loo_3$pointwise[, "elpd_loo"]) # Check that the loo.matrix method gives same answer as loo.function method log_lik_matrix <- sapply(1:N, function(i) { llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior) }) loo_with_mat <- loo(log_lik_matrix) all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE! ## Not run: ### For package developers: defining loo methods # An example of a possible loo method for 'stanfit' objects (rstan package). # A similar method is included in the rstan package. # In order for users to be able to call loo(stanfit) instead of # loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately # (roxygen2 and devtools packages are good for that). # loo.stanfit <- function(x, pars = "log_lik", ..., save_psis = FALSE, cores = getOption("mc.cores", 1)) { stopifnot(length(pars) == 1L) LLarray <- loo::extract_log_lik(stanfit = x, parameter_name = pars, merge_chains = FALSE) r_eff <- loo::relative_eff(x = exp(LLarray), cores = cores) loo::loo.array(LLarray, r_eff = r_eff, cores = cores, save_psis = save_psis) } ## End(Not run)
### Array and matrix methods (using example objects included with loo package) # Array method LLarr <- example_loglik_array() rel_n_eff <- relative_eff(exp(LLarr)) loo(LLarr, r_eff = rel_n_eff, cores = 2) # Matrix method LLmat <- example_loglik_matrix() rel_n_eff <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) loo(LLmat, r_eff = rel_n_eff, cores = 2) ### Using log-likelihood function instead of array or matrix set.seed(124) # Simulate data and draw from posterior N <- 50; K <- 10; S <- 100; a0 <- 3; b0 <- 2 p <- rbeta(1, a0, b0) y <- rbinom(N, size = K, prob = p) a <- a0 + sum(y); b <- b0 + N * K - sum(y) fake_posterior <- as.matrix(rbeta(S, a, b)) dim(fake_posterior) # S x 1 fake_data <- data.frame(y,K) dim(fake_data) # N x 2 llfun <- function(data_i, draws) { # each time called internally within loo the arguments will be equal to: # data_i: ith row of fake_data (fake_data[i,, drop=FALSE]) # draws: entire fake_posterior matrix dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE) } # Use the loo_i function to check that llfun works on a single observation # before running on all obs. For example, using the 3rd obs in the data: loo_3 <- loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior) print(loo_3$pointwise[, "elpd_loo"]) # Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC) loo_with_fn <- loo(llfun, draws = fake_posterior, data = fake_data) # If we look at the elpd_loo contribution from the 3rd obs it should be the # same as what we got above with the loo_i function and i=3: print(loo_with_fn$pointwise[3, "elpd_loo"]) print(loo_3$pointwise[, "elpd_loo"]) # Check that the loo.matrix method gives same answer as loo.function method log_lik_matrix <- sapply(1:N, function(i) { llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior) }) loo_with_mat <- loo(log_lik_matrix) all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE! ## Not run: ### For package developers: defining loo methods # An example of a possible loo method for 'stanfit' objects (rstan package). # A similar method is included in the rstan package. # In order for users to be able to call loo(stanfit) instead of # loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately # (roxygen2 and devtools packages are good for that). # loo.stanfit <- function(x, pars = "log_lik", ..., save_psis = FALSE, cores = getOption("mc.cores", 1)) { stopifnot(length(pars) == 1L) LLarray <- loo::extract_log_lik(stanfit = x, parameter_name = pars, merge_chains = FALSE) r_eff <- loo::relative_eff(x = exp(LLarray), cores = cores) loo::loo.array(LLarray, r_eff = r_eff, cores = cores, save_psis = save_psis) } ## End(Not run)
Efficient approximate leave-one-out cross-validation (LOO) for posterior approximations
loo_approximate_posterior(x, log_p, log_g, ...) ## S3 method for class 'array' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class ''function'' loo_approximate_posterior( x, ..., data = NULL, draws = NULL, log_p = NULL, log_g = NULL, save_psis = FALSE, cores = getOption("mc.cores", 1) )
loo_approximate_posterior(x, log_p, log_g, ...) ## S3 method for class 'array' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class 'matrix' loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1) ) ## S3 method for class ''function'' loo_approximate_posterior( x, ..., data = NULL, draws = NULL, log_p = NULL, log_g = NULL, save_psis = FALSE, cores = getOption("mc.cores", 1) )
x |
A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. |
log_p |
The log-posterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. |
log_g |
The log-density (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. |
save_psis |
Should the |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
data , draws , ...
|
For the |
The loo_approximate_posterior()
function is an S3 generic and
methods are provided for 3-D pointwise log-likelihood arrays, pointwise
log-likelihood matrices, and log-likelihood functions. The implementation
works for posterior approximations where it is possible to compute the log
density for the posterior approximation.
The loo_approximate_posterior()
methods return a named list with
class c("psis_loo_ap", "psis_loo", "loo")
. It has the same structure
as the objects returned by loo()
but with the additional slot:
posterior_approximation
A list with two vectors, log_p
and log_g
of the same length
containing the posterior density and the approximation density
for the individual draws.
loo_approximate_posterior(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
loo_approximate_posterior(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
loo_approximate_posterior(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the log-likelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
log-likelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
log-likelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
log-likelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise log-likelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the log-likelihood function.
The ...
can be used if your log-likelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
Compare fitted models based on ELPD.
By default the print method shows only the most important information. Use
print(..., simplify=FALSE)
to print a more detailed summary.
loo_compare(x, ...) ## Default S3 method: loo_compare(x, ...) ## S3 method for class 'compare.loo' print(x, ..., digits = 1, simplify = TRUE) ## S3 method for class 'compare.loo_ss' print(x, ..., digits = 1, simplify = TRUE)
loo_compare(x, ...) ## Default S3 method: loo_compare(x, ...) ## S3 method for class 'compare.loo' print(x, ..., digits = 1, simplify = TRUE) ## S3 method for class 'compare.loo_ss' print(x, ..., digits = 1, simplify = TRUE)
x |
An object of class |
... |
Additional objects of class |
digits |
For the print method only, the number of digits to use when printing. |
simplify |
For the print method only, should only the essential columns of the summary matrix be printed? The entire matrix is always returned, but by default only the most important columns are printed. |
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in
elpd_loo
or elpd_waic
(or multiplied by , if
desired, to be on the deviance scale).
When using loo_compare()
, the returned matrix will have one row per model
and several columns of estimates. The values in the
elpd_diff
and se_diff
columns of the
returned matrix are computed by making pairwise comparisons between each
model and the model with the largest ELPD (the model in the first row). For
this reason the elpd_diff
column will always have the value 0
in the
first row (i.e., the difference between the preferred model and itself) and
negative values in subsequent rows for the remaining models.
To compute the standard error of the difference in ELPD —
which should not be expected to equal the difference of the standard errors
— we use a paired estimate to take advantage of the fact that the same
set of data points was used to fit both models. These calculations
should be most useful when
is large, because then non-normality of
the distribution is not such an issue when estimating the uncertainty in
these sums. These standard errors, for all their flaws, should give a
better sense of uncertainty than what is obtained using the current
standard approach of comparing differences of deviances to a Chi-squared
distribution, a practice derived for Gaussian linear models or
asymptotically, and which only applies to nested models in any case.
Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and
se_diff
is good.
If more than models are compared, we internally recompute the model
differences using the median model by ELPD as the baseline model. We then
estimate whether the differences in predictive performance are potentially
due to chance as described by McLatchie and Vehtari (2023). This will flag
a warning if it is deemed that there is a risk of over-fitting due to the
selection process. In that case users are recommended to avoid model
selection based on LOO-CV, and instead to favor model averaging/stacking or
projection predictive inference.
A matrix with class "compare.loo"
that has its own
print method. See the Details section.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. preprint arXiv:2008.10296v3..
McLatchie, Y., and Vehtari, A. (2023). Efficient estimation and correction of selection-induced bias with order statistics. preprint arXiv:2309.03742
The FAQ page on the loo website for answers to frequently asked questions.
# very artificial example, just for demonstration! LL <- example_loglik_array() loo1 <- loo(LL) # should be worst model when compared loo2 <- loo(LL + 1) # should be second best model when compared loo3 <- loo(LL + 2) # should be best model when compared comp <- loo_compare(loo1, loo2, loo3) print(comp, digits = 2) # show more details with simplify=FALSE # (will be the same for all models in this artificial example) print(comp, simplify = FALSE, digits = 3) # can use a list of objects with custom names # will use apple, banana, and cherry, as the names in the output loo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3)) ## Not run: # works for waic (and kfold) too loo_compare(waic(LL), waic(LL - 10)) ## End(Not run)
# very artificial example, just for demonstration! LL <- example_loglik_array() loo1 <- loo(LL) # should be worst model when compared loo2 <- loo(LL + 1) # should be second best model when compared loo3 <- loo(LL + 2) # should be best model when compared comp <- loo_compare(loo1, loo2, loo3) print(comp, digits = 2) # show more details with simplify=FALSE # (will be the same for all models in this artificial example) print(comp, simplify = FALSE, digits = 3) # can use a list of objects with custom names # will use apple, banana, and cherry, as the names in the output loo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3)) ## Not run: # works for waic (and kfold) too loo_compare(waic(LL), waic(LL - 10)) ## End(Not run)
Model averaging via stacking of predictive distributions, pseudo-BMA weighting or pseudo-BMA+ weighting with the Bayesian bootstrap. See Yao et al. (2018), Vehtari, Gelman, and Gabry (2017), and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for background.
loo_model_weights(x, ...) ## Default S3 method: loo_model_weights( x, ..., method = c("stacking", "pseudobma"), optim_method = "BFGS", optim_control = list(), BB = TRUE, BB_n = 1000, alpha = 1, r_eff_list = NULL, cores = getOption("mc.cores", 1) ) stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list()) pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)
loo_model_weights(x, ...) ## Default S3 method: loo_model_weights( x, ..., method = c("stacking", "pseudobma"), optim_method = "BFGS", optim_control = list(), BB = TRUE, BB_n = 1000, alpha = 1, r_eff_list = NULL, cores = getOption("mc.cores", 1) ) stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list()) pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)
x |
A list of |
... |
Unused, except for the generic to pass arguments to individual methods. |
method |
Either |
optim_method |
If |
optim_control |
If |
BB |
Logical used when |
BB_n |
For pseudo-BMA+ weighting only, the number of samples to use for
the Bayesian bootstrap. The default is |
alpha |
Positive scalar shape parameter in the Dirichlet distribution
used for the Bayesian bootstrap. The default is |
r_eff_list |
Optionally, a list of relative effective sample size
estimates for the likelihood |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
lpd_point |
If calling |
loo_model_weights()
is a wrapper around the stacking_weights()
and
pseudobma_weights()
functions that implements stacking, pseudo-BMA, and
pseudo-BMA+ weighting for combining multiple predictive distributions. We can
use approximate or exact leave-one-out cross-validation (LOO-CV) or K-fold CV
to estimate the expected log predictive density (ELPD).
The stacking method (method="stacking"
), which is the default for
loo_model_weights()
, combines all models by maximizing the leave-one-out
predictive density of the combination distribution. That is, it finds the
optimal linear combining weights for maximizing the leave-one-out log score.
The pseudo-BMA method (method="pseudobma"
) finds the relative weights
proportional to the ELPD of each model. However, when
method="pseudobma"
, the default is to also use the Bayesian bootstrap
(BB=TRUE
), which corresponds to the pseudo-BMA+ method. The Bayesian
bootstrap takes into account the uncertainty of finite data points and
regularizes the weights away from the extremes of 0 and 1.
In general, we recommend stacking for averaging predictive distributions, while pseudo-BMA+ can serve as a computationally easier alternative.
A numeric vector containing one weight for each model.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).
The loo package vignettes, particularly Bayesian Stacking and Pseudo-BMA weights using the loo package.
loo()
for details on leave-one-out ELPD estimation.
constrOptim()
for the choice of optimization methods and control-parameters.
relative_eff()
for computing r_eff
.
## Not run: ### Demonstrating usage after fitting models with RStan library(rstan) # generate fake data from N(0,1). N <- 100 y <- rnorm(N, 0, 1) # Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma). stan_code <- " data { int N; vector[N] y; real mu_fixed; } parameters { real<lower=0> sigma; } model { sigma ~ exponential(1); y ~ normal(mu_fixed, sigma); } generated quantities { vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma); }" mod <- stan_model(model_code = stan_code) fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1)) fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5)) fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6)) model_list <- list(fit1, fit2, fit3) log_lik_list <- lapply(model_list, extract_log_lik) # optional but recommended r_eff_list <- lapply(model_list, function(x) { ll_array <- extract_log_lik(x, merge_chains = FALSE) relative_eff(exp(ll_array)) }) # stacking method: wts1 <- loo_model_weights( log_lik_list, method = "stacking", r_eff_list = r_eff_list, optim_control = list(reltol=1e-10) ) print(wts1) # can also pass a list of psis_loo objects to avoid recomputing loo loo_list <- lapply(1:length(log_lik_list), function(j) { loo(log_lik_list[[j]], r_eff = r_eff_list[[j]]) }) wts2 <- loo_model_weights( loo_list, method = "stacking", optim_control = list(reltol=1e-10) ) all.equal(wts1, wts2) # can provide names to be used in the results loo_model_weights(setNames(loo_list, c("A", "B", "C"))) # pseudo-BMA+ method: set.seed(1414) loo_model_weights(loo_list, method = "pseudobma") # pseudo-BMA method (set BB = FALSE): loo_model_weights(loo_list, method = "pseudobma", BB = FALSE) # calling stacking_weights or pseudobma_weights directly lpd1 <- loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1] lpd2 <- loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1] lpd3 <- loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1] stacking_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE) ## End(Not run)
## Not run: ### Demonstrating usage after fitting models with RStan library(rstan) # generate fake data from N(0,1). N <- 100 y <- rnorm(N, 0, 1) # Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma). stan_code <- " data { int N; vector[N] y; real mu_fixed; } parameters { real<lower=0> sigma; } model { sigma ~ exponential(1); y ~ normal(mu_fixed, sigma); } generated quantities { vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma); }" mod <- stan_model(model_code = stan_code) fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1)) fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5)) fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6)) model_list <- list(fit1, fit2, fit3) log_lik_list <- lapply(model_list, extract_log_lik) # optional but recommended r_eff_list <- lapply(model_list, function(x) { ll_array <- extract_log_lik(x, merge_chains = FALSE) relative_eff(exp(ll_array)) }) # stacking method: wts1 <- loo_model_weights( log_lik_list, method = "stacking", r_eff_list = r_eff_list, optim_control = list(reltol=1e-10) ) print(wts1) # can also pass a list of psis_loo objects to avoid recomputing loo loo_list <- lapply(1:length(log_lik_list), function(j) { loo(log_lik_list[[j]], r_eff = r_eff_list[[j]]) }) wts2 <- loo_model_weights( loo_list, method = "stacking", optim_control = list(reltol=1e-10) ) all.equal(wts1, wts2) # can provide names to be used in the results loo_model_weights(setNames(loo_list, c("A", "B", "C"))) # pseudo-BMA+ method: set.seed(1414) loo_model_weights(loo_list, method = "pseudobma") # pseudo-BMA method (set BB = FALSE): loo_model_weights(loo_list, method = "pseudobma", BB = FALSE) # calling stacking_weights or pseudobma_weights directly lpd1 <- loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1] lpd2 <- loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1] lpd3 <- loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1] stacking_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3)) pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE) ## End(Not run)
Moment matching algorithm for updating a loo object when Pareto k estimates are large.
loo_moment_match(x, ...) ## Default S3 method: loo_moment_match( x, loo, post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars, max_iters = 30L, k_threshold = NULL, split = TRUE, cov = TRUE, cores = getOption("mc.cores", 1), ... )
loo_moment_match(x, ...) ## Default S3 method: loo_moment_match( x, loo, post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars, max_iters = 30L, k_threshold = NULL, split = TRUE, cov = TRUE, cores = getOption("mc.cores", 1), ... )
x |
A fitted model object. |
... |
Further arguments passed to the custom functions documented above. |
loo |
A loo object to be modified. |
post_draws |
A function the takes |
log_lik_i |
A function that takes |
unconstrain_pars |
A function that takes arguments |
log_prob_upars |
A function that takes arguments |
log_lik_i_upars |
A function that takes arguments |
max_iters |
Maximum number of moment matching iterations. Usually this
does not need to be modified. If the maximum number of iterations is
reached, there will be a warning, and increasing |
k_threshold |
Threshold value for Pareto k values above which the moment
matching algorithm is used. The default value is |
split |
Logical; Indicate whether to do the split transformation or not at the end of moment matching for each LOO fold. |
cov |
Logical; Indicate whether to match the covariance matrix of the
samples or not. If |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
The loo_moment_match()
function is an S3 generic and we provide a
default method that takes as arguments user-specified functions
post_draws
, log_lik_i
, unconstrain_pars
, log_prob_upars
, and
log_lik_i_upars
. All of these functions should take ...
. as an argument
in addition to those specified for each function.
The loo_moment_match()
methods return an updated loo
object. The
structure of the updated loo
object is similar, but the method also
stores the original Pareto k diagnostic values in the diagnostics field.
loo_moment_match(default)
: A default method that takes as arguments a
user-specified model object x
, a loo
object and user-specified
functions post_draws
, log_lik_i
, unconstrain_pars
, log_prob_upars
,
and log_lik_i_upars
.
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
loo()
, loo_moment_match_split()
# See the vignette for loo_moment_match()
# See the vignette for loo_moment_match()
A function that computes the split moment matching importance sampling loo. Takes in the moment matching total transformation, transforms only half of the draws, and computes a single elpd using multiple importance sampling.
loo_moment_match_split( x, upars, cov, total_shift, total_scaling, total_mapping, i, log_prob_upars, log_lik_i_upars, r_eff_i, cores, is_method, ... )
loo_moment_match_split( x, upars, cov, total_shift, total_scaling, total_mapping, i, log_prob_upars, log_lik_i_upars, r_eff_i, cores, is_method, ... )
x |
A fitted model object. |
upars |
A matrix containing the model parameters in unconstrained space where they can have any real value. |
cov |
Logical; Indicate whether to match the covariance matrix of the
samples or not. If |
total_shift |
A vector representing the total shift made by the moment matching algorithm. |
total_scaling |
A vector representing the total scaling of marginal variance made by the moment matching algorithm. |
total_mapping |
A vector representing the total covariance transformation made by the moment matching algorithm. |
i |
Observation index. |
log_prob_upars |
A function that takes arguments |
log_lik_i_upars |
A function that takes arguments |
r_eff_i |
MCMC relative effective sample size of the |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
is_method |
The importance sampling method to use. The following methods are implemented: |
... |
Further arguments passed to the custom functions documented above. |
A list containing the updated log-importance weights and log-likelihood values. Also returns the updated MCMC effective sample size and the integrand-specific log-importance weights.
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
The loo_predictive_metric()
function computes estimates of leave-one-out
predictive metrics given a set of predictions and observations. Currently
supported metrics are mean absolute error, mean squared error and root mean
squared error for continuous predictions and accuracy and balanced accuracy
for binary classification. Predictions are passed on to the E_loo()
function, so this function assumes that the PSIS approximation is working
well.
loo_predictive_metric(x, ...) ## S3 method for class 'matrix' loo_predictive_metric( x, y, log_lik, ..., metric = c("mae", "rmse", "mse", "acc", "balanced_acc"), r_eff = 1, cores = getOption("mc.cores", 1) )
loo_predictive_metric(x, ...) ## S3 method for class 'matrix' loo_predictive_metric( x, y, log_lik, ..., metric = c("mae", "rmse", "mse", "acc", "balanced_acc"), r_eff = 1, cores = getOption("mc.cores", 1) )
x |
A numeric matrix of predictions. |
... |
Additional arguments passed on to |
y |
A numeric vector of observations. Length should be equal to the
number of rows in |
log_lik |
A matrix of pointwise log-likelihoods. Should be of same
dimension as |
metric |
The type of predictive metric to be used. Currently
supported options are
|
r_eff |
A Vector of relative effective sample size estimates containing
one element per observation. See |
cores |
The number of cores to use for parallelization of |
A list with the following components:
estimate
Estimate of the given metric.
se
Standard error of the estimate.
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a log-likelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit <- stan_glm(weight ~ group, data = d, refresh = 0) ll <- log_lik(fit) r_eff <- relative_eff(exp(-ll), chain_id = rep(1:4, each = 1000)) mu_pred <- posterior_epred(fit) # Leave-one-out mean absolute error of predictions mae <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff) # Leave-one-out 90%-quantile of mean absolute error mae_90q <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff, type = 'quantile', probs = 0.9) }
if (requireNamespace("rstanarm", quietly = TRUE)) { # Use rstanarm package to quickly fit a model and get both a log-likelihood # matrix and draws from the posterior predictive distribution library("rstanarm") # data from help("lm") ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")) ) fit <- stan_glm(weight ~ group, data = d, refresh = 0) ll <- log_lik(fit) r_eff <- relative_eff(exp(-ll), chain_id = rep(1:4, each = 1000)) mu_pred <- posterior_epred(fit) # Leave-one-out mean absolute error of predictions mae <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff) # Leave-one-out 90%-quantile of mean absolute error mae_90q <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff, type = 'quantile', probs = 0.9) }
Efficient approximate leave-one-out cross-validation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOO-fold, and more costly and accurate computations are made only for m<N LOO-folds.
loo_subsample(x, ...) ## S3 method for class ''function'' loo_subsample( x, ..., data = NULL, draws = NULL, observations = 400, log_p = NULL, log_g = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), loo_approximation = "plpd", loo_approximation_draws = NULL, estimator = "diff_srs", llgrad = NULL, llhess = NULL )
loo_subsample(x, ...) ## S3 method for class ''function'' loo_subsample( x, ..., data = NULL, draws = NULL, observations = 400, log_p = NULL, log_g = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), loo_approximation = "plpd", loo_approximation_draws = NULL, estimator = "diff_srs", llgrad = NULL, llhess = NULL )
x |
A function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs. |
data , draws , ...
|
For |
observations |
The subsample observations to use. The argument can take four (4) types of arguments:
|
log_p , log_g
|
Should be supplied only if approximate posterior draws are
used. The default ( |
r_eff |
Vector of relative effective sample size estimates for the
likelihood ( |
save_psis |
Should the |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
loo_approximation |
What type of approximation of the loo_i's should be used?
The default is
As point estimates of |
loo_approximation_draws |
The number of posterior draws used when
integrating over the posterior. This is used if |
estimator |
How should
|
llgrad |
The gradient of the log-likelihood. This
is only used when |
llhess |
The Hessian of the log-likelihood. This is only used
with |
The loo_subsample()
function is an S3 generic and a methods is
currently provided for log-likelihood functions. The implementation works
for both MCMC and for posterior approximations where it is possible to
compute the log density for the approximation.
loo_subsample()
returns a named list with class c("psis_loo_ss", "psis_loo", "loo")
. This has the same structure as objects returned by
loo()
but with the additional slot:
loo_subsampling
: A list with two vectors, log_p
and log_g
, of the
same length containing the posterior density and the approximation density
for the individual draws.
loo_subsample(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the log-likelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
log-likelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
log-likelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
log-likelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise log-likelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the log-likelihood function.
The ...
can be used if your log-likelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
Small datasets for use in loo examples and vignettes. The Kline
and milk
datasets are also included in the rethinking package
(McElreath, 2016a), but we include them here as rethinking is not
on CRAN.
Currently the data sets included are:
Kline
:
Small dataset from Kline and Boyd (2010) on tool complexity and demography
in Oceanic islands societies. This data is discussed in detail in
McElreath (2016a,2016b). (Link to variable descriptions)
milk
:
Small dataset from Hinde and Milligan (2011) on primate milk
composition.This data is discussed in detail in McElreath (2016a,2016b).
(Link to variable descriptions)
voice
:
Voice rehabilitation data from Tsanas et al. (2014).
Hinde and Milligan. 2011. Evolutionary Anthropology 20:9-23.
Kline, M.A. and R. Boyd. 2010. Proc R Soc B 277:2559-2564.
McElreath, R. (2016a). rethinking: Statistical Rethinking book package. R package version 1.59.
McElreath, R. (2016b). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC.
A. Tsanas, M.A. Little, C. Fox, L.O. Ramig: Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 22, pp. 181-190, January 2014
str(Kline) str(milk)
str(Kline) str(milk)
The pages provides definitions to key terms. Also see the FAQ page on the loo website for answers to frequently asked questions.
Note: VGG2017 refers to Vehtari, Gelman, and Gabry (2017). See References, below.
elpd_loo
The ELPD is the theoretical expected log pointwise predictive density for a new
dataset (Eq 1 in VGG2017), which can be estimated, e.g., using
cross-validation. elpd_loo
is the Bayesian LOO estimate of the
expected log pointwise predictive density (Eq 4 in VGG2017) and
is a sum of N individual pointwise log predictive densities. Probability
densities can be smaller or larger than 1, and thus log predictive densities
can be negative or positive. For simplicity the ELPD acronym is used also for
expected log pointwise predictive probabilities for discrete models.
Probabilities are always equal or less than 1, and thus log predictive
probabilities are 0 or negative.
elpd_loo
As elpd_loo
is defined as the sum of N independent components (Eq 4 in
VGG2017), we can compute the standard error by using the standard deviation
of the N components and multiplying by sqrt(N)
(Eq 23 in VGG2017).
This standard error is a coarse description of our uncertainty about the
predictive performance for unknown future data. When N is small or there is
severe model misspecification, the current SE estimate is overoptimistic and
the actual SE can even be twice as large. Even for moderate N, when the SE
estimate is an accurate estimate for the scale, it ignores the skewness. When
making model comparisons, the SE of the component-wise (pairwise) differences
should be used instead (see the se_diff
section below and Eq 24 in
VGG2017). Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and se_diff
is good.
The Monte Carlo standard error is the estimate for the computational accuracy
of MCMC and importance sampling used to compute elpd_loo
. Usually this
is negligible compared to the standard describing the uncertainty due to
finite number of observations (Eq 23 in VGG2017).
p_loo
(effective number of parameters)p_loo
is the difference between elpd_loo
and the non-cross-validated
log posterior predictive density. It describes how much more difficult it
is to predict future data than the observed data. Asymptotically under
certain regularity conditions, p_loo
can be interpreted as the
effective number of parameters. In well behaving cases p_loo < N
and
p_loo < p
, where p
is the total number of parameters in the
model. p_loo > N
or p_loo > p
indicates that the model has very
weak predictive capability and may indicate a severe model misspecification.
See below for more on interpreting p_loo
when there are warnings
about high Pareto k diagnostic values.
The Pareto estimate is a diagnostic for Pareto smoothed importance
sampling (PSIS), which is used to compute components of
elpd_loo
. In
importance-sampling LOO the full posterior distribution is used as the
proposal distribution. The Pareto k diagnostic estimates how far an
individual leave-one-out distribution is from the full distribution. If
leaving out an observation changes the posterior too much then importance
sampling is not able to give a reliable estimate. Pareto smoothing stabilizes
importance sampling and guarantees a finite variance estimate at the
cost of some bias.
The diagnostic threshold for Pareto depends on sample size
(sample size dependent threshold was introduced by Vehtari
et al., 2024, and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity,
loo
package uses the nominal sample
size when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but even then if ESS/S > 1/2 the difference
is usually negligible. Thinning of MCMC draws can be used to improve
the ratio ESS/S.
If , where
is the
sample size, the PSIS estimate and the corresponding Monte
Carlo standard error estimate are reliable.
If , the PSIS estimate and the
corresponding Monte Carlo standard error estimate are not
reliable, but increasing the (effective) sample size
above
2200 may help (this will increase the sample size specific
threshold
and then the bias specific
threshold 0.7 dominates).
If , the PSIS estimate and the corresponding Monte
Carlo standard error have large bias and are not reliable. Increasing
the sample size may reduce the variability in the
estimate, which
may also result in a lower
estimate.
If , the target distribution is estimated to
have non-finite mean. The PSIS estimate and the corresponding Monte
Carlo standard error are not well defined. Increasing the sample size
may reduce the variability in
estimate, which may also result in
a lower
estimate.
Pareto is also useful as a measure of influence of an
observation. Highly influential observations have high
values. Very high
values often indicate model
misspecification, outliers or mistakes in data processing. See
Section 6 of Gabry et al. (2019) for an example.
p_loo
when Pareto k
is largeIf then we can also look at
the
p_loo
estimate for some additional information about the problem:
If p_loo << p
(the total number of parameters in the model),
then the model is likely to be misspecified. Posterior predictive checks
(PPCs) are then likely to also detect the problem. Try using an overdispersed
model, or add more structural information (nonlinearity, mixture model,
etc.).
If p_loo < p
and the number of parameters p
is relatively
large compared to the number of observations (e.g., p>N/5
), it is
likely that the model is so flexible or the population prior so weak that it’s
difficult to predict the left out observation (even for the true model).
This happens, for example, in the simulated 8 schools (in VGG2017), random
effect models with a few observations per random effect, and Gaussian
processes and spatial models with short correlation lengths.
If p_loo > p
, then the model is likely to be badly misspecified.
If the number of parameters p<<N
, then PPCs are also likely to detect the
problem. See the case study at
https://avehtari.github.io/modelselection/roaches.html for an example.
If p
is relatively large compared to the number of
observations, say p>N/5
(more accurately we should count number of
observations influencing each parameter as in hierarchical models some groups
may have few observations and other groups many), it is possible that PPCs won't
detect the problem.
elpd_diff
is the difference in elpd_loo
for two models. If more
than two models are compared, the difference is computed relative to the
model with highest elpd_loo
.
The standard error of component-wise differences of elpd_loo (Eq 24 in VGG2017) between two models. This SE is smaller than the SE for individual models due to correlation (i.e., if some observations are easier and some more difficult to predict for all models).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. preprint arXiv:2008.10296v3..
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378 (journal version, preprint arXiv:1709.01449, code on GitHub)
psis_loo_ss
object.The number of observations in a psis_loo_ss
object.
## S3 method for class 'psis_loo_ss' nobs(object, ...)
## S3 method for class 'psis_loo_ss' nobs(object, ...)
object |
a |
... |
Currently unused. |
Get observation indices used in subsampling
obs_idx(x, rep = TRUE)
obs_idx(x, rep = TRUE)
x |
A |
rep |
If sampling with replacement is used, an observation can have
multiple samples and these are then repeated in the returned object if
|
An integer vector.
Print a diagnostic table summarizing the estimated Pareto shape parameters
and PSIS effective sample sizes, find the indexes of observations for which
the estimated Pareto shape parameter is larger than some
threshold
value, or plot observation indexes vs. diagnostic estimates.
The Details section below provides a brief overview of the
diagnostics, but we recommend consulting Vehtari, Gelman, and Gabry (2017)
and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for full details.
pareto_k_table(x) pareto_k_ids(x, threshold = NULL) pareto_k_values(x) pareto_k_influence_values(x) psis_n_eff_values(x) mcse_loo(x, threshold = NULL) ## S3 method for class 'psis_loo' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" ) ## S3 method for class 'psis' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" )
pareto_k_table(x) pareto_k_ids(x, threshold = NULL) pareto_k_values(x) pareto_k_influence_values(x) psis_n_eff_values(x) mcse_loo(x, threshold = NULL) ## S3 method for class 'psis_loo' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" ) ## S3 method for class 'psis' plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot" )
x |
|
threshold |
For |
diagnostic |
For the |
label_points , ...
|
For the |
main |
For the |
The reliability and approximate convergence rate of the PSIS-based
estimates can be assessed using the estimates for the shape
parameter of the generalized Pareto distribution. The
diagnostic threshold for Pareto
depends on sample size
(sample size dependent threshold was introduced by Vehtari
et al. (2024), and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity,
loo
package uses the nominal sample
size when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but if MCMC-ESS > S/2 the difference
is usually negligible. Thinning of MCMC draws can be used to
improve the ratio ESS/S.
If , where
is the
sample size, the PSIS estimate and the corresponding Monte Carlo
standard error estimate are reliable.
If , the PSIS estimate and the
corresponding Monte Carlo standard error estimate are not
reliable, but increasing the (effective) sample size
above
2200 may help (this will increase the sample size specific
threshold
and then the bias specific
threshold 0.7 dominates).
If , the PSIS estimate and the corresponding Monte
Carlo standard error have large bias and are not reliable. Increasing
the sample size may reduce the variability in
estimate, which
may result in lower
estimate, too.
If , the target distribution is estimated to
have a non-finite mean. The PSIS estimate and the corresponding Monte
Carlo standard error are not well defined. Increasing the sample size
may reduce the variability in the
estimate, which
may also result in a lower
estimate.
exceeds the diagnostic threshold? Importance sampling is likely to
work less well if the marginal posterior and
LOO posterior
are very different, which
is more likely to happen with a non-robust model and highly
influential observations. If the estimated tail shape parameter
exceeds the diagnostic threshold, the user should be
warned. (Note: If
is greater than the diagnostic threshold
then WAIC is also likely to fail, but WAIC lacks as accurate
diagnostic.) When using PSIS in the context of approximate LOO-CV,
we recommend one of the following actions:
With some additional computations, it is possible to transform
the MCMC draws from the posterior distribution to obtain more
reliable importance sampling estimates. This results in a smaller
shape parameter . See
loo_moment_match()
and the
vignette Avoiding model refits in leave-one-out cross-validation
with moment matching for an example of this.
Sampling from a leave-one-out mixture distribution (see the
vignette Mixture IS leave-one-out cross-validation for
high-dimensional Bayesian models), directly from for the problematic observations
, or using
-fold cross-validation (see the vignette Holdout
validation and K-fold cross-validation of Stan programs with the
loo package) will generally be more stable.
Using a model that is more robust to anomalous observations will generally make approximate LOO-CV more stable.
The estimated shape parameter
for each observation can be used as a measure of the observation's
influence on posterior distribution of the model. These can be obtained with
pareto_k_influence_values()
.
In the case that we
obtain the samples from the proposal distribution via MCMC the loo
package also computes estimates for the Monte Carlo error and the effective
sample size for importance sampling, which are more accurate for PSIS than
for IS and TIS (see Vehtari et al (2024) for details). However, the PSIS
effective sample size estimate will be
over-optimistic when the estimate of is greater than
, where
is the sample size.
pareto_k_table()
returns an object of class
"pareto_k_table"
, which is a matrix with columns "Count"
,
"Proportion"
, and "Min. n_eff"
, and has its own print method.
pareto_k_ids()
returns an integer vector indicating which
observations have Pareto estimates above
threshold
.
pareto_k_values()
returns a vector of the estimated Pareto
parameters. These represent the reliability of sampling.
pareto_k_influence_values()
returns a vector of the estimated Pareto
parameters. These represent influence of the observations on the
model posterior distribution.
psis_n_eff_values()
returns a vector of the estimated PSIS
effective sample sizes.
mcse_loo()
returns the Monte Carlo standard error (MCSE)
estimate for PSIS-LOO. MCSE will be NA if any Pareto values are
above
threshold
.
The plot()
method is called for its side effect and does not
return anything. If x
is the result of a call to loo()
or psis()
then plot(x, diagnostic)
produces a plot of
the estimates of the Pareto shape parameters (diagnostic = "k"
) or
estimates of the PSIS effective sample sizes (diagnostic = "ESS"
).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
psis()
for the implementation of the PSIS algorithm.
The FAQ page on the loo website for answers to frequently asked questions.
Convenience function for extracting pointwise estimates
pointwise(x, estimate, ...) ## S3 method for class 'loo' pointwise(x, estimate, ...)
pointwise(x, estimate, ...) ## S3 method for class 'loo' pointwise(x, estimate, ...)
x |
A |
estimate |
Which pointwise estimate to return. By default all are
returned. The objects returned by the different functions ( |
... |
Currently ignored. |
A vector of length equal to the number of observations.
x <- loo(example_loglik_array()) pointwise(x, "elpd_loo")
x <- loo(example_loglik_array()) pointwise(x, "elpd_loo")
Print methods
## S3 method for class 'loo' print(x, digits = 1, ...) ## S3 method for class 'waic' print(x, digits = 1, ...) ## S3 method for class 'psis_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis_loo_ap' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling' print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'loo' print(x, digits = 1, ...) ## S3 method for class 'waic' print(x, digits = 1, ...) ## S3 method for class 'psis_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling_loo' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis_loo_ap' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'psis' print(x, digits = 1, plot_k = FALSE, ...) ## S3 method for class 'importance_sampling' print(x, digits = 1, plot_k = FALSE, ...)
x |
|
digits |
An integer passed to |
... |
Arguments passed to |
plot_k |
Logical. If |
x
, invisibly.
Implementation of Pareto smoothed importance sampling (PSIS), a method for stabilizing importance ratios. The version of PSIS implemented here corresponds to the algorithm presented in Vehtari, Simpson, Gelman, Yao, and Gabry (2024). For PSIS diagnostics see the pareto-k-diagnostic page.
psis(log_ratios, ...) ## S3 method for class 'array' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: psis(log_ratios, ..., r_eff = 1) is.psis(x) is.sis(x) is.tis(x)
psis(log_ratios, ...) ## S3 method for class 'array' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: psis(log_ratios, ..., r_eff = 1) is.psis(x) is.sis(x) is.tis(x)
log_ratios |
An array, matrix, or vector of importance ratios on the log scale (for PSIS-LOO these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. |
... |
Arguments passed on to the various methods. |
r_eff |
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
x |
For |
The psis()
methods return an object of class "psis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed (and truncated) but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class "psis"
.
diagnostics
A named list containing two vectors:
pareto_k
: Estimates of the shape parameter of the
generalized Pareto distribution. See the pareto-k-diagnostic
page for details.
n_eff
: PSIS effective sample size estimates.
Objects of class "psis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights
method to normalize the log weights.
tail_len
Vector of tail lengths used for fitting the generalized Pareto distribution.
r_eff
If specified, the user's r_eff
argument.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here psis
.
psis(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
psis(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
psis(default)
: A vector of length (posterior sample size).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
loo()
for approximate LOO-CV using PSIS.
pareto-k-diagnostic for PSIS diagnostics.
The loo package vignettes for demonstrations.
The FAQ page on the loo website for answers to frequently asked questions.
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) psis_result <- psis(log_ratios, r_eff = r_eff) str(psis_result) plot(psis_result) # extract smoothed weights lw <- weights(psis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(psis_result, normalize=FALSE) # unnormalized log-weights w <- weights(psis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) psis_result <- psis(log_ratios, r_eff = r_eff) str(psis_result) plot(psis_result) # extract smoothed weights lw <- weights(psis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(psis_result, normalize=FALSE) # unnormalized log-weights w <- weights(psis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weights
As of version 2.0.0
this function is deprecated. Please use the
psis()
function for the new PSIS algorithm.
psislw( lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("mc.cores", 1), llfun = NULL, llargs = NULL, ... )
psislw( lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("mc.cores", 1), llfun = NULL, llargs = NULL, ... )
lw |
A matrix or vector of log weights. For computing LOO, |
wcp |
The proportion of importance weights to use for the generalized
Pareto fit. The |
wtrunc |
For truncating very large weights to |
cores |
The number of cores to use for parallelization. This defaults to
the option |
llfun , llargs
|
See |
... |
Ignored when |
A named list with components lw_smooth
(modified log weights) and
pareto_k
(estimated generalized Pareto shape parameter(s) k).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
pareto-k-diagnostic for PSIS diagnostics.
relative_eff()
computes the the MCMC effective sample size divided by
the total sample size.
relative_eff(x, ...) ## Default S3 method: relative_eff(x, chain_id, ...) ## S3 method for class 'matrix' relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'array' relative_eff(x, ..., cores = getOption("mc.cores", 1)) ## S3 method for class ''function'' relative_eff( x, chain_id, ..., cores = getOption("mc.cores", 1), data = NULL, draws = NULL ) ## S3 method for class 'importance_sampling' relative_eff(x, ...)
relative_eff(x, ...) ## Default S3 method: relative_eff(x, chain_id, ...) ## S3 method for class 'matrix' relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1)) ## S3 method for class 'array' relative_eff(x, ..., cores = getOption("mc.cores", 1)) ## S3 method for class ''function'' relative_eff( x, chain_id, ..., cores = getOption("mc.cores", 1), data = NULL, draws = NULL ) ## S3 method for class 'importance_sampling' relative_eff(x, ...)
x |
A vector, matrix, 3-D array, or function. See the Methods (by
class) section below for details on specifying |
chain_id |
A vector of length |
cores |
The number of cores to use for parallelization. |
data , draws , ...
|
Same as for the |
A vector of relative effective sample sizes.
relative_eff(default)
: A vector of length (posterior sample size).
relative_eff(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
relative_eff(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
relative_eff(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the log-likelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
log-likelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
log-likelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
log-likelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise log-likelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the log-likelihood function.
The ...
can be used if your log-likelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
relative_eff(importance_sampling)
: If x
is an object of class "psis"
, relative_eff()
simply returns
the r_eff
attribute of x
.
LLarr <- example_loglik_array() LLmat <- example_loglik_matrix() dim(LLarr) dim(LLmat) rel_n_eff_1 <- relative_eff(exp(LLarr)) rel_n_eff_2 <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) all.equal(rel_n_eff_1, rel_n_eff_2)
LLarr <- example_loglik_array() LLmat <- example_loglik_matrix() dim(LLarr) dim(LLmat) rel_n_eff_1 <- relative_eff(exp(LLarr)) rel_n_eff_2 <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500)) all.equal(rel_n_eff_1, rel_n_eff_2)
Implementation of standard importance sampling (SIS).
sis(log_ratios, ...) ## S3 method for class 'array' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## Default S3 method: sis(log_ratios, ..., r_eff = NULL)
sis(log_ratios, ...) ## S3 method for class 'array' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1)) ## Default S3 method: sis(log_ratios, ..., r_eff = NULL)
log_ratios |
An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. |
... |
Arguments passed on to the various methods. |
r_eff |
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
The sis()
methods return an object of class "sis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class sis
.
diagnostics
A named list containing one vector:
pareto_k
: Not used in sis
, all set to 0.
n_eff
: effective sample size estimates.
Objects of class "sis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights
method to normalize the log weights.
r_eff
If specified, the user's r_eff
argument.
tail_len
Not used for sis
.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here sis
.
sis(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
sis(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
sis(default)
: A vector of length (posterior sample size).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
psis()
for approximate LOO-CV using PSIS.
loo()
for approximate LOO-CV.
pareto-k-diagnostic for PSIS diagnostics.
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) sis_result <- sis(log_ratios, r_eff = r_eff) str(sis_result) # extract smoothed weights lw <- weights(sis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(sis_result, normalize=FALSE) # unnormalized log-weights w <- weights(sis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) sis_result <- sis(log_ratios, r_eff = r_eff) str(sis_result) # extract smoothed weights lw <- weights(sis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(sis_result, normalize=FALSE) # unnormalized log-weights w <- weights(sis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weights
Implementation of truncated (self-normalized) importance sampling (TIS), truncated at S^(1/2) as recommended by Ionides (2008).
tis(log_ratios, ...) ## S3 method for class 'array' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: tis(log_ratios, ..., r_eff = 1)
tis(log_ratios, ...) ## S3 method for class 'array' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## S3 method for class 'matrix' tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1)) ## Default S3 method: tis(log_ratios, ..., r_eff = 1)
log_ratios |
An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. |
... |
Arguments passed on to the various methods. |
r_eff |
Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
The tis()
methods return an object of class "tis"
,
which is a named list with the following components:
log_weights
Vector or matrix of smoothed (and truncated) but unnormalized log
weights. To get normalized weights use the
weights()
method provided for objects of
class tis
.
diagnostics
A named list containing one vector:
pareto_k
: Not used in tis
, all set to 0.
n_eff
: Effective sample size estimates.
Objects of class "tis"
also have the following attributes:
norm_const_log
Vector of precomputed values of colLogSumExps(log_weights)
that are
used internally by the weights()
method to normalize the log weights.
r_eff
If specified, the user's r_eff
argument.
tail_len
Not used for tis
.
dims
Integer vector of length 2 containing S
(posterior sample size)
and N
(number of observations).
method
Method used for importance sampling, here tis
.
tis(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
tis(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
tis(default)
: A vector of length (posterior sample size).
Ionides, Edward L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17(2): 295–311.
psis()
for approximate LOO-CV using PSIS.
loo()
for approximate LOO-CV.
pareto-k-diagnostic for PSIS diagnostics.
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) tis_result <- tis(log_ratios, r_eff = r_eff) str(tis_result) # extract smoothed weights lw <- weights(tis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(tis_result, normalize=FALSE) # unnormalized log-weights w <- weights(tis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weights
log_ratios <- -1 * example_loglik_array() r_eff <- relative_eff(exp(-log_ratios)) tis_result <- tis(log_ratios, r_eff = r_eff) str(tis_result) # extract smoothed weights lw <- weights(tis_result) # default args are log=TRUE, normalize=TRUE ulw <- weights(tis_result, normalize=FALSE) # unnormalized log-weights w <- weights(tis_result, log=FALSE) # normalized weights (not log-weights) uw <- weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weights
psis_loo_ss
objectsUpdate psis_loo_ss
objects
## S3 method for class 'psis_loo_ss' update( object, ..., data = NULL, draws = NULL, observations = NULL, r_eff = 1, cores = getOption("mc.cores", 1), loo_approximation = NULL, loo_approximation_draws = NULL, llgrad = NULL, llhess = NULL )
## S3 method for class 'psis_loo_ss' update( object, ..., data = NULL, draws = NULL, observations = NULL, r_eff = 1, cores = getOption("mc.cores", 1), loo_approximation = NULL, loo_approximation_draws = NULL, llgrad = NULL, llhess = NULL )
object |
A |
... |
Currently not used. |
data , draws
|
|
observations |
The subsample observations to use. The argument can take four (4) types of arguments:
|
r_eff |
Vector of relative effective sample size estimates for the
likelihood ( |
cores |
The number of cores to use for parallelization. This defaults to
the option
|
loo_approximation |
What type of approximation of the loo_i's should be used?
The default is
As point estimates of |
loo_approximation_draws |
The number of posterior draws used when
integrating over the posterior. This is used if |
llgrad |
The gradient of the log-likelihood. This
is only used when |
llhess |
The Hessian of the log-likelihood. This is only used
with |
If observations
is updated then if a vector of indices or a psis_loo_ss
object is supplied the updated object will have exactly the observations
indicated by the vector or psis_loo_ss
object. If a single integer is
supplied, new observations will be sampled to reach the supplied sample size.
A psis_loo_ss
object.
The waic()
methods can be used to compute WAIC from the pointwise
log-likelihood. However, we recommend LOO-CV using PSIS (as implemented by
the loo()
function) because PSIS provides useful diagnostics as well as
effective sample size and Monte Carlo estimates.
waic(x, ...) ## S3 method for class 'array' waic(x, ...) ## S3 method for class 'matrix' waic(x, ...) ## S3 method for class ''function'' waic(x, ..., data = NULL, draws = NULL) is.waic(x)
waic(x, ...) ## S3 method for class 'array' waic(x, ...) ## S3 method for class 'matrix' waic(x, ...) ## S3 method for class ''function'' waic(x, ..., data = NULL, draws = NULL) is.waic(x)
x |
A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. |
draws , data , ...
|
For the function method only. See the Methods (by class) section below for details on these arguments. |
A named list (of class c("waic", "loo")
) with components:
estimates
A matrix with two columns ("Estimate"
, "SE"
) and three
rows ("elpd_waic"
, "p_waic"
, "waic"
). This contains
point estimates and standard errors of the expected log pointwise predictive
density (elpd_waic
), the effective number of parameters
(p_waic
) and the information criterion waic
(which is just
-2 * elpd_waic
, i.e., converted to deviance scale).
pointwise
A matrix with three columns (and number of rows equal to the number of
observations) containing the pointwise contributions of each of the above
measures (elpd_waic
, p_waic
, waic
).
waic(array)
: An by
by
array, where
is the number of MCMC iterations per chain,
is the number of
chains, and
is the number of data points.
waic(matrix)
: An by
matrix, where
is the size
of the posterior sample (with all chains merged) and
is the number
of data points.
waic(`function`)
: A function f()
that takes arguments data_i
and draws
and returns a
vector containing the log-likelihood for a single observation i
evaluated
at each posterior draw. The function should be written such that, for each
observation i
in 1:N
, evaluating
f(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length S
(size of posterior sample). The
log-likelihood function can also have additional arguments but data_i
and
draws
are required.
If using the function method then the arguments data
and draws
must also
be specified in the call to loo()
:
data
: A data frame or matrix containing the data (e.g.
observed outcome and predictors) needed to compute the pointwise
log-likelihood. For each observation i
, the i
th row of
data
will be passed to the data_i
argument of the
log-likelihood function.
draws
: An object containing the posterior draws for any
parameters needed to compute the pointwise log-likelihood. Unlike
data
, which is indexed by observation, for each observation the
entire object draws
will be passed to the draws
argument of
the log-likelihood function.
The ...
can be used if your log-likelihood function takes additional
arguments. These arguments are used like the draws
argument in that they
are recycled for each observation.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571-3594.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
The loo package vignettes and
Vehtari, Gelman, and Gabry (2017) and Vehtari, Simpson, Gelman, Yao,
and Gabry (2024) for more details on why we prefer loo()
to waic()
.
loo_compare()
for comparing models on approximate LOO-CV or WAIC.
### Array and matrix methods LLarr <- example_loglik_array() dim(LLarr) LLmat <- example_loglik_matrix() dim(LLmat) waic_arr <- waic(LLarr) waic_mat <- waic(LLmat) identical(waic_arr, waic_mat) ## Not run: log_lik1 <- extract_log_lik(stanfit1) log_lik2 <- extract_log_lik(stanfit2) (waic1 <- waic(log_lik1)) (waic2 <- waic(log_lik2)) print(compare(waic1, waic2), digits = 2) ## End(Not run)
### Array and matrix methods LLarr <- example_loglik_array() dim(LLarr) LLmat <- example_loglik_matrix() dim(LLmat) waic_arr <- waic(LLarr) waic_mat <- waic(LLmat) identical(waic_arr, waic_mat) ## Not run: log_lik1 <- extract_log_lik(stanfit1) log_lik2 <- extract_log_lik(stanfit2) (waic1 <- waic(log_lik1)) (waic2 <- waic(log_lik2)) print(compare(waic1, waic2), digits = 2) ## End(Not run)
Extract importance sampling weights
## S3 method for class 'importance_sampling' weights(object, ..., log = TRUE, normalize = TRUE)
## S3 method for class 'importance_sampling' weights(object, ..., log = TRUE, normalize = TRUE)
object |
|
... |
Ignored. |
log |
Should the weights be returned on the log scale? Defaults to
|
normalize |
Should the weights be normalized? Defaults to |
The weights()
method returns an object with the same dimensions as
the log_weights
component of object
. The normalize
and log
arguments control whether the returned weights are normalized and whether
or not to return them on the log scale.
# See the examples at help("psis")
# See the examples at help("psis")