Title: | Plotting for Bayesian Models |
---|---|
Description: | Plotting functions for posterior analysis, MCMC diagnostics, prior and posterior predictive checks, and other visualizations to support the applied Bayesian workflow advocated in Gabry, Simpson, Vehtari, Betancourt, and Gelman (2019) <doi:10.1111/rssa.12378>. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with 'Stan'. |
Authors: | Jonah Gabry [aut, cre], Tristan Mahr [aut], Paul-Christian Bürkner [ctb], Martin Modrák [ctb], Malcolm Barrett [ctb], Frank Weber [ctb], Eduardo Coronado Sroka [ctb], Teemu Sailynoja [ctb], Aki Vehtari [ctb] |
Maintainer: | Jonah Gabry <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.11.1.9000 |
Built: | 2024-09-01 06:35:17 UTC |
Source: | https://github.com/stan-dev/bayesplot |
Stan Development Team
The bayesplot package provides a variety of ggplot2-based plotting functions for use after fitting Bayesian models (typically, though not exclusively, via Markov chain Monte Carlo). The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of packages for Bayesian modeling, particularly (but not necessarily) packages powered by RStan (the R interface to Stan). Examples of packages that will soon (or already are) using bayesplot are rstan itself, as well as the rstan-dependent rstanarm and brms packages for applied regression modeling.
The plotting functions in bayesplot are organized into several modules:
MCMC: Visualizations of Markov chain Monte Carlo (MCMC) simulations generated by any MCMC algorithm as well as diagnostics. There are also additional functions specifically for use with models fit using the No-U-Turn Sampler (NUTS).
PPC: Graphical (posterior or prior) predictive checks (PPCs).
PPD: Plots of (posterior or prior) predictive distributions without comparisons to observed data.
Online documentation and vignettes: Visit the bayesplot website at https://mc-stan.org/bayesplot/
Bug reports and feature requests: If you would like to request a new feature or if you have noticed a bug that needs to be fixed please let us know at the bayesplot issue tracker at https://github.com/stan-dev/bayesplot/issues/
General questions and help: To ask a question about bayesplot on the Stan Forums forum please visit https://discourse.mc-stan.org.
Maintainer: Jonah Gabry [email protected]
Authors:
Tristan Mahr
Other contributors:
Paul-Christian Bürkner [contributor]
Martin Modrák [contributor]
Malcolm Barrett [contributor]
Frank Weber [contributor]
Eduardo Coronado Sroka [contributor]
Teemu Sailynoja [contributor]
Aki Vehtari [contributor]
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
theme_default()
for the default ggplot theme used by
bayesplot and bayesplot_theme_set()
to change it.
bayesplot-colors to set or view the color scheme used for plotting.
ggplot2::ggsave()
for saving plots.
# A few quick examples (all of the functions have many examples # on their individual help pages) # MCMC plots x <- example_mcmc_draws(params = 5) mcmc_intervals(x, prob = 0.5) mcmc_intervals(x, regex_pars = "beta") color_scheme_set("purple") mcmc_areas(x, regex_pars = "beta", prob = 0.8) color_scheme_set("mix-blue-red") mcmc_trace(x, pars = c("alpha", "sigma"), facet_args = list(nrow = 2)) color_scheme_set("brightblue") mcmc_scatter(x, pars = c("beta[1]", "sigma"), transformations = list(sigma = "log")) # Graphical PPCs y <- example_y_data() yrep <- example_yrep_draws() ppc_dens_overlay(y, yrep[1:50, ]) color_scheme_set("pink") ppc_stat(y, yrep, stat = "median") + grid_lines() ppc_hist(y, yrep[1:8, ]) # Same plots but without y (using ppd_ instead of ppc_) bayesplot_theme_set(ggplot2::theme_gray()) ypred <- yrep ppd_dens_overlay(ypred[1:50, ]) ppd_stat(ypred, stat = "median") + grid_lines() ppd_hist(ypred[1:8, ])
# A few quick examples (all of the functions have many examples # on their individual help pages) # MCMC plots x <- example_mcmc_draws(params = 5) mcmc_intervals(x, prob = 0.5) mcmc_intervals(x, regex_pars = "beta") color_scheme_set("purple") mcmc_areas(x, regex_pars = "beta", prob = 0.8) color_scheme_set("mix-blue-red") mcmc_trace(x, pars = c("alpha", "sigma"), facet_args = list(nrow = 2)) color_scheme_set("brightblue") mcmc_scatter(x, pars = c("beta[1]", "sigma"), transformations = list(sigma = "log")) # Graphical PPCs y <- example_y_data() yrep <- example_yrep_draws() ppc_dens_overlay(y, yrep[1:50, ]) color_scheme_set("pink") ppc_stat(y, yrep, stat = "median") + grid_lines() ppc_hist(y, yrep[1:8, ]) # Same plots but without y (using ppd_ instead of ppc_) bayesplot_theme_set(ggplot2::theme_gray()) ypred <- yrep ppd_dens_overlay(ypred[1:50, ]) ppd_stat(ypred, stat = "median") + grid_lines() ppd_hist(ypred[1:8, ])
Get or view the names of available plotting or data functions
available_ppc(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE) available_ppd(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE) available_mcmc( pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE )
available_ppc(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE) available_ppd(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE) available_mcmc( pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE )
pattern , fixed , invert
|
Passed to |
plots_only |
If |
A possibly empty character vector of function names with several
additional attributes (for use by a custom print method). If pattern
is missing then the returned object contains the names of all available
plotting functions in the MCMC, PPC, or PPD module, depending on
which function is called. If pattern
is specified then a subset of
function names is returned.
available_mcmc() available_mcmc("nuts") available_mcmc("rhat|neff") available_ppc() available_ppc("grouped") available_ppc("grouped", invert = TRUE) available_ppd() available_ppd("grouped") # can also see which functions that return data are available available_ppc(plots_only = FALSE) # only show the _data functions available_ppc("_data", plots_only = FALSE) available_ppd("_data", plots_only = FALSE) available_mcmc("_data", plots_only = FALSE)
available_mcmc() available_mcmc("nuts") available_mcmc("rhat|neff") available_ppc() available_ppc("grouped") available_ppc("grouped", invert = TRUE) available_ppd() available_ppd("grouped") # can also see which functions that return data are available available_ppc(plots_only = FALSE) # only show the _data functions available_ppc("_data", plots_only = FALSE) available_ppd("_data", plots_only = FALSE) available_mcmc("_data", plots_only = FALSE)
The bayesplot_grid
function makes it simple to juxtapose plots using
common and/or
axes.
bayesplot_grid( ..., plots = list(), xlim = NULL, ylim = NULL, grid_args = list(), titles = character(), subtitles = character(), legends = TRUE, save_gg_objects = TRUE )
bayesplot_grid( ..., plots = list(), xlim = NULL, ylim = NULL, grid_args = list(), titles = character(), subtitles = character(), legends = TRUE, save_gg_objects = TRUE )
... |
One or more ggplot objects. |
plots |
A list of ggplot objects. Can be used as an alternative to
specifying plot objects via |
xlim , ylim
|
Optionally, numeric vectors of length 2 specifying lower and upper limits for the axes that will be shared across all plots. |
grid_args |
An optional named list of arguments to pass to
|
titles , subtitles
|
Optional character vectors of plot titles and
subtitles. If specified, |
legends |
If any of the plots have legends should they be displayed?
Defaults to |
save_gg_objects |
If |
An object of class "bayesplot_grid"
(essentially a gtable object
from gridExtra::arrangeGrob()
), which has a plot
method.
y <- example_y_data() yrep <- example_yrep_draws() stats <- c("sd", "median", "max", "min") color_scheme_set("pink") bayesplot_grid( plots = lapply(stats, function(s) ppc_stat(y, yrep, stat = s)), titles = stats, legends = FALSE, grid_args = list(ncol = 1) ) ## Not run: library(rstanarm) mtcars$log_mpg <- log(mtcars$mpg) fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) fit2 <- stan_glm(log_mpg ~ wt, data = mtcars, refresh = 0) y <- mtcars$mpg yrep1 <- posterior_predict(fit1, draws = 50) yrep2 <- posterior_predict(fit2, fun = exp, draws = 50) color_scheme_set("blue") ppc1 <- ppc_dens_overlay(y, yrep1) ppc1 ppc1 + yaxis_text() color_scheme_set("red") ppc2 <- ppc_dens_overlay(y, yrep2) bayesplot_grid(ppc1, ppc2) # make sure the plots use the same limits for the axes bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2)) # remove the legends and add text bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2), legends = FALSE, subtitles = rep("Predicted MPG", 2)) ## End(Not run)
y <- example_y_data() yrep <- example_yrep_draws() stats <- c("sd", "median", "max", "min") color_scheme_set("pink") bayesplot_grid( plots = lapply(stats, function(s) ppc_stat(y, yrep, stat = s)), titles = stats, legends = FALSE, grid_args = list(ncol = 1) ) ## Not run: library(rstanarm) mtcars$log_mpg <- log(mtcars$mpg) fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) fit2 <- stan_glm(log_mpg ~ wt, data = mtcars, refresh = 0) y <- mtcars$mpg yrep1 <- posterior_predict(fit1, draws = 50) yrep2 <- posterior_predict(fit2, fun = exp, draws = 50) color_scheme_set("blue") ppc1 <- ppc_dens_overlay(y, yrep1) ppc1 ppc1 + yaxis_text() color_scheme_set("red") ppc2 <- ppc_dens_overlay(y, yrep2) bayesplot_grid(ppc1, ppc2) # make sure the plots use the same limits for the axes bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2)) # remove the legends and add text bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2), legends = FALSE, subtitles = rep("Predicted MPG", 2)) ## End(Not run)
These functions are the bayesplot equivalent to
ggplot2's ggplot2::theme_set()
and friends. They set, get, and update
the active theme but only apply them to bayesplots
. The current/active
theme is automatically applied to every bayesplot
you draw.
Use bayesplot_theme_get()
to get the current bayesplot theme and
bayesplot_theme_set()
to set a new theme. bayesplot_theme_update()
and
bayesplot_theme_replace()
are shorthands for changing individual elements.
bayesplot_theme_get() bayesplot_theme_set(new = theme_default()) bayesplot_theme_update(...) bayesplot_theme_replace(...)
bayesplot_theme_get() bayesplot_theme_set(new = theme_default()) bayesplot_theme_update(...) bayesplot_theme_replace(...)
new |
The new theme (list of theme elements) to use. This is analogous
to the |
... |
A named list of theme settings. |
bayesplot_theme_set()
and friends only apply to bayesplots
.
However, ggplot2::theme_set()
can also be used to change the
bayesplot theme. Currently, setting a theme with ggplot2::theme_set()
(other than the ggplot2 default ggplot2::theme_grey()
) will override
the bayesplot theme.
bayesplot_theme_get()
returns the current theme. The other three
functions (set, update, replace) invisibly return the previous theme
so it can be saved and easily restored later. This is the same behavior as
the ggplot2 versions of these functions.
theme_default()
for the default bayesplot theme.
bayesplot-helpers for a variety of convenience functions, many of which provide shortcuts for tweaking theme elements after creating a plot.
bayesplot-colors to set or view the color scheme used for plotting.
library(ggplot2) # plot using the current value of bayesplot_theme_get() # (the default is bayesplot::theme_default()) x <- example_mcmc_draws() mcmc_hist(x) # change the bayesplot theme to theme_minimal and save the old theme old <- bayesplot_theme_set(theme_minimal()) mcmc_hist(x) # change back to the previous theme bayesplot_theme_set(old) mcmc_hist(x) # change the default font size and family for bayesplots bayesplot_theme_update(text = element_text(size = 16, family = "sans")) mcmc_hist(x) # change back to the default bayesplot_theme_set() # same as bayesplot_theme_set(theme_default()) mcmc_hist(x) # updating theme elements color_scheme_set("brightblue") bayesplot_theme_set(theme_dark()) mcmc_hist(x) bayesplot_theme_update(panel.background = element_rect(fill = "black")) mcmc_hist(x) # to get the same plot without updating the theme we could also have # used the bayeplot convenience function panel_bg() bayesplot_theme_set(theme_dark()) mcmc_hist(x) + panel_bg(fill = "black") # reset bayesplot_theme_set()
library(ggplot2) # plot using the current value of bayesplot_theme_get() # (the default is bayesplot::theme_default()) x <- example_mcmc_draws() mcmc_hist(x) # change the bayesplot theme to theme_minimal and save the old theme old <- bayesplot_theme_set(theme_minimal()) mcmc_hist(x) # change back to the previous theme bayesplot_theme_set(old) mcmc_hist(x) # change the default font size and family for bayesplots bayesplot_theme_update(text = element_text(size = 16, family = "sans")) mcmc_hist(x) # change back to the default bayesplot_theme_set() # same as bayesplot_theme_set(theme_default()) mcmc_hist(x) # updating theme elements color_scheme_set("brightblue") bayesplot_theme_set(theme_dark()) mcmc_hist(x) bayesplot_theme_update(panel.background = element_rect(fill = "black")) mcmc_hist(x) # to get the same plot without updating the theme we could also have # used the bayeplot convenience function panel_bg() bayesplot_theme_set(theme_dark()) mcmc_hist(x) + panel_bg(fill = "black") # reset bayesplot_theme_set()
Set, get, or view color schemes. Choose from a preset scheme or create a custom scheme. See the Available color schemes section below for a list of available scheme names. The Custom color schemes section describes how to specify a custom scheme.
color_scheme_set(scheme = "blue") color_scheme_get(scheme = NULL, i = NULL) color_scheme_view(scheme = NULL)
color_scheme_set(scheme = "blue") color_scheme_get(scheme = NULL, i = NULL) color_scheme_view(scheme = NULL)
scheme |
For For For See the Available color schemes section below for a list of available scheme names. The Custom color schemes section describes how to specify a custom scheme. |
i |
For |
color_scheme_set()
has the side effect of setting the color scheme
used for plotting. It also returns (invisibly) a list of
the hexadecimal color values used in scheme
.
color_scheme_get()
returns a list of the hexadecimal color
values (without changing the current scheme). If the scheme
argument
is not specified the returned values correspond to the current color
scheme. If the optional argument i
is specified then the returned
list only contains length(i)
elements.
color_scheme_view()
returns a ggplot object if only a single scheme is
specified and a gtable object if multiple schemes names are specified.
Currently, the available preset color schemes are:
"blue"
, "brightblue"
"gray"
, "darkgray"
"green"
"pink"
"purple"
"red"
"teal"
"yellow"
"viridis"
, "viridisA"
,
"viridisB"
, "viridisC"
, "viridisD"
, "viridisE"
"mix-x-y"
, replacing x
and y
with any two of
the scheme names listed above (e.g. "mix-teal-pink", "mix-blue-red",
etc.). The order of x
and y
matters, i.e., the color schemes
"mix-blue-red"
and "mix-red-blue"
are not identical. There is no
guarantee that every possible mixed scheme will look good with every
possible plot.
"brewer-x"
, replacing x
with the name of a palette available from
RColorBrewer::brewer.pal()
(e.g., brewer-PuBuGn
).
If you have a suggestion for a new color scheme please let us know via the bayesplot issue tracker.
A bayesplot color scheme consists of six colors. To specify a custom color scheme simply pass a character vector containing either the names of six colors or six hexadecimal color values (or a mix of names and hex values). The colors should be in order from lightest to darkest. See the end of the Examples section for a demonstration.
theme_default()
for the default ggplot theme used by
bayesplot and bayesplot_theme_set()
to change it.
color_scheme_set("blue") color_scheme_view() color_scheme_get() color_scheme_get(i = c(3, 5)) # 3rd and 5th colors only color_scheme_get("brightblue") color_scheme_view("brightblue") # compare multiple schemes color_scheme_view(c("pink", "gray", "teal")) color_scheme_view(c("viridis", "viridisA", "viridisB", "viridisC")) color_scheme_set("pink") x <- example_mcmc_draws() mcmc_intervals(x) color_scheme_set("teal") color_scheme_view() mcmc_intervals(x) color_scheme_set("red") mcmc_areas(x, regex_pars = "beta") color_scheme_set("purple") color_scheme_view() y <- example_y_data() yrep <- example_yrep_draws() ppc_stat(y, yrep, stat = "mean") + legend_none() ############################ ### Mixing color schemes ### ############################ color_scheme_set("mix-teal-pink") ppc_stat(y, yrep, stat = "sd") + legend_none() mcmc_areas(x, regex_pars = "beta") ########################## ### ColorBrewer scheme ### ########################## color_scheme_set("brewer-Spectral") color_scheme_view() mcmc_trace(x, pars = "sigma") ########################### ### Custom color scheme ### ########################### orange_scheme <- c("#ffebcc", "#ffcc80", "#ffad33", "#e68a00", "#995c00", "#663d00") color_scheme_set(orange_scheme) color_scheme_view() mcmc_areas(x, regex_pars = "alpha") mcmc_dens_overlay(x) ppc_stat(y, yrep, stat = "var") + legend_none()
color_scheme_set("blue") color_scheme_view() color_scheme_get() color_scheme_get(i = c(3, 5)) # 3rd and 5th colors only color_scheme_get("brightblue") color_scheme_view("brightblue") # compare multiple schemes color_scheme_view(c("pink", "gray", "teal")) color_scheme_view(c("viridis", "viridisA", "viridisB", "viridisC")) color_scheme_set("pink") x <- example_mcmc_draws() mcmc_intervals(x) color_scheme_set("teal") color_scheme_view() mcmc_intervals(x) color_scheme_set("red") mcmc_areas(x, regex_pars = "beta") color_scheme_set("purple") color_scheme_view() y <- example_y_data() yrep <- example_yrep_draws() ppc_stat(y, yrep, stat = "mean") + legend_none() ############################ ### Mixing color schemes ### ############################ color_scheme_set("mix-teal-pink") ppc_stat(y, yrep, stat = "sd") + legend_none() mcmc_areas(x, regex_pars = "beta") ########################## ### ColorBrewer scheme ### ########################## color_scheme_set("brewer-Spectral") color_scheme_view() mcmc_trace(x, pars = "sigma") ########################### ### Custom color scheme ### ########################### orange_scheme <- c("#ffebcc", "#ffcc80", "#ffad33", "#e68a00", "#995c00", "#663d00") color_scheme_set(orange_scheme) color_scheme_view() mcmc_areas(x, regex_pars = "alpha") mcmc_dens_overlay(x) ppc_stat(y, yrep, stat = "var") + legend_none()
Generics and methods for extracting quantities needed for plotting from various types of model objects. Currently methods are provided for stanfit (rstan), CmdStanMCMC (cmdstanr), and stanreg (rstanarm) objects, but adding new methods should be relatively straightforward.
log_posterior(object, ...) nuts_params(object, ...) rhat(object, ...) neff_ratio(object, ...) ## S3 method for class 'stanfit' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'stanreg' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'CmdStanMCMC' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'stanfit' nuts_params(object, pars = NULL, inc_warmup = FALSE, ...) ## S3 method for class 'stanreg' nuts_params(object, pars = NULL, inc_warmup = FALSE, ...) ## S3 method for class 'list' nuts_params(object, pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' nuts_params(object, pars = NULL, ...) ## S3 method for class 'stanfit' rhat(object, pars = NULL, ...) ## S3 method for class 'stanreg' rhat(object, pars = NULL, regex_pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' rhat(object, pars = NULL, ...) ## S3 method for class 'stanfit' neff_ratio(object, pars = NULL, ...) ## S3 method for class 'stanreg' neff_ratio(object, pars = NULL, regex_pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' neff_ratio(object, pars = NULL, ...)
log_posterior(object, ...) nuts_params(object, ...) rhat(object, ...) neff_ratio(object, ...) ## S3 method for class 'stanfit' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'stanreg' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'CmdStanMCMC' log_posterior(object, inc_warmup = FALSE, ...) ## S3 method for class 'stanfit' nuts_params(object, pars = NULL, inc_warmup = FALSE, ...) ## S3 method for class 'stanreg' nuts_params(object, pars = NULL, inc_warmup = FALSE, ...) ## S3 method for class 'list' nuts_params(object, pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' nuts_params(object, pars = NULL, ...) ## S3 method for class 'stanfit' rhat(object, pars = NULL, ...) ## S3 method for class 'stanreg' rhat(object, pars = NULL, regex_pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' rhat(object, pars = NULL, ...) ## S3 method for class 'stanfit' neff_ratio(object, pars = NULL, ...) ## S3 method for class 'stanreg' neff_ratio(object, pars = NULL, regex_pars = NULL, ...) ## S3 method for class 'CmdStanMCMC' neff_ratio(object, pars = NULL, ...)
object |
The object to use. |
... |
Arguments passed to individual methods. |
inc_warmup |
A logical scalar (defaulting to |
pars |
An optional character vector of parameter names. For
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
log_posterior()
log_posterior()
methods return a molten data frame (see reshape2::melt()
).
The data frame should have columns "Iteration"
(integer), "Chain"
(integer), and "Value"
(numeric). See Examples, below.
nuts_params()
nuts_params()
methods return a molten data frame (see reshape2::melt()
).
The data frame should have columns "Parameter"
(factor), "Iteration"
(integer), "Chain"
(integer), and "Value"
(numeric). See Examples, below.
rhat()
, neff_ratio()
Methods return (named) vectors.
## Not run: library(rstanarm) fit <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) np <- nuts_params(fit) head(np) tail(np) lp <- log_posterior(fit) head(lp) tail(lp) ## End(Not run)
## Not run: library(rstanarm) fit <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) np <- nuts_params(fit) head(np) tail(np) lp <- log_posterior(fit) head(lp) tail(lp) ## End(Not run)
Convenience functions for adding to (and changing details of) ggplot objects (many of the objects returned by bayesplot functions). See the Examples section, below.
vline_at(v, fun, ..., na.rm = TRUE) hline_at(v, fun, ..., na.rm = TRUE) vline_0(..., na.rm = TRUE) hline_0(..., na.rm = TRUE) abline_01(..., na.rm = TRUE) lbub(p, med = TRUE) legend_move(position = "right") legend_none() legend_text(...) xaxis_title(on = TRUE, ...) xaxis_text(on = TRUE, ...) xaxis_ticks(on = TRUE, ...) yaxis_title(on = TRUE, ...) yaxis_text(on = TRUE, ...) yaxis_ticks(on = TRUE, ...) facet_text(on = TRUE, ...) facet_bg(on = TRUE, ...) panel_bg(on = TRUE, ...) plot_bg(on = TRUE, ...) grid_lines(color = "gray50", size = 0.2) overlay_function(...)
vline_at(v, fun, ..., na.rm = TRUE) hline_at(v, fun, ..., na.rm = TRUE) vline_0(..., na.rm = TRUE) hline_0(..., na.rm = TRUE) abline_01(..., na.rm = TRUE) lbub(p, med = TRUE) legend_move(position = "right") legend_none() legend_text(...) xaxis_title(on = TRUE, ...) xaxis_text(on = TRUE, ...) xaxis_ticks(on = TRUE, ...) yaxis_title(on = TRUE, ...) yaxis_text(on = TRUE, ...) yaxis_ticks(on = TRUE, ...) facet_text(on = TRUE, ...) facet_bg(on = TRUE, ...) panel_bg(on = TRUE, ...) plot_bg(on = TRUE, ...) grid_lines(color = "gray50", size = 0.2) overlay_function(...)
v |
Either a numeric vector specifying the value(s) at which to
draw the vertical or horizontal line(s), or an object of any type to use as
the first argument to |
fun |
A function, or the name of a function, that returns a numeric vector. |
... |
For the various For functions ending in For functions ending in For For |
na.rm |
A logical scalar passed to the appropriate geom (e.g.
|
p |
The probability mass (in |
med |
Should the median also be included in addition to the lower and upper bounds of the interval? |
position |
The position of the legend. Either a numeric vector (of
length 2) giving the relative coordinates (between 0 and 1) for the legend,
or a string among |
on |
For functions modifying ggplot theme elements,
set |
color , size
|
Passed to |
vline_at()
and hline_at()
return an object created by either
ggplot2::geom_vline()
or ggplot2::geom_hline()
that can be added to a
ggplot object to draw a vertical or horizontal line (at one or several
values). If fun
is missing then the lines are drawn at the values in v
.
If fun
is specified then the lines are drawn at the values returned by fun(v)
.
vline_0()
and hline_0()
are wrappers for vline_at()
and hline_at()
with v = 0
and fun
missing.
abline_01()
is a wrapper for ggplot2::geom_abline()
with the intercept
set to 0
and the slope set to 1
.
lbub()
returns a function that takes a single argument x
and returns
the lower and upper bounds (lb
, ub
) of the 100*p
\
of x
, as well as the median (if med=TRUE
).
facet_text()
returns ggplot2 theme objects that can be added to an
existing plot (ggplot object) to format the text in facet strips.
facet_bg()
can be added to a plot to change the background of the facet strips.
legend_move()
and legend_none()
return a ggplot2 theme object that can
be added to an existing plot (ggplot object) in order to change the
position of the legend or remove it.
legend_text()
works much like facet_text()
but for the legend.
-axis and
-axis featuresxaxis_title()
and yaxis_title()
return a ggplot2 theme object
that can be added to an existing plot (ggplot object) in order to toggle or
format the titles displayed on the x
or y
axis. (To change
the titles themselves use ggplot2::labs()
.)
xaxis_text()
and yaxis_text()
return a ggplot2 theme object
that can be added to an existing plot (ggplot object) in order to toggle or
format the text displayed on the x
or y
axis (e.g. tick
labels).
xaxis_ticks()
and yaxis_ticks()
return a ggplot2 theme object
that can be added to an existing plot (ggplot object) to change the
appearance of the axis tick marks.
plot_bg()
returns a ggplot2 theme object that can be added to an
existing plot (ggplot object) to format the background of the entire plot.
panel_bg()
returns a ggplot2 theme object that can be added to an
existing plot (ggplot object) to format the background of the just the
plotting area.
grid_lines()
returns a ggplot2 theme object that can be added to
an existing plot (ggplot object) to add grid lines to the plot background.
overlay_function()
is a simple wrapper for ggplot2::stat_function()
but
with the inherit.aes
argument fixed to FALSE
. Fixing inherit.aes=FALSE
will avoid potential errors due to the ggplot2::aes()
thetic mapping used by
certain bayesplot plotting functions.
A ggplot2 layer or ggplot2::theme()
object that can be
added to existing ggplot objects, like those created by many of the
bayesplot plotting functions. See the Details section.
theme_default()
for the default ggplot theme used by
bayesplot.
color_scheme_set("gray") x <- example_mcmc_draws(chains = 1) dim(x) colnames(x) ################################### ### vertical & horizontal lines ### ################################### (p <- mcmc_intervals(x, regex_pars = "beta")) # vertical line at zero (with some optional styling) p + vline_0() p + vline_0(linewidth = 0.25, color = "darkgray", linetype = 2) # vertical line(s) at specified values v <- c(-0.5, 0, 0.5) p + vline_at(v, linetype = 3, linewidth = 0.25) my_lines <- vline_at(v, alpha = 0.25, linewidth = 0.75 * c(1, 2, 1), color = c("maroon", "skyblue", "violet")) p + my_lines # add vertical line(s) at computed values # (three ways of getting lines at column means) color_scheme_set("brightblue") p <- mcmc_intervals(x, regex_pars = "beta") p + vline_at(x[, 3:4], colMeans) p + vline_at(x[, 3:4], "colMeans", color = "darkgray", lty = 2, linewidth = 0.25) p + vline_at(x[, 3:4], function(a) apply(a, 2, mean), color = "orange", linewidth = 2, alpha = 0.1) # using the lbub function to get interval lower and upper bounds (lb, ub) color_scheme_set("pink") parsed <- ggplot2::label_parsed p2 <- mcmc_hist(x, pars = "beta[1]", binwidth = 1/20, facet_args = list(labeller = parsed)) (p2 <- p2 + facet_text(size = 16)) b1 <- x[, "beta[1]"] p2 + vline_at(b1, fun = lbub(0.8), color = "gray20", linewidth = 2 * c(1,.5,1), alpha = 0.75) p2 + vline_at(b1, lbub(0.8, med = FALSE), color = "gray20", linewidth = 2, alpha = 0.75) ########################## ### format axis titles ### ########################## color_scheme_set("green") y <- example_y_data() yrep <- example_yrep_draws() (p3 <- ppc_stat(y, yrep, stat = "median", binwidth = 1/4)) # turn off the legend, turn on x-axis title p3 + legend_none() + xaxis_title(size = 13, family = "sans") + ggplot2::xlab(expression(italic(T(y)) == median(italic(y)))) ################################ ### format axis & facet text ### ################################ color_scheme_set("gray") p4 <- mcmc_trace(example_mcmc_draws(), pars = c("alpha", "sigma")) myfacets <- facet_bg(fill = "gray30", color = NA) + facet_text(face = "bold", color = "skyblue", size = 14) p4 + myfacets ########################## ### control tick marks ### ########################## p4 + myfacets + yaxis_text(FALSE) + yaxis_ticks(FALSE) + xaxis_ticks(linewidth = 1, color = "skyblue") ############################## ### change plot background ### ############################## color_scheme_set("blue") # add grid lines ppc_stat(y, yrep) + grid_lines() # panel_bg vs plot_bg ppc_scatter_avg(y, yrep) + panel_bg(fill = "gray90") ppc_scatter_avg(y, yrep) + plot_bg(fill = "gray90") color_scheme_set("yellow") p5 <- ppc_scatter_avg(y, yrep, alpha = 1) p5 + panel_bg(fill = "gray20") + grid_lines(color = "white") color_scheme_set("purple") ppc_dens_overlay(y, yrep[1:30, ]) + legend_text(size = 14) + legend_move(c(0.75, 0.5)) + plot_bg(fill = "gray90") + panel_bg(color = "black", fill = "gray99", linewidth = 3) ############################################### ### superimpose a function on existing plot ### ############################################### # compare posterior of beta[1] to Gaussian with same posterior mean # and sd as beta[1] x <- example_mcmc_draws(chains = 4) dim(x) purple_gaussian <- overlay_function( fun = dnorm, args = list(mean(x[,, "beta[1]"]), sd(x[,, "beta[1]"])), color = "purple", linewidth = 2 ) color_scheme_set("gray") mcmc_hist(x, pars = "beta[1]", freq = FALSE) + purple_gaussian mcmc_dens(x, pars = "beta[1]") + purple_gaussian
color_scheme_set("gray") x <- example_mcmc_draws(chains = 1) dim(x) colnames(x) ################################### ### vertical & horizontal lines ### ################################### (p <- mcmc_intervals(x, regex_pars = "beta")) # vertical line at zero (with some optional styling) p + vline_0() p + vline_0(linewidth = 0.25, color = "darkgray", linetype = 2) # vertical line(s) at specified values v <- c(-0.5, 0, 0.5) p + vline_at(v, linetype = 3, linewidth = 0.25) my_lines <- vline_at(v, alpha = 0.25, linewidth = 0.75 * c(1, 2, 1), color = c("maroon", "skyblue", "violet")) p + my_lines # add vertical line(s) at computed values # (three ways of getting lines at column means) color_scheme_set("brightblue") p <- mcmc_intervals(x, regex_pars = "beta") p + vline_at(x[, 3:4], colMeans) p + vline_at(x[, 3:4], "colMeans", color = "darkgray", lty = 2, linewidth = 0.25) p + vline_at(x[, 3:4], function(a) apply(a, 2, mean), color = "orange", linewidth = 2, alpha = 0.1) # using the lbub function to get interval lower and upper bounds (lb, ub) color_scheme_set("pink") parsed <- ggplot2::label_parsed p2 <- mcmc_hist(x, pars = "beta[1]", binwidth = 1/20, facet_args = list(labeller = parsed)) (p2 <- p2 + facet_text(size = 16)) b1 <- x[, "beta[1]"] p2 + vline_at(b1, fun = lbub(0.8), color = "gray20", linewidth = 2 * c(1,.5,1), alpha = 0.75) p2 + vline_at(b1, lbub(0.8, med = FALSE), color = "gray20", linewidth = 2, alpha = 0.75) ########################## ### format axis titles ### ########################## color_scheme_set("green") y <- example_y_data() yrep <- example_yrep_draws() (p3 <- ppc_stat(y, yrep, stat = "median", binwidth = 1/4)) # turn off the legend, turn on x-axis title p3 + legend_none() + xaxis_title(size = 13, family = "sans") + ggplot2::xlab(expression(italic(T(y)) == median(italic(y)))) ################################ ### format axis & facet text ### ################################ color_scheme_set("gray") p4 <- mcmc_trace(example_mcmc_draws(), pars = c("alpha", "sigma")) myfacets <- facet_bg(fill = "gray30", color = NA) + facet_text(face = "bold", color = "skyblue", size = 14) p4 + myfacets ########################## ### control tick marks ### ########################## p4 + myfacets + yaxis_text(FALSE) + yaxis_ticks(FALSE) + xaxis_ticks(linewidth = 1, color = "skyblue") ############################## ### change plot background ### ############################## color_scheme_set("blue") # add grid lines ppc_stat(y, yrep) + grid_lines() # panel_bg vs plot_bg ppc_scatter_avg(y, yrep) + panel_bg(fill = "gray90") ppc_scatter_avg(y, yrep) + plot_bg(fill = "gray90") color_scheme_set("yellow") p5 <- ppc_scatter_avg(y, yrep, alpha = 1) p5 + panel_bg(fill = "gray20") + grid_lines(color = "white") color_scheme_set("purple") ppc_dens_overlay(y, yrep[1:30, ]) + legend_text(size = 14) + legend_move(c(0.75, 0.5)) + plot_bg(fill = "gray90") + panel_bg(color = "black", fill = "gray99", linewidth = 3) ############################################### ### superimpose a function on existing plot ### ############################################### # compare posterior of beta[1] to Gaussian with same posterior mean # and sd as beta[1] x <- example_mcmc_draws(chains = 4) dim(x) purple_gaussian <- overlay_function( fun = dnorm, args = list(mean(x[,, "beta[1]"]), sd(x[,, "beta[1]"])), color = "purple", linewidth = 2 ) color_scheme_set("gray") mcmc_hist(x, pars = "beta[1]", freq = FALSE) + purple_gaussian mcmc_dens(x, pars = "beta[1]") + purple_gaussian
Combination plots
mcmc_combo(x, combo = c("dens", "trace"), ..., widths = NULL, gg_theme = NULL)
mcmc_combo(x, combo = c("dens", "trace"), ..., widths = NULL, gg_theme = NULL)
x |
An object containing MCMC draws:
|
combo |
A character vector with at least two elements. Each element of
|
... |
Arguments passed to the plotting functions named in |
widths |
A numeric vector the same length as |
gg_theme |
Unlike most of the other bayesplot functions,
|
A gtable object (the result of calling
gridExtra::arrangeGrob()
) with length(combo)
columns and
a row for each parameter.
Other MCMC:
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
# some parameter draws to use for demonstration x <- example_mcmc_draws() dim(x) dimnames(x) mcmc_combo(x, pars = c("alpha", "sigma")) mcmc_combo(x, pars = c("alpha", "sigma"), widths = c(1, 2)) # change second plot, show log(sigma) instead of sigma, # and remove the legends color_scheme_set("mix-blue-red") mcmc_combo( x, combo = c("dens_overlay", "trace"), pars = c("alpha", "sigma"), transformations = list(sigma = "log"), gg_theme = legend_none() ) # same thing but this time also change the entire ggplot theme mcmc_combo( x, combo = c("dens_overlay", "trace"), pars = c("alpha", "sigma"), transformations = list(sigma = "log"), gg_theme = ggplot2::theme_gray() + legend_none() )
# some parameter draws to use for demonstration x <- example_mcmc_draws() dim(x) dimnames(x) mcmc_combo(x, pars = c("alpha", "sigma")) mcmc_combo(x, pars = c("alpha", "sigma"), widths = c(1, 2)) # change second plot, show log(sigma) instead of sigma, # and remove the legends color_scheme_set("mix-blue-red") mcmc_combo( x, combo = c("dens_overlay", "trace"), pars = c("alpha", "sigma"), transformations = list(sigma = "log"), gg_theme = legend_none() ) # same thing but this time also change the entire ggplot theme mcmc_combo( x, combo = c("dens_overlay", "trace"), pars = c("alpha", "sigma"), transformations = list(sigma = "log"), gg_theme = ggplot2::theme_gray() + legend_none() )
Plots of Rhat statistics, ratios of effective sample size to total sample size, and autocorrelation of MCMC draws. See the Plot Descriptions section, below, for details. For models fit using the No-U-Turn-Sampler, see also MCMC-nuts for additional MCMC diagnostic plots.
mcmc_rhat(rhat, ..., size = NULL) mcmc_rhat_hist(rhat, ..., binwidth = NULL, bins = NULL, breaks = NULL) mcmc_rhat_data(rhat, ...) mcmc_neff(ratio, ..., size = NULL) mcmc_neff_hist(ratio, ..., binwidth = NULL, bins = NULL, breaks = NULL) mcmc_neff_data(ratio, ...) mcmc_acf( x, pars = character(), regex_pars = character(), ..., facet_args = list(), lags = 20, size = NULL ) mcmc_acf_bar( x, pars = character(), regex_pars = character(), ..., facet_args = list(), lags = 20 )
mcmc_rhat(rhat, ..., size = NULL) mcmc_rhat_hist(rhat, ..., binwidth = NULL, bins = NULL, breaks = NULL) mcmc_rhat_data(rhat, ...) mcmc_neff(ratio, ..., size = NULL) mcmc_neff_hist(ratio, ..., binwidth = NULL, bins = NULL, breaks = NULL) mcmc_neff_data(ratio, ...) mcmc_acf( x, pars = character(), regex_pars = character(), ..., facet_args = list(), lags = 20, size = NULL ) mcmc_acf_bar( x, pars = character(), regex_pars = character(), ..., facet_args = list(), lags = 20 )
rhat |
A vector of R-hat estimates. |
... |
Currently ignored. |
size |
Optional values to override |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
ratio |
A vector of ratios of effective sample size estimates to
total sample size. See |
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
facet_args |
A named list of arguments (other than |
lags |
The number of lags to show in the autocorrelation plot. |
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
mcmc_rhat()
, mcmc_rhat_hist()
Rhat values as either points or a histogram. Values are colored using different shades (lighter is better). The chosen thresholds are somewhat arbitrary, but can be useful guidelines in practice.
light: below 1.05 (good)
mid: between 1.05 and 1.1 (ok)
dark: above 1.1 (too high)
mcmc_neff()
, mcmc_neff_hist()
Ratios of effective sample size to total sample size as either points or a histogram. Values are colored using different shades (lighter is better). The chosen thresholds are somewhat arbitrary, but can be useful guidelines in practice.
light: between 0.5 and 1 (high)
mid: between 0.1 and 0.5 (good)
dark: below 0.1 (low)
mcmc_acf()
, mcmc_acf_bar()
Grid of autocorrelation plots by chain and parameter. The lags
argument
gives the maximum number of lags at which to calculate the autocorrelation
function. mcmc_acf()
is a line plot whereas mcmc_acf_bar()
is a
barplot.
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org/users/documentation/
Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science. 7(4), 457–472.
The Visual MCMC Diagnostics vignette.
MCMC-nuts for additional MCMC diagnostic plots for models fit using the No-U-Turn-Sampler.
Other MCMC:
MCMC-combos
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
# autocorrelation x <- example_mcmc_draws() dim(x) dimnames(x) color_scheme_set("green") mcmc_acf(x, pars = c("alpha", "beta[1]")) color_scheme_set("pink") (p <- mcmc_acf_bar(x, pars = c("alpha", "beta[1]"))) # add horiztonal dashed line at 0.5 p + hline_at(0.5, linetype = 2, size = 0.15, color = "gray") # fake rhat values to use for demonstration rhat <- c(runif(100, 1, 1.15)) mcmc_rhat_hist(rhat) mcmc_rhat(rhat) # lollipops color_scheme_set("purple") mcmc_rhat(rhat[1:10], size = 5) color_scheme_set("blue") mcmc_rhat(runif(1000, 1, 1.07)) mcmc_rhat(runif(1000, 1, 1.3)) + legend_move("top") # add legend above plot # fake neff ratio values to use for demonstration ratio <- c(runif(100, 0, 1)) mcmc_neff_hist(ratio) mcmc_neff(ratio) ## Not run: # Example using rstanarm model (requires rstanarm package) library(rstanarm) # intentionally use small 'iter' so there are some # problems with rhat and neff for demonstration fit <- stan_glm(mpg ~ ., data = mtcars, iter = 50, refresh = 0) rhats <- rhat(fit) ratios <- neff_ratio(fit) mcmc_rhat(rhats) mcmc_neff(ratios, size = 3) # there's a small enough number of parameters in the # model that we can display their names on the y-axis mcmc_neff(ratios) + yaxis_text(hjust = 1) # can also look at autocorrelation draws <- as.array(fit) mcmc_acf(draws, pars = c("wt", "cyl"), lags = 10) # increase number of iterations and plots look much better fit2 <- update(fit, iter = 500) mcmc_rhat(rhat(fit2)) mcmc_neff(neff_ratio(fit2)) mcmc_acf(as.array(fit2), pars = c("wt", "cyl"), lags = 10) ## End(Not run)
# autocorrelation x <- example_mcmc_draws() dim(x) dimnames(x) color_scheme_set("green") mcmc_acf(x, pars = c("alpha", "beta[1]")) color_scheme_set("pink") (p <- mcmc_acf_bar(x, pars = c("alpha", "beta[1]"))) # add horiztonal dashed line at 0.5 p + hline_at(0.5, linetype = 2, size = 0.15, color = "gray") # fake rhat values to use for demonstration rhat <- c(runif(100, 1, 1.15)) mcmc_rhat_hist(rhat) mcmc_rhat(rhat) # lollipops color_scheme_set("purple") mcmc_rhat(rhat[1:10], size = 5) color_scheme_set("blue") mcmc_rhat(runif(1000, 1, 1.07)) mcmc_rhat(runif(1000, 1, 1.3)) + legend_move("top") # add legend above plot # fake neff ratio values to use for demonstration ratio <- c(runif(100, 0, 1)) mcmc_neff_hist(ratio) mcmc_neff(ratio) ## Not run: # Example using rstanarm model (requires rstanarm package) library(rstanarm) # intentionally use small 'iter' so there are some # problems with rhat and neff for demonstration fit <- stan_glm(mpg ~ ., data = mtcars, iter = 50, refresh = 0) rhats <- rhat(fit) ratios <- neff_ratio(fit) mcmc_rhat(rhats) mcmc_neff(ratios, size = 3) # there's a small enough number of parameters in the # model that we can display their names on the y-axis mcmc_neff(ratios) + yaxis_text(hjust = 1) # can also look at autocorrelation draws <- as.array(fit) mcmc_acf(draws, pars = c("wt", "cyl"), lags = 10) # increase number of iterations and plots look much better fit2 <- update(fit, iter = 500) mcmc_rhat(rhat(fit2)) mcmc_neff(neff_ratio(fit2)) mcmc_acf(as.array(fit2), pars = c("wt", "cyl"), lags = 10) ## End(Not run)
Various types of histograms and kernel density plots of MCMC draws. See the Plot Descriptions section, below, for details.
mcmc_hist( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE, alpha = 1 ) mcmc_dens( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), trim = FALSE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL, alpha = 1 ) mcmc_hist_by_chain( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE, alpha = 1 ) mcmc_dens_overlay( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), color_chains = TRUE, trim = FALSE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_dens_chains( x, pars = character(), regex_pars = character(), transformations = list(), ..., color_chains = TRUE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_dens_chains_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_violin( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), probs = c(0.1, 0.5, 0.9) )
mcmc_hist( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE, alpha = 1 ) mcmc_dens( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), trim = FALSE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL, alpha = 1 ) mcmc_hist_by_chain( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE, alpha = 1 ) mcmc_dens_overlay( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), color_chains = TRUE, trim = FALSE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_dens_chains( x, pars = character(), regex_pars = character(), transformations = list(), ..., color_chains = TRUE, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_dens_chains_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_violin( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), probs = c(0.1, 0.5, 0.9) )
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
facet_args |
A named list of arguments (other than |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
alpha |
Passed to the geom to control the transparency. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens
|
Optional arguments passed to
|
color_chains |
Option for whether to separately color chains. |
probs |
A numeric vector passed to |
A ggplot object that can be further customized using the ggplot2 package.
mcmc_hist()
Histograms of posterior draws with all chains merged.
mcmc_dens()
Kernel density plots of posterior draws with all chains merged.
mcmc_hist_by_chain()
Histograms of posterior draws with chains separated via faceting.
mcmc_dens_overlay()
Kernel density plots of posterior draws with chains separated but overlaid on a single plot.
mcmc_violin()
The density estimate of each chain is plotted as a violin with horizontal lines at notable quantiles.
mcmc_dens_chains()
Ridgeline kernel density plots of posterior draws with chains separated
but overlaid on a single plot. In mcmc_dens_overlay()
parameters
appear in separate facets; in mcmc_dens_chains()
they appear in the
same panel and can overlap vertically.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
set.seed(9262017) # some parameter draws to use for demonstration x <- example_mcmc_draws() dim(x) dimnames(x) ################## ### Histograms ### ################## # histograms of all parameters color_scheme_set("brightblue") mcmc_hist(x) # histograms of some parameters color_scheme_set("pink") mcmc_hist(x, pars = c("alpha", "beta[2]")) mcmc_hist(x, pars = "sigma", regex_pars = "beta") # example of using 'transformations' argument to plot log(sigma), # and parsing facet labels (e.g. to get greek letters for parameters) mcmc_hist(x, transformations = list(sigma = "log"), facet_args = list(labeller = ggplot2::label_parsed)) + facet_text(size = 15) # instead of list(sigma = "log"), you could specify the transformation as # list(sigma = log) or list(sigma = function(x) log(x)), but then the # label for the transformed sigma is 't(sigma)' instead of 'log(sigma)' mcmc_hist(x, transformations = list(sigma = log)) # separate histograms by chain color_scheme_set("pink") mcmc_hist_by_chain(x, regex_pars = "beta") ################# ### Densities ### ################# mcmc_dens(x, pars = c("sigma", "beta[2]"), facet_args = list(nrow = 2)) # separate and overlay chains color_scheme_set("mix-teal-pink") mcmc_dens_overlay(x, pars = c("sigma", "beta[2]"), facet_args = list(nrow = 2)) + facet_text(size = 14) x2 <- example_mcmc_draws(params = 6) mcmc_dens_chains(x2, pars = c("beta[1]", "beta[2]", "beta[3]")) # separate chains as violin plots color_scheme_set("green") mcmc_violin(x) + panel_bg(color = "gray20", size = 2, fill = "gray30")
set.seed(9262017) # some parameter draws to use for demonstration x <- example_mcmc_draws() dim(x) dimnames(x) ################## ### Histograms ### ################## # histograms of all parameters color_scheme_set("brightblue") mcmc_hist(x) # histograms of some parameters color_scheme_set("pink") mcmc_hist(x, pars = c("alpha", "beta[2]")) mcmc_hist(x, pars = "sigma", regex_pars = "beta") # example of using 'transformations' argument to plot log(sigma), # and parsing facet labels (e.g. to get greek letters for parameters) mcmc_hist(x, transformations = list(sigma = "log"), facet_args = list(labeller = ggplot2::label_parsed)) + facet_text(size = 15) # instead of list(sigma = "log"), you could specify the transformation as # list(sigma = log) or list(sigma = function(x) log(x)), but then the # label for the transformed sigma is 't(sigma)' instead of 'log(sigma)' mcmc_hist(x, transformations = list(sigma = log)) # separate histograms by chain color_scheme_set("pink") mcmc_hist_by_chain(x, regex_pars = "beta") ################# ### Densities ### ################# mcmc_dens(x, pars = c("sigma", "beta[2]"), facet_args = list(nrow = 2)) # separate and overlay chains color_scheme_set("mix-teal-pink") mcmc_dens_overlay(x, pars = c("sigma", "beta[2]"), facet_args = list(nrow = 2)) + facet_text(size = 14) x2 <- example_mcmc_draws(params = 6) mcmc_dens_chains(x2, pars = c("beta[1]", "beta[2]", "beta[3]")) # separate chains as violin plots color_scheme_set("green") mcmc_violin(x) + panel_bg(color = "gray20", size = 2, fill = "gray30")
Plot central (quantile-based) posterior interval estimates from MCMC draws. See the Plot Descriptions section, below, for details.
mcmc_intervals( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), outer_size = 0.5, inner_size = 2, point_size = 4, rhat = numeric() ) mcmc_areas( x, pars = character(), regex_pars = character(), transformations = list(), ..., area_method = c("equal area", "equal height", "scaled height"), prob = 0.5, prob_outer = 1, point_est = c("median", "mean", "none"), rhat = numeric(), border_size = NULL, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_areas_ridges( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob_outer = 1, prob = 1, border_size = NULL, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_intervals_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), rhat = numeric() ) mcmc_areas_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 1, point_est = c("median", "mean", "none"), rhat = numeric(), bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_areas_ridges_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob_outer = 1, prob = 1, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL )
mcmc_intervals( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), outer_size = 0.5, inner_size = 2, point_size = 4, rhat = numeric() ) mcmc_areas( x, pars = character(), regex_pars = character(), transformations = list(), ..., area_method = c("equal area", "equal height", "scaled height"), prob = 0.5, prob_outer = 1, point_est = c("median", "mean", "none"), rhat = numeric(), border_size = NULL, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_areas_ridges( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob_outer = 1, prob = 1, border_size = NULL, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_intervals_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), rhat = numeric() ) mcmc_areas_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob = 0.5, prob_outer = 1, point_est = c("median", "mean", "none"), rhat = numeric(), bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL ) mcmc_areas_ridges_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., prob_outer = 1, prob = 1, bw = NULL, adjust = NULL, kernel = NULL, n_dens = NULL )
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently unused. |
prob |
The probability mass to include in the inner interval (for
|
prob_outer |
The probability mass to include in the outer interval. The
default is |
point_est |
The point estimate to show. Either |
inner_size , outer_size
|
For |
point_size |
For |
rhat |
An optional numeric vector of R-hat estimates, with one element
per parameter included in |
area_method |
How to constrain the areas in |
border_size |
For |
bw , adjust , kernel , n_dens
|
Optional arguments passed to
|
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
mcmc_intervals()
Plots of uncertainty intervals computed from posterior draws with all chains merged.
mcmc_areas()
Density plots computed from posterior draws with all chains merged, with uncertainty intervals shown as shaded areas under the curves.
mcmc_areas_ridges()
Density plot, as in mcmc_areas()
, but drawn with overlapping
ridgelines. This plot provides a compact display of (hierarchically)
related distributions.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
set.seed(9262017) # load ggplot2 to use its functions to modify our plots library(ggplot2) # some parameter draws to use for demonstration x <- example_mcmc_draws(params = 6) dim(x) dimnames(x) color_scheme_set("brightblue") mcmc_intervals(x) mcmc_intervals(x, pars = c("beta[1]", "beta[2]")) mcmc_areas(x, regex_pars = "beta\\[[1-3]\\]", prob = 0.8) + labs( title = "Posterior distributions", subtitle = "with medians and 80% intervals" ) color_scheme_set("red") p <- mcmc_areas( x, pars = c("alpha", "beta[4]"), prob = 2/3, prob_outer = 0.9, point_est = "mean", border_size = 1.5 # make the ridgelines fatter ) plot(p) # control spacing at top and bottom of plot # see ?ggplot2::expansion p + scale_y_discrete( limits = c("beta[4]", "alpha"), expand = expansion(add = c(1, 2)) ) p + scale_y_discrete( limits = c("beta[4]", "alpha"), expand = expansion(add = c(.1, .3)) ) # relabel parameters p + scale_y_discrete( labels = c("alpha" = "param label 1", "beta[4]" = "param label 2") ) # relabel parameters and define the order p + scale_y_discrete( labels = c("alpha" = "param label 1", "beta[4]" = "param label 2"), limits = c("beta[4]", "alpha") ) # color by rhat value color_scheme_set("blue") fake_rhat_values <- c(1, 1.07, 1.3, 1.01, 1.15, 1.005) mcmc_intervals(x, rhat = fake_rhat_values) # get the dataframe that is used in the plotting functions mcmc_intervals_data(x) mcmc_intervals_data(x, rhat = fake_rhat_values) mcmc_areas_data(x, pars = "alpha") color_scheme_set("gray") p <- mcmc_areas(x, pars = c("alpha", "beta[4]"), rhat = c(1, 1.1)) p + legend_move("bottom") p + legend_move("none") # or p + legend_none() # Different area calculations b3 <- c("beta[1]", "beta[2]", "beta[3]") mcmc_areas(x, pars = b3, area_method = "equal area") + labs( title = "Curves have same area", subtitle = "A wide, uncertain interval is spread thin when areas are equal" ) mcmc_areas(x, pars = b3, area_method = "equal height") + labs( title = "Curves have same maximum height", subtitle = "Local curvature is clearer but more uncertain curves use more area" ) mcmc_areas(x, pars = b3, area_method = "scaled height") + labs( title = "Same maximum heights but heights scaled by square-root", subtitle = "Compromise: Local curvature is accentuated and less area is used" ) # apply transformations mcmc_intervals( x, pars = c("beta[2]", "sigma"), transformations = list("sigma" = "log", "beta[2]" = function(x) x + 3) ) # apply same transformation to all selected parameters mcmc_intervals(x, regex_pars = "beta", transformations = "exp") ## Not run: # example using fitted model from rstanarm package library(rstanarm) fit <- stan_glm( mpg ~ 0 + wt + factor(cyl), data = mtcars, iter = 500, refresh = 0 ) x <- as.matrix(fit) color_scheme_set("teal") mcmc_intervals(x, point_est = "mean", prob = 0.8, prob_outer = 0.95) mcmc_areas(x, regex_pars = "cyl", bw = "SJ", rhat = rhat(fit, regex_pars = "cyl")) ## End(Not run) ## Not run: # Example of hierarchically related parameters # plotted with ridgelines m <- shinystan::eight_schools@posterior_sample mcmc_areas_ridges(m, pars = "mu", regex_pars = "theta", border_size = 0.75) + ggtitle("Treatment effect on eight schools (Rubin, 1981)") ## End(Not run)
set.seed(9262017) # load ggplot2 to use its functions to modify our plots library(ggplot2) # some parameter draws to use for demonstration x <- example_mcmc_draws(params = 6) dim(x) dimnames(x) color_scheme_set("brightblue") mcmc_intervals(x) mcmc_intervals(x, pars = c("beta[1]", "beta[2]")) mcmc_areas(x, regex_pars = "beta\\[[1-3]\\]", prob = 0.8) + labs( title = "Posterior distributions", subtitle = "with medians and 80% intervals" ) color_scheme_set("red") p <- mcmc_areas( x, pars = c("alpha", "beta[4]"), prob = 2/3, prob_outer = 0.9, point_est = "mean", border_size = 1.5 # make the ridgelines fatter ) plot(p) # control spacing at top and bottom of plot # see ?ggplot2::expansion p + scale_y_discrete( limits = c("beta[4]", "alpha"), expand = expansion(add = c(1, 2)) ) p + scale_y_discrete( limits = c("beta[4]", "alpha"), expand = expansion(add = c(.1, .3)) ) # relabel parameters p + scale_y_discrete( labels = c("alpha" = "param label 1", "beta[4]" = "param label 2") ) # relabel parameters and define the order p + scale_y_discrete( labels = c("alpha" = "param label 1", "beta[4]" = "param label 2"), limits = c("beta[4]", "alpha") ) # color by rhat value color_scheme_set("blue") fake_rhat_values <- c(1, 1.07, 1.3, 1.01, 1.15, 1.005) mcmc_intervals(x, rhat = fake_rhat_values) # get the dataframe that is used in the plotting functions mcmc_intervals_data(x) mcmc_intervals_data(x, rhat = fake_rhat_values) mcmc_areas_data(x, pars = "alpha") color_scheme_set("gray") p <- mcmc_areas(x, pars = c("alpha", "beta[4]"), rhat = c(1, 1.1)) p + legend_move("bottom") p + legend_move("none") # or p + legend_none() # Different area calculations b3 <- c("beta[1]", "beta[2]", "beta[3]") mcmc_areas(x, pars = b3, area_method = "equal area") + labs( title = "Curves have same area", subtitle = "A wide, uncertain interval is spread thin when areas are equal" ) mcmc_areas(x, pars = b3, area_method = "equal height") + labs( title = "Curves have same maximum height", subtitle = "Local curvature is clearer but more uncertain curves use more area" ) mcmc_areas(x, pars = b3, area_method = "scaled height") + labs( title = "Same maximum heights but heights scaled by square-root", subtitle = "Compromise: Local curvature is accentuated and less area is used" ) # apply transformations mcmc_intervals( x, pars = c("beta[2]", "sigma"), transformations = list("sigma" = "log", "beta[2]" = function(x) x + 3) ) # apply same transformation to all selected parameters mcmc_intervals(x, regex_pars = "beta", transformations = "exp") ## Not run: # example using fitted model from rstanarm package library(rstanarm) fit <- stan_glm( mpg ~ 0 + wt + factor(cyl), data = mtcars, iter = 500, refresh = 0 ) x <- as.matrix(fit) color_scheme_set("teal") mcmc_intervals(x, point_est = "mean", prob = 0.8, prob_outer = 0.95) mcmc_areas(x, regex_pars = "cyl", bw = "SJ", rhat = rhat(fit, regex_pars = "cyl")) ## End(Not run) ## Not run: # Example of hierarchically related parameters # plotted with ridgelines m <- shinystan::eight_schools@posterior_sample mcmc_areas_ridges(m, pars = "mu", regex_pars = "theta", border_size = 0.75) + ggtitle("Treatment effect on eight schools (Rubin, 1981)") ## End(Not run)
Diagnostic plots for the No-U-Turn-Sampler (NUTS), the default MCMC algorithm used by Stan. See the Plot Descriptions section, below.
mcmc_nuts_acceptance( x, lp, chain = NULL, ..., binwidth = NULL, bins = NULL, breaks = NULL ) mcmc_nuts_divergence(x, lp, chain = NULL, ...) mcmc_nuts_stepsize(x, lp, chain = NULL, ...) mcmc_nuts_treedepth(x, lp, chain = NULL, ...) mcmc_nuts_energy( x, ..., binwidth = NULL, bins = NULL, breaks = NULL, alpha = 0.5, merge_chains = FALSE )
mcmc_nuts_acceptance( x, lp, chain = NULL, ..., binwidth = NULL, bins = NULL, breaks = NULL ) mcmc_nuts_divergence(x, lp, chain = NULL, ...) mcmc_nuts_stepsize(x, lp, chain = NULL, ...) mcmc_nuts_treedepth(x, lp, chain = NULL, ...) mcmc_nuts_energy( x, ..., binwidth = NULL, bins = NULL, breaks = NULL, alpha = 0.5, merge_chains = FALSE )
x |
A molten data frame of NUTS sampler parameters, either created by
|
lp |
A molten data frame of draws of the log-posterior or, more
commonly, of a quantity equal to the log-posterior up to a constant.
|
chain |
A positive integer for selecting a particular chain. The default
( |
... |
Currently ignored. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
alpha |
For |
merge_chains |
For |
A gtable object (the result of calling
gridExtra::arrangeGrob()
) created from several ggplot objects,
except for mcmc_nuts_energy()
, which returns a ggplot object.
For more details see Stan Development Team (2016) and Betancourt (2017).
accept_stat__
: the average acceptance probabilities of all
possible samples in the proposed tree.
divergent__
: the number of leapfrog transitions with diverging
error. Because NUTS terminates at the first divergence this will be either
0 or 1 for each iteration.
stepsize__
: the step size used by NUTS in its Hamiltonian
simulation.
treedepth__
: the depth of tree used by NUTS, which is the log
(base 2) of the number of leapfrog steps taken during the Hamiltonian
simulation.
energy__
: the value of the Hamiltonian (up to an additive
constant) at each iteration.
mcmc_nuts_acceptance()
Three plots:
Histogram of accept_stat__
with vertical lines indicating the
mean (solid line) and median (dashed line).
Histogram of lp__
with vertical
lines indicating the mean (solid line) and median (dashed line).
Scatterplot of accept_stat__
vs lp__
.
mcmc_nuts_divergence()
Two plots:
Violin plots of lp__|divergent__=1
and lp__|divergent__=0
.
Violin plots of accept_stat__|divergent__=1
and
accept_stat__|divergent__=0
.
mcmc_nuts_stepsize()
Two plots:
Violin plots of lp__
by chain ordered by stepsize__
value.
Violin plots of accept_stat__
by chain ordered by stepsize__
value.
mcmc_nuts_treedepth()
Three plots:
Violin plots of lp__
by value of treedepth__
.
Violin plots of accept_stat__
by value of treedepth__
.
Histogram of treedepth__
.
mcmc_nuts_energy()
Overlaid histograms showing energy__
vs the change in
energy__
. See Betancourt (2016) for details.
Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434
Betancourt, M. and Girolami, M. (2013). Hamiltonian Monte Carlo for hierarchical models. https://arxiv.org/abs/1312.0906
Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research. 15:1593–1623.
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org/users/documentation/
The Visual MCMC Diagnostics vignette.
Several other plotting functions are not NUTS-specific but take optional extra arguments if the model was fit using NUTS:
mcmc_trace()
: show divergences as tick marks below the
trace plot.
mcmc_parcoord()
: change the color/size/transparency of lines
corresponding to divergences.
mcmc_scatter()
: change the color/size/shape of points
corresponding to divergences.
mcmc_pairs()
: change the color/size/shape of points
corresponding divergences and/or max treedepth saturation.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
## Not run: library(ggplot2) library(rstanarm) fit <- stan_glm(mpg ~ wt + am, data = mtcars, iter = 1000, refresh = 0) np <- nuts_params(fit) lp <- log_posterior(fit) color_scheme_set("brightblue") mcmc_nuts_acceptance(np, lp) mcmc_nuts_acceptance(np, lp, chain = 2) mcmc_nuts_divergence(np, lp) mcmc_nuts_stepsize(np, lp) mcmc_nuts_treedepth(np, lp) color_scheme_set("red") mcmc_nuts_energy(np) mcmc_nuts_energy(np, merge_chains = TRUE, binwidth = .15) mcmc_nuts_energy(np) + facet_wrap(vars(Chain), nrow = 1) + coord_fixed(ratio = 150) + ggtitle("NUTS Energy Diagnostic") ## End(Not run)
## Not run: library(ggplot2) library(rstanarm) fit <- stan_glm(mpg ~ wt + am, data = mtcars, iter = 1000, refresh = 0) np <- nuts_params(fit) lp <- log_posterior(fit) color_scheme_set("brightblue") mcmc_nuts_acceptance(np, lp) mcmc_nuts_acceptance(np, lp, chain = 2) mcmc_nuts_divergence(np, lp) mcmc_nuts_stepsize(np, lp) mcmc_nuts_treedepth(np, lp) color_scheme_set("red") mcmc_nuts_energy(np) mcmc_nuts_energy(np, merge_chains = TRUE, binwidth = .15) mcmc_nuts_energy(np) + facet_wrap(vars(Chain), nrow = 1) + coord_fixed(ratio = 150) + ggtitle("NUTS Energy Diagnostic") ## End(Not run)
The bayesplot MCMC module provides various plotting functions for creating graphical displays of Markov chain Monte Carlo (MCMC) simulations. The MCMC plotting functions section, below, provides links to the documentation for various categories of MCMC plots. Currently the MCMC plotting functions accept posterior draws provided in one of the following formats:
3-D array: An array with dimensions Iteration, Chain, Parameter
in
that order.
list: A list of matrices, where each matrix corresponds to a Markov chain. All of the matrices should have the same number of iterations (rows) and parameters (columns), and parameters should have the same names and be in the same order.
matrix (2-D array): A matrix with one column per parameter. If using matrix there should only be a single Markov chain or all chains should already be merged (stacked).
data frame: There are two types of data frames allowed. Either a data
frame with one column per parameter (if only a single chain or all chains
have already been merged), or a data frame with one column per parameter plus
an additional column "Chain"
that contains the chain number (an integer)
corresponding to each row in the data frame.
draws: Any of the draws
formats supported by the
posterior package.
Note: typically the user should not include warmup iterations in the object passed to bayesplot plotting functions, although for certain plots (e.g. trace plots) it can occasionally be useful to include the warmup iterations for diagnostic purposes.
Posterior distributions: Histograms and kernel density plots of parameter draws, optionally showing each Markov chain separately.
Uncertainty intervals: Uncertainty intervals computed from parameter draws.
Trace plots: Times series of parameter draws, optionally including HMC/NUTS diagnostic information.
Scatterplots: Scatterplots, heatmaps, and pairs plots of parameter draws, optionally including HMC/NUTS diagnostic information.
Parallel coordinates plots: Parallel coordinates plot of MCMC draws (one dimension per parameter), optionally including HMC/NUTS diagnostic information.
Combos: Combination plots (e.g. trace plot + histogram).
General MCMC diagnostics: MCMC diagnostic plots including R-hat, effective sample size, autocorrelation. NUTS diagnostics: Special diagnostic plots for the No-U-Turn Sampler.
Comparisons to "true" values: Plots comparing MCMC estimates to "true" parameter values (e.g., values used to simulate data).
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Parallel coordinates plot of MCMC draws (one dimension per parameter). See the Plot Descriptions section below for details, and see Gabry et al. (2019) for more background and a real example.
mcmc_parcoord( x, pars = character(), regex_pars = character(), transformations = list(), ..., size = 0.2, alpha = 0.3, np = NULL, np_style = parcoord_style_np() ) mcmc_parcoord_data( x, pars = character(), regex_pars = character(), transformations = list(), np = NULL ) parcoord_style_np(div_color = "red", div_size = 0.2, div_alpha = 0.2)
mcmc_parcoord( x, pars = character(), regex_pars = character(), transformations = list(), ..., size = 0.2, alpha = 0.3, np = NULL, np_style = parcoord_style_np() ) mcmc_parcoord_data( x, pars = character(), regex_pars = character(), transformations = list(), np = NULL ) parcoord_style_np(div_color = "red", div_size = 0.2, div_alpha = 0.2)
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
size , alpha
|
Arguments passed on to |
np |
For models fit using NUTS (more generally,
any symplectic integrator),
an optional data frame providing NUTS diagnostic information. The data
frame should be the object returned by |
np_style |
A call to the |
div_color , div_size , div_alpha
|
Optional arguments to the
|
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
mcmc_parcoord()
Parallel coordinates plot of MCMC draws. There is one dimension per parameter along the horizontal axis and each set of connected line segments represents a single MCMC draw (i.e., a vector of length equal to the number of parameters).
The parallel coordinates plot is most useful if the optional HMC/NUTS
diagnostic information is provided via the np
argument. In that
case divergences are highlighted in the plot. The appearance of the
divergences can be customized using the np_style
argument and the
parcoord_style_np
helper function. This version of the plot is the
same as the parallel coordinates plot described in Gabry et al. (2019).
When the plotted model parameters are on very different scales the
transformations
argument can be useful. For example, to standardize
all variables before plotting you could use function (x - mean(x))/sd(x)
when specifying the transformations
argument to
mcmc_parcoord
. See the Examples section for how to do this.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Hartikainen, A. (2017, Aug 23). Concentration of divergences (Msg 21). Message posted to The Stan Forums: https://discourse.mc-stan.org/t/concentration-of-divergences/1590/21.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
color_scheme_set("pink") x <- example_mcmc_draws(params = 5) mcmc_parcoord(x) mcmc_parcoord(x, regex_pars = "beta") ## Not run: # Example using a Stan demo model library(rstan) fit <- stan_demo("eight_schools") draws <- as.array(fit, pars = c("mu", "tau", "theta", "lp__")) np <- nuts_params(fit) str(np) levels(np$Parameter) color_scheme_set("brightblue") mcmc_parcoord(draws, alpha = 0.05) mcmc_parcoord(draws, np = np) # customize appearance of divergences color_scheme_set("darkgray") div_style <- parcoord_style_np(div_color = "green", div_size = 0.05, div_alpha = 0.4) mcmc_parcoord(draws, size = 0.25, alpha = 0.1, np = np, np_style = div_style) # to use a transformation (e.g., standardizing all the variables can be helpful) # specify the 'transformations' argument (though partial argument name # matching means we can just use 'trans' or 'transform') mcmc_parcoord( draws, transform = function(x) {(x - mean(x)) / sd(x)}, size = 0.25, alpha = 0.1, np = np, np_style = div_style ) # mcmc_parcoord_data returns just the data in a conventient form for plotting d <- mcmc_parcoord_data(x, np = np) head(d) tail(d) ## End(Not run)
color_scheme_set("pink") x <- example_mcmc_draws(params = 5) mcmc_parcoord(x) mcmc_parcoord(x, regex_pars = "beta") ## Not run: # Example using a Stan demo model library(rstan) fit <- stan_demo("eight_schools") draws <- as.array(fit, pars = c("mu", "tau", "theta", "lp__")) np <- nuts_params(fit) str(np) levels(np$Parameter) color_scheme_set("brightblue") mcmc_parcoord(draws, alpha = 0.05) mcmc_parcoord(draws, np = np) # customize appearance of divergences color_scheme_set("darkgray") div_style <- parcoord_style_np(div_color = "green", div_size = 0.05, div_alpha = 0.4) mcmc_parcoord(draws, size = 0.25, alpha = 0.1, np = np, np_style = div_style) # to use a transformation (e.g., standardizing all the variables can be helpful) # specify the 'transformations' argument (though partial argument name # matching means we can just use 'trans' or 'transform') mcmc_parcoord( draws, transform = function(x) {(x - mean(x)) / sd(x)}, size = 0.25, alpha = 0.1, np = np, np_style = div_style ) # mcmc_parcoord_data returns just the data in a conventient form for plotting d <- mcmc_parcoord_data(x, np = np) head(d) tail(d) ## End(Not run)
Plots comparing MCMC estimates to "true" parameter values. Before fitting a model to real data it is useful to simulate data according to the model using known (fixed) parameter values and to check that these "true" parameter values are (approximately) recovered by fitting the model to the simulated data. See the Plot Descriptions section, below, for details on the available plots.
mcmc_recover_intervals( x, true, batch = rep(1, length(true)), ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), size = 4, alpha = 1 ) mcmc_recover_scatter( x, true, batch = rep(1, length(true)), ..., facet_args = list(), point_est = c("median", "mean"), size = 3, alpha = 1 ) mcmc_recover_hist( x, true, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL )
mcmc_recover_intervals( x, true, batch = rep(1, length(true)), ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, point_est = c("median", "mean", "none"), size = 4, alpha = 1 ) mcmc_recover_scatter( x, true, batch = rep(1, length(true)), ..., facet_args = list(), point_est = c("median", "mean"), size = 3, alpha = 1 ) mcmc_recover_hist( x, true, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL )
x |
An object containing MCMC draws:
|
true |
A numeric vector of "true" values of the parameters in |
batch |
Optionally, a vector-like object (numeric, character, integer,
factor) used to split the parameters into batches. If |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
prob |
The probability mass to include in the inner interval. The
default is |
prob_outer |
The probability mass to include in the outer interval. The
default is |
point_est |
The point estimate to show. Either |
size , alpha
|
Passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
A ggplot object that can be further customized using the ggplot2 package.
mcmc_recover_intervals()
Central intervals and point estimates computed from MCMC draws, with "true" values plotted using a different shape.
mcmc_recover_scatter()
Scatterplot of posterior means (or medians) against "true" values.
mcmc_recover_hist()
Histograms of the draws for each parameter with the "true" value overlaid as a vertical line.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-scatterplots
,
MCMC-traces
## Not run: library(rstanarm) alpha <- 1; beta <- rnorm(10, 0, 3); sigma <- 2 X <- matrix(rnorm(1000), 100, 10) y <- rnorm(100, mean = c(alpha + X %*% beta), sd = sigma) fit <- stan_glm(y ~ ., data = data.frame(y, X), refresh = 0) draws <- as.matrix(fit) print(colnames(draws)) true <- c(alpha, beta, sigma) mcmc_recover_intervals(draws, true) # put the coefficients on X into the same batch mcmc_recover_intervals(draws, true, batch = c(1, rep(2, 10), 1)) # equivalent mcmc_recover_intervals(draws, true, batch = grepl("X", colnames(draws))) # same but facets stacked vertically mcmc_recover_intervals(draws, true, batch = grepl("X", colnames(draws)), facet_args = list(ncol = 1), size = 3) # each parameter in its own facet mcmc_recover_intervals(draws, true, batch = 1:ncol(draws)) # same but in a different order mcmc_recover_intervals(draws, true, batch = c(1, 3, 4, 2, 5:12)) # present as bias by centering with true values mcmc_recover_intervals(sweep(draws, 2, true), rep(0, ncol(draws))) + hline_0() # scatterplot of posterior means vs true values mcmc_recover_scatter(draws, true, point_est = "mean") # histograms of parameter draws with true value added as vertical line color_scheme_set("brightblue") mcmc_recover_hist(draws[, 1:4], true[1:4]) ## End(Not run)
## Not run: library(rstanarm) alpha <- 1; beta <- rnorm(10, 0, 3); sigma <- 2 X <- matrix(rnorm(1000), 100, 10) y <- rnorm(100, mean = c(alpha + X %*% beta), sd = sigma) fit <- stan_glm(y ~ ., data = data.frame(y, X), refresh = 0) draws <- as.matrix(fit) print(colnames(draws)) true <- c(alpha, beta, sigma) mcmc_recover_intervals(draws, true) # put the coefficients on X into the same batch mcmc_recover_intervals(draws, true, batch = c(1, rep(2, 10), 1)) # equivalent mcmc_recover_intervals(draws, true, batch = grepl("X", colnames(draws))) # same but facets stacked vertically mcmc_recover_intervals(draws, true, batch = grepl("X", colnames(draws)), facet_args = list(ncol = 1), size = 3) # each parameter in its own facet mcmc_recover_intervals(draws, true, batch = 1:ncol(draws)) # same but in a different order mcmc_recover_intervals(draws, true, batch = c(1, 3, 4, 2, 5:12)) # present as bias by centering with true values mcmc_recover_intervals(sweep(draws, 2, true), rep(0, ncol(draws))) + hline_0() # scatterplot of posterior means vs true values mcmc_recover_scatter(draws, true, point_est = "mean") # histograms of parameter draws with true value added as vertical line color_scheme_set("brightblue") mcmc_recover_hist(draws[, 1:4], true[1:4]) ## End(Not run)
Scatterplots, hexagonal heatmaps, and pairs plots from MCMC draws. See the Plot Descriptions section, below, for details.
mcmc_scatter( x, pars = character(), regex_pars = character(), transformations = list(), ..., size = 2.5, alpha = 0.8, np = NULL, np_style = scatter_style_np() ) mcmc_hex( x, pars = character(), regex_pars = character(), transformations = list(), ..., bins = 30, binwidth = NULL ) mcmc_pairs( x, pars = character(), regex_pars = character(), transformations = list(), ..., diag_fun = c("hist", "dens"), off_diag_fun = c("scatter", "hex"), diag_args = list(), off_diag_args = list(), condition = pairs_condition(), lp = NULL, np = NULL, np_style = pairs_style_np(), max_treedepth = NULL, grid_args = list(), save_gg_objects = TRUE ) scatter_style_np( div_color = "red", div_shape = 16, div_size = 2.5, div_alpha = 1 ) pairs_style_np( div_color = "red", div_shape = 4, div_size = 1, div_alpha = 1, td_color = "yellow2", td_shape = 3, td_size = 1, td_alpha = 1 ) pairs_condition(chains = NULL, draws = NULL, nuts = NULL)
mcmc_scatter( x, pars = character(), regex_pars = character(), transformations = list(), ..., size = 2.5, alpha = 0.8, np = NULL, np_style = scatter_style_np() ) mcmc_hex( x, pars = character(), regex_pars = character(), transformations = list(), ..., bins = 30, binwidth = NULL ) mcmc_pairs( x, pars = character(), regex_pars = character(), transformations = list(), ..., diag_fun = c("hist", "dens"), off_diag_fun = c("scatter", "hex"), diag_args = list(), off_diag_args = list(), condition = pairs_condition(), lp = NULL, np = NULL, np_style = pairs_style_np(), max_treedepth = NULL, grid_args = list(), save_gg_objects = TRUE ) scatter_style_np( div_color = "red", div_shape = 16, div_size = 2.5, div_alpha = 1 ) pairs_style_np( div_color = "red", div_shape = 4, div_size = 1, div_alpha = 1, td_color = "yellow2", td_shape = 3, td_size = 1, td_alpha = 1 ) pairs_condition(chains = NULL, draws = NULL, nuts = NULL)
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
size , alpha
|
For |
np |
Optionally, a data frame of NUTS sampler parameters, either created
by |
np_style |
If |
bins , binwidth
|
For |
diag_fun , off_diag_fun
|
For |
diag_args , off_diag_args
|
For |
condition |
For |
lp |
For |
max_treedepth |
For |
grid_args , save_gg_objects
|
For |
div_color , div_shape , div_size , div_alpha , td_color , td_shape , td_size , td_alpha
|
Optional arguments to the |
chains , draws , nuts
|
Optional arguments to the
|
mcmc_scatter()
and mcmc_hex()
return a ggplot object that
can be further customized using the ggplot2 package.
mcmc_pairs()
returns many ggplot objects organized into a grid via
bayesplot_grid()
.
mcmc_scatter()
Bivariate scatterplot of posterior draws. If using a very large number of
posterior draws then mcmc_hex()
may be preferable to avoid
overplotting. For models fit using NUTS the np
,
and np_style
arguments can be used to add additional information in
the plot (in this case the approximate location of divergences).
For more on why the scatter plot with divergences is a useful
diagnostic tool see Gabry et al. (2019).
mcmc_hex()
Hexagonal heatmap of 2-D bin counts. This plot is useful in cases where
the posterior sample size is large enough that mcmc_scatter()
suffers
from overplotting.
mcmc_pairs()
A square plot matrix with univariate marginal distributions along the diagonal (as histograms or kernel density plots) and bivariate distributions off the diagonal (as scatterplots or hex heatmaps).
For the off-diagonal plots, the default is to split the chains so that
(roughly) half are displayed above the diagonal and half are below (all
chains are always merged together for the plots along the diagonal). Other
possibilities are available by setting the condition
argument.
Additionally, extra diagnostic information for models fit using
NUTS can be added to the pairs plot using the lp
,
np
, and np_style
arguments. If np
is specified (and
condition
is not "divergent__"
), then points (red, by
default) will be superimposed onto the off-diagonal plots indicating which
(if any) iterations encountered a divergent transition. Also, if both
np
and max_treedepth
are specified then points (yellow, by
default) will be superimposed to indicate a transition that hit the
maximum treedepth rather than terminated its evolution normally. The
np_style
argument can be used with the pairs_style_np()
convenience function to change the appearance of these overlaid points.
See the Examples section.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-traces
library("ggplot2") # some parameter draws to use for demonstration x <- example_mcmc_draws(params = 6) dimnames(x) # scatterplot of alpha vs log(sigma) color_scheme_set("teal") (p <- mcmc_scatter(x, pars = c("alpha", "sigma"), transform = list(sigma = "log"))) p + labs( title = "Insert your own headline-grabbing title", subtitle = "with a provocative subtitle", caption = "and a controversial caption", x = expression(alpha), y = expression(log(sigma)) ) # add ellipse p + stat_ellipse(level = 0.9, color = "gray20", size = 1) # add contour color_scheme_set("red") p2 <- mcmc_scatter(x, pars = c("alpha", "sigma"), size = 3.5, alpha = 0.25) p2 + stat_density_2d(color = "black", size = .5) # can also add lines/smooths color_scheme_set("pink") (p3 <- mcmc_scatter(x, pars = c("alpha", "beta[3]"), alpha = 0.25, size = 3)) p3 + geom_smooth(method = "lm", se = FALSE, color = "gray20", size = .75, linetype = 2) if (requireNamespace("hexbin", quietly = TRUE)) { # hexagonal heatmap color_scheme_set("brightblue") (p <- mcmc_hex(x, pars = c("sigma", "alpha"), transform = list(sigma = "log"))) p + plot_bg(fill = "gray95") p + plot_bg(fill = "gray95") + panel_bg(fill = "gray70") } color_scheme_set("purple") # pairs plots # default of condition=NULL implies splitting chains between upper and lower panels mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", off_diag_args = list(size = 1, alpha = 0.5)) # change to density plots instead of histograms and hex plots instead of # scatterplots mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", diag_fun = "dens", off_diag_fun = "hex") # plot chain 1 above diagonal and chains 2, 3, and 4 below color_scheme_set("brightblue") mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", diag_fun = "dens", off_diag_fun = "hex", condition = pairs_condition(chains = list(1, 2:4))) ## Not run: ### Adding NUTS diagnostics to scatterplots and pairs plots # examples using rstanarm package library(rstanarm) # for demonstration purposes, intentionally fit a model that # will (almost certainly) have some divergences fit <- stan_glm( mpg ~ ., data = mtcars, iter = 1000, refresh = 0, # this combo of prior and adapt_delta should lead to some divergences prior = hs(), adapt_delta = 0.9 ) posterior <- as.array(fit) np <- nuts_params(fit) # mcmc_scatter with divergences highlighted color_scheme_set("brightblue") mcmc_scatter(posterior, pars = c("wt", "sigma"), np = np) color_scheme_set("darkgray") div_style <- scatter_style_np(div_color = "green", div_shape = 4, div_size = 4) mcmc_scatter(posterior, pars = c("sigma", "(Intercept)"), np = np, np_style = div_style) # split the draws according to above/below median accept_stat__ # and show approximate location of divergences (red points) color_scheme_set("brightblue") mcmc_pairs( posterior, pars = c("wt", "cyl", "sigma"), off_diag_args = list(size = 1, alpha = 1/3), condition = pairs_condition(nuts = "accept_stat__"), np = np ) # more customizations: # - transform sigma to log(sigma) # - median log-posterior as 'condition' # - hex instead of scatter for off-diagonal plots # - show points where max treedepth hit in blue color_scheme_set("darkgray") mcmc_pairs( posterior, pars = c("wt", "cyl", "sigma"), transform = list(sigma = "log"), off_diag_fun = "hex", condition = pairs_condition(nuts = "lp__"), lp = log_posterior(fit), np = np, np_style = pairs_style_np(div_color = "firebrick", td_color = "blue", td_size = 2), # for demonstration purposes, set max_treedepth to a value that will # result in at least a few max treedepth warnings max_treedepth = with(np, -1 + max(Value[Parameter == "treedepth__"])) ) ## End(Not run)
library("ggplot2") # some parameter draws to use for demonstration x <- example_mcmc_draws(params = 6) dimnames(x) # scatterplot of alpha vs log(sigma) color_scheme_set("teal") (p <- mcmc_scatter(x, pars = c("alpha", "sigma"), transform = list(sigma = "log"))) p + labs( title = "Insert your own headline-grabbing title", subtitle = "with a provocative subtitle", caption = "and a controversial caption", x = expression(alpha), y = expression(log(sigma)) ) # add ellipse p + stat_ellipse(level = 0.9, color = "gray20", size = 1) # add contour color_scheme_set("red") p2 <- mcmc_scatter(x, pars = c("alpha", "sigma"), size = 3.5, alpha = 0.25) p2 + stat_density_2d(color = "black", size = .5) # can also add lines/smooths color_scheme_set("pink") (p3 <- mcmc_scatter(x, pars = c("alpha", "beta[3]"), alpha = 0.25, size = 3)) p3 + geom_smooth(method = "lm", se = FALSE, color = "gray20", size = .75, linetype = 2) if (requireNamespace("hexbin", quietly = TRUE)) { # hexagonal heatmap color_scheme_set("brightblue") (p <- mcmc_hex(x, pars = c("sigma", "alpha"), transform = list(sigma = "log"))) p + plot_bg(fill = "gray95") p + plot_bg(fill = "gray95") + panel_bg(fill = "gray70") } color_scheme_set("purple") # pairs plots # default of condition=NULL implies splitting chains between upper and lower panels mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", off_diag_args = list(size = 1, alpha = 0.5)) # change to density plots instead of histograms and hex plots instead of # scatterplots mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", diag_fun = "dens", off_diag_fun = "hex") # plot chain 1 above diagonal and chains 2, 3, and 4 below color_scheme_set("brightblue") mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]", diag_fun = "dens", off_diag_fun = "hex", condition = pairs_condition(chains = list(1, 2:4))) ## Not run: ### Adding NUTS diagnostics to scatterplots and pairs plots # examples using rstanarm package library(rstanarm) # for demonstration purposes, intentionally fit a model that # will (almost certainly) have some divergences fit <- stan_glm( mpg ~ ., data = mtcars, iter = 1000, refresh = 0, # this combo of prior and adapt_delta should lead to some divergences prior = hs(), adapt_delta = 0.9 ) posterior <- as.array(fit) np <- nuts_params(fit) # mcmc_scatter with divergences highlighted color_scheme_set("brightblue") mcmc_scatter(posterior, pars = c("wt", "sigma"), np = np) color_scheme_set("darkgray") div_style <- scatter_style_np(div_color = "green", div_shape = 4, div_size = 4) mcmc_scatter(posterior, pars = c("sigma", "(Intercept)"), np = np, np_style = div_style) # split the draws according to above/below median accept_stat__ # and show approximate location of divergences (red points) color_scheme_set("brightblue") mcmc_pairs( posterior, pars = c("wt", "cyl", "sigma"), off_diag_args = list(size = 1, alpha = 1/3), condition = pairs_condition(nuts = "accept_stat__"), np = np ) # more customizations: # - transform sigma to log(sigma) # - median log-posterior as 'condition' # - hex instead of scatter for off-diagonal plots # - show points where max treedepth hit in blue color_scheme_set("darkgray") mcmc_pairs( posterior, pars = c("wt", "cyl", "sigma"), transform = list(sigma = "log"), off_diag_fun = "hex", condition = pairs_condition(nuts = "lp__"), lp = log_posterior(fit), np = np, np_style = pairs_style_np(div_color = "firebrick", td_color = "blue", td_size = 2), # for demonstration purposes, set max_treedepth to a value that will # result in at least a few max treedepth warnings max_treedepth = with(np, -1 + max(Value[Parameter == "treedepth__"])) ) ## End(Not run)
Trace and rank plots of MCMC draws. See the Plot Descriptions section, below, for details.
mcmc_trace( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_warmup = 0, iter1 = 0, window = NULL, size = NULL, np = NULL, np_style = trace_style_np(), divergences = NULL ) mcmc_trace_highlight( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_warmup = 0, window = NULL, size = NULL, alpha = 0.2, highlight = 1 ) trace_style_np(div_color = "red", div_size = 0.25, div_alpha = 1) mcmc_rank_overlay( x, pars = character(), regex_pars = character(), transformations = list(), facet_args = list(), ..., n_bins = 20, ref_line = FALSE ) mcmc_rank_hist( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_bins = 20, ref_line = FALSE ) mcmc_rank_ecdf( x, pars = character(), regex_pars = character(), transformations = list(), ..., K = NULL, facet_args = list(), prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL ) mcmc_trace_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., highlight = NULL, n_warmup = 0, iter1 = 0 )
mcmc_trace( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_warmup = 0, iter1 = 0, window = NULL, size = NULL, np = NULL, np_style = trace_style_np(), divergences = NULL ) mcmc_trace_highlight( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_warmup = 0, window = NULL, size = NULL, alpha = 0.2, highlight = 1 ) trace_style_np(div_color = "red", div_size = 0.25, div_alpha = 1) mcmc_rank_overlay( x, pars = character(), regex_pars = character(), transformations = list(), facet_args = list(), ..., n_bins = 20, ref_line = FALSE ) mcmc_rank_hist( x, pars = character(), regex_pars = character(), transformations = list(), ..., facet_args = list(), n_bins = 20, ref_line = FALSE ) mcmc_rank_ecdf( x, pars = character(), regex_pars = character(), transformations = list(), ..., K = NULL, facet_args = list(), prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL ) mcmc_trace_data( x, pars = character(), regex_pars = character(), transformations = list(), ..., highlight = NULL, n_warmup = 0, iter1 = 0 )
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
facet_args |
A named list of arguments (other than |
n_warmup |
An integer; the number of warmup iterations included in
|
iter1 |
An integer; the iteration number of the first included draw
(default is |
window |
An integer vector of length two specifying the limits of a range of iterations to display. |
size |
An optional value to override the default line size
for |
np |
For models fit using NUTS (more generally, any
symplectic integrator),
an optional data frame providing NUTS diagnostic information. The data
frame should be the object returned by |
np_style |
A call to the |
divergences |
Deprecated. Use the |
alpha |
For |
highlight |
For |
div_color , div_size , div_alpha
|
Optional arguments to the
|
n_bins |
For the rank plots, the number of bins to use for the histogram
of rank-normalized MCMC samples. Defaults to |
ref_line |
For the rank plots, whether to draw a horizontal line at the
average number of ranks per bin. Defaults to |
K |
An optional integer defining the number of equally spaced evaluation
points for the PIT-ECDF. Reducing K when using |
prob |
For |
plot_diff |
For |
interpolate_adj |
A boolean defining if the simultaneous confidence
bands should be interpolated based on precomputed values rather than
computed exactly. Computing the bands may be computationally intensive and
the approximation gives a fast method for assessing the ECDF trajectory.
The default is to use interpolation if |
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
mcmc_trace_data()
returns the data for the trace and rank plots
in the same data frame.
mcmc_trace()
Standard trace plots of MCMC draws. For models fit using NUTS,
the np
argument can be used to also show divergences on the trace plot.
mcmc_trace_highlight()
Traces are plotted using points rather than lines and the opacity of all
chains but one (specified by the highlight
argument) is reduced.
mcmc_rank_hist()
Whereas traditional trace plots visualize how the chains mix over the course of sampling, rank histograms visualize how the values from the chains mix together in terms of ranking. An ideal plot would show the rankings mixing or overlapping in a uniform distribution. See Vehtari et al. (2019) for details.
mcmc_rank_overlay()
Ranks from mcmc_rank_hist()
are plotted using overlaid lines in a
single panel.
mcmc_rank_ecdf()
The ECDFs of the ranks from mcmc_rank_hist()
are plotted with the
simultaneous confidence bands with a coverage determined by prob
, that
is, bands that completely cover all of the rank ECDFs with the probability
prob
. If plot_diff = TRUE
, the difference between the observed rank
ECDFs and the theoretical expectation for samples originating from the
same distribution is drawn. See Säilynoja et al. (2021) for details.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., Bürkner, P. (2019). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. arXiv preprint.
Säilynoja, T., Bürkner, P., Vehtari, A. (2021). Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison arXiv preprint.
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
# some parameter draws to use for demonstration x <- example_mcmc_draws(chains = 4, params = 6) dim(x) dimnames(x) # trace plots of the betas color_scheme_set("viridis") mcmc_trace(x, regex_pars = "beta") color_scheme_set("viridisA") mcmc_trace(x, regex_pars = "beta") color_scheme_set("viridisC") mcmc_trace(x, regex_pars = "beta") # mix color schemes color_scheme_set("mix-blue-red") mcmc_trace(x, regex_pars = "beta") # use traditional ggplot discrete color scale mcmc_trace(x, pars = c("alpha", "sigma")) + ggplot2::scale_color_discrete() # zoom in on a window of iterations, increase line size, # add tick marks, move legend to the top, add gray background color_scheme_set("viridisA") mcmc_trace(x[,, 1:4], window = c(100, 130), size = 1) + panel_bg(fill = "gray90", color = NA) + legend_move("top") # Rank-normalized histogram plots. Instead of showing how chains mix over # time, look at how the ranking of MCMC samples mixed between chains. color_scheme_set("viridisE") mcmc_rank_hist(x, "alpha") mcmc_rank_hist(x, pars = c("alpha", "sigma"), ref_line = TRUE) mcmc_rank_overlay(x, "alpha") # ECDF and ECDF difference plots of the ranking of MCMC samples between chains. # Provide 99% simultaneous confidence intervals for the chains sampling from # the same distribution. mcmc_rank_ecdf(x, prob = 0.99) mcmc_rank_ecdf(x, prob = 0.99, plot_diff = TRUE) ## Not run: # parse facet label text color_scheme_set("purple") p <- mcmc_trace( x, regex_pars = "beta\\[[1,3]\\]", facet_args = list(labeller = ggplot2::label_parsed) ) p + facet_text(size = 15) # mark first 100 draws as warmup mcmc_trace(x, n_warmup = 100) # plot as points, highlighting chain 2 color_scheme_set("brightblue") mcmc_trace_highlight(x, pars = "sigma", highlight = 2, size = 2) # for models fit using HMC/NUTS divergences can be displayed in the trace plot library("rstanarm") fit <- stan_glm(mpg ~ ., data = mtcars, refresh = 0, # next line to keep example fast and also ensure we get some divergences prior = hs(), iter = 400, adapt_delta = 0.8) # extract draws using as.array (instead of as.matrix) to keep # chains separate for trace plot posterior <- as.array(fit) # for stanfit and stanreg objects use nuts_params() to get the divergences mcmc_trace(posterior, pars = "sigma", np = nuts_params(fit)) color_scheme_set("viridis") mcmc_trace( posterior, pars = c("wt", "sigma"), size = 0.5, facet_args = list(nrow = 2), np = nuts_params(fit), np_style = trace_style_np(div_color = "black", div_size = 0.5) ) ## End(Not run)
# some parameter draws to use for demonstration x <- example_mcmc_draws(chains = 4, params = 6) dim(x) dimnames(x) # trace plots of the betas color_scheme_set("viridis") mcmc_trace(x, regex_pars = "beta") color_scheme_set("viridisA") mcmc_trace(x, regex_pars = "beta") color_scheme_set("viridisC") mcmc_trace(x, regex_pars = "beta") # mix color schemes color_scheme_set("mix-blue-red") mcmc_trace(x, regex_pars = "beta") # use traditional ggplot discrete color scale mcmc_trace(x, pars = c("alpha", "sigma")) + ggplot2::scale_color_discrete() # zoom in on a window of iterations, increase line size, # add tick marks, move legend to the top, add gray background color_scheme_set("viridisA") mcmc_trace(x[,, 1:4], window = c(100, 130), size = 1) + panel_bg(fill = "gray90", color = NA) + legend_move("top") # Rank-normalized histogram plots. Instead of showing how chains mix over # time, look at how the ranking of MCMC samples mixed between chains. color_scheme_set("viridisE") mcmc_rank_hist(x, "alpha") mcmc_rank_hist(x, pars = c("alpha", "sigma"), ref_line = TRUE) mcmc_rank_overlay(x, "alpha") # ECDF and ECDF difference plots of the ranking of MCMC samples between chains. # Provide 99% simultaneous confidence intervals for the chains sampling from # the same distribution. mcmc_rank_ecdf(x, prob = 0.99) mcmc_rank_ecdf(x, prob = 0.99, plot_diff = TRUE) ## Not run: # parse facet label text color_scheme_set("purple") p <- mcmc_trace( x, regex_pars = "beta\\[[1,3]\\]", facet_args = list(labeller = ggplot2::label_parsed) ) p + facet_text(size = 15) # mark first 100 draws as warmup mcmc_trace(x, n_warmup = 100) # plot as points, highlighting chain 2 color_scheme_set("brightblue") mcmc_trace_highlight(x, pars = "sigma", highlight = 2, size = 2) # for models fit using HMC/NUTS divergences can be displayed in the trace plot library("rstanarm") fit <- stan_glm(mpg ~ ., data = mtcars, refresh = 0, # next line to keep example fast and also ensure we get some divergences prior = hs(), iter = 400, adapt_delta = 0.8) # extract draws using as.array (instead of as.matrix) to keep # chains separate for trace plot posterior <- as.array(fit) # for stanfit and stanreg objects use nuts_params() to get the divergences mcmc_trace(posterior, pars = "sigma", np = nuts_params(fit)) color_scheme_set("viridis") mcmc_trace( posterior, pars = c("wt", "sigma"), size = 0.5, facet_args = list(nrow = 2), np = nuts_params(fit), np_style = trace_style_np(div_color = "black", div_size = 0.5) ) ## End(Not run)
S3 generic with simple default method. The intent is to provide a generic so
authors of other R packages who wish to provide interfaces to the functions
in bayesplot will be encouraged to include pp_check()
methods in their
package, preserving the same naming conventions for posterior (and prior)
predictive checking across many R packages for Bayesian inference. This is
for the convenience of both users and developers. See the Details and
Examples sections, below, and the package vignettes for examples of
defining pp_check()
methods.
pp_check(object, ...) ## Default S3 method: pp_check(object, yrep, fun, ...)
pp_check(object, ...) ## Default S3 method: pp_check(object, yrep, fun, ...)
object |
Typically a fitted model object. The default method, however,
takes |
... |
For the generic, arguments passed to individual methods. For the
default method, these are additional arguments to pass to |
yrep |
For the default method, a |
fun |
For the default method, the plotting function to call. Can be any
of the PPC functions. The |
A package that creates fitted model objects of class "foo"
can include a method pp_check.foo()
that prepares the appropriate
inputs (y
, yrep
, etc.) for the bayesplot functions. The
pp_check.foo()
method may, for example, let the user choose between
various plots, calling the functions from bayesplot internally as
needed. See Examples, below, and the package vignettes.
The exact form of the value returned by pp_check()
may vary by
the class of object
, but for consistency we encourage authors of
methods to return the ggplot object created by one of bayesplot's
plotting functions. The default method returns the object returned by fun
.
# default method y <- example_y_data() yrep <- example_yrep_draws() pp_check(y, yrep[1:50,], ppc_dens_overlay) g <- example_group_data() pp_check(y, yrep, fun = "stat_grouped", group = g, stat = "median") # defining a method x <- list(y = rnorm(50), yrep = matrix(rnorm(5000), nrow = 100, ncol = 50)) class(x) <- "foo" pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) { y <- object[["y"]] yrep <- object[["yrep"]] switch(match.arg(type), multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]), overlaid = ppc_dens_overlay(y, yrep)) } pp_check(x) pp_check(x, type = "overlaid")
# default method y <- example_y_data() yrep <- example_yrep_draws() pp_check(y, yrep[1:50,], ppc_dens_overlay) g <- example_group_data() pp_check(y, yrep, fun = "stat_grouped", group = g, stat = "median") # defining a method x <- list(y = rnorm(50), yrep = matrix(rnorm(5000), nrow = 100, ncol = 50)) class(x) <- "foo" pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) { y <- object[["y"]] yrep <- object[["yrep"]] switch(match.arg(type), multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]), overlaid = ppc_dens_overlay(y, yrep)) } pp_check(x) pp_check(x, type = "overlaid")
Compare the empirical distribution of censored data y
to the
distributions of simulated/replicated data yrep
from the posterior
predictive distribution. See the Plot Descriptions section, below, for
details.
Although some of the other bayesplot plots can be used with censored
data, ppc_km_overlay()
is currently the only plotting function designed
specifically for censored data. We encourage you to suggest or contribute
additional plots at
github.com/stan-dev/bayesplot.
ppc_km_overlay(y, yrep, ..., status_y, size = 0.25, alpha = 0.7) ppc_km_overlay_grouped(y, yrep, group, ..., status_y, size = 0.25, alpha = 0.7)
ppc_km_overlay(y, yrep, ..., status_y, size = 0.25, alpha = 0.7) ppc_km_overlay_grouped(y, yrep, group, ..., status_y, size = 0.25, alpha = 0.7)
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently only used internally. |
status_y |
The status indicator for the observations from |
size , alpha
|
Passed to the appropriate geom to control the appearance of
the |
group |
A grouping variable of the same length as |
A ggplot object that can be further customized using the ggplot2 package.
ppc_km_overlay()
Empirical CCDF estimates of each dataset (row) in yrep
are overlaid,
with the Kaplan-Meier estimate (Kaplan and Meier, 1958) for y
itself on
top (and in a darker shade). This is a PPC suitable for right-censored
y
. Note that the replicated data from yrep
is assumed to be
uncensored.
ppc_km_overlay_grouped()
The same as ppc_km_overlay()
, but with separate facets by group
.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 53(282), 457–481. doi:10.1080/01621459.1958.10501452.
Other PPCs:
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
color_scheme_set("brightblue") y <- example_y_data() # For illustrative purposes, (right-)censor values y > 110: status_y <- as.numeric(y <= 110) y <- pmin(y, 110) # In reality, the replicated data (yrep) would be obtained from a # model which takes the censoring of y properly into account. Here, # for illustrative purposes, we simply use example_yrep_draws(): yrep <- example_yrep_draws() dim(yrep) ppc_km_overlay(y, yrep[1:25, ], status_y = status_y) # With separate facets by group: group <- example_group_data() ppc_km_overlay_grouped(y, yrep[1:25, ], group = group, status_y = status_y)
color_scheme_set("brightblue") y <- example_y_data() # For illustrative purposes, (right-)censor values y > 110: status_y <- as.numeric(y <= 110) y <- pmin(y, 110) # In reality, the replicated data (yrep) would be obtained from a # model which takes the censoring of y properly into account. Here, # for illustrative purposes, we simply use example_yrep_draws(): yrep <- example_yrep_draws() dim(yrep) ppc_km_overlay(y, yrep[1:25, ], status_y = status_y) # With separate facets by group: group <- example_group_data() ppc_km_overlay_grouped(y, yrep[1:25, ], group = group, status_y = status_y)
Many of the PPC functions in bayesplot can
be used with discrete data. The small subset of these functions that can
only be used if y
and yrep
are discrete are documented
on this page. Currently these include rootograms for count outcomes and bar
plots for ordinal, categorical, and multinomial outcomes. See the
Plot Descriptions section below.
ppc_bars( y, yrep, ..., prob = 0.9, width = 0.9, size = 1, fatten = 2.5, linewidth = 1, freq = TRUE ) ppc_bars_grouped( y, yrep, group, ..., facet_args = list(), prob = 0.9, width = 0.9, size = 1, fatten = 2.5, linewidth = 1, freq = TRUE ) ppc_rootogram( y, yrep, style = c("standing", "hanging", "suspended"), ..., prob = 0.9, size = 1 ) ppc_bars_data(y, yrep, group = NULL, prob = 0.9, freq = TRUE)
ppc_bars( y, yrep, ..., prob = 0.9, width = 0.9, size = 1, fatten = 2.5, linewidth = 1, freq = TRUE ) ppc_bars_grouped( y, yrep, group, ..., facet_args = list(), prob = 0.9, width = 0.9, size = 1, fatten = 2.5, linewidth = 1, freq = TRUE ) ppc_rootogram( y, yrep, style = c("standing", "hanging", "suspended"), ..., prob = 0.9, size = 1 ) ppc_bars_data(y, yrep, group = NULL, prob = 0.9, freq = TRUE)
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
prob |
A value between |
width |
For bar plots only, passed to |
size , fatten , linewidth
|
For bar plots, |
freq |
For bar plots only, if |
group |
A grouping variable of the same length as |
facet_args |
An optional list of arguments (other than |
style |
For |
For all of these plots y
and yrep
must be integers, although
they need not be integers in the strict sense of R's
integer type. For rootogram plots y
and yrep
must also
be non-negative.
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
ppc_bars()
Bar plot of y
with yrep
medians and uncertainty intervals
superimposed on the bars.
ppc_bars_grouped()
Same as ppc_bars()
but a separate plot (facet) is generated for each
level of a grouping variable.
ppc_rootogram()
Rootograms allow for diagnosing problems in count data models such as
overdispersion or excess zeros. They consist of a histogram of y
with the
expected counts based on yrep
overlaid as a line along with uncertainty
intervals. The y-axis represents the square roots of the counts to
approximately adjust for scale differences and thus ease comparison between
observed and expected counts. Using the style
argument, the histogram
style can be adjusted to focus on different aspects of the data:
Standing: basic histogram of observed counts with curve showing expected counts.
Hanging: observed counts counts hanging from the curve representing expected counts.
Suspended: histogram of the differences between expected and observed counts.
All of the rootograms are plotted on the square root scale. See Kleiber and Zeileis (2016) for advice on interpreting rootograms and selecting among the different styles.
Kleiber, C. and Zeileis, A. (2016). Visualizing count data regressions using rootograms. The American Statistician. 70(3): 296–303. https://arxiv.org/abs/1605.01311.
Other PPCs:
PPC-censoring
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
set.seed(9222017) # bar plots f <- function(N) { sample(1:4, size = N, replace = TRUE, prob = c(0.25, 0.4, 0.1, 0.25)) } y <- f(100) yrep <- t(replicate(500, f(100))) dim(yrep) group <- gl(2, 50, length = 100, labels = c("GroupA", "GroupB")) color_scheme_set("mix-pink-blue") ppc_bars(y, yrep) # split by group, change interval width, and display proportion # instead of count on y-axis color_scheme_set("mix-blue-pink") ppc_bars_grouped(y, yrep, group, prob = 0.5, freq = FALSE) ## Not run: # example for ordinal regression using rstanarm library(rstanarm) fit <- stan_polr( tobgp ~ agegp, data = esoph, method = "probit", prior = R2(0.2, "mean"), init_r = 0.1, seed = 12345, # cores = 4, refresh = 0 ) # coded as character, so convert to integer yrep_char <- posterior_predict(fit) print(yrep_char[1, 1:4]) yrep_int <- sapply(data.frame(yrep_char, stringsAsFactors = TRUE), as.integer) y_int <- as.integer(esoph$tobgp) ppc_bars(y_int, yrep_int) ppc_bars_grouped( y = y_int, yrep = yrep_int, group = esoph$agegp, freq=FALSE, prob = 0.5, fatten = 1, size = 1.5 ) ## End(Not run) # rootograms for counts y <- rpois(100, 20) yrep <- matrix(rpois(10000, 20), ncol = 100) color_scheme_set("brightblue") ppc_rootogram(y, yrep) ppc_rootogram(y, yrep, prob = 0) ppc_rootogram(y, yrep, style = "hanging", prob = 0.8) ppc_rootogram(y, yrep, style = "suspended")
set.seed(9222017) # bar plots f <- function(N) { sample(1:4, size = N, replace = TRUE, prob = c(0.25, 0.4, 0.1, 0.25)) } y <- f(100) yrep <- t(replicate(500, f(100))) dim(yrep) group <- gl(2, 50, length = 100, labels = c("GroupA", "GroupB")) color_scheme_set("mix-pink-blue") ppc_bars(y, yrep) # split by group, change interval width, and display proportion # instead of count on y-axis color_scheme_set("mix-blue-pink") ppc_bars_grouped(y, yrep, group, prob = 0.5, freq = FALSE) ## Not run: # example for ordinal regression using rstanarm library(rstanarm) fit <- stan_polr( tobgp ~ agegp, data = esoph, method = "probit", prior = R2(0.2, "mean"), init_r = 0.1, seed = 12345, # cores = 4, refresh = 0 ) # coded as character, so convert to integer yrep_char <- posterior_predict(fit) print(yrep_char[1, 1:4]) yrep_int <- sapply(data.frame(yrep_char, stringsAsFactors = TRUE), as.integer) y_int <- as.integer(esoph$tobgp) ppc_bars(y_int, yrep_int) ppc_bars_grouped( y = y_int, yrep = yrep_int, group = esoph$agegp, freq=FALSE, prob = 0.5, fatten = 1, size = 1.5 ) ## End(Not run) # rootograms for counts y <- rpois(100, 20) yrep <- matrix(rpois(10000, 20), ncol = 100) color_scheme_set("brightblue") ppc_rootogram(y, yrep) ppc_rootogram(y, yrep, prob = 0) ppc_rootogram(y, yrep, style = "hanging", prob = 0.8) ppc_rootogram(y, yrep, style = "suspended")
Compare the empirical distribution of the data y
to the distributions of
simulated/replicated data yrep
from the posterior predictive distribution.
See the Plot Descriptions section, below, for details.
ppc_data(y, yrep, group = NULL) ppc_dens_overlay( y, yrep, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_dens_overlay_grouped( y, yrep, group, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_ecdf_overlay( y, yrep, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppc_ecdf_overlay_grouped( y, yrep, group, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppc_dens(y, yrep, ..., trim = FALSE, size = 0.5, alpha = 1) ppc_hist( y, yrep, ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_freqpoly( y, yrep, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppc_freqpoly_grouped( y, yrep, group, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppc_boxplot(y, yrep, ..., notch = TRUE, size = 0.5, alpha = 1) ppc_violin_grouped( y, yrep, group, ..., probs = c(0.1, 0.5, 0.9), size = 1, alpha = 1, y_draw = c("violin", "points", "both"), y_size = 1, y_alpha = 1, y_jitter = 0.1 ) ppc_pit_ecdf( y, yrep, ..., pit = NULL, K = NULL, prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL ) ppc_pit_ecdf_grouped( y, yrep, group, ..., K = NULL, pit = NULL, prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL )
ppc_data(y, yrep, group = NULL) ppc_dens_overlay( y, yrep, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_dens_overlay_grouped( y, yrep, group, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_ecdf_overlay( y, yrep, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppc_ecdf_overlay_grouped( y, yrep, group, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppc_dens(y, yrep, ..., trim = FALSE, size = 0.5, alpha = 1) ppc_hist( y, yrep, ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_freqpoly( y, yrep, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppc_freqpoly_grouped( y, yrep, group, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppc_boxplot(y, yrep, ..., notch = TRUE, size = 0.5, alpha = 1) ppc_violin_grouped( y, yrep, group, ..., probs = c(0.1, 0.5, 0.9), size = 1, alpha = 1, y_draw = c("violin", "points", "both"), y_size = 1, y_alpha = 1, y_jitter = 0.1 ) ppc_pit_ecdf( y, yrep, ..., pit = NULL, K = NULL, prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL ) ppc_pit_ecdf_grouped( y, yrep, group, ..., K = NULL, pit = NULL, prob = 0.99, plot_diff = FALSE, interpolate_adj = NULL )
y |
A vector of observations. See Details. |
yrep |
An |
group |
A grouping variable of the same length as |
... |
Currently unused. |
size , alpha
|
Passed to the appropriate geom to control the appearance of the predictive distributions. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens
|
Optional arguments passed to
|
discrete |
For |
pad |
A logical scalar passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
notch |
For the box plot, a logical scalar passed to
|
probs |
A numeric vector passed to |
y_draw |
For |
y_jitter , y_size , y_alpha
|
For |
pit |
An optional vector of probability integral transformed values for
which the ECDF is to be drawn. If NULL, PIT values are computed to |
K |
An optional integer defining the number of equally spaced evaluation
points for the PIT-ECDF. Reducing K when using |
prob |
The desired simultaneous coverage level of the bands around the ECDF. A value in (0,1). |
plot_diff |
A boolean defining whether to plot the difference between
the observed PIT- ECDF and the theoretical expectation for uniform PIT
values rather than plotting the regular ECDF. The default is |
interpolate_adj |
A boolean defining if the simultaneous confidence
bands should be interpolated based on precomputed values rather than
computed exactly. Computing the bands may be computationally intensive and
the approximation gives a fast method for assessing the ECDF trajectory.
The default is to use interpolation if |
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
ppc_hist(), ppc_freqpoly(), ppc_dens(), ppc_boxplot()
A separate histogram, shaded frequency polygon, smoothed kernel density
estimate, or box and whiskers plot is displayed for y
and each
dataset (row) in yrep
. For these plots yrep
should therefore
contain only a small number of rows. See the Examples section.
ppc_freqpoly_grouped()
A separate frequency polygon is plotted for each level of a grouping
variable for y
and each dataset (row) in yrep
. For this plot
yrep
should therefore contain only a small number of rows. See the
Examples section.
ppc_ecdf_overlay(), ppc_dens_overlay(), ppc_ecdf_overlay_grouped(), ppc_dens_overlay_grouped()
Kernel density or empirical CDF estimates of each dataset (row) in
yrep
are overlaid, with the distribution of y
itself on top
(and in a darker shade). When using ppc_ecdf_overlay()
with discrete
data, set the discrete
argument to TRUE
for better results.
For an example of ppc_dens_overlay()
also see Gabry et al. (2019).
ppc_violin_grouped()
The density estimate of yrep
within each level of a grouping
variable is plotted as a violin with horizontal lines at notable
quantiles. y
is overlaid on the plot either as a violin, points, or
both, depending on the y_draw
argument.
ppc_pit_ecdf()
, ppc_pit_ecdf_grouped()
The PIT-ECDF of the empirical PIT values of y
computed with respect to
the corresponding yrep
values. 100 * prob
% central simultaneous
confidence intervals are provided to asses if y
and yrep
originate
from the same distribution. The PIT values can also be provided directly
as pit
.
See Säilynoja et al. (2021) for more details.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Säilynoja, T., Bürkner, P., Vehtari, A. (2021). Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison arXiv preprint.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
color_scheme_set("brightblue") y <- example_y_data() yrep <- example_yrep_draws() group <- example_group_data() dim(yrep) ppc_dens_overlay(y, yrep[1:25, ]) # ppc_ecdf_overlay with continuous data (set discrete=TRUE if discrete data) ppc_ecdf_overlay(y, yrep[sample(nrow(yrep), 25), ]) # PIT-ECDF and PIT-ECDF difference plot of the PIT values of y compared to # yrep with 99% simultaneous confidence bands. ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = FALSE) ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = TRUE) # for ppc_hist,dens,freqpoly,boxplot definitely use a subset yrep rows so # only a few (instead of nrow(yrep)) histograms are plotted ppc_hist(y, yrep[1:8, ]) color_scheme_set("red") ppc_boxplot(y, yrep[1:8, ]) # wizard hat plot color_scheme_set("blue") ppc_dens(y, yrep[200:202, ]) # frequency polygons ppc_freqpoly(y, yrep[1:3, ], alpha = 0.1, size = 1, binwidth = 5) ppc_freqpoly_grouped(y, yrep[1:3, ], group) + yaxis_text() # if groups are different sizes then the 'freq' argument can be useful ppc_freqpoly_grouped(y, yrep[1:3, ], group, freq = FALSE) + yaxis_text() # density and distribution overlays by group ppc_dens_overlay_grouped(y, yrep[1:25, ], group = group) ppc_ecdf_overlay_grouped(y, yrep[1:25, ], group = group) # PIT-ECDF plots of the PIT values by group # with 99% simultaneous confidence bands. ppc_pit_ecdf_grouped(y, yrep, group=group, prob=0.99) # don't need to only use small number of rows for ppc_violin_grouped # (as it pools yrep draws within groups) color_scheme_set("gray") ppc_violin_grouped(y, yrep, group, size = 1.5) ppc_violin_grouped(y, yrep, group, alpha = 0) # change how y is drawn ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "points", y_size = 1.5) ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "both", y_size = 1.5, y_alpha = 0.5, y_jitter = 0.33 )
color_scheme_set("brightblue") y <- example_y_data() yrep <- example_yrep_draws() group <- example_group_data() dim(yrep) ppc_dens_overlay(y, yrep[1:25, ]) # ppc_ecdf_overlay with continuous data (set discrete=TRUE if discrete data) ppc_ecdf_overlay(y, yrep[sample(nrow(yrep), 25), ]) # PIT-ECDF and PIT-ECDF difference plot of the PIT values of y compared to # yrep with 99% simultaneous confidence bands. ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = FALSE) ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = TRUE) # for ppc_hist,dens,freqpoly,boxplot definitely use a subset yrep rows so # only a few (instead of nrow(yrep)) histograms are plotted ppc_hist(y, yrep[1:8, ]) color_scheme_set("red") ppc_boxplot(y, yrep[1:8, ]) # wizard hat plot color_scheme_set("blue") ppc_dens(y, yrep[200:202, ]) # frequency polygons ppc_freqpoly(y, yrep[1:3, ], alpha = 0.1, size = 1, binwidth = 5) ppc_freqpoly_grouped(y, yrep[1:3, ], group) + yaxis_text() # if groups are different sizes then the 'freq' argument can be useful ppc_freqpoly_grouped(y, yrep[1:3, ], group, freq = FALSE) + yaxis_text() # density and distribution overlays by group ppc_dens_overlay_grouped(y, yrep[1:25, ], group = group) ppc_ecdf_overlay_grouped(y, yrep[1:25, ], group = group) # PIT-ECDF plots of the PIT values by group # with 99% simultaneous confidence bands. ppc_pit_ecdf_grouped(y, yrep, group=group, prob=0.99) # don't need to only use small number of rows for ppc_violin_grouped # (as it pools yrep draws within groups) color_scheme_set("gray") ppc_violin_grouped(y, yrep, group, size = 1.5) ppc_violin_grouped(y, yrep, group, alpha = 0) # change how y is drawn ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "points", y_size = 1.5) ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "both", y_size = 1.5, y_alpha = 0.5, y_jitter = 0.33 )
Various plots of predictive errors y - yrep
. See the
Details and Plot Descriptions sections, below.
ppc_error_hist( y, yrep, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_error_hist_grouped( y, yrep, group, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_error_scatter(y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8) ppc_error_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8) ppc_error_scatter_avg_grouped( y, yrep, group, ..., facet_args = list(), size = 2.5, alpha = 0.8 ) ppc_error_scatter_avg_vs_x(y, yrep, x, ..., size = 2.5, alpha = 0.8) ppc_error_binned( y, yrep, ..., facet_args = list(), bins = NULL, size = 1, alpha = 0.25 ) ppc_error_data(y, yrep, group = NULL)
ppc_error_hist( y, yrep, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_error_hist_grouped( y, yrep, group, ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_error_scatter(y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8) ppc_error_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8) ppc_error_scatter_avg_grouped( y, yrep, group, ..., facet_args = list(), size = 2.5, alpha = 0.8 ) ppc_error_scatter_avg_vs_x(y, yrep, x, ..., size = 2.5, alpha = 0.8) ppc_error_binned( y, yrep, ..., facet_args = list(), bins = NULL, size = 1, alpha = 0.25 ) ppc_error_data(y, yrep, group = NULL)
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
binwidth |
Passed to |
bins |
For |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
size , alpha
|
For scatterplots, arguments passed to
|
x |
A numeric vector the same length as |
All of these functions (aside from the *_scatter_avg
functions)
compute and plot predictive errors for each row of the matrix yrep
, so
it is usually a good idea for yrep
to contain only a small number of
draws (rows). See Examples, below.
For binomial and Bernoulli data the ppc_error_binned()
function can be used
to generate binned error plots. Bernoulli data can be input as a vector of 0s
and 1s, whereas for binomial data y
and yrep
should contain "success"
proportions (not counts). See the Examples section, below.
A ggplot object that can be further customized using the ggplot2 package.
ppc_error_hist()
A separate histogram is plotted for the predictive errors computed from
y
and each dataset (row) in yrep
. For this plot yrep
should have
only a small number of rows.
ppc_error_hist_grouped()
Like ppc_error_hist()
, except errors are computed within levels of a
grouping variable. The number of histograms is therefore equal to the
product of the number of rows in yrep
and the number of groups
(unique values of group
).
ppc_error_scatter()
A separate scatterplot is displayed for y
vs. the predictive errors
computed from y
and each dataset (row) in yrep
. For this plot yrep
should have only a small number of rows.
ppc_error_scatter_avg()
A single scatterplot of y
vs. the average of the errors computed from
y
and each dataset (row) in yrep
. For each individual data point
y[n]
the average error is the average of the errors for y[n]
computed
over the the draws from the posterior predictive distribution.
ppc_error_scatter_avg_vs_x()
Same as ppc_error_scatter_avg()
, except the average is plotted on the
y-axis and a predictor variable x
is plotted on the x-axis.
ppc_error_binned()
Intended for use with binomial data. A separate binned error plot (similar
to arm::binnedplot()
) is generated for each dataset (row) in yrep
. For
this plot y
and yrep
should contain proportions rather than counts,
and yrep
should have only a small number of rows.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
y <- example_y_data() yrep <- example_yrep_draws() ppc_error_hist(y, yrep[1:3, ]) # errors within groups group <- example_group_data() (p1 <- ppc_error_hist_grouped(y, yrep[1:3, ], group)) p1 + yaxis_text() # defaults to showing counts on y-axis table(group) # more obs in GroupB, can set freq=FALSE to show density on y-axis (p2 <- ppc_error_hist_grouped(y, yrep[1:3, ], group, freq = FALSE)) p2 + yaxis_text() # scatterplots ppc_error_scatter(y, yrep[10:14, ]) ppc_error_scatter_avg(y, yrep) x <- example_x_data() ppc_error_scatter_avg_vs_x(y, yrep, x) ## Not run: # binned error plot with binomial model from rstanarm library(rstanarm) example("example_model", package = "rstanarm") formula(example_model) # get observed proportion of "successes" y <- example_model$y # matrix of "success" and "failure" counts trials <- rowSums(y) y_prop <- y[, 1] / trials # proportions # get predicted success proportions yrep <- posterior_predict(example_model) yrep_prop <- sweep(yrep, 2, trials, "/") ppc_error_binned(y_prop, yrep_prop[1:6, ]) ## End(Not run)
y <- example_y_data() yrep <- example_yrep_draws() ppc_error_hist(y, yrep[1:3, ]) # errors within groups group <- example_group_data() (p1 <- ppc_error_hist_grouped(y, yrep[1:3, ], group)) p1 + yaxis_text() # defaults to showing counts on y-axis table(group) # more obs in GroupB, can set freq=FALSE to show density on y-axis (p2 <- ppc_error_hist_grouped(y, yrep[1:3, ], group, freq = FALSE)) p2 + yaxis_text() # scatterplots ppc_error_scatter(y, yrep[10:14, ]) ppc_error_scatter_avg(y, yrep) x <- example_x_data() ppc_error_scatter_avg_vs_x(y, yrep, x) ## Not run: # binned error plot with binomial model from rstanarm library(rstanarm) example("example_model", package = "rstanarm") formula(example_model) # get observed proportion of "successes" y <- example_model$y # matrix of "success" and "failure" counts trials <- rowSums(y) y_prop <- y[, 1] / trials # proportions # get predicted success proportions yrep <- posterior_predict(example_model) yrep_prop <- sweep(yrep, 2, trials, "/") ppc_error_binned(y_prop, yrep_prop[1:6, ]) ## End(Not run)
Medians and central interval estimates of yrep
with y
overlaid.
See the Plot Descriptions section, below.
ppc_intervals( y, yrep, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppc_intervals_grouped( y, yrep, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppc_ribbon( y, yrep, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25, y_draw = c("line", "points", "both") ) ppc_ribbon_grouped( y, yrep, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25, y_draw = c("line", "points", "both") ) ppc_intervals_data( y, yrep, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 ) ppc_ribbon_data( y, yrep, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 )
ppc_intervals( y, yrep, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppc_intervals_grouped( y, yrep, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppc_ribbon( y, yrep, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25, y_draw = c("line", "points", "both") ) ppc_ribbon_grouped( y, yrep, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25, y_draw = c("line", "points", "both") ) ppc_intervals_data( y, yrep, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 ) ppc_ribbon_data( y, yrep, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 )
y |
A vector of observations. See Details. |
yrep |
An |
x |
A numeric vector to use as the x-axis
variable. For example, |
... |
Currently unused. |
prob , prob_outer
|
Values between |
alpha , size , fatten , linewidth
|
Arguments passed to geoms. For ribbon
plots |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
y_draw |
For ribbon plots only, a string specifying how to draw |
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
ppc_intervals(), ppc_ribbon()
100*prob
% central intervals for yrep
at each x
value. ppc_intervals()
plots intervals as vertical bars with points
indicating yrep
medians and darker points indicating observed
y
values. ppc_ribbon()
plots a ribbon of connected intervals
with a line through the median of yrep
and a darker line connecting
observed y
values. In both cases an optional x
variable can
also be specified for the x-axis variable.
Depending on the number of observations and the variability in the
predictions at different values of x
, one of these plots may be easier
to read than the other.
ppc_intervals_grouped(), ppc_ribbon_grouped()
Same as ppc_intervals()
and ppc_ribbon()
, respectively, but a
separate plot (facet) is generated for each level of a grouping variable.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
y <- rnorm(50) yrep <- matrix(rnorm(5000, 0, 2), ncol = 50) color_scheme_set("brightblue") ppc_intervals(y, yrep) ppc_ribbon(y, yrep) ppc_ribbon(y, yrep, y_draw = "points") ## Not run: ppc_ribbon(y, yrep, y_draw = "both") ## End(Not run) ppc_intervals(y, yrep, size = 1.5, fatten = 0) # remove the yrep point estimates color_scheme_set("teal") year <- 1950:1999 ppc_intervals(y, yrep, x = year, fatten = 1) + ggplot2::xlab("Year") ppc_ribbon(y, yrep, x = year) + ggplot2::xlab("Year") color_scheme_set("pink") year <- rep(2000:2009, each = 5) group <- gl(5, 1, length = 50, labels = LETTERS[1:5]) ppc_ribbon_grouped(y, yrep, x = year, group, y_draw = "both") + ggplot2::scale_x_continuous(breaks = pretty) ppc_ribbon_grouped(y, yrep, x = year, group, facet_args = list(scales = "fixed")) + xaxis_text(FALSE) + xaxis_ticks(FALSE) + panel_bg(fill = "gray20") # get the data frames used to make the ggplots ppc_dat <- ppc_intervals_data(y, yrep, x = year, prob = 0.5) ppc_group_dat <- ppc_intervals_data(y, yrep, x = year, group = group, prob = 0.5) ## Not run: library("rstanarm") fit <- stan_glmer(mpg ~ wt + (1|cyl), data = mtcars, refresh = 0) yrep <- posterior_predict(fit) color_scheme_set("purple") ppc_intervals(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, prob = 0.8) + panel_bg(fill="gray90", color = NA) + grid_lines(color = "white") ppc_ribbon(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, prob = 0.6, prob_outer = 0.8) ppc_ribbon_grouped(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, group = mtcars$cyl) color_scheme_set("gray") ppc_intervals(mtcars$mpg, yrep, prob = 0.5) + ggplot2::scale_x_continuous( labels = rownames(mtcars), breaks = 1:nrow(mtcars) ) + xaxis_text(angle = -70, vjust = 1, hjust = 0) + xaxis_title(FALSE) ## End(Not run)
y <- rnorm(50) yrep <- matrix(rnorm(5000, 0, 2), ncol = 50) color_scheme_set("brightblue") ppc_intervals(y, yrep) ppc_ribbon(y, yrep) ppc_ribbon(y, yrep, y_draw = "points") ## Not run: ppc_ribbon(y, yrep, y_draw = "both") ## End(Not run) ppc_intervals(y, yrep, size = 1.5, fatten = 0) # remove the yrep point estimates color_scheme_set("teal") year <- 1950:1999 ppc_intervals(y, yrep, x = year, fatten = 1) + ggplot2::xlab("Year") ppc_ribbon(y, yrep, x = year) + ggplot2::xlab("Year") color_scheme_set("pink") year <- rep(2000:2009, each = 5) group <- gl(5, 1, length = 50, labels = LETTERS[1:5]) ppc_ribbon_grouped(y, yrep, x = year, group, y_draw = "both") + ggplot2::scale_x_continuous(breaks = pretty) ppc_ribbon_grouped(y, yrep, x = year, group, facet_args = list(scales = "fixed")) + xaxis_text(FALSE) + xaxis_ticks(FALSE) + panel_bg(fill = "gray20") # get the data frames used to make the ggplots ppc_dat <- ppc_intervals_data(y, yrep, x = year, prob = 0.5) ppc_group_dat <- ppc_intervals_data(y, yrep, x = year, group = group, prob = 0.5) ## Not run: library("rstanarm") fit <- stan_glmer(mpg ~ wt + (1|cyl), data = mtcars, refresh = 0) yrep <- posterior_predict(fit) color_scheme_set("purple") ppc_intervals(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, prob = 0.8) + panel_bg(fill="gray90", color = NA) + grid_lines(color = "white") ppc_ribbon(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, prob = 0.6, prob_outer = 0.8) ppc_ribbon_grouped(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, group = mtcars$cyl) color_scheme_set("gray") ppc_intervals(mtcars$mpg, yrep, prob = 0.5) + ggplot2::scale_x_continuous( labels = rownames(mtcars), breaks = 1:nrow(mtcars) ) + xaxis_text(angle = -70, vjust = 1, hjust = 0) + xaxis_title(FALSE) ## End(Not run)
Leave-One-Out (LOO) predictive checks. See the Plot Descriptions section, below, and Gabry et al. (2019) for details.
ppc_loo_pit_overlay( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, samples = 100, size = 0.25, alpha = 0.7, boundary_correction = TRUE, grid_len = 512, bw = "nrd0", trim = FALSE, adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_loo_pit_data( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, samples = 100, bw = "nrd0", boundary_correction = TRUE, grid_len = 512 ) ppc_loo_pit_qq( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, compare = c("uniform", "normal"), size = 2, alpha = 1 ) ppc_loo_pit( y, yrep, lw, pit = NULL, compare = c("uniform", "normal"), ..., size = 2, alpha = 1 ) ppc_loo_intervals( y, yrep, psis_object, ..., subset = NULL, intervals = NULL, prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1, order = c("index", "median") ) ppc_loo_ribbon( y, yrep, psis_object, ..., subset = NULL, intervals = NULL, prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 )
ppc_loo_pit_overlay( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, samples = 100, size = 0.25, alpha = 0.7, boundary_correction = TRUE, grid_len = 512, bw = "nrd0", trim = FALSE, adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppc_loo_pit_data( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, samples = 100, bw = "nrd0", boundary_correction = TRUE, grid_len = 512 ) ppc_loo_pit_qq( y, yrep, lw = NULL, ..., psis_object = NULL, pit = NULL, compare = c("uniform", "normal"), size = 2, alpha = 1 ) ppc_loo_pit( y, yrep, lw, pit = NULL, compare = c("uniform", "normal"), ..., size = 2, alpha = 1 ) ppc_loo_intervals( y, yrep, psis_object, ..., subset = NULL, intervals = NULL, prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1, order = c("index", "median") ) ppc_loo_ribbon( y, yrep, psis_object, ..., subset = NULL, intervals = NULL, prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 )
y |
A vector of observations. See Details. |
yrep |
An |
lw |
A matrix of (smoothed) log weights with the same dimensions as
|
... |
Currently unused. |
psis_object |
If using loo version |
pit |
For |
samples |
For |
alpha , size , fatten , linewidth
|
Arguments passed to code geoms to control plot
aesthetics. For |
boundary_correction |
For |
grid_len |
For |
bw , adjust , kernel , n_dens
|
Optional arguments passed to
|
trim |
Passed to |
compare |
For |
subset |
For |
intervals |
For |
prob , prob_outer
|
Values between |
order |
For |
A ggplot object that can be further customized using the ggplot2 package.
ppc_loo_pit_overlay()
, ppc_loo_pit_qq()
The calibration of marginal predictions can be assessed using probability integral transformation (PIT) checks. LOO improves the check by avoiding the double use of data. See the section on marginal predictive checks in Gelman et al. (2013, p. 152–153) and section 5 of Gabry et al. (2019) for an example of using bayesplot for these checks.
The LOO PIT values are asymptotically uniform (for continuous data) if the
model is calibrated. The ppc_loo_pit_overlay()
function creates a plot
comparing the density of the LOO PITs (thick line) to the density estimates
of many simulated data sets from the standard uniform distribution (thin
lines). See Gabry et al. (2019) for an example of interpreting the shape of
the miscalibration that can be observed in these plots.
The ppc_loo_pit_qq()
function provides an alternative visualization of
the miscalibration with a quantile-quantile (Q-Q) plot comparing the LOO
PITs to the standard uniform distribution. Comparing to the uniform is not
good for extreme probabilities close to 0 and 1, so it can sometimes be
useful to set the compare
argument to "normal"
, which will
produce a Q-Q plot comparing standard normal quantiles calculated from the
PIT values to the theoretical standard normal quantiles. This can help see
the (mis)calibration better for the extreme values. However, in most cases
we have found that the overlaid density plot (ppc_loo_pit_overlay()
)
function will provide a clearer picture of calibration problems than the
Q-Q plot.
ppc_loo_intervals()
, ppc_loo_ribbon()
Similar to ppc_intervals()
and ppc_ribbon()
but the intervals are for
the LOO predictive distribution.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (p. 152–153)
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: https://arxiv.org/abs/1507.04544
Boneva, L. I., Kendall, D., & Stefanov, I. (1971). Spline transformations: Three new diagnostic aids for the statistical data-analyst. J. R. Stat. Soc. B (Methodological), 33(1), 1-71. https://www.jstor.org/stable/2986005.
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
## Not run: library(rstanarm) library(loo) head(radon) fit <- stan_lmer( log_radon ~ floor + log_uranium + floor:log_uranium + (1 + floor | county), data = radon, iter = 100, chains = 2, cores = 2 ) y <- radon$log_radon yrep <- posterior_predict(fit) loo1 <- loo(fit, save_psis = TRUE, cores = 4) psis1 <- loo1$psis_object lw <- weights(psis1) # normalized log weights # marginal predictive check using LOO probability integral transform color_scheme_set("orange") ppc_loo_pit_overlay(y, yrep, lw = lw) ppc_loo_pit_qq(y, yrep, lw = lw) ppc_loo_pit_qq(y, yrep, lw = lw, compare = "normal") # can use the psis object instead of lw ppc_loo_pit_qq(y, yrep, psis_object = psis1) # loo predictive intervals vs observations keep_obs <- 1:50 ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs) color_scheme_set("gray") ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs, order = "median") ## End(Not run)
## Not run: library(rstanarm) library(loo) head(radon) fit <- stan_lmer( log_radon ~ floor + log_uranium + floor:log_uranium + (1 + floor | county), data = radon, iter = 100, chains = 2, cores = 2 ) y <- radon$log_radon yrep <- posterior_predict(fit) loo1 <- loo(fit, save_psis = TRUE, cores = 4) psis1 <- loo1$psis_object lw <- weights(psis1) # normalized log weights # marginal predictive check using LOO probability integral transform color_scheme_set("orange") ppc_loo_pit_overlay(y, yrep, lw = lw) ppc_loo_pit_qq(y, yrep, lw = lw) ppc_loo_pit_qq(y, yrep, lw = lw, compare = "normal") # can use the psis object instead of lw ppc_loo_pit_qq(y, yrep, psis_object = psis1) # loo predictive intervals vs observations keep_obs <- 1:50 ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs) color_scheme_set("gray") ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs, order = "median") ## End(Not run)
The bayesplot PPC module provides various plotting functions for creating graphical displays comparing observed data to simulated data from the posterior (or prior) predictive distribution. See the sections below for a brief discussion of the ideas behind posterior predictive checking, an overview of the available PPC plots, and tips on providing an interface to bayesplot from another package.
For plots of posterior (or prior) predictive distributions that do not include observed data see PPD-overview instead.
The idea behind posterior predictive checking is simple: if a model is a good fit then we should be able to use it to generate data that looks a lot like the data we observed.
To generate the data used for posterior predictive checks we simulate from
the posterior predictive distribution. The posterior predictive
distribution is the distribution of the outcome variable implied by a model
after using the observed data (a vector of outcome values), and
typically predictors
, to update our beliefs about the unknown
parameters
in the model. For each draw of the parameters
from the posterior distribution
we generate an entire vector of outcomes. The result is
an
matrix of simulations, where
is the the
size of the posterior sample (number of draws from the posterior
distribution) and
is the number of data points in
. That is,
each row of the matrix is an individual "replicated" dataset of
observations.
When simulating from the posterior predictive distribution we can use either
the same values of the predictors that we used when fitting the model
or new observations of those predictors. When we use the same values of
we denote the resulting simulations by
as they
can be thought of as replications of the outcome
rather than
predictions for future observations. This corresponds to the notation from
Gelman et. al. (2013) and is the notation used throughout the documentation
for this package.
Using the datasets drawn from the posterior predictive
distribution, the functions in the bayesplot package produce various
graphical displays comparing the observed data
to the replications.
For a more thorough discussion of posterior predictive checking see
Chapter 6 of Gelman et. al. (2013).
To use bayesplot for prior predictive checks you can simply use draws
from the prior predictive distribution instead of the posterior predictive
distribution. See Gabry et al. (2019) for more on prior predictive checking
and when it is reasonable to compare the prior predictive distribution to the
observed data. If you want to avoid using the observed data for prior
predictive checks then you can use the bayesplot PPD plots instead,
which do not take a y
argument, or you can use the PPC plots but provide
plausible or implausible y
values that you want to compare to the prior
predictive realizations.
The plotting functions for prior and
posterior predictive checking all have the prefix ppc_
and all require
the arguments y
, a vector of observations, and yrep
, a matrix of
replications (in-sample predictions). The plots are organized into several
categories, each with its own documentation:
PPC-distributions: Histograms, kernel density estimates, boxplots, and
other plots comparing the empirical distribution of data y
to the
distributions of individual simulated datasets (rows) in yrep
.
PPC-test-statistics: The distribution of a statistic, or a pair of
statistics, over the simulated datasets (rows) in yrep
compared to value of
the statistic(s) computed from y
.
PPC-intervals: Interval estimates of yrep
with y
overlaid. The x-axis variable can be optionally specified by the user
(e.g. to plot against a predictor variable or over time).
PPC-errors: Plots of predictive errors (y - yrep
) computed from y
and
each of the simulated datasets (rows) in yrep
. For binomial models binned
error plots are also available.
PPC-scatterplots: Scatterplots (and similar visualizations) of the data
y
vs. individual simulated datasets (rows) in yrep
, or vs. the average
value of the distributions of each data point (columns) in yrep
.
PPC-discrete: PPC functions that can only be used if y
and yrep
are
discrete. For example, rootograms for count outcomes and bar plots for
ordinal, categorical, and multinomial outcomes.
PPC-loo: PPC functions for predictive checks based on (approximate) leave-one-out (LOO) cross-validation. '
PPC-censoring: PPC functions comparing the empirical
distribution of censored data y
to the distributions of individual
simulated datasets (rows) in yrep
.
In addition to the various plotting functions, the bayesplot package
provides the S3 generic pp_check()
. Authors of R packages for
Bayesian inference are encouraged to define pp_check()
methods for the
fitted model objects created by their packages. See the package vignettes for
more details and a simple example, and see the rstanarm and brms
packages for full examples of pp_check()
methods.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-scatterplots
,
PPC-test-statistics
Scatterplots of the observed data y
vs. simulated/replicated data
yrep
from the posterior predictive distribution. See the
Plot Descriptions and Details sections, below.
ppc_scatter( y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8, ref_line = TRUE ) ppc_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8, ref_line = TRUE) ppc_scatter_avg_grouped( y, yrep, group, ..., facet_args = list(), size = 2.5, alpha = 0.8, ref_line = TRUE ) ppc_scatter_data(y, yrep) ppc_scatter_avg_data(y, yrep, group = NULL)
ppc_scatter( y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8, ref_line = TRUE ) ppc_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8, ref_line = TRUE) ppc_scatter_avg_grouped( y, yrep, group, ..., facet_args = list(), size = 2.5, alpha = 0.8, ref_line = TRUE ) ppc_scatter_data(y, yrep) ppc_scatter_avg_data(y, yrep, group = NULL)
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
size , alpha
|
Arguments passed to |
ref_line |
If |
group |
A grouping variable of the same length as |
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
ppc_scatter()
For each dataset (row) in yrep
a scatterplot is generated showing y
against that row of yrep
. For this plot yrep
should only contain a
small number of rows.
ppc_scatter_avg()
A single scatterplot of y
against the average values of yrep
, i.e.,
the points (x,y) = (mean(yrep[, n]), y[n])
, where each yrep[, n]
is
a vector of length equal to the number of posterior draws. Unlike
for ppc_scatter()
, for ppc_scatter_avg()
yrep
should contain many
draws (rows).
ppc_scatter_avg_grouped()
The same as ppc_scatter_avg()
, but a separate plot is generated for
each level of a grouping variable.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-test-statistics
y <- example_y_data() yrep <- example_yrep_draws() p1 <- ppc_scatter_avg(y, yrep) p1 # don't draw line x=y ppc_scatter_avg(y, yrep, ref_line = FALSE) p2 <- ppc_scatter(y, yrep[20:23, ], alpha = 0.5, size = 1.5) p2 # give x and y axes the same limits lims <- ggplot2::lims(x = c(0, 160), y = c(0, 160)) p1 + lims p2 + lims # for ppc_scatter_avg_grouped the default is to allow the facets # to have different x and y axes group <- example_group_data() ppc_scatter_avg_grouped(y, yrep, group) # let x-axis vary but force y-axis to be the same ppc_scatter_avg_grouped(y, yrep, group, facet_args = list(scales = "free_x"))
y <- example_y_data() yrep <- example_yrep_draws() p1 <- ppc_scatter_avg(y, yrep) p1 # don't draw line x=y ppc_scatter_avg(y, yrep, ref_line = FALSE) p2 <- ppc_scatter(y, yrep[20:23, ], alpha = 0.5, size = 1.5) p2 # give x and y axes the same limits lims <- ggplot2::lims(x = c(0, 160), y = c(0, 160)) p1 + lims p2 + lims # for ppc_scatter_avg_grouped the default is to allow the facets # to have different x and y axes group <- example_group_data() ppc_scatter_avg_grouped(y, yrep, group) # let x-axis vary but force y-axis to be the same ppc_scatter_avg_grouped(y, yrep, group, facet_args = list(scales = "free_x"))
The distribution of a (test) statistic T(yrep)
, or a pair of
(test) statistics, over the simulated datasets in yrep
, compared to the
observed value T(y)
computed from the data y
. See the
Plot Descriptions and Details sections, below, as
well as Gabry et al. (2019).
NOTE: Although the default test statistic is the mean, this is unlikely to detect anything interesting in most cases. In general we recommend using some other test statistic as discussed in Section 5 of Gabry et al. (2019).
ppc_stat( y, yrep, stat = "mean", ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_stat_grouped( y, yrep, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_stat_freqpoly( y, yrep, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppc_stat_freqpoly_grouped( y, yrep, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppc_stat_2d(y, yrep, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7) ppc_stat_data(y, yrep, group = NULL, stat)
ppc_stat( y, yrep, stat = "mean", ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_stat_grouped( y, yrep, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppc_stat_freqpoly( y, yrep, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppc_stat_freqpoly_grouped( y, yrep, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppc_stat_2d(y, yrep, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7) ppc_stat_data(y, yrep, group = NULL, stat)
y |
A vector of observations. See Details. |
yrep |
An |
stat |
A single function or a string naming a function, except for the 2D plot which requires a vector of exactly two names or functions. In all cases the function(s) should take a vector input and return a scalar statistic. If specified as a string (or strings) then the legend will display the function name(s). If specified as a function (or functions) then generic naming is used in the legend. |
... |
Currently unused. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
size , alpha
|
For the 2D plot only, arguments passed to
|
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
ppc_stat()
, ppc_stat_freqpoly()
A histogram or frequency polygon of the distribution of a statistic
computed by applying stat
to each dataset (row) in yrep
. The value of
the statistic in the observed data, stat(y)
, is overlaid as a vertical
line. More details and example usage of ppc_stat()
can be found in Gabry
et al. (2019).
ppc_stat_grouped()
,ppc_stat_freqpoly_grouped()
The same as ppc_stat()
and ppc_stat_freqpoly()
, but a separate plot is
generated for each level of a grouping variable. More details and example
usage of ppc_stat_grouped()
can be found in Gabry et al. (2019).
ppc_stat_2d()
A scatterplot showing the joint distribution of two statistics
computed over the datasets (rows) in yrep
. The value of the
statistics in the observed data is overlaid as large point.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
y <- example_y_data() yrep <- example_yrep_draws() ppc_stat(y, yrep, stat = "median") ppc_stat(y, yrep, stat = "sd") + legend_none() # use your own function for the 'stat' argument color_scheme_set("brightblue") q25 <- function(y) quantile(y, 0.25) ppc_stat(y, yrep, stat = "q25") # legend includes function name # can define the function in the 'stat' argument instead of # using its name but then the legend doesn't include the function name ppc_stat(y, yrep, stat = function(y) quantile(y, 0.25)) # plots by group color_scheme_set("teal") group <- example_group_data() ppc_stat_grouped(y, yrep, group, stat = "median") ppc_stat_grouped(y, yrep, group, stat = "mad") + yaxis_text() # force y-axes to have same scales, allow x axis to vary ppc_stat_grouped(y, yrep, group, facet_args = list(scales = "free_x")) + yaxis_text() # the freqpoly plots use frequency polygons instead of histograms ppc_stat_freqpoly(y, yrep, stat = "median") ppc_stat_freqpoly_grouped(y, yrep, group, stat = "median", facet_args = list(nrow = 2)) # ppc_stat_2d allows 2 statistics and makes a scatterplot bayesplot_theme_set(ggplot2::theme_linedraw()) color_scheme_set("viridisE") ppc_stat_2d(y, yrep, stat = c("mean", "sd")) bayesplot_theme_set(ggplot2::theme_grey()) color_scheme_set("brewer-Paired") ppc_stat_2d(y, yrep, stat = c("median", "mad")) # reset aesthetics color_scheme_set() bayesplot_theme_set()
y <- example_y_data() yrep <- example_yrep_draws() ppc_stat(y, yrep, stat = "median") ppc_stat(y, yrep, stat = "sd") + legend_none() # use your own function for the 'stat' argument color_scheme_set("brightblue") q25 <- function(y) quantile(y, 0.25) ppc_stat(y, yrep, stat = "q25") # legend includes function name # can define the function in the 'stat' argument instead of # using its name but then the legend doesn't include the function name ppc_stat(y, yrep, stat = function(y) quantile(y, 0.25)) # plots by group color_scheme_set("teal") group <- example_group_data() ppc_stat_grouped(y, yrep, group, stat = "median") ppc_stat_grouped(y, yrep, group, stat = "mad") + yaxis_text() # force y-axes to have same scales, allow x axis to vary ppc_stat_grouped(y, yrep, group, facet_args = list(scales = "free_x")) + yaxis_text() # the freqpoly plots use frequency polygons instead of histograms ppc_stat_freqpoly(y, yrep, stat = "median") ppc_stat_freqpoly_grouped(y, yrep, group, stat = "median", facet_args = list(nrow = 2)) # ppc_stat_2d allows 2 statistics and makes a scatterplot bayesplot_theme_set(ggplot2::theme_linedraw()) color_scheme_set("viridisE") ppc_stat_2d(y, yrep, stat = c("mean", "sd")) bayesplot_theme_set(ggplot2::theme_grey()) color_scheme_set("brewer-Paired") ppc_stat_2d(y, yrep, stat = c("median", "mad")) # reset aesthetics color_scheme_set() bayesplot_theme_set()
Plot posterior or prior predictive distributions. Each of these functions
makes the same plot as the corresponding ppc_
function
but without plotting any observed data y
. The Plot Descriptions section
at PPC-distributions has details on the individual plots.
ppd_data(ypred, group = NULL) ppd_dens_overlay( ypred, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppd_ecdf_overlay( ypred, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppd_dens(ypred, ..., trim = FALSE, size = 0.5, alpha = 1) ppd_hist(ypred, ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE) ppd_freqpoly( ypred, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppd_freqpoly_grouped( ypred, group, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppd_boxplot(ypred, ..., notch = TRUE, size = 0.5, alpha = 1)
ppd_data(ypred, group = NULL) ppd_dens_overlay( ypred, ..., size = 0.25, alpha = 0.7, trim = FALSE, bw = "nrd0", adjust = 1, kernel = "gaussian", n_dens = 1024 ) ppd_ecdf_overlay( ypred, ..., discrete = FALSE, pad = TRUE, size = 0.25, alpha = 0.7 ) ppd_dens(ypred, ..., trim = FALSE, size = 0.5, alpha = 1) ppd_hist(ypred, ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE) ppd_freqpoly( ypred, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppd_freqpoly_grouped( ypred, group, ..., binwidth = NULL, bins = NULL, freq = TRUE, size = 0.5, alpha = 1 ) ppd_boxplot(ypred, ..., notch = TRUE, size = 0.5, alpha = 1)
ypred |
An |
group |
A grouping variable of the same length as |
... |
Currently unused. |
size , alpha
|
Passed to the appropriate geom to control the appearance of the predictive distributions. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens
|
Optional arguments passed to
|
discrete |
For |
pad |
A logical scalar passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
notch |
For the box plot, a logical scalar passed to
|
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Other PPDs:
PPD-intervals
,
PPD-overview
,
PPD-test-statistics
# difference between ppd_dens_overlay() and ppc_dens_overlay() color_scheme_set("brightblue") preds <- example_yrep_draws() ppd_dens_overlay(ypred = preds[1:50, ]) ppc_dens_overlay(y = example_y_data(), yrep = preds[1:50, ])
# difference between ppd_dens_overlay() and ppc_dens_overlay() color_scheme_set("brightblue") preds <- example_yrep_draws() ppd_dens_overlay(ypred = preds[1:50, ]) ppc_dens_overlay(y = example_y_data(), yrep = preds[1:50, ])
Medians and central interval estimates of posterior or prior predictive
distributions. Each of these functions makes the same plot as the
corresponding ppc_
function but without plotting any
observed data y
. The Plot Descriptions section at PPC-intervals has
details on the individual plots.
ppd_intervals( ypred, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppd_intervals_grouped( ypred, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppd_ribbon( ypred, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 ) ppd_ribbon_grouped( ypred, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 ) ppd_intervals_data( ypred, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 ) ppd_ribbon_data( ypred, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 )
ppd_intervals( ypred, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppd_intervals_grouped( ypred, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 1, fatten = 2.5, linewidth = 1 ) ppd_ribbon( ypred, x = NULL, ..., prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 ) ppd_ribbon_grouped( ypred, x = NULL, group, ..., facet_args = list(), prob = 0.5, prob_outer = 0.9, alpha = 0.33, size = 0.25 ) ppd_intervals_data( ypred, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 ) ppd_ribbon_data( ypred, x = NULL, group = NULL, ..., prob = 0.5, prob_outer = 0.9 )
ypred |
An |
x |
A numeric vector to use as the x-axis
variable. For example, |
... |
Currently unused. |
prob , prob_outer
|
Values between |
alpha , size , fatten , linewidth
|
Arguments passed to geoms. For ribbon
plots |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Other PPDs:
PPD-distributions
,
PPD-overview
,
PPD-test-statistics
color_scheme_set("brightblue") ypred <- example_yrep_draws() x <- example_x_data() group <- example_group_data() ppd_intervals(ypred[, 1:50]) ppd_intervals(ypred[, 1:50], fatten = 0) ppd_intervals(ypred[, 1:50], fatten = 0, linewidth = 2) ppd_intervals(ypred[, 1:50], prob_outer = 0.75, fatten = 0, linewidth = 2) # put a predictor variable on the x-axis ppd_intervals(ypred[, 1:100], x = x[1:100], fatten = 1) + ggplot2::labs(y = "Prediction", x = "Some variable of interest") # with a grouping variable too ppd_intervals_grouped( ypred = ypred[, 1:100], x = x[1:100], group = group[1:100], size = 2, fatten = 0, facet_args = list(nrow = 2) ) # even reducing size, ppd_intervals is too cluttered when there are many # observations included (ppd_ribbon is better) ppd_intervals(ypred, size = 0.5, fatten = 0.1, linewidth = 0.5) ppd_ribbon(ypred) ppd_ribbon(ypred, size = 0) # remove line showing median prediction
color_scheme_set("brightblue") ypred <- example_yrep_draws() x <- example_x_data() group <- example_group_data() ppd_intervals(ypred[, 1:50]) ppd_intervals(ypred[, 1:50], fatten = 0) ppd_intervals(ypred[, 1:50], fatten = 0, linewidth = 2) ppd_intervals(ypred[, 1:50], prob_outer = 0.75, fatten = 0, linewidth = 2) # put a predictor variable on the x-axis ppd_intervals(ypred[, 1:100], x = x[1:100], fatten = 1) + ggplot2::labs(y = "Prediction", x = "Some variable of interest") # with a grouping variable too ppd_intervals_grouped( ypred = ypred[, 1:100], x = x[1:100], group = group[1:100], size = 2, fatten = 0, facet_args = list(nrow = 2) ) # even reducing size, ppd_intervals is too cluttered when there are many # observations included (ppd_ribbon is better) ppd_intervals(ypred, size = 0.5, fatten = 0.1, linewidth = 0.5) ppd_ribbon(ypred) ppd_ribbon(ypred, size = 0) # remove line showing median prediction
The bayesplot PPD module provides various plotting functions for creating graphical displays of simulated data from the posterior or prior predictive distribution. These plots are essentially the same as the corresponding PPC plots but without showing any observed data. Because these are not "checks" compared to data we use PPD (for prior/posterior predictive distribution) instead of PPC (for prior/posterior predictive check).
The functions for plotting prior and
posterior predictive distributions without observed data each have the
prefix ppd_
and all have a required argument ypred
(a matrix of
predictions). The plots are organized into several categories, each with
its own documentation:
PPD-distributions: Histograms, kernel density estimates, boxplots, and
other plots of multiple simulated datasets (rows) in ypred
. These are the
same as the plots in PPC-distributions but without including any
comparison to y
.
PPD-intervals: Interval estimates for each predicted observations
(columns) in ypred
. The x-axis variable can be optionally specified by
the user (e.g. to plot against against a predictor variable or over
time).These are the same as the plots in PPC-intervals but without
including any comparison to y
.
PPD-test-statistics: The distribution of a statistic, or a pair of
statistics, over the simulated datasets (rows) in ypred
. These are the
same as the plots in PPC-test-statistics but without including any
comparison to y
.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Other PPDs:
PPD-distributions
,
PPD-intervals
,
PPD-test-statistics
The distribution of a (test) statistic T(ypred)
, or a pair of (test)
statistics, over the simulations from the posterior or prior predictive
distribution. Each of these functions makes the same plot as the
corresponding ppc_
function but without comparing to
any observed data y
. The Plot Descriptions section at
PPC-test-statistics has details on the individual plots.
ppd_stat( ypred, stat = "mean", ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppd_stat_grouped( ypred, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppd_stat_freqpoly( ypred, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppd_stat_freqpoly_grouped( ypred, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppd_stat_2d(ypred, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7) ppd_stat_data(ypred, group = NULL, stat)
ppd_stat( ypred, stat = "mean", ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppd_stat_grouped( ypred, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE ) ppd_stat_freqpoly( ypred, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppd_stat_freqpoly_grouped( ypred, group, stat = "mean", ..., facet_args = list(), binwidth = NULL, bins = NULL, freq = TRUE ) ppd_stat_2d(ypred, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7) ppd_stat_data(ypred, group = NULL, stat)
ypred |
An |
stat |
A single function or a string naming a function, except for the 2D plot which requires a vector of exactly two names or functions. In all cases the function(s) should take a vector input and return a scalar statistic. If specified as a string (or strings) then the legend will display the function name(s). If specified as a function (or functions) then generic naming is used in the legend. |
... |
Currently unused. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
size , alpha
|
For the 2D plot only, arguments passed to
|
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Other PPDs:
PPD-distributions
,
PPD-intervals
,
PPD-overview
yrep <- example_yrep_draws() ppd_stat(yrep) ppd_stat(yrep, stat = "sd") + legend_none() # use your own function for the 'stat' argument color_scheme_set("brightblue") q25 <- function(y) quantile(y, 0.25) ppd_stat(yrep, stat = "q25") # legend includes function name
yrep <- example_yrep_draws() ppd_stat(yrep) ppd_stat(yrep, stat = "sd") + legend_none() # use your own function for the 'stat' argument color_scheme_set("brightblue") q25 <- function(y) quantile(y, 0.25) ppd_stat(yrep, stat = "q25") # legend includes function name
The theme_default()
function returns the default ggplot
theme used by the bayesplot plotting functions. See
bayesplot_theme_set()
for details on setting and updating the plotting
theme.
theme_default( base_size = getOption("bayesplot.base_size", 12), base_family = getOption("bayesplot.base_family", "serif") )
theme_default( base_size = getOption("bayesplot.base_size", 12), base_family = getOption("bayesplot.base_family", "serif") )
base_size , base_family
|
Base font size and family (passed to
|
A ggplot theme object.
bayesplot_theme_set()
to change the ggplot theme.
bayesplot-colors to set or view the color scheme used for plotting.
bayesplot-helpers for a variety of convenience functions, many of which provide shortcuts for tweaking theme elements after creating a plot.
class(theme_default()) bayesplot_theme_set() # defaults to setting theme_default() x <- example_mcmc_draws() mcmc_hist(x) # change the default font size and family for bayesplots bayesplot_theme_set(theme_default(base_size = 8, base_family = "sans")) mcmc_hist(x) mcmc_areas(x, regex_pars = "beta") # change back bayesplot_theme_set() mcmc_areas(x, regex_pars = "beta")
class(theme_default()) bayesplot_theme_set() # defaults to setting theme_default() x <- example_mcmc_draws() mcmc_hist(x) # change the default font size and family for bayesplots bayesplot_theme_set(theme_default(base_size = 8, base_family = "sans")) mcmc_hist(x) mcmc_areas(x, regex_pars = "beta") # change back bayesplot_theme_set() mcmc_areas(x, regex_pars = "beta")
Parameter selection in the style of dplyr and other tidyverse packages.
param_range(prefix, range, vars = NULL) param_glue(pattern, ..., vars = NULL)
param_range(prefix, range, vars = NULL) param_glue(pattern, ..., vars = NULL)
prefix , range
|
For param_range("beta", c(1,2,8)) would select parameters named |
vars |
|