| Title: | Resampling-Based Predictive Validation for Generalized Linear and Generalized Additive Models |
|---|---|
| Description: | Provides resampling-based predictive validation for generalized linear and generalized additive models (with or without random effects) fitted using packages such as glmmTMB, mgcv, stats, lme4, and MASS. Predictive performance is assessed using either repeated random holdout (Monte Carlo cross-validation) or bootstrap resampling with out-of-bag evaluation. In each replicate, models are refit to a training dataset and evaluated on separate testing data, generating sampling distributions of in-sample and out-of-sample performance statistics. For continuous or integer response models, supported metrics include relative root mean squared error (RRMSE), relative mean absolute error (RMAE), relative median absolute error (RMedAE), and relative bias (RBIAS). For binary response models, supported metrics include AUC, Brier score, and log loss. All predictive metrics are based on population-level predictions, meaning random effects are excluded when present. Optional residual diagnostics can also be performed using the DHARMa package. |
| Authors: | Colin Shea [aut, cre] (ORCID: <https://orcid.org/0000-0002-2906-250X>) |
| Maintainer: | Colin Shea <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.7 |
| Built: | 2026-05-19 19:06:23 UTC |
| Source: | https://github.com/colinpshea/GLAMMGoF |
Assess in- and out-of-sample predictive performance of generalized linear and generalized additive models with continuous or integer response variables and with or without random effects and zero-inflation, using either repeated random holdout (Monte Carlo cross-validation) or bootstrap resampling with out-of-bag evaluation. Four performance statistics are reported: relative root mean squared error (RRMSE), relative Mean Absolute Error (RMAE), relative median absolute error (RMedAE), and relative bias (RBIAS). Note that all performance measures are based on population-level predictions (i.e., random effects are ignored). For models with a zero-inflation component, predictions account for zero-inflation (e.g., for glmmTMB, the predicted value represents the product of the mean_count and (1 - prob_zero)).
All three accuracy statistics (RRMSE, RMAE, and RMedAE) express prediction error as a percentage of the mean observed value, which makes them interpretable on a common scale regardless of the units or magnitude of the response. To recover the corresponding raw error in the original units of the response, multiply any of these values by the observed mean and divide by 100. For example, an RRMSE of 18% with a sample mean of 50 implies a root mean squared error of 9 in the original units.
RRMSE (Relative Root Mean Squared Error): The square root of average squared prediction errors, expressed as a percentage of the mean observed value. Because errors are squared before averaging, RRMSE penalizes large individual errors more heavily than small ones, making it sensitive to cases where the model produces occasional extreme mispredictions.
RRMSE = sqrt(mean((predicted - observed)^2))/mean(observed)*100
RMAE (Relative Mean Absolute Error): The average absolute prediction error expressed as a percentage of the mean observed value. RMAE treats all errors proportionally to their size without the extra weight that squaring applies, and is often the most intuitive summary of typical prediction accuracy.
RMAE = mean(abs((predicted - observed)))/mean(observed)*100
RMedAE (Relative Median Absolute Error): The median absolute prediction error expressed as a percentage of the mean observed value. Because it is based on the median rather than the mean of the error distribution, RMedAE is the most robust of the three to outlying predictions and gives the best picture of accuracy for a typical observation when the error distribution is skewed.
RMedAE = median(abs((predicted - observed)))/mean(observed)*100
Taken together, these three statistics tell a more complete story than any one alone. If RRMSE is notably larger than RMAE, that signals a handful of high-leverage mispredictions are inflating the squared-error average, which is a pattern RMedAE will often fail to reflect if the bulk of predictions are accurate. Conversely, close agreement among all three suggests errors are roughly symmetric and there are no extreme outliers driving the summary.
RBIAS (Relative Bias): The mean signed prediction error expressed as a percentage of the mean observed value, where positive values indicate systematic over-prediction and negative values indicate systematic under-prediction. RBIAS is independent of the accuracy metrics above: a model can be nearly unbiased on average yet still produce large errors, or it can be highly biased while still ranking observations correctly. Reporting RBIAS alongside RRMSE and RMAE therefore distinguishes random prediction noise from systematic directional error.
RBIAS = mean((predicted - observed))/mean(observed)*100
bias_precision( nReps = 100, testModel = NULL, testData = NULL, propTrain = 0.8, DHARMaPlot = TRUE, testZI = TRUE, DHARMaReps = 1000, seed = NULL, method = c("holdout", "bootstrap") )bias_precision( nReps = 100, testModel = NULL, testData = NULL, propTrain = 0.8, DHARMaPlot = TRUE, testZI = TRUE, DHARMaReps = 1000, seed = NULL, method = c("holdout", "bootstrap") )
nReps |
Desired number of bootstrap or Monte Carlo replicates. The default value is 100, but this number should be at least 1000 in practice. |
testModel |
A regression model fit to testData in glmmTMB (with or without random effects), glmer/glmer.nb/lmer (with random effects), glm/glm.nb/lm (without random effects), or gam (with or without random effects). The response variable can be continuous or an integer, and possible statistical distributions include Poisson, negative binomial, gamma, tweedie, and gaussian. |
testData |
A data frame with a continuous or integer response variable and continuous and/or categorical predictors. |
propTrain |
The proportion of |
DHARMaPlot |
Do you want to return a goodness-of-fit plot from the simulateResiduals() function of the DHARMa package? The default is TRUE. |
testZI |
Logical. If |
DHARMaReps |
You can also specify DHARMaReps if you want something other than the default of 1000 simulation replicates. |
seed |
Optional integer seed for reproducibility. If NULL (the default), no seed is set and results will differ across runs. |
method |
The resampling method to use. The default, |
This function returns either three, four, or five objects depending on the values of DHARMaPlot and testZI: a data frame with all bootstrapping or Monte Carlo resampling results (i.e., all nReps values for each performance statistic), a data frame with a summary (mean and 95% confidence intervals) of all replicates for each performance statistic, and a histogram of values for each performance statistic. If DHARMaPlot = TRUE, a fourth object is also returned: a goodness-of-fit plot based on scaled residuals from simulateResiduals(). If testZI = TRUE and DHARMaPlot = TRUE, a fifth object dharmaZI is also returned containing the result of testZeroInflation(). In the histogram, a blue dotted vertical line indicates the mean across replicates.
This package contains an example data set for fitting a negative binomial or Poisson regression called countData. Six example negative binomial regression model objects are also included: countModel_GLM is a GLM with no random effects; countModel_GLMM is a GLMM with one random effect; countModel_GLMM2 is a GLMM with two random effects; countModel_GAM is a GAM with no random effects; countModel_GAMM is a GAMM with one random effect; and countModel_GAMM2 is a GAMM with two random effects. GLMs and GLMMs were fitted using glmmTMB, whereas GAMs and GAMMs were fitted using mgcv:
countModel_GLM <- glmmTMB(y ~ Season + Temp, family = nbinom2, data = countData)
countModel_GLMM <- glmmTMB(y ~ Season + Temp + (1|Site), family = nbinom2, data = countData)
countModel_GLMM2 <- glmmTMB(y ~ Season + Temp + (1|Site) + (1|Year), family = nbinom2, data = countData)
countModel_GAM <- gam(y ~ Season + s(Temp), family = nb, data = countData)
countModel_GAMM <- gam(y ~ Season + s(Temp) + s(Site, bs = "re"), family = nb, data = countData)
countModel_GAMM2 <- gam(y ~ Season + s(Temp) + s(Site, bs = "re") + s(Year, bs = "re"), family = nb, data = countData)
Bootstrapping or Monte Carlo resampling of the performance statistics requires specifying the data and model being tested, the desired number of replicates (the default is 100 but should be at least 1000 in practice), the proportion of data used for training when method = "holdout" (the default is 0.8), whether to use DHARMa residual diagnostics (the default is TRUE), whether to use DHARMa to test for zero-inflation (the default is TRUE), the number of DHARMa simulation replicates (the default is 1000), and an optional integer seed for reproducibility, and the resampling method holdout or bootstrap:
bias_precision(nReps = 100, testModel = countModel_GLMM, testData = countData, propTrain = 0.8, DHARMaPlot = TRUE, testZI = TRUE, DHARMaReps = 1000, seed = 123, method = "holdout")
This function does not currently support binomial models with cbind() or proportion responses, and for binary 0/1 responses, use brier_auc(). This function also supports models with spatial random effects (e.g, in glmmTMB), but it is much slower than for more conventional GLM(M)s and GAM(M)s.
Hyndman, R.J. and Koehler, A.B. (2006) Another look at measures of forecast accuracy. International Journal of Forecasting, 22, 679–688.
Assess in- and out-of-sample predictive performance of generalized linear and generalized additive models with binary response variables and with or without random effects, using either repeated random holdout (Monte Carlo cross-validation) or bootstrap resampling with out-of-bag evaluation. Three performance statistics are reported: Brier scores (see the rms package documentation for details), which range from 0 to 1 with values closer to 0 indicating a better-predicting model and where sqrt(Brier score) is the average difference between the predicted probability and the observed value (0 or 1); AUC, an aggregated metric that evaluates how well a model classifies positive and negative outcomes at all possible probability cutoffs, ranging from 0 to 1 with values closer to 1 indicating a better classifier and where an AUC of 0.5 suggests performance no better than random guessing; and log loss (cross-entropy), which penalizes confident wrong predictions logarithmically and is particularly sensitive to miscalibration at the extremes of the predicted probability distribution, with lower values indicating better performance and no upper bound. Note that all performance measures are based on population-level predictions (i.e., random effects are ignored when present).
The three binary metrics each capture a different aspect of predictive performance and are most informative when interpreted together.
AUC (Area Under the ROC Curve): The probability that a randomly chosen positive case receives a higher predicted probability than a randomly chosen negative case, ranging from 0 (no discrimination) to 1 (perfect discrimination), and where an AUC of 0.5 indicates that a model predicts no better than random chance (and AUC <0.5 is worse than random). AUC is purely a measure of discrimination (the model's ability to rank cases by their probability of the outcome) and is insensitive to how well the predicted probabilities are calibrated in absolute terms.
Brier Score: The mean squared difference between predicted probabilities and observed binary outcomes (0/1), ranging from 0 (perfect) to 1 (worst). The Brier score jointly rewards good discrimination and good calibration, penalizing confident but wrong predictions more heavily than uncertain ones. A model can have high AUC but a poor Brier score if its predicted probabilities are systematically too high or too low even while correctly ordering cases.
Log Loss (Binary Cross-Entropy): The negative mean log-likelihood of the observed outcomes under the predicted probabilities, where lower values indicate better performance. Like the Brier score, log loss rewards calibration, but its logarithmic scale means it penalizes extreme, confident mispredictions far more aggressively. Log loss is therefore the most sensitive of the three to cases where the model assigns near-zero probability to an outcome that actually occurs.
Reporting all three together is recommended: AUC reflects whether the model correctly separates cases from non-cases; Brier score provides an overall summary of probability accuracy on a squared-error scale; and log loss highlights whether the model is making any dangerously overconfident errors that the Brier score might underweight.
brier_auc( nReps = 100, testModel = NULL, testData = NULL, propTrain = 0.8, DHARMaPlot = TRUE, DHARMaReps = 1000, seed = NULL, method = c("holdout", "bootstrap") )brier_auc( nReps = 100, testModel = NULL, testData = NULL, propTrain = 0.8, DHARMaPlot = TRUE, DHARMaReps = 1000, seed = NULL, method = c("holdout", "bootstrap") )
nReps |
Desired number of bootstrap or Monte Carlo replicates. The default value is 100, but this number should be at least 1000 in practice. |
testModel |
A logistic regression model fitted to testData using glmmTMB (with or without random effects), glmer (with random effects), glm (without random effects), or gam (with or without random effects). |
testData |
A data frame with a binary response variable and continuous and/or categorical predictor variables. |
propTrain |
The proportion of |
DHARMaPlot |
Do you want to return a goodness-of-fit plot from the simulateResiduals() function of the DHARMa package? The default is TRUE. |
DHARMaReps |
If DHARMaPlot is TRUE, you can also specify DHARMaReps if you want something other than the default of 1000 simulation replicates. |
seed |
Optional integer seed for reproducibility. If |
method |
The resampling method to use. The default, |
This function returns four objects: a data frame with all of the bootstrapping or Monte Carlo resampling results (i.e., all nReps values for each performance statistic), a data frame with a summary (mean and 95% confidence intervals) of all replicates for each performance statistic, a histogram of values for each performance statistic, and a goodness-of-fit plot based on scaled residuals from the simulateResiduals() function of the DHARMa package. If DHARMaPlot = FALSE, then simulateResiduals() is not used to assess the model residuals and only three of the four objects are returned. This function also returns the null model (i.e., intercept-only) log loss, Brier score, and AUC statistic for reference (i.e., the best predictive performance achievable without any predictors), which allows for comparison to the summarized log loss, Brier, and AUC metrics. The null log loss is computed as -(p_bar * log(p_bar) + (1 - p_bar) * log(1 - p_bar)), where p_bar is the observed event rate. The null Brier score is computed as p_bar * (1 - p_bar). The null AUC is 0.5, the expected value for a model with no discriminatory ability, equivalent to random classification. The null log loss and Brier score are based on the arithmetic mean of the response, which approximates but is not identical to plogis(intercept) from a fitted intercept-only logistic model; the difference is negligible in practice and avoids the overhead of model fitting. In the histogram, a blue dotted vertical line indicates the mean of each metric across replicates, and a red dashed vertical line indicates the null model baseline for each metric: null log loss, null Brier score, and AUC = 0.5.
This package contains an example data set to fit a logistic regression called logitData. Six example logistic regression model objects are also included: logitModel_GLM is a GLM with no random effects; logitModel_GLMM is a GLMM with one random effect; logitModel_GLMM2 is a GLMM with two random effects; logitModel_GAM is a GAM with no random effects; logitModel_GAMM is a GAMM with one random effect; and logitModel_GAMM2 is a GAMM with two random effects. GLMs and GLMMs were fitted using glmmTMB, whereas GAMs and GAMMs were fitted using mgcv:
logitModel_GLM <- glmmTMB(y ~ Season + Temp, family = binomial, data = logitData)
logitModel_GLMM <- glmmTMB(y ~ Season + Temp + (1|Site), family = binomial, data = logitData)
logitModel_GLMM2 <- glmmTMB(y ~ Season + Temp + (1|Site) + (1|Year), family = binomial, data = logitData)
logitModel_GAM <- gam(y ~ Season + s(Temp), family = binomial, data = logitData)
logitModel_GAMM <- gam(y ~ Season + s(Temp) + s(Site, bs = "re"), family = binomial, data = logitData)
logitModel_GAMM2 <- gam(y ~ Season + s(Temp) + s(Site, bs = "re") + s(Year, bs = "re"), family = binomial, data = logitData)
Bootstrapping or Monte Carlo resampling of the performance statistics requires specifying the data and model being tested, the desired number of replicates (the default is 100 but should be at least 1000 in practice), the resampling method holdout or bootstrap, the proportion of data used for training when method = "holdout" (the default is 0.8), whether to use DHARMa residual diagnostics (the default is TRUE), the number of DHARMa simulation replicates (the default is 1000), and an optional integer seed for reproducibility:
brier_auc(nReps = 100, testModel = logitModel_GLMM, testData = logitData, propTrain = 0.8, DHARMaPlot = TRUE, DHARMaReps = 1000, seed = 123, method = "holdout")
This function only supports binary 0/1 responses and does not currently support binomial models with cbind() or proportion responses. This function also supports models with spatial random effects (e.g, in glmmTMB), but it is much slower than for more conventional GLM(M)s and GAM(M)s.
A simulated dataset with an integer count response and predictor variables for demonstrating the bias_precision function.
countDatacountData
A data frame with 1000 rows and 5 variables:
Integer count response variable
Factor with 4 levels: Autumn, Spring, Summer, Winter
Continuous temperature predictor
Factor with 10 levels representing sampling sites
Factor with 8 levels representing sampling years (2015-2022)
A mgcv GAM fit to countData with fixed effects only. Formula: y ~ Season + s(Temp)
countModel_GAMcountModel_GAM
A gam model object
A mgcv GAM fit to countData with one random effect. Formula: y ~ Season + s(Temp) + s(Site, bs = "re")
countModel_GAMMcountModel_GAMM
A gam model object
A mgcv GAM fit to countData with two random effects. Formula: y ~ Season + s(Temp) + s(Site, bs = "re") + s(Year, bs = "re")
countModel_GAMM2countModel_GAMM2
A gam model object
A glmmTMB model fit to countData with fixed effects only. Formula: y ~ Season + Temp
countModel_GLMcountModel_GLM
A glmmTMB model object
A glmmTMB model fit to countData with one random effect. Formula: y ~ Season + Temp + (1 | Site)
countModel_GLMMcountModel_GLMM
A glmmTMB model object
A glmmTMB model fit to countData with two random effects. Formula: y ~ Season + Temp + (1 | Site) + (1 | Year)
countModel_GLMM2countModel_GLMM2
A glmmTMB model object
A simulated dataset with a binary response and predictor variables for demonstrating the brier_auc function.
logitDatalogitData
A data frame with 1000 rows and 5 variables:
Binary response variable (0 or 1)
Factor with 4 levels: Autumn, Spring, Summer, Winter
Continuous temperature predictor
Factor with 10 levels representing sampling sites
Factor with 8 levels representing sampling years (2015-2022)
A mgcv GAM fit to logitData with fixed effects only. Formula: y ~ Season + s(Temp)
logitModel_GAMlogitModel_GAM
A gam model object
A mgcv GAM fit to logitData with one random effect. Formula: y ~ Season + s(Temp) + s(Site, bs = "re")
logitModel_GAMMlogitModel_GAMM
A gam model object
A mgcv GAM fit to logitData with two random effects. Formula: y ~ Season + s(Temp) + s(Site, bs = "re") + s(Year, bs = "re")
logitModel_GAMM2logitModel_GAMM2
A gam model object
A glmmTMB model fit to logitData with fixed effects only. Formula: y ~ Season + Temp
logitModel_GLMlogitModel_GLM
A glmmTMB model object
A glmmTMB model fit to logitData with one random effect. Formula: y ~ Season + Temp + (1 | Site)
logitModel_GLMMlogitModel_GLMM
A glmmTMB model object
A glmmTMB model fit to logitData with two random effects. Formula: y ~ Season + Temp + (1 | Site) + (1 | Year)
logitModel_GLMM2logitModel_GLMM2
A glmmTMB model object