Changelog • yardstick

yardstick (development version)

yardstick 1.4.0

CRAN release: 2026-04-07

Breaking Changes

The global option yardstick.event_first (deprecated in 0.0.7) now throws an error. Use the event_level argument of individual metric functions instead. (#632)
conf_mat() now throws an error if anything is passed to ... (deprecated in 1.0.0). This argument has had no effect since case weight support was added. (#632)
roc_auc(), roc_aunp(), roc_aunu(), and roc_curve() now throw an error if a non-empty list is passed to options (deprecated in 1.0.0). Use the pROC package directly if you need these features. (#632)

Deprecations

dots_to_estimate(), metric_summarizer(), and metric_vec_template() (soft-deprecated in 1.2.0) now warn for all users. See the yardstick 1.2.0 release notes for recommended replacements. (#632)

New Metrics

gini_coef() computes the normalized Gini coefficient for regression, which measures ranking ability based on the Lorenz curve. (#147)
mse() computes the mean squared error. (#560)
rmse_relative() computes the relative root mean squared error, normalizing RMSE by the range of the true values. (#527)
fall_out() and miss_rate() compute the false positive rate and false negative rate respectively. (#336)
markedness() computes the markedness metric (PPV + NPV - 1), the predictive power analog of j_index(). (#27)
roc_dist() computes the Euclidean distance from the (sensitivity, specificity) point to the ideal point (1, 1) in ROC space. (#148)
sedi() computes the Symmetric Extremal Dependence Index, a prevalence-independent skill metric for binary classification that remains reliable at extreme class imbalance. (#630)
ranked_prob_score() computes the ranked probability score for ordinal classification data. (#524)
weighted_interval_score() is a new quantile metric. (#569)

Improvements

Added checks to all metrics for the na_rm argument. (#349)
Added improved argument checking for metrics with additional arguments. (#519)
Added documentation pages for each metric type (e.g., ?class-metrics, ?numeric-metrics) listing all available metrics with their direction and range. (#547, #540)
All class metrics and probability metrics now include mathematical formulas in their documentation. (#605)
All metrics now document their valid range of output values. (#572)
Documentation pages for metrics with alternate argument values now emphasize using metric_tweak() when building metric sets. (#626)
Survival metrics performance has been improved. (#576)
brier_class() has gained the event_level argument. (#515)
get_metrics() has been added to return a metric_set() containing all metrics of a specified type. (#534)
metric_set() now provides a more informative error message when estimate is not explicitly named for class/prob or survival metric sets. (#504)
roc_curve() has gained a thresholds argument for specifying custom thresholds at which the curve is evaluated. (#488)

Bug Fixes

brier_class() no longer returns NaN with extreme value case weights. (#614)
classification_cost() documentation now correctly refers to the cost column of the costs data frame. (#343)
mpe() documentation now includes the formula and clarifies the interpretation of positive and negative values. (#345)
poisson_log_loss() now handles 0-valued estimates without returning Inf or NaN. (#513)
Fixed a bug where ranked probability metrics didn’t work in combination with other classification metrics in metric_set(). (#539)
Fixed documentation to show equations correctly. (#541)

Developer

Added infrastructure for quantile metrics. (#569)
Added infrastructure for survival metrics on the linear predictor. (#551)
new_metric() and related functions gain an optional range argument to store the valid output range of a metric. (#572)
Removed crayon as a suggested package. (#574)

yardstick 1.3.2

CRAN release: 2025-01-22

All messages, warnings and errors has been translated to use {cli} package (#517, #522).

yardstick 1.3.1

CRAN release: 2024-03-21

Bug Fixes

Bug was fixed in roc_curve_survival() where wrong weights were used. (#495, @asb2111).
Output of roc_curve_survival() now returns columns in same order as roc_curve(). (#498)

yardstick 1.3.0

CRAN release: 2024-01-19

New Metrics

The Brier score for survival data was added with brier_survival().
The Integrated Brier score for survival data was added with brier_survival_integrated().
The Concordance index for survival data was added with concordance_survival().
Time-Dependent ROC curves estimation for right-censored data can now be calculated with roc_curve_survival().
Time-Dependent ROC AUC estimation for right-censored data can now be calculated with roc_auc_survival().

Improvements

demographic_parity(), equalized_odds(), and equal_opportunity() are new metrics for measuring model fairness. Each is implemented with the new_groupwise_metric() constructor, a general interface for defining group-aware metrics that allows for quickly and flexibly defining fairness metrics with the problem context in mind.
metric_set() can now be used with a combination of dynamic and static survival metrics.
Added a print method for metrics and metric sets (#435).
All warnings and errors have been updated to use the cli package for increased clarity and consistency. (#456, #457, #458)
brier_survival_integrated() now throws an error if input data only includes 1 evalution time point. (#460)
Clarifying documentation about how event_level always default to "first. (#432)

Bug Fixes

Metrics now throw more informative error if estimate argument is wrongly used. (#443)

Breaking Changes

Curve metrics now throw an informative error instead of returning NA when missing values are found and na_rm = FALSE. (#344)

yardstick 1.2.0

CRAN release: 2023-04-21

New Metrics

The Brier score for classification was added with brier_class() (#139).

Improvements

The global option, yardstick.event_first, has been hard deprecated in favor of using explicit argument, event_level. Setting this option will now produce an warning, but won’t have any effect. (#173)
Removed start-up message about event_level argument.
yardstick metric functions now use a pure tidyselect interface for truth, estimate, and the ... of class probability metrics (#322).
Changed the default aspect ratio for PR curves to be 1.0.
Error messages now show what user-facing function was called (#348).
classification and probability metrics now fully support class_pred objects from {probably} package (#341).

Bug Fixes

Using metric_set() on a metric created with metric_tweak() will no longer produces an error, and will favor arguments set with metric_tweak() (#351).
Metric summarizers no longer error if column names in data conflict with argument names (#382).
conf_mat() no longer throw errors listed as internal (#327).

Developer

metric_vec_template() is being soft deprecated in favor of a more manual and flexible metric creation approach. yardstick_remove_missing() and yardstick_any_missing() have been added for treatment of missing values. check_class_metric(), check_numeric_metric(), and check_prob_metric() have been added to perform standardized input checking for classification, regression and class probability metrics respectively. These changes mean that it is the developer’s responsibility to perform validation of truth and estimate input. (#337).
metric_summarizer() is being soft deprecated in favor of the more specific newly added class_metric_summarizer(), numeric_metric_summarizer(), prob_metric_summarizer(), and curve_metric_summarizer() (#322).
dots_to_estimate() is being soft deprecated along with metric_summarizer(). dots_to_estimate() is not needed with prob_metric_summarizer(), and curve_metric_summarizer() (#329).

yardstick 1.1.0

CRAN release: 2022-09-07

Emil Hvitfeldt is now the maintainer (#315).
Improved on the chained error thrown by metric_set() when one of the metric computations fails (#313).

yardstick 1.0.0

CRAN release: 2022-06-06

All yardstick metrics now support case weights through the new case_weights argument. This also includes metric-adjacent functions like roc_curve(), pr_curve(), conf_mat(), and metric_set().
The options argument of roc_curve(), roc_auc(), roc_aunp(), roc_aunu(), and metrics() that was passed along to the pROC package is now deprecated and no longer has any affect. This is a result of changing to an ROC curve implementation that supports case weights, but does not support any of the previous options. If you need these options, we suggest wrapping pROC yourself in a custom metric (#296).
conf_mat() now ignores any inputs passed through ... and warns if you try to do such a thing. Previously, those were passed on to base::table(), but with the addition of case weight support, table() is no longer used (#295).
Fixed a small mistake in ccc() where the unbiased covariance wasn’t being used when bias = FALSE.
j_index() now throws a more correct warning if 0 is in the denominator when computing sens() internally. Additionally, in the multiclass case it now removes the levels where this occurs from the multiclass weighted average computation, which is consistent with how other metrics were updated to handle this in #118 (#265).
Improved on some possible ambiguity in the documentation of the data argument for all metrics (#255).
purrr has been removed from Suggests.
The pROC package has been removed as a dependency (#300).
Moved the Custom Metrics vignette to tidymodels.org (#236).

yardstick 0.0.9

CRAN release: 2021-11-22

New metric poisson_log_loss() was added (#146).
sensitivity() and specificity() now work correctly with the tune and workflowsets packages (#232).
roc_curve() now throws a more informative error if truth doesn’t have any control or event observations.
dplyr 1.0.0 is now required. This allowed us to remove multiple usages of dplyr::do() in favor of dplyr::summarise(), which can now return packed data frame columns and multiple rows per group.
Removed internal hardcoding of "dplyr_error" to avoid issues with an upcoming dplyr 1.0.8 release (#244).
Updated test suite to testthat 3e (#243).
Internal upkeep has been done to move from rlang::warn(.subclass = ) to rlang::warn(class = ), since the .subclass argument has been deprecated (#225).

yardstick 0.0.8

CRAN release: 2021-03-28

New metric_tweak() for adjusting the default values of optional arguments in an existing yardstick metric. This is useful to quickly adjust the defaults of a metric that will be included in a metric_set(), especially if that metric set is going to be used for tuning with the tune package (#206, #182).
New classification_cost() metric for computing the cost of a poor class probability prediction using user-defined costs (#3).
New msd() for computing the mean signed deviation (also called mean signed difference, or mean signed error) (#183, @datenzauberai).
class_pred objects from the probably package are now supported, and are automatically converted to factors before computing any metric. Note that this means that any equivocal values are materialized as NA (#198).
The kap() metric has a new weighting argument to apply linear or quadratic weightings before computing the kappa value (#2, #125, @jonthegeek).
When sens() is undefined when computing ppv(), npv(), j_index(), or bal_accuracy(), a sensitivity warning is now correctly thrown, rather than a recall warning (#101).
The autoplot() method for gain curves now plots the curve line on top of the shaded polygon, resulting in a sharper look for the line itself (#192, @eddjberry).
The autoplot() methods for conf_mat now respect user-defined dimension names added through conf_mat(dnn = ) or from converting a table with dimension names to a conf_mat (#191).
Added an as_tibble() method for metric_set objects. Printing a metric_set now uses this to print out a tibble rather than a data frame (#186).
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here (#204).

yardstick 0.0.7

CRAN release: 2020-07-13

The global option, yardstick.event_first, has been deprecated in favor of the new explicit argument, event_level. All metric functions that previously supported changing the “event” level have gained this new argument. The global option was a historical design decision that can be classified as a case of a hidden argument. Existing code that relied on this global option will continue to work in this version of yardstick, however you will now get a once-per-session warning that requests that you update to instead use the explicit event_level argument. The global option will be completely removed in a future version. To update, follow the guide below (#163).
```
`options(yardstick.event_first = TRUE)`  -> `event_level = "first"` (the default)
`options(yardstick.event_first = FALSE)` -> `event_level = "second"`
```
The roc_auc() Hand-Till multiclass estimator will now ignore levels in truth that occur zero times in the actual data. With other methods of multiclass averaging, this usually returns an NA, however, ignoring levels in this manner is more consistent with implementations in the HandTill2001 and pROC packages (#123).
roc_auc() and roc_curve() now set direction = "<" when computing the ROC curve using pROC::roc(). Results were being computed incorrectly with direction = "auto" when most probability values were predicting the wrong class (#123).
mn_log_loss() now respects the (deprecated) global option yardstick.event_first. However, you should instead change the relevant event level through the event_level argument.
metric_set() now strips the package name when auto-labeling functions (@rorynolan, #151).
There are three new helper functions for more easily creating custom metric functions: new_class_metric(), new_prob_metric(), and new_numeric_metric().
Rcpp has been removed as a direct dependency.

yardstick 0.0.6

CRAN release: 2020-03-17

roc_auc() now warns when there are no events or controls in the provided truth column, and returns NA (@dpastling, #132).
Adds sensitivity() and specificity() as aliases for sens() and spec() respectively, avoids conflict with other packages e.g. readr::spec().
roc_aunu() and roc_aunp() are two new ROC AUC metrics for multiclass classifiers. These measure the AUC of each class against the rest, roc_aunu() using the uniform class distribution (#69) and roc_aunp() using the a priori class distribution (#70).

yardstick 0.0.5

CRAN release: 2020-01-23

Other improvements

The autoplot() heat map for confusion matrices now places the predicted values on the x axis and the truth values on the y axis to be more consistent with the confusion matrix print() method.
The autoplot() mosaic plot for confusion matrices had the x and y axis labels backwards. This has been corrected.

yardstick 0.0.4

CRAN release: 2019-08-26

New metrics and functionality

iic() is a new numeric metric for computing the index of ideality of correlation. It can be seen as a potential alternative to the traditional correlation coefficient, and has been used in QSAR models (@jyuu, #115).
average_precision() is a new probability metric that can be used as an alternative to pr_auc(). It has the benefit of avoiding any issues of ambiguity in the case where recall == 0 and the current number of false positives is 0.

Other improvements

metric_set() output now includes a metrics attribute which contains a list of the original metric functions used to generate the metric set.
Each metric function now has a direction attribute attached to it, specifying whether to minimize or maximize the metric.
Classification metrics that can potentially have a 0 value denominator now throw an informative warning when this case occurs. These include recall(), precision(), sens(), and spec() (#98).
The autoplot() method for pr_curve() has been improved to always set the axis limits to c(0, 1).
All valid arguments to pROC::roc() are now utilized, including those passed on to pROC::auc().
Documentation for class probability metrics has been improved with more informative examples (@rudeboybert, #100).

Bug fixes

mn_log_loss() now uses the min/max rule before computing the log of the estimated probabilities to avoid problematic undefined log values (#103).
pr_curve() now places a 1 as the first precision value, rather than NA. While NA is technically correct as precision is undefined here, 1 is practically more correct because it generates a correct PR Curve graph and, more importantly, allows pr_auc() to compute the correct AUC.
pr_curve() could generate the wrong results in the somewhat rare case when two class probability estimates were the same, but had different truth values.
pr_curve() (and subsequently pr_auc()) now generates the correct curve when there are duplicate class probability values (reported by @dariyasydykova, #93).
Binary mcc() now avoids integer overflow when the confusion matrix elements are large (#108).

yardstick 0.0.3

CRAN release: 2019-03-08

New metrics and functionality

mase() is a numeric metric for the mean absolute scaled error. It is generally useful when forecasting with time series (@alexhallam, #68).
huber_loss() is a numeric metric that is less sensitive to outliers than rmse(), but is more sensitive than mae() for small errors (@blairj09, #71).
huber_loss_pseudo() is a smoothed form of huber_loss() (@blairj09, #71).
smape() is a numeric metric that is based on percentage errors (@riazhedayati, #67).
conf_mat objects now have two ggplot2::autoplot() methods for easy visualization of the confusion matrix as either a heat map or a mosaic plot (@EmilHvitfeldt, #10).

Other improvements

metric_set() now returns a classed function. If numeric metrics are used, a "numeric_metric_set" function is returned. If class or probability metrics are used, a "class_prob_metric_set" is returned.

Bug fixes

Tests related to the fixed R 3.6 sample() function have been fixed.
f_meas() propagates NA values from precision() and recall() correctly (#77).
All "micro" estimators now propagate NA values through correctly.
roc_auc(estimator = "hand_till") now correctly computes the metric when the column names of the probability matrix are not the exact same as the levels of truth. Note that the computation still assumes that the order of the supplied probability matrix columns still matches the order of levels(truth), like other multiclass metrics (#86).

yardstick 0.0.2

CRAN release: 2018-11-05

Breaking changes

A desire to standardize the yardstick API is what drove these breaking changes. The output of each metric is now in line with tidy principles, returning a tibble rather than a single numeric. Additionally, all metrics now have a standard argument list so you should be able to switch between metrics and combine them together effortlessly.

All metrics now return a tibble rather than a single numeric value. This format allows metrics to work with grouped data frames (for resamples). It also allows you to bundle multiple metrics together with a new function, metric_set().
For all class probability metrics, now only 1 column can be passed to ... when a binary implementation is used. Those metrics will no longer select only the first column when multiple columns are supplied, and will instead throw an error.
The summary() method for conf_mat objects now returns a tibble to be consistent with the change to the metric functions.
For naming consistency, mnLogLoss() was renamed to mn_log_loss()
mn_log_loss() now returns the negative log loss for the multinomial distribution.
The argument na.rm has been changed to na_rm in all metrics to align with the tidymodels model implementation principles.

Core features

Each metric now has a vector interface to go alongside the data frame interface. All vector functions end in _vec(). The vector interface accepts vector/matrix inputs and returns a single numeric value.
Multiclass support has been added for each classification metric. The support varies from one metric to the next, but generally macro and micro averaging is available for all metrics, with some metrics having specialized multiclass implementations (for example, roc_auc() supports the multiclass generalization presented in a paper by Hand and Till). For more information, see vignette("multiclass", "yardstick").
All metrics now work with grouped data frames. This produces a tibble with as many rows as there are groups, and is useful when used alongside resampling techniques.

New metrics and functionality

mape() calculates the mean absolute percent error.
kap() is a metric similar to accuracy() that calculates Cohen’s kappa.
detection_prevalence() calculates the number of predicted positive events relative to the total number of predictions.
bal_accuracy() calculates balanced accuracy as the average of sensitivity and specificity.
roc_curve() calculates receiver operator curves and returns the results as a tibble.
pr_curve() calculates precision recall curves.
gain_curve() and lift_curve() calculate the information used in gain and lift curves.
gain_capture() is a measure of performance similar in spirit to AUC but applied to a gain curve.
pr_curve(), roc_curve(), gain_curve() and lift_curve() all have ggplot2::autoplot() methods for easy visualization.
metric_set() constructs functions that calculate multiple metrics at once.

Other improvements

The infrastructure for creating metrics has been exposed to allow users to extend yardstick to work with their own metrics. You might want to do this if you want your metrics to work with grouped data frames out of the box, or if you want the standardization and error checking that yardstick already provides. See vignette("custom-metrics", "yardstick") for a few examples.
A vignette describing the three classes of metrics used in yardstick has been added. It also includes a list of every metric available, grouped by class. See vignette("metric-types", "yardstick").
The error messages in yardstick should now be much more informative, with better feedback about the types of input that each metric can use and about what kinds of metrics can be used together (i.e. in metric_set()).
There is now a grouped_df method for conf_mat() that returns a tibble with a list column of conf_mat objects.
Each metric now has its own help page. This allows us to better document the nuances of each metric without cluttering the help pages of other metrics.

Dependencies

broom has been removed from Depends, and is replaced by generics in Suggests.
tidyr and ggplot2 have been moved to Suggests.
MLmetrics has been removed as a dependency.

yardstick 0.0.1

CRAN release: 2017-11-12

First CRAN release