classification_cost()
calculates the cost of a poor prediction based on
userdefined costs. The costs are multiplied by the estimated class
probabilities and the mean cost is returned.
classification_cost(data, ...) # S3 method for data.frame classification_cost( data, truth, ..., costs = NULL, na_rm = TRUE, event_level = yardstick_event_level() ) classification_cost_vec( truth, estimate, costs = NULL, na_rm = TRUE, event_level = yardstick_event_level(), ... )
data  A 

...  A set of unquoted column names or one or more

truth  The column identifier for the true class results
(that is a 
costs  A data frame with columns
It is often the case that when If any combinations of the levels of If 
na_rm  A 
event_level  A single string. Either 
estimate  If 
A tibble
with columns .metric
, .estimator
,
and .estimate
and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For class_cost_vec()
, a single numeric
value (or NA
).
As an example, suppose that there are three classes: "A"
, "B"
, and "C"
.
Suppose there is a truly "A"
observation with class probabilities A = 0.3 / B = 0.3 / C = 0.4
. Suppose that, when the true result is class "A"
, the
costs for each class were A = 0 / B = 5 / C = 10
, penalizing the
probability of incorrectly predicting "C"
more than predicting "B"
. The
cost for this prediction would be 0.3 * 0 + 0.3 * 5 + 0.4 * 10
. This
calculation is done for each sample and the individual costs are averaged.
Other class probability metrics:
average_precision()
,
gain_capture()
,
mn_log_loss()
,
pr_auc()
,
roc_auc()
,
roc_aunp()
,
roc_aunu()
Max Kuhn
library(dplyr) #  # Two class example data(two_class_example) # Assuming `Class1` is our "event", this penalizes false positives heavily costs1 < tribble( ~truth, ~estimate, ~cost, "Class1", "Class2", 1, "Class2", "Class1", 2 ) # Assuming `Class1` is our "event", this penalizes false negatives heavily costs2 < tribble( ~truth, ~estimate, ~cost, "Class1", "Class2", 2, "Class2", "Class1", 1 ) classification_cost(two_class_example, truth, Class1, costs = costs1)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 classification_cost binary 0.288classification_cost(two_class_example, truth, Class1, costs = costs2)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 classification_cost binary 0.260#  # Multiclass data(hpc_cv) # Define cost matrix from Kuhn and Johnson (2013) hpc_costs < tribble( ~estimate, ~truth, ~cost, "VF", "VF", 0, "VF", "F", 1, "VF", "M", 5, "VF", "L", 10, "F", "VF", 1, "F", "F", 0, "F", "M", 5, "F", "L", 5, "M", "VF", 1, "M", "F", 1, "M", "M", 0, "M", "L", 1, "L", "VF", 1, "L", "F", 1, "L", "M", 1, "L", "L", 0 ) # You can use the col1:colN tidyselect syntax hpc_cv %>% filter(Resample == "Fold01") %>% classification_cost(obs, VF:L, costs = hpc_costs)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 classification_cost multiclass 0.779# Groups are respected hpc_cv %>% group_by(Resample) %>% classification_cost(obs, VF:L, costs = hpc_costs)#> # A tibble: 10 x 4 #> Resample .metric .estimator .estimate #> <chr> <chr> <chr> <dbl> #> 1 Fold01 classification_cost multiclass 0.779 #> 2 Fold02 classification_cost multiclass 0.735 #> 3 Fold03 classification_cost multiclass 0.654 #> 4 Fold04 classification_cost multiclass 0.754 #> 5 Fold05 classification_cost multiclass 0.777 #> 6 Fold06 classification_cost multiclass 0.737 #> 7 Fold07 classification_cost multiclass 0.743 #> 8 Fold08 classification_cost multiclass 0.749 #> 9 Fold09 classification_cost multiclass 0.760 #> 10 Fold10 classification_cost multiclass 0.771