Calculates a cross-tabulation of observed and predicted classes.
conf_mat(data, ...) # S3 method for data.frame conf_mat(data, truth, estimate, dnn = c("Prediction", "Truth"), ...) # S3 method for conf_mat tidy(x, ...) autoplot.conf_mat(object, type = "mosaic", ...)
data | A data frame or a |
---|---|
... | Options to pass to |
truth | The column identifier for the true class results
(that is a |
estimate | The column identifier for the predicted class
results (that is also |
dnn | A character vector of dimnames for the table. |
x | A |
object | The |
type | Type of plot desired, must be |
conf_mat()
produces an object with class conf_mat
. This contains the
table and other objects. tidy.conf_mat()
generates a tibble with columns
name
(the cell identifier) and value
(the cell count).
When used on a grouped data frame, conf_mat()
returns a tibble containing
columns for the groups along with conf_mat
, a list-column
where each element is a conf_mat
object.
For conf_mat()
objects, a broom
tidy()
method has been created
that collapses the cell counts by cell into a data frame for
easy manipulation.
There is also a summary()
method that computes various classification
metrics at once. See summary.conf_mat()
There is a ggplot2::autoplot()
method for quickly visualizing the matrix. Both a heatmap and mosaic type
is implemented.
The function requires that the factors have exactly the same levels.
summary.conf_mat()
for computing a large number of metrics from one
confusion matrix.
library(dplyr) data("hpc_cv") # The confusion matrix from a single assessment set (i.e. fold) cm <- hpc_cv %>% filter(Resample == "Fold01") %>% conf_mat(obs, pred) cm#> Truth #> Prediction VF F M L #> VF 166 33 8 1 #> F 11 71 24 7 #> M 0 3 5 3 #> L 0 1 4 10# Now compute the average confusion matrix across all folds in # terms of the proportion of the data contained in each cell. # First get the raw cell counts per fold using the `tidy` method library(purrr) library(tidyr) cells_per_resample <- hpc_cv %>% group_by(Resample) %>% conf_mat(obs, pred) %>% mutate(tidied = map(conf_mat, tidy)) %>% unnest(tidied) # Get the totals per resample counts_per_resample <- hpc_cv %>% group_by(Resample) %>% summarize(total = n()) %>% left_join(cells_per_resample, by = "Resample") %>% # Compute the proportions mutate(prop = value/total) %>% group_by(name) %>% # Average summarize(prop = mean(prop))#>#>counts_per_resample#> # A tibble: 16 x 2 #> name prop #> <chr> <dbl> #> 1 cell_1_1 0.467 #> 2 cell_1_2 0.107 #> 3 cell_1_3 0.0185 #> 4 cell_1_4 0.00259 #> 5 cell_2_1 0.0407 #> 6 cell_2_2 0.187 #> 7 cell_2_3 0.0632 #> 8 cell_2_4 0.0173 #> 9 cell_3_1 0.00173 #> 10 cell_3_2 0.00692 #> 11 cell_3_3 0.0228 #> 12 cell_3_4 0.00807 #> 13 cell_4_1 0.000575 #> 14 cell_4_2 0.0104 #> 15 cell_4_3 0.0144 #> 16 cell_4_4 0.0320# Now reshape these into a matrix mean_cmat <- matrix(counts_per_resample$prop, byrow = TRUE, ncol = 4) rownames(mean_cmat) <- levels(hpc_cv$obs) colnames(mean_cmat) <- levels(hpc_cv$obs) round(mean_cmat, 3)#> VF F M L #> VF 0.467 0.107 0.018 0.003 #> F 0.041 0.187 0.063 0.017 #> M 0.002 0.007 0.023 0.008 #> L 0.001 0.010 0.014 0.032# The confusion matrix can quickly be visualized using autoplot() library(ggplot2) autoplot(cm, type = "mosaic")