Matthews correlation coefficient

## Usage

```
mcc(data, ...)
# S3 method for data.frame
mcc(data, truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
mcc_vec(truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
```

## Arguments

- data
Either a

`data.frame`

containing the columns specified by the`truth`

and`estimate`

arguments, or a`table`

/`matrix`

where the true class results should be in the columns of the table.- ...
Not currently used.

- truth
The column identifier for the true class results (that is a

`factor`

). This should be an unquoted column name although this argument is passed by expression and supports quasiquotation (you can unquote column names). For`_vec()`

functions, a`factor`

vector.- estimate
The column identifier for the predicted class results (that is also

`factor`

). As with`truth`

this can be specified different ways but the primary method is to use an unquoted variable name. For`_vec()`

functions, a`factor`

vector.- na_rm
A

`logical`

value indicating whether`NA`

values should be stripped before the computation proceeds.- case_weights
The optional column identifier for case weights. This should be an unquoted column name that evaluates to a numeric column in

`data`

. For`_vec()`

functions, a numeric vector.

## Value

A `tibble`

with columns `.metric`

, `.estimator`

,
and `.estimate`

and 1 row of values.

For grouped data frames, the number of rows returned will be the same as the number of groups.

For `mcc_vec()`

, a single `numeric`

value (or `NA`

).

## Relevant Level

There is no common convention on which factor level should
automatically be considered the "event" or "positive" result
when computing binary classification metrics. In `yardstick`

, the default
is to use the *first* level. To alter this, change the argument
`event_level`

to `"second"`

to consider the *last* level of the factor the
level of interest. For multiclass extensions involving one-vs-all
comparisons (such as macro averaging), this option is ignored and
the "one" level is always the relevant result.

## Multiclass

`mcc()`

has a known multiclass generalization and that is computed
automatically if a factor with more than 2 levels is provided. Because
of this, no averaging methods are provided.

## References

Giuseppe, J. (2012). "A Comparison of MCC and CEN Error
Measures in Multi-Class Prediction". *PLOS ONE*. Vol 7, Iss 8, e41882.

## Examples

```
library(dplyr)
data("two_class_example")
data("hpc_cv")
# Two class
mcc(two_class_example, truth, predicted)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 mcc binary 0.677
# Multiclass
# mcc() has a natural multiclass extension
hpc_cv %>%
filter(Resample == "Fold01") %>%
mcc(obs, pred)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 mcc multiclass 0.542
# Groups are respected
hpc_cv %>%
group_by(Resample) %>%
mcc(obs, pred)
#> # A tibble: 10 × 4
#> Resample .metric .estimator .estimate
#> <chr> <chr> <chr> <dbl>
#> 1 Fold01 mcc multiclass 0.542
#> 2 Fold02 mcc multiclass 0.521
#> 3 Fold03 mcc multiclass 0.602
#> 4 Fold04 mcc multiclass 0.519
#> 5 Fold05 mcc multiclass 0.520
#> 6 Fold06 mcc multiclass 0.494
#> 7 Fold07 mcc multiclass 0.461
#> 8 Fold08 mcc multiclass 0.538
#> 9 Fold09 mcc multiclass 0.459
#> 10 Fold10 mcc multiclass 0.498
```