Calculate the coefficient of determination using correlation. For the
traditional measure of R squared, see `rsq_trad()`

.

rsq(data, ...) # S3 method for data.frame rsq(data, truth, estimate, na_rm = TRUE, ...) rsq_vec(truth, estimate, na_rm = TRUE, ...)

data | A |
---|---|

... | Not currently used. |

truth | The column identifier for the true results
(that is |

estimate | The column identifier for the predicted
results (that is also |

na_rm | A |

A `tibble`

with columns `.metric`

, `.estimator`

,
and `.estimate`

and 1 row of values.

For grouped data frames, the number of rows returned will be the same as the number of groups.

For `rsq_vec()`

, a single `numeric`

value (or `NA`

).

The two estimates for the
coefficient of determination, `rsq()`

and `rsq_trad()`

, differ by
their formula. The former guarantees a value on (0, 1) while the
latter can generate inaccurate values when the model is
non-informative (see the examples). Both are measures of
consistency/correlation and not of accuracy.

`rsq()`

is simply the squared correlation between `truth`

and `estimate`

.

Because `rsq()`

internally computes a correlation, if either `truth`

or
`estimate`

are constant it can result in a divide by zero error. In these
cases, a warning is thrown and `NA`

is returned. This can occur when a model
predicts a single value for all samples. For example, a regularized model
that eliminates all predictors except for the intercept would do this.
Another example would be a CART model that contains no splits.

Kvalseth. Cautionary note about \(R^2\). American Statistician (1985) vol. 39 (4) pp. 279-285.

Other numeric metrics:
`ccc()`

,
`huber_loss_pseudo()`

,
`huber_loss()`

,
`iic()`

,
`mae()`

,
`mape()`

,
`mase()`

,
`mpe()`

,
`rmse()`

,
`rpd()`

,
`rpiq()`

,
`rsq_trad()`

,
`smape()`

Max Kuhn

# Supply truth and predictions as bare column names rsq(solubility_test, solubility, prediction)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 rsq standard 0.879library(dplyr) set.seed(1234) size <- 100 times <- 10 # create 10 resamples solubility_resampled <- bind_rows( replicate( n = times, expr = sample_n(solubility_test, size, replace = TRUE), simplify = FALSE ), .id = "resample" ) # Compute the metric by group metric_results <- solubility_resampled %>% group_by(resample) %>% rsq(solubility, prediction) metric_results#> # A tibble: 10 x 4 #> resample .metric .estimator .estimate #> <chr> <chr> <chr> <dbl> #> 1 1 rsq standard 0.874 #> 2 10 rsq standard 0.879 #> 3 2 rsq standard 0.891 #> 4 3 rsq standard 0.916 #> 5 4 rsq standard 0.892 #> 6 5 rsq standard 0.858 #> 7 6 rsq standard 0.873 #> 8 7 rsq standard 0.852 #> 9 8 rsq standard 0.915 #> 10 9 rsq standard 0.884#> # A tibble: 1 x 1 #> avg_estimate #> <dbl> #> 1 0.883# With uninformitive data, the traditional version of R^2 can return # negative values. set.seed(2291) solubility_test$randomized <- sample(solubility_test$prediction) rsq(solubility_test, solubility, randomized)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 rsq standard 0.00199#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 rsq_trad standard -1.01# A constant `truth` or `estimate` vector results in a warning from # a divide by zero error in the correlation calculation. # `NA` will be returned in these cases. truth <- c(1, 2) estimate <- c(1, 1) rsq_vec(truth, estimate)#> Warning: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be returned.#> [1] NA