Calculate the coefficient of determination using correlation. For the
traditional measure of R squared, see `rsq_trad()`

.

## Usage

```
rsq(data, ...)
# S3 method for data.frame
rsq(data, truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
rsq_vec(truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
```

## Arguments

- data
A

`data.frame`

containing the columns specified by the`truth`

and`estimate`

arguments.- ...
Not currently used.

- truth
The column identifier for the true results (that is

`numeric`

). This should be an unquoted column name although this argument is passed by expression and supports quasiquotation (you can unquote column names). For`_vec()`

functions, a`numeric`

vector.- estimate
The column identifier for the predicted results (that is also

`numeric`

). As with`truth`

this can be specified different ways but the primary method is to use an unquoted variable name. For`_vec()`

functions, a`numeric`

vector.- na_rm
A

`logical`

value indicating whether`NA`

values should be stripped before the computation proceeds.- case_weights
The optional column identifier for case weights. This should be an unquoted column name that evaluates to a numeric column in

`data`

. For`_vec()`

functions, a numeric vector,`hardhat::importance_weights()`

, or`hardhat::frequency_weights()`

.

## Value

A `tibble`

with columns `.metric`

, `.estimator`

,
and `.estimate`

and 1 row of values.

For grouped data frames, the number of rows returned will be the same as the number of groups.

For `rsq_vec()`

, a single `numeric`

value (or `NA`

).

## Details

The two estimates for the
coefficient of determination, `rsq()`

and `rsq_trad()`

, differ by
their formula. The former guarantees a value on (0, 1) while the
latter can generate inaccurate values when the model is
non-informative (see the examples). Both are measures of
consistency/correlation and not of accuracy.

`rsq()`

is simply the squared correlation between `truth`

and `estimate`

.

Because `rsq()`

internally computes a correlation, if either `truth`

or
`estimate`

are constant it can result in a divide by zero error. In these
cases, a warning is thrown and `NA`

is returned. This can occur when a model
predicts a single value for all samples. For example, a regularized model
that eliminates all predictors except for the intercept would do this.
Another example would be a CART model that contains no splits.

## References

Kvalseth. Cautionary note about \(R^2\). American Statistician (1985) vol. 39 (4) pp. 279-285.

## Examples

```
# Supply truth and predictions as bare column names
rsq(solubility_test, solubility, prediction)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rsq standard 0.879
library(dplyr)
set.seed(1234)
size <- 100
times <- 10
# create 10 resamples
solubility_resampled <- bind_rows(
replicate(
n = times,
expr = sample_n(solubility_test, size, replace = TRUE),
simplify = FALSE
),
.id = "resample"
)
# Compute the metric by group
metric_results <- solubility_resampled %>%
group_by(resample) %>%
rsq(solubility, prediction)
metric_results
#> # A tibble: 10 × 4
#> resample .metric .estimator .estimate
#> <chr> <chr> <chr> <dbl>
#> 1 1 rsq standard 0.874
#> 2 10 rsq standard 0.879
#> 3 2 rsq standard 0.891
#> 4 3 rsq standard 0.916
#> 5 4 rsq standard 0.892
#> 6 5 rsq standard 0.858
#> 7 6 rsq standard 0.873
#> 8 7 rsq standard 0.852
#> 9 8 rsq standard 0.915
#> 10 9 rsq standard 0.884
# Resampled mean estimate
metric_results %>%
summarise(avg_estimate = mean(.estimate))
#> # A tibble: 1 × 1
#> avg_estimate
#> <dbl>
#> 1 0.883
# With uninformitive data, the traditional version of R^2 can return
# negative values.
set.seed(2291)
solubility_test$randomized <- sample(solubility_test$prediction)
rsq(solubility_test, solubility, randomized)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rsq standard 0.00199
rsq_trad(solubility_test, solubility, randomized)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rsq_trad standard -1.01
# A constant `truth` or `estimate` vector results in a warning from
# a divide by zero error in the correlation calculation.
# `NA` will be returned in these cases.
truth <- c(1, 2)
estimate <- c(1, 1)
rsq_vec(truth, estimate)
#> Warning: A correlation computation is required, but `estimate` is constant and has
#> 0 standard deviation, resulting in a divide by 0 error. `NA` will be
#> returned.
#> [1] NA
```