Calculate the mean percentage error. This metric is in relative
units. It can be used as a measure of the estimate's bias.
Note that if any truth values are 0, a value of:
-Inf (estimate > 0), Inf (estimate < 0), or NaN (estimate == 0)
is returned for mpe().
Usage
mpe(data, ...)
# S3 method for class 'data.frame'
mpe(data, truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
mpe_vec(truth, estimate, na_rm = TRUE, case_weights = NULL, ...)Arguments
- data
- A - data.framecontaining the columns specified by the- truthand- estimatearguments.
- ...
- Not currently used. 
- truth
- The column identifier for the true results (that is - numeric). This should be an unquoted column name although this argument is passed by expression and supports quasiquotation (you can unquote column names). For- _vec()functions, a- numericvector.
- estimate
- The column identifier for the predicted results (that is also - numeric). As with- truththis can be specified different ways but the primary method is to use an unquoted variable name. For- _vec()functions, a- numericvector.
- na_rm
- A - logicalvalue indicating whether- NAvalues should be stripped before the computation proceeds.
- case_weights
- The optional column identifier for case weights. This should be an unquoted column name that evaluates to a numeric column in - data. For- _vec()functions, a numeric vector,- hardhat::importance_weights(), or- hardhat::frequency_weights().
Value
A tibble with columns .metric, .estimator,
and .estimate and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For mpe_vec(), a single numeric value (or NA).
See also
Other numeric metrics:
ccc(),
huber_loss(),
huber_loss_pseudo(),
iic(),
mae(),
mape(),
mase(),
msd(),
poisson_log_loss(),
rmse(),
rpd(),
rpiq(),
rsq(),
rsq_trad(),
smape()
Other accuracy metrics:
ccc(),
huber_loss(),
huber_loss_pseudo(),
iic(),
mae(),
mape(),
mase(),
msd(),
poisson_log_loss(),
rmse(),
smape()
Examples
# `solubility_test$solubility` has zero values with corresponding
# `$prediction` values that are negative. By definition, this causes `Inf`
# to be returned from `mpe()`.
solubility_test[solubility_test$solubility == 0, ]
#>     solubility prediction
#> 17           0 -0.1532030
#> 220          0 -0.3876578
mpe(solubility_test, solubility, prediction)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 mpe     standard         Inf
# We'll remove the zero values for demonstration
solubility_test <- solubility_test[solubility_test$solubility != 0, ]
# Supply truth and predictions as bare column names
mpe(solubility_test, solubility, prediction)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 mpe     standard        16.1
library(dplyr)
set.seed(1234)
size <- 100
times <- 10
# create 10 resamples
solubility_resampled <- bind_rows(
  replicate(
    n = times,
    expr = sample_n(solubility_test, size, replace = TRUE),
    simplify = FALSE
  ),
  .id = "resample"
)
# Compute the metric by group
metric_results <- solubility_resampled %>%
  group_by(resample) %>%
  mpe(solubility, prediction)
metric_results
#> # A tibble: 10 × 4
#>    resample .metric .estimator .estimate
#>    <chr>    <chr>   <chr>          <dbl>
#>  1 1        mpe     standard     -56.2  
#>  2 10       mpe     standard      50.4  
#>  3 2        mpe     standard     -27.9  
#>  4 3        mpe     standard       0.470
#>  5 4        mpe     standard      -0.836
#>  6 5        mpe     standard     -35.3  
#>  7 6        mpe     standard       7.51 
#>  8 7        mpe     standard     -34.5  
#>  9 8        mpe     standard       7.87 
#> 10 9        mpe     standard      14.7  
# Resampled mean estimate
metric_results %>%
  summarise(avg_estimate = mean(.estimate))
#> # A tibble: 1 × 1
#>   avg_estimate
#>          <dbl>
#> 1        -7.38
