These functions calculate the `ppv()`

(positive predictive value) of a
measurement system compared to a reference result (the "truth" or gold standard).
Highly related functions are `spec()`

, `sens()`

, and `npv()`

.

ppv(data, ...) # S3 method for data.frame ppv( data, truth, estimate, prevalence = NULL, estimator = NULL, na_rm = TRUE, ... ) ppv_vec( truth, estimate, prevalence = NULL, estimator = NULL, na_rm = TRUE, ... )

data | Either a |
---|---|

... | Not currently used. |

truth | The column identifier for the true class results
(that is a |

estimate | The column identifier for the predicted class
results (that is also |

prevalence | A numeric value for the rate of the "positive" class of the data. |

estimator | One of: |

na_rm | A |

A `tibble`

with columns `.metric`

, `.estimator`

,
and `.estimate`

and 1 row of values.

For grouped data frames, the number of rows returned will be the same as the number of groups.

For `ppv_vec()`

, a single `numeric`

value (or `NA`

).

The positive predictive value (`ppv()`

) is defined as the percent of
predicted positives that are actually positive while the
negative predictive value (`npv()`

) is defined as the percent of negative
positives that are actually negative.

There is no common convention on which factor level should
automatically be considered the "event" or "positive" result.
In `yardstick`

, the default is to use the *first* level. To
change this, a global option called `yardstick.event_first`

is
set to `TRUE`

when the package is loaded. This can be changed
to `FALSE`

if the *last* level of the factor is considered the
level of interest by running: `options(yardstick.event_first = FALSE)`

.
For multiclass extensions involving one-vs-all
comparisons (such as macro averaging), this option is ignored and
the "one" level is always the relevant result.

Macro, micro, and macro-weighted averaging is available for this metric.
The default is to select macro averaging if a `truth`

factor with more
than 2 levels is provided. Otherwise, a standard binary calculation is done.
See `vignette("multiclass", "yardstick")`

for more information.

Suppose a 2x2 table with notation:

Reference | ||

Predicted | Positive | Negative |

Positive | A | B |

Negative | C | D |

The formulas used here are:

$$Sensitivity = A/(A+C)$$ $$Specificity = D/(B+D)$$ $$Prevalence = (A+C)/(A+B+C+D)$$ $$PPV = (Sensitivity * Prevalence) / ((Sensitivity * Prevalence) + ((1-Specificity) * (1-Prevalence)))$$ $$NPV = (Specificity * (1-Prevalence)) / (((1-Sensitivity) * Prevalence) + ((Specificity) * (1-Prevalence)))$$

See the references for discussions of the statistics.

Altman, D.G., Bland, J.M. (1994) ``Diagnostic tests 2:
predictive values,'' *British Medical Journal*, vol 309,
102.

Other class metrics:
`accuracy()`

,
`bal_accuracy()`

,
`detection_prevalence()`

,
`f_meas()`

,
`j_index()`

,
`kap()`

,
`mcc()`

,
`npv()`

,
`precision()`

,
`recall()`

,
`sens()`

,
`spec()`

#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 0.819#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv macro 0.637#> # A tibble: 10 x 4 #> Resample .metric .estimator .estimate #> <chr> <chr> <chr> <dbl> #> 1 Fold01 ppv macro 0.637 #> 2 Fold02 ppv macro 0.603 #> 3 Fold03 ppv macro 0.706 #> 4 Fold04 ppv macro 0.658 #> 5 Fold05 ppv macro 0.651 #> 6 Fold06 ppv macro 0.626 #> 7 Fold07 ppv macro 0.562 #> 8 Fold08 ppv macro 0.652 #> 9 Fold09 ppv macro 0.605 #> 10 Fold10 ppv macro 0.625# Weighted macro averaging hpc_cv %>% group_by(Resample) %>% ppv(obs, pred, estimator = "macro_weighted")#> # A tibble: 10 x 4 #> Resample .metric .estimator .estimate #> <chr> <chr> <chr> <dbl> #> 1 Fold01 ppv macro_weighted 0.697 #> 2 Fold02 ppv macro_weighted 0.690 #> 3 Fold03 ppv macro_weighted 0.752 #> 4 Fold04 ppv macro_weighted 0.690 #> 5 Fold05 ppv macro_weighted 0.705 #> 6 Fold06 ppv macro_weighted 0.682 #> 7 Fold07 ppv macro_weighted 0.649 #> 8 Fold08 ppv macro_weighted 0.702 #> 9 Fold09 ppv macro_weighted 0.661 #> 10 Fold10 ppv macro_weighted 0.683# Vector version ppv_vec(two_class_example$truth, two_class_example$predicted)#> [1] 0.8194946# Making Class2 the "relevant" level options(yardstick.event_first = FALSE) ppv_vec(two_class_example$truth, two_class_example$predicted)#> [1] 0.8609865options(yardstick.event_first = TRUE) # But what if we think that Class 1 only occurs 40% of the time? ppv(two_class_example, truth, predicted, prevalence = 0.40)#> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 0.740