| Title: | Composite Scoring via Principal Component Analysis of Ridit Scores |
|---|---|
| Description: | Implements 'PRIDIT' (Principal Component Analysis applied to 'RIDITs'), an unsupervised, nonparametric method for aggregating ordinal, categorical, and continuous indicators into a single interpretable composite score. Originally proposed by Brockett et al. (2002) <doi:10.1111/1539-6975.00027> for insurance fraud detection and extended to hospital quality measurement by Lieberthal (2008) <doi:10.1111/j.1475-6773.2007.00821.x> and Lieberthal and Comer (2013) <doi:10.1111/rmir.12009>. The package provides: (1) low-level functions ridit(), PRIDITweight(), and PRIDITscore(); (2) a unified pridit() entry point returning a classed object with print, summary, 'autoplot', and 'coef' methods; (3) pridit_boot() for bootstrap confidence intervals on scores and weights; (4) a step_pridit() recipe step for out-of-sample scoring within the 'tidymodels' framework; and (5) pridit_longitudinal() for panel data, computing cross-period stability of scores and weights. |
| Authors: | Robert D. Lieberthal [aut, cre] |
| Maintainer: | Robert D. Lieberthal <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 1.1.1 |
| Built: | 2026-06-04 17:06:05 UTC |
| Source: | https://github.com/rlieberthal/pridit |
Produces a two-panel ggplot2 figure: a bar chart of the top indicator weights by magnitude (left) and a histogram of the PRIDIT score distribution (right).
## S3 method for class 'pridit' autoplot(object, top_n = 20L, ...)## S3 method for class 'pridit' autoplot(object, top_n = 20L, ...)
object |
A |
top_n |
Integer. Number of top-weighted indicators to display. Default 20. |
... |
Ignored. |
A ggplot object (invisibly).
Produces a point-and-range plot for indicator weight CIs and, if available, a ranked-score plot with error ribbons.
## S3 method for class 'pridit_boot' autoplot(object, top_n = 20L, ...)## S3 method for class 'pridit_boot' autoplot(object, top_n = 20L, ...)
object |
A |
top_n |
Integer. Number of weights to display (by absolute estimate). Default 20. |
... |
Ignored. |
A ggplot object (invisibly).
Produces two panels: (left) a heatmap of cross-period Spearman score correlations and (right) a line plot of per-indicator weight trajectories across periods.
## S3 method for class 'pridit_longitudinal' autoplot(object, top_n = 10L, ...)## S3 method for class 'pridit_longitudinal' autoplot(object, top_n = 10L, ...)
object |
A |
top_n |
Integer. Number of indicators to show in the weight trajectory panel (by mean absolute weight across periods). Default 10. |
... |
Ignored. |
A ggplot object (invisibly).
Extract PRIDIT weights
## S3 method for class 'pridit' coef(object, ...)## S3 method for class 'pridit' coef(object, ...)
object |
A |
... |
Ignored. |
Named numeric vector of PRIDIT weights.
The pridit package provides functions for implementing the PRIDIT (Principal Component Analysis applied to RIDITs) scoring system.
A single entry-point that runs the full PRIDIT pipeline—ridit scoring,
weight estimation, and composite scoring—and returns a classed object with
print, summary, autoplot, and coef methods.
pridit(data, sign_correction = TRUE)pridit(data, sign_correction = TRUE)
data |
A data frame. The first column is treated as the observation identifier; all remaining columns must be numeric indicators. |
sign_correction |
Logical (default |
PRIDIT (Principal Component Analysis applied to RIDITs) was introduced by Brockett et al. (2002) for insurance fraud detection and applied to hospital quality measurement by Lieberthal (2008). Its key properties are:
No parametric assumptions about the data-generating process.
No prior knowledge of indicator direction is required; weight signs are determined entirely by the data.
Each indicator weight is interpretable as its contribution to the dominant latent factor.
An object of class "pridit", a list with components:
scoresData frame with columns id and
PRIDITscore, sorted descending.
weightsNamed numeric vector of PRIDIT weights.
eigenvalueLargest eigenvalue of the ridit cross-product matrix (used for score normalisation).
eigenvalue_ratioRatio of the first to the second eigenvalue; large values support the single-factor interpretation.
nNumber of observations.
pNumber of indicators.
callMatched call.
Maintainer: Robert D. Lieberthal [email protected]
Authors:
Robert D. Lieberthal [email protected]
Brockett, P. L., Derrig, R. A., Golden, L. L., Levine, A., & Alpert, M. (2002). Fraud classification using principal component analysis of RIDITs. Journal of Risk and Insurance, 69(3), 341–371.
Lieberthal, R. D. (2008). Hospital quality: A PRIDIT approach. Health Services Research, 43(3), 988–1005.
Lieberthal, R. D., & Comer, D. M. (2013). What are the characteristics that explain hospital quality? A longitudinal PRIDIT approach. Risk Management and Insurance Review, 17(1), 17–35.
Useful links:
pridit_boot, pridit_longitudinal,
step_pridit
dat <- data.frame( id = letters[1:10], x1 = runif(10), x2 = runif(10), x3 = runif(10) ) fit <- pridit(dat) fit summary(fit)dat <- data.frame( id = letters[1:10], x1 = runif(10), x2 = runif(10), x3 = runif(10) ) fit <- pridit(dat) fit summary(fit)
Resamples observations with replacement B times, refitting the full
PRIDIT pipeline on each resample. Returns percentile confidence intervals
for every indicator weight and, optionally, for every observation's score.
pridit_boot(fit, data, B = 500L, conf_level = 0.95, scores = TRUE, seed = NULL)pridit_boot(fit, data, B = 500L, conf_level = 0.95, scores = TRUE, seed = NULL)
fit |
A |
data |
The same data frame that was passed to |
B |
Integer. Number of bootstrap replicates. Default 500. |
conf_level |
Numeric in (0, 1). Coverage probability. Default 0.95. |
scores |
Logical. If |
seed |
Optional integer random seed for reproducibility. |
Because PCA sign is arbitrary, each bootstrap replicate's weight vector is aligned to the original fit before aggregation: if the Pearson correlation between the replicate weights and the original weights is negative, the replicate is sign-flipped.
An object of class "pridit_boot", a list with components:
weights_ciData frame with columns indicator,
estimate, lower, upper.
scores_ciData frame with columns id,
estimate, lower, upper (or NULL if
scores = FALSE).
BNumber of replicates used.
conf_levelCoverage probability.
callMatched call.
dat <- data.frame( id = letters[1:30], x1 = runif(30), x2 = runif(30), x3 = runif(30) ) fit <- pridit(dat) boot <- pridit_boot(fit, dat, B = 100, seed = 42) bootdat <- data.frame( id = letters[1:30], x1 = runif(30), x2 = runif(30), x3 = runif(30) ) fit <- pridit(dat) boot <- pridit_boot(fit, dat, B = 100, seed = 42) boot
Fits a separate PRIDIT model for each time period in a panel data set and summarises the stability of scores and weights across periods. The analysis follows Lieberthal & Comer (2013), who demonstrated that PRIDIT weights computed on one year's Hospital Compare data predict out-of-period outcomes in the following year, with cross-year weight correlations exceeding 0.99.
pridit_longitudinal( data, id_col, time_col, indicator_cols = NULL, sign_correction = TRUE )pridit_longitudinal( data, id_col, time_col, indicator_cols = NULL, sign_correction = TRUE )
data |
A data frame in long format containing columns identified by
|
id_col |
Character. Name of the observation identifier column. |
time_col |
Character. Name of the time-period column. Periods are
processed in the order returned by |
indicator_cols |
Character vector of indicator column names to include.
If |
sign_correction |
Logical. Passed to |
Because the PCA sign is arbitrary, each period's weight vector is aligned to the first period before computing cross-period correlations: if the Pearson correlation between a replicate's weights and the first period's weights is negative, the replicate is sign-flipped.
Cross-period score correlations are computed only for the balanced panel (observations present in all periods).
An object of class "pridit_longitudinal", a list with:
fitsNamed list of "pridit" objects, one per period.
weight_corsSymmetric matrix of Pearson correlations between period weight vectors.
score_corsSymmetric matrix of Spearman rank correlations between period scores on the balanced panel.
scores_wideData frame of scores in wide format (one column per period) for the balanced panel.
weights_longData frame of weights in long format with
columns period, indicator, weight.
periodsSorted vector of period labels.
n_balancedNumber of observations in the balanced panel.
callMatched call.
Lieberthal, R. D., & Comer, D. M. (2013). What are the characteristics that explain hospital quality? A longitudinal PRIDIT approach. Risk Management and Insurance Review, 17(1), 17–35.
pridit, autoplot.pridit_longitudinal
set.seed(1) dat <- data.frame( id = rep(letters[1:20], times = 3), year = rep(2020:2022, each = 20), x1 = runif(60), x2 = runif(60), x3 = runif(60) ) fit_long <- pridit_longitudinal(dat, id_col = "id", time_col = "year") fit_longset.seed(1) dat <- data.frame( id = rep(letters[1:20], times = 3), year = rep(2020:2022, each = 20), x1 = runif(60), x2 = runif(60), x3 = runif(60) ) fit_long <- pridit_longitudinal(dat, id_col = "id", time_col = "year") fit_long
Applies a vector of PRIDIT weights to a ridit-scored data frame and returns
a composite score in for each observation. The score is
normalised by the largest eigenvalue so that the mean score is zero by
construction.
PRIDITscore(ridit_data, id_vector, weight_vec)PRIDITscore(ridit_data, id_vector, weight_vec)
ridit_data |
A data frame returned by |
id_vector |
A vector of observation identifiers (same length and order
as the rows of |
weight_vec |
A named numeric vector of PRIDIT weights returned by
|
A data frame with columns id and PRIDITscore.
dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) rs <- ridit(dat) wts <- PRIDITweight(rs) PRIDITscore(rs, dat$id, wts)dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) rs <- ridit(dat) wts <- PRIDITweight(rs) PRIDITscore(rs, dat$id, wts)
Computes the PRIDIT weight vector from a ridit-scored data frame. Weights
are the loadings of the first principal component of the ridit matrix,
scaled by the column norms of that matrix. The sign of the weight vector
is arbitrary (a property of PCA); pass the result to pridit
rather than using this function directly if automatic sign correction is
desired.
PRIDITweight(ridit_data)PRIDITweight(ridit_data)
ridit_data |
A data frame returned by |
A named numeric vector of PRIDIT weights, one per indicator column.
dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) rs <- ridit(dat) PRIDITweight(rs)dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) rs <- ridit(dat) PRIDITweight(rs)
Transforms a data frame of numeric indicators into ridit scores on the
interval using the empirical cumulative distribution of each
column across the reference population. A score of zero indicates a value
exactly at the median; positive scores indicate above-median values.
ridit(data)ridit(data)
data |
A data frame whose first column is an ID and whose remaining columns are numeric indicators. |
The ridit score for observation on indicator is
where is the empirical CDF of column and
is a small constant that makes the lower CDF strictly left-continuous.
This formulation is robust to ties and requires no parametric assumptions.
Categorical indicators should be expanded into binary dummy columns before
calling ridit(); each dummy then receives its own ridit transformation
and PRIDIT weight, with sign determined by the data rather than by the
analyst.
A data frame of the same shape as data with numeric columns
replaced by their ridit scores. The ID column is preserved as-is.
Bross, I. D. J. (1958). How to use ridit analysis. Biometrics, 14(1), 18–38.
Brockett, P. L., Derrig, R. A., Golden, L. L., Levine, A., & Alpert, M. (2002). Fraud classification using principal component analysis of RIDITs. Journal of Risk and Insurance, 69(3), 341–371.
dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) ridit(dat)dat <- data.frame( id = c("A", "B", "C", "D", "E"), x1 = c(0.90, 0.85, 0.89, 1.00, 0.89), x2 = c(0.99, 0.92, 0.90, 1.00, 0.93) ) ridit(dat)
Creates a recipes preprocessing step that fits a PRIDIT model on the
training data and appends a single composite score column to any data set
passed to bake(). This enables genuine out-of-sample scoring:
the empirical CDFs used for ridit transformation and the PCA weights are
estimated on the training fold only and then applied to the test fold without
re-fitting.
step_pridit( recipe, ..., role = "predictor", trained = FALSE, score_name = "PRIDIT_score", sign_correction = TRUE, ecdfs = NULL, weights = NULL, max_eigval = NULL, col_norms = NULL, skip = FALSE, id = recipes::rand_id("pridit") )step_pridit( recipe, ..., role = "predictor", trained = FALSE, score_name = "PRIDIT_score", sign_correction = TRUE, ecdfs = NULL, weights = NULL, max_eigval = NULL, col_norms = NULL, skip = FALSE, id = recipes::rand_id("pridit") )
recipe |
A |
... |
One or more selector expressions passed to
|
role |
For the new score column: passed to |
trained |
Logical. Set automatically by |
score_name |
Character. Name of the new score column.
Default |
sign_correction |
Logical. Passed to |
ecdfs |
Internal. Stored empirical CDFs from training. |
weights |
Internal. Stored PRIDIT weight vector from training. |
max_eigval |
Internal. Stored largest eigenvalue from training. |
col_norms |
Internal. Stored column norms from training. |
skip |
Logical. If |
id |
Character. Unique step identifier. |
All selected columns must be numeric. The step does not remove the original
columns; use step_rm() afterwards if a clean feature set is required.
An updated recipe.
## Not run: library(recipes) dat <- data.frame( id = letters[1:50], x1 = runif(50), x2 = runif(50), x3 = runif(50) ) rec <- recipe(~ ., data = dat) |> update_role(id, new_role = "id") |> step_pridit(x1, x2, x3) prepped <- prep(rec, training = dat) bake(prepped, new_data = dat) ## End(Not run)## Not run: library(recipes) dat <- data.frame( id = letters[1:50], x1 = runif(50), x2 = runif(50), x3 = runif(50) ) rec <- recipe(~ ., data = dat) |> update_role(id, new_role = "id") |> step_pridit(x1, x2, x3) prepped <- prep(rec, training = dat) bake(prepped, new_data = dat) ## End(Not run)
A sample dataset containing health quality metrics for 5 healthcare providers, used to demonstrate the PRIDIT scoring methodology.
testtest
A data frame with 5 rows and 4 variables:
Character. Unique identifier for each healthcare provider (A through E)
Numeric. Smoking cessation counseling rate (0.85-1.0)
Numeric. ACE inhibitor prescription rate (0.90-1.0)
Numeric. Proper antibiotic usage rate (0.98-1.0)
Synthetic data created for package examples
data(test) head(test) # Calculate PRIDIT scores ridit_scores <- ridit(test) weights <- PRIDITweight(ridit_scores) final_scores <- PRIDITscore(ridit_scores, test$ID, weights)data(test) head(test) # Calculate PRIDIT scores ridit_scores <- ridit(test) weights <- PRIDITweight(ridit_scores) final_scores <- PRIDITscore(ridit_scores, test$ID, weights)