Principles of RefineR

RefineR is the newest among the latest generation of algorithms for calculating indirect reference intervals. The approach is more objective than Bhattacharya or Hoffmann analysis because it does not require the analyst to manually select the portion of the data to analyse (i.e. the linear portion of the Bhattagram for Hoffmann plot).

Another advantage is that the method does not require the distribution of results from healthy individuals to be Gaussian (or transformed to Gaussian prior to analysis). Instead, the algorithm itself performs Box-Cox transformation if necessary. The algorithm is designed to find the lambda (Box-Cox transformation factor), mean and standard deviation that best fit results from healthy individuals. To do this, the algorithm uses complex calculations over multiple steps, the details of which are available.¹ An overview is provided here.

Data pre-processing

The algorithm itself excludes outliers, saving the analyst a pre-processing step and standardising analyses. Outliers are defined as values beyond three standard deviations from the mean, after the algorithm has performed Box-Cox transformation.

Determination of parameters

To find the lambda, mean and standard deviation that best describe the distribution of results from healthy individuals, the algorithm scores many possible combinations of these values using ‘regularised maximum likelihood’. The scoring system gives favourable scores to combinations that are likely to have produced the observed data, particularly in the region where most observations occur - i.e. around the histogram peak. The algorithm selects the combination of parameters with the best score and uses these to generate the reference interval.

Calculation of confidence intervals

A further advantage of RefineR is that it can calculate confidence intervals for its estimates. This is done by bootstrapping, which involves repeatedly creating ‘new’ data sets from the original data. Each ‘new’ data set is created by randomly sampling from the original data set n times with replacement (where n is the number of original data points). Thus, the new data set is the same size as the original, but has some data points repeated multiple times and others not selected at all. This data set is analysed by refineR, and upper and lower reference limits calculated. The process is repeated the number of times specified by the analyst (usually ~200) and the confidence intervals are calculated from the variation in the repeated analyses. Because it involves re-running the algorithm many times, refineR’s computation time is greatly increased when confidence intervals are calculated.

Performing refineR analysis

If you have R installed on your computer and are confident in its use, then it is best to use the refineR package.² If you don’t have R installed or aren’t confident with using it, analysis can be done online without any R knowledge using the RefineR App. Instructions for using the RefineR app are available.

References

Ammer T, Schützenmeister A, Prokosch HU, et al. RefineR: A novel algorithm for reference interval estimation from real-world data. Sci Rep 2021; 11: 16023. PubMed
Ammer T, Rank C, Schuetzenmeister A. RefineR: Reference interval estimation using real-world data. R package version 1.5.1 (2022). Available at: https://CRAN.R-project.org/package=refineR.