R Data Analysis: Understanding Net Reclassification (NRI) and Integrated Discriminant Improvement (IDI) Index

Our most common indicator for evaluating the performance of classification prediction models is the ROC curve, which reports AUC. For example, if there are two models, let’s compare the AUCs of the two models to determine the performance of the two models. This is our normal practice. If our research focus is on “Whether the effect of the model will be improved if a new predictor variable is introduced into the original model“, at this time ROC will often appear to be inadequate. Because usually adding a variable AUC will not change much, and AUC is difficult to interpret.

When evaluating the improvement of predictive performance of a predictive model after incorporating a new marker, the improvement of C-Statistic/AUC is always small, therefore the new marker sometimes fails to significantly improve C-Statistic/AUC.

At this time, you need to use the comprehensive discrimination improvement index IDI and net reclassification index NRI indicators that we are going to talk about today.

Two new metrics, the integrated discrimination improvement (IDI) and net reclassification improvement (NRI), have been rapidly adopted to quantify the added value of a biomarker to an existing test.

Net Reclassification Index NRI

Whether a new indicator or a new model will improve the classification effect will eventually be reflected in the number of people. From this perspective, we can compare the correct division of research objects between two models or different indicators to draw conclusions.

That is to say, our old model will classify research subjects into patients and non-patients, and the new model will also classify research subjects into patients and non-patients. At this time, when comparing the classification changes of the research population between the new and old models, you will find that some research objects were originally misclassified in the old model, but were correctly classified in the new model; there are also some research objects that were originally classified in the old model. The classification is correct in the model, but it is misclassified in the model. Therefore, the classification of the research object will change in the new and old models. We use this reclassification change to calculate the net reclassification index NRI.

To better understand this change let us look at the following table:

In table3, c1 means that the original model did not predict correctly, but the new model predicted correctly. In the same way, b1 means that the original model predicted correctly, but the new model predicted incorrectly, so (c1 ? b1)/N1 is the disease group or The correct ratio of reclassification increased by the event group.

Similarly, we can get the increased reclassification correct ratio in the non-disease group (table 4) as (b2 ? c2)/N2.

The NRIevents is the net proportion of patients with events reassigned to a higher risk category and the NRIonevents is the number of patients without events reassigned to a lower risk category

So NRI = (c1 ? b1)/N1 + (b2 ? c2)/N2

Because NRI represents the increase in the proportion of correct cases that are reclassified, if NRI>0, it is a positive improvement, indicating that the prediction ability of the new model is improved compared to the old model; if NRI<0, it is a negative improvement, and the new model The predictive ability of the model decreases; if NRI=0, the new model is considered to have no improvement.

Integrated Discrimination Improvement Index IDI

We have just introduced NRI. NRI evaluates models from the perspective of an increase in the proportion of the correct number of case predictions between the old and new models. Another way of thinking is that we can reflect the quality of the model from the perspective of an increase in probability.

That is to say, in the disease group, the probability of positive prediction by the model should be as large as possible, and in the non-disease group, the probability of positive prediction by the model should be as small as possible. An evaluation index can still be obtained through the difference in prediction probability of the model. If the new model is better than the original model: in the positive group, the probability of predicting positive is greater than that of the old model; in the negative group, the probability of predicting positive is smaller than that of the old model. Then it can be said that the new model is better than the old model.

This index is IDI

IDI = (Pnew,events–Pold,events) – (Pnew,non-events – Pold,non-events)

Among them, Pnew,events represents the predicted positive probability of the new model in the disease group, and Pold,non-events represents the predicted positive probability of the old model in the non-disease group.

That is to say, IDI is equal to the difference between the predicted positive probabilities of the new and old models for the disease group minus the difference between the predicted positive probabilities of the new and old models for the non-disease group (because the probability of predicted positive by the model for the non-disease group should be as small as possible, so the middle It is a minus sign) In this way, the larger the IDI, the more it indicates that the new model has better prediction results than the old model. If IDI>0, it is a positive improvement, indicating that the prediction ability of the new model has improved compared with the old model. If IDI<0, it is a negative improvement, and the prediction ability of the new model has declined. If IDI=0, it is considered that the new model has not improved. .

Practical practice

In R language, we can use the reclassification function to easily obtain NRI and IDI. This function accepts 5 parameters. The parameter description is as follows:

The first is that data is the original data set, and the cOutcome parameter is the position of the outcome column in the original data set. For example, the second column of the original data set is the outcome variable, and cOutcome is set to 2; then the values of the old model and the new model are in sequence. To predict the risk value, the last parameter cutoff is the risk value cutoff point for model classification.

For example, I now have a data set as follows

The outcome is in the second column of the data set. I want to compare the classification performance of the two models when model 1 with only age and sex and model 2 with age, sex and education set the predicted risk value of 0.5 as the category classification standard. After fitting model1 and model2, I can write the following code:

model1 <- glm(formula = `outcome(AMD)` ~Age + Sex, family = binomial("logit"), data = Data)
model2 <- glm(formula = `outcome(AMD)` ~Age + Sex + Education, family = binomial("logit"), data = Data)
predRisk1 <- predRisk(model1)
predRisk2 <- predRisk(model2)
cutoff <- c(0,.5,1)
reclassification(data=ExampleData, cOutcome=cOutcome,
predrisk1=predRisk1, predrisk2=predRisk2, cutoff)

The output after running the code is as follows:

It can be seen that the NRI (Categorical) is 0 when the risk cutoff value is 0.5, indicating that adding the edu model does not make the classification model better. At the same time, the point estimates, p values and confidence intervals of NRI (Continuous) and IDI are also given in the results. All can be reported in the paper.

This concludes the introduction to NRI and IDI. In the future, when comparing two disease models or comparing the diagnostic performance of two indicators, in addition to the traditional ROC curve and its AUC, NRI and IDI can also be given at the same time, which is more comprehensive. Multi-level display of model improvements.

when comparing diagnostic power of two markers or comparing two predictive models, we could use not only AUC, C-statistics but also NRI and IDI, which could give a comprehensive perspective on how much the predictive performance improves.

we could not calculate NRI or IDI of one predictive model. IDI and NRI are calculated from the comparison of two models. One model does not have IDI or NRI.

Recommended literature:
https://cdn.amegroups.cn/journals/amepc/files/journals/16/articles/29812/public/29812-PB1-1696-R4.pdf