Executive summary
This report assesses how well MARA version 5 (v5) XPLN models predict annual healthcare spending and the occurrence of acute medical events (emergency department visits and hospitalizations) for people with commercial or Medicare insurance. Our findings are based on the healthcare data of more than 17 million individuals from 2022–2023 who were excluded from the MARA model development. Highlights from the report include:
- Predictive accuracy improved from MARA v4 to v5. The R-squared of the commercial Cx Concurrent model increased from 47% to 70%, and the Medicare Cx Concurrent model increased from 50% to 61%.
- MARA v5 prospective risk models performed well in predicting next year’s spending. The R-squared of the Cx Prospective model was 37% for commercial and 32% for Medicare, while the mean absolute percentage error (MAPE) was 87% for commercial and 79% for Medicare.
- MARA v5 CxXPLN models help identify individuals at risk of future hospitalizations and emergency department visits. Among Medicare individuals with the highest 1% prospective inpatient risk scores, 60% had a hospitalization in 2023 compared to the overall rate of 21%. Among commercial individuals with the highest 1% prospective emergency department risk scores, 44% had an emergency department visit compared to the overall rate of 11%.
Importance of using MARA for risk adjustment
Milliman Advanced Risk Adjusters® (MARA®) represents a cutting-edge solution in population health management that supports the predictive analytics of more than 300 healthcare organizations. In today’s dynamic healthcare environment, MARA is an essential tool for organizations seeking to understand and adapt to rapidly changing healthcare trends. The release of MARA v5 XPLN (pronounced “explain”) models marks a significant milestone in Milliman’s predictive modeling capabilities and demonstrates improved prediction accuracy compared to MARA version 4 (v4). These improvements in accuracy unlock opportunities to optimize case management protocols, refine risk-based reimbursement methodologies, strengthen provider performance analytics, and enhance the rigor of program evaluations. The increased precision of MARA v5 allows payers to more accurately underwrite risk and set premiums and positions healthcare organizations to make more informed, data-driven decisions that can improve patient outcomes and achieve operational efficiency.
The enhanced performance of the suite of MARA v5 models is the product of a multi-year commitment to research and development. MARA v5 adds more than 500 new, more granular condition categories for more comprehensive risk stratification, refines disease-to-condition mappings for International Classification of Diseases (ICD) codes that identify multiple conditions, and updates pregnancy algorithms to better identify high-risk cases. These enhancements increase the precision of risk predictions across diverse populations, and the models are updated regularly to reflect current trends in treatment patterns and changes in the relative costs of healthcare services.
Understanding the types of MARA XPLN risk adjustment models
MARA v5 was developed and tested using administrative healthcare data representing more than 32 million unique lives enrolled in commercial insurance or a Medicare Advantage with a Part D prescription drug plan (MAPD) in 2022–2023. Each MARA model makes predictions for an individual’s annual healthcare spending across different service categories (inpatient, outpatient, emergency department, physician, pharmacy, Medicare Part B, and other), which are aggregated to predict total annual healthcare spending (allowed costs).
All MARA XPLN models offer concurrent and prospective risk scores. Concurrent models estimate an individual’s healthcare spending during the year in which their health status was assessed. While concurrent models generate predictions using an individual’s health status based on diagnosis and pharmacy records, they do not incorporate measures of utilization. Thus, concurrent risk scores provide a spending benchmark for an individual given their health status.
Among the MARA concurrent models, those using the most comprehensive source of input data (Cx) are typically more accurate compared to pharmacy-only (Rx) and medical-only (Dx) models. While the Rx model incorporates demographic and pharmacy claims history into the risk score calculation and the Dx model uses demographic and medical claims history, the Cx model uses demographic, pharmacy claims, medical claims, and office-administered drugs to predict healthcare spending. Accurate concurrent risk scores are important for activities such as administering retrospective provider risk sharing arrangements, creating utilization benchmarks, and evaluating the effects of care management activities.
MARA prospective risk models use the same demographic, pharmacy, and/or medical claims data from the assessment year as concurrent models but predict healthcare spending the following year. Compared with concurrent models, prospective models generally place more weight on chronic conditions that are expected to persist in the subsequent year and less weight on episodic conditions that are likely to have resolved in the current year. All MARA models exclude prior healthcare costs when calculating prospective risk scores to ensure that they reflect underlying morbidity and healthcare needs rather than past spending patterns, which may instead reflect inefficient disease management or socioeconomic factors.
Figure 1: Descriptions of concurrent and prospective risk score models
Uses medical, pharmacy, and demographic data from the 12-month assessment period to predict total spending during that same 12-month period. The risk score represents the predicted spending amount for individuals with a given clinical profile.
Uses medical, pharmacy, and demographic data from the 12-month assessment period to predict total spending during the following 12-month period. The risk score represents the predicted spending amount for the subsequent year based on an individual's clinical profile.
Evaluating the accuracy of MARA risk adjustment models
In a study conducted by the Society of Actuaries (SOA) nearly a decade ago, MARA’s v3.6 diagnosis and pharmacy risk adjustment model demonstrated the best performance in accurately predicting healthcare costs on a variety of performance metrics compared to five competing risk score products.1 This paper evaluates the MARA v5.0.1 XPLN models using the same core performance metrics used by the SOA study—R-squared and MAPE—it extends the evaluation by assessing the accuracy of emergency department and hospitalization predictions in the v5 models across prospective risk score stratifications.
The performance metrics of the MARA v4.21 and v5.0.1 XPLN models were calculated using a dataset of more than 17 million individuals from 2022–2023 who were enrolled in commercial insurance or an MAPD plan and had been excluded from training the v5 models.
Truncating individual healthcare costs is a standard practice when assessing risk adjustment models. In our evaluations, annual costs in the sample were truncated at $250,000 to prevent high-cost cases from skewing the R-squared and MAPE statistics and to provide a more representative assessment of the model's predictive performance across the majority of individuals. Similarly, predicted spending amounts in the testing dataset were also truncated at $250,000.
Performance metrics
- R-squared: Measures the percent of variation in allowed costs that is explained by the risk score. An R-squared of 100% indicates perfect prediction accuracy, while an R-squared of 0% indicates there is no meaningful connection between risk scores and actual healthcare spending. The R-squared measure reflects the explanatory power of a risk adjustment model that is comparable across populations. However, it is sensitive to large prediction errors since it is calculated using the squared differences between actual and predicted values.
- Mean absolute percentage error (MAPE): Measures the mean of the absolute prediction errors relative to the mean actual cost of the sample. A MAPE statistic of 0% implies that the risk scores always perfectly predict healthcare spending, while a MAPE statistic of 75% implies the mean absolute prediction error was 75%. MAPE is useful for evaluating the magnitude of the model’s prediction errors but is sensitive to differences in annual spending amounts between populations. As a result, the same absolute prediction error will yield a lower (better) MAPE for a high-cost population like Medicare than a lower-cost population like commercial.
where n is the number of individuals in the sample
- Prospective risk score stratification: We stratify the commercial and MAPD populations in the test dataset using prospective inpatient and emergency department risk scores to evaluate how likely high-risk individuals were to have the predicted type of utilization occur in the subsequent year compared to those in lower risk stratifications.
How well MARA predicts healthcare spending and acute medical events
Figure 2 presents the R-squared values of MARA v4 and v5 XPLN models on the same testing dataset. All models demonstrated improvements in accuracy between v4 and v5, with concurrent models showing some of the largest improvements.
While all prospective MARA XPLN risk score models experienced an increase in accuracy between v4 and v5, the gains in accuracy made by the Cx models were some of the largest. The R-squared of the MARA Cx v5 prospective model was 37% for commercial and 32% for Medicare. Enhanced prospective risk score accuracy can lead to better medical expense forecasting, more accurate underwriting, and more effective care management referrals.
Figure 2: Changes in R-squared between MARA version 4 and 5
Note: Higher R-squared values indicate greater model accuracy. Dx models use age, gender, and medical claims information.
Rx models use age, gender, and pharmacy information. Cx models use age, gender, medical claims, and pharmacy information.
Source: Model testing dataset (2002-2023).
Figure 3 presents the MAPE statistics of the MARA v4 and v5 models evaluated on the testing dataset. Similar to the evaluations of R-squared, models that use both pharmacy and medical claims data to predict spending were more accurate than when only one type of input data was used.
Figure 3: Mean absolute percentage error (MAPE) of MARA XPLN models by segment and version
| MARA model version |
Commercial | Medicare | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Concurrent | Prospective | Concurrent | Prospective | |||||||||
| Cx | Dx | Rx | Cx | Dx | Rx | Cx | Dx | Rx | Cx | Dx | Rx | |
| v4 | 60% | 72% | 87% | 91% | 99% | 99% | 53% | 62% | 79% | 82% | 87% | 86% |
| v5 | 52% | 67% | 80% | 87% | 97% | 96% | 49% | 61% | 74% | 79% | 87% | 85% |
Note: Lower MAPE values indicate greater accuracy. Dx models use age, gender, and medical claims information.
Rx models use age, gender, and pharmacy information. Cx models use age, gender, medical claims, and pharmacy information.
Source: Model testing dataset (2002-2023).
The R-squared values in Figure 2 indicate MARA v5 predicts healthcare costs most accurately for individuals enrolled in a commercial plan, whereas the MAPE results in Figure 3 indicate greater accuracy for those in a MAPD plan. These differences reflect that the R-squared measure magnifies the impact of large prediction errors by squaring them, while MAPE instead uses their absolute value.
Figure 4 highlights the accuracy of the predictions produced by MARA v5 for inpatient admissions and emergency department visits. We used MARA’s CxXPLN prospective risk adjustment models to predict future inpatient and emergency department expenses in 2023 for the commercial and Medicare populations in the testing dataset using their 2022 medical claims and pharmacy information.
The columns in the graphs denote cohorts that an insurer or healthcare provider might use to stratify their population based on risk of emergency department or inpatient spending. Those assigned to the “Bottom 50%” cohort have the lowest levels of predicted spending for a particular type of care, while those assigned to the “Top 1%” represent the 1% of the sample with the highest predicted risk. Using the Top 1% prospective risk scores to inform care management referrals can help ensure that staff efforts focus on those most likely to experience an event.
Figure 4: Inpatient and emergency department utilization by prospective risk score stratifications
Source: Model testing dataset (2022-2023)
In the testing dataset, 34% of commercial members and 60% of Medicare members assigned to the highest risk inpatient score category had an inpatient admission compared to an overall rate of 3% in commercial and 21% in Medicare. Similarly, 44% of commercial members and 62% of Medicare members assigned to the highest risk category for emergency department scores had an emergency department visit compared to the overall rate of 11% in commercial and 13% in Medicare. This represents an improvement compared to MARA v4, where the Top 1% had an inpatient rate of 27% in commercial and 59% in Medicare and an emergency department visit rate of 43% in commercial and 58% in Medicare.
MARA XPLN models offer additional outputs that can further stratify the highest-risk population and prioritize candidates most appropriate for intervention. These include diagnosis-based indicators of complex medical conditions requiring coordination between multiple healthcare providers (such as transplants and cancer) and diagnosis of social risk factors like homelessness or food inadequacy. MARA’s Rising Risk™ models can complement this approach by identifying individuals whose costs are likely to increase in the future relative to their costs in the current year.
Conclusion
Predictive accuracy is an essential feature of risk adjustment and an area where MARA continues to innovate. Our advancements can help organizations unlock the potential of their data to better understand their population, improve patient outcomes, and achieve operational efficiency. To learn more about Milliman Advanced Risk Adjusters (MARA), visit milliman.com/MARA or send us an email.
1 Society of Actuaries. (October 2016). Accuracy of Claims-Based Risk Scoring Models, pp.18–19. Retrieved October 22, 2022, from https://www.soa.org/4937b5/globalassets/assets/files/research/research-2016-accuracy-claims-based-risk-scoring-models.pdf.