Introduction
Welcome back to the Data Reuse Digest - and the latest edition of our series on computational research for type II diabetes. We are exploring how different kinds of computational studies can drive clinical progress. By sharing success stories in one disease area, we aim to inspire the implementation of these successful approaches for other diseases also.
If you have friends or colleagues that you think would benefit from this newsletter, you can share it with them by clicking this button:
And if you are reading this now, but you’re not yet a subscriber, you can subscribe by clicking this button:
What Biological Patterns Predict Disease Progression?
How can computational research help people who have already developed diabetes? Machine learning methods can identify biological patterns that predict when patients are likely to develop diabetes complications – conditions like kidney disease that mark a substantial worsening of the illness and a higher mortality risk. These complications mean more symptoms to deal with, more time spent managing illness, and higher medical bills.
If patients and their doctors are more aware they are at risk for complications, doctors can preemptively treat and manage their evolving illness with an updated treatment regimen. Also, knowing what groups of patients are at greater risk for developing complications, medical clinics can monitor these patients more frequently, manage medical resources more effectively, and keep the patient population healthier overall.
Biological patterns that warn of worsening diabetes can come from many places. They may come from hospital records of clinical data: blood pressure, fasting glucose, blood and urine biomarkers, and other measurements. The history of medical prescriptions and hospitalizations can also provide a trail of clues that hints at future developments. Nowadays, research scientists and their clinical counterparts can also use more minute measures of cellular function like gene expression or metabolite production to understand patient health.
As they search for markers of worsening disease, researchers are mining data from different sources. In one recent study, researchers in Kazakhstan gathered clinical data from the country’s National Electronic Health System. The available health data included various demographic details, age at diagnosis, and comorbidities (other conditions that the patients have in addition to diabetes). The researchers applied a total of nine different classification machine learning algorithms to predict the one-year mortality rates of patients with diabetes, ultimately settling on an algorithm called gradient boosting that performed the task best.
Featured Study: Predicting 1-Year Mortality of Patients with Diabetes mellitus in Kazakhstan Based on Administrative Health Data Using Machine Learning (Alimbayev et al., May 2023, Scientific Reports)
In addition to the gradient boosting algorithm, the researchers also applied a technique called SHAP (or SHapley Additive exPlanations) – a sort of decoder that explains the outputs of the machine learning model. SHAP provides additional details about what specific items in the hospital record are most useful for predicting mortality, and in what ‘direction’ they impact mortality. For example, SHAP analysis indicated that age at diagnosis was the best-predictor of 1-year mortality – the older a patient is diagnosed, the higher their mortality risk (As opposed to patients diagnosed at younger ages being at higher risk – right predictor, wrong direction). They also found men to be at higher risk of mortality than women (though studies in some other countries have found the opposite – so this sex effect may be impacted by sociological factors).
In another example of predicting disease progression, researchers in Japan applied a method called ‘association rules’ to predict worsening kidney function. This is a good example of biological research drawing from other fields – the association rules approach is apparently common in market analysis. The researchers drew on clinical data for two large cohorts of Japanese people with diabetes. Like the SHAP approach utilized in the previous study, the association rules approach produces results that are very easy to interpret. For one, the researchers found that HbA1c levels were a good predictor of worsening kidney function. HbA1c is a proxy for blood sugar. As it travels through the blood, the hemoglobin protein tends to become weighed down with attached sugar molecules. The higher the blood sugar, the more this occurs. HbA1c or Hemoglobin A1c refers to the sugar-bound form of hemoglobin.
Featured Study: A Comprehensive Risk Factor Analysis Using Association Rules in People with Diabetic Kidney Disease (Toyama et al., July 2023, Scientific Reports)
Aside from HbA1c, which is already commonly monitored by clinicians, urine protein levels were also good predictors of worsening kidney function – and may be a more specific measure that doctors can use to detect emerging issues in the kidneys.
An ongoing clinical trial is putting this idea to the test. The UPRIGHT-HTM clinical trial, which started in 2020, is combining telehealth monitoring and urinary peptide profiling (the measurement of proteins in the urine) to monitor and advise patients with type II diabetes and/or high blood pressure. The trial has involved about 200 patients thus far, all 55-75 years old, from many different countries across Europe, Africa, and South America (Belgium, Denmark, Germany, Greece, Nigeria, Poland, Slovenia, South Africa, and Uruguay). None of these patients had existing kidney disease.
The trial was structured such that some of the patients received telehealth monitoring, while others received telehealth monitoring AND urinary profiling. The idea is to see if collecting urinary protein data will provide an extra dose of encouragement for patients and their medical providers to take more active preventive measures and reduce the risk of kidney disease. Preventative measures could involve lifestyle changes or starting new medications.
As the trial is still in progress, there’s not yet a clear answer as to whether monitoring urinary proteins will make a substantial difference for patients. But the recent progress report (June 2023) does suggest that the study is going well despite COVID-19 related disruptions. The combination of home monitoring and urinary protein measurements has been feasible even in countries like Nigeria where medical resources are relatively limited. The study is currently recruiting additional participants – more are being enrolled from Europe, Africa, and now in China as well.
Whatever the ultimate results, this trial is an excellent example of how basic computational research can be translated into clinical progress. You can use computational methods to identify risk factors for disease progression, develop methods to monitor these risk factors, stack this new monitoring approach on top of existing medical monitoring protocols, and see if the new monitoring method helps delay or even prevent the worsening of disease. This same general approach applies to other diseases as well.
Research Trends
The point of this section is to provide big-picture context: how are the featured studies shared in this edition representative of broader trends in computational research? These trends will be sometimes cite information from past editions, additional research articles, and mainstream news stories.
🏆 [Research Goal] This article highlights two practical reasons for trying to predict the progression of disease. The first is to help individual patients maintain their quality of life and prevent their condition from worsening. The second involves the insurance system. Delaying disease progression and keeping healthcare costs for individual patients down lessens the overall healthcare burden and helps to keep insurance costs down (which further improves a patient’s quality of life and ability to pay for treatments – a virtuous cycle)
📈 [Ongoing Development] As noted in the previous edition of this newsletter, efforts to predict disease rely on large stores of biological data. The more extensive and diverse these data stores are, the better the chance that researchers will identify strong predictors of disease onset and progression. Thus, disease prediction research depends on the efforts of clinicians (to collect data), hospital administrators (to enforce policies of data collection), technical staff (to maintain and enable access to electronic health records), and others.
⚙️ [Technical Note] In the Kazakhstan study, the researchers tested out nine different machine learning models before settling on one that worked best. This seems to be a common theme – when applying machine learning models to a specific research problem, researchers often try out different models and fine-tune their parameters to find an ideal model and setting. As we encounter more featured studies in the newsletter, we’ll get a better sense together of what the ‘best practices’ are for picking and fine-tuning models.
📈 [Ongoing Development] The Kazakhstan study hints at a broader trend in biomedical research – just as researchers are interested in biological variability at the level of individuals, which allows them to identify factors that heighten or lessen risk of disease onset, complications, or treatment failure, they also are interested in variability at the level of national populations. Biological factors like genetic variability can cause differences in health outcomes between nations (for example, if people of a specific ethnic background are at higher risk for a specific condition due to their genetic profile). There is a lot of research ongoing to identify genetic variants that are unique to different ethnic groups.
Non-biological factors like the state of the public health infrastructure or societal attitudes about disease can also contribute to differences in outcomes. In the Kazakhstan study, the observation that men with diabetes have a greater mortality risk in one country while women have a greater risk in another is a clue that some non-biological factor is interacting with the sex effect. In other words, if it were just differences in biology between the sexes that affected disease risk, you should see the same pattern of higher mortality risk for one sex over the other in every country studied (i.e., men at higher risk in all countries). But this is not the case. It is possible, as the researchers note in their study, that men in certain countries are much less likely to go to the doctor and get medical attention than women. This could raise the mortality rate for men over women, even if women appear to be at a higher risk of diabetes complications for biological reasons. Efforts to identify and address non-biological factors, in addition to the biological factors that contribute to disease complications, may improve population health outcomes significantly.
⚙️ [Technical Note] There has been recent focus in both the AI literature and more mainstream news outlets on designing and implementing more ‘explainable’ AI models. The basic idea behind explainable AI is that the inner mathematical workings of AI tools – why they are making the decisions that they are making – should be transparent. What are the underlying factors that the AI tool is using to guide its decisions, and which does it weight most heavily? Making AI more explainable can involve the use of a tool like SHAP (discussed in the Kazakhstan study) to help interpret results – or it could mean using a more intuitive model to begin with (the ‘association rules’ approach in the Japanese study is an example of this)
It's especially important to understand the AI decision making process when they AI models go wrong. Consider a hypothetical AI tool that can identify biological patterns that predict whether patients are at risk for a deadly disease that is treatable if caught early. After several years of using this tool, a retrospective analysis finds that the tool performed poorly in detecting disease early in patients of a particular ethnic group. If the AI tool were transparent, researchers would be able to see why the model is underperforming for this ethnic group. Perhaps the disease is generally less prevalent in this ethnic group, the model takes this into account, and it tends to miss the relatively rare cases where a member of the ethnic group does develop the disease). Knowing that this bias in the model exists, the model can be adjusted to improve its predictive power.
(This example is based roughly on this study: Examining the Potential Impact of Race Multiplier Utilization in Estimated Glomerular Filtration Rate Calculation on African-American Care Outcomes)
📈 [Ongoing Development] The clinical trial featured at the end of this edition is a great example of how the application of predictive models can drive clinical progress. The basic model seems to be this: researchers use predictive models to determine what biological factors (e.g., urine proteins) are best associated with disease onset and progression. If they identify a factor that predicts disease progression relatively effectively, and this factor is not currently monitored by clinicians (or not widely monitored), researchers can partner with clinicians to launch a new clinical trial like the one featured here. The goal of this trial would be to see whether patients assessed with this new monitoring protocol are better able to preserve their health than those that do not.