Introduction
Thanks for tuning in to The Data Reuse Digest! In writing this newsletter, the goal is to uncover all the different ways that published scientific data can be used to drive research forward - ultimately with an eye on translational developments (new drugs, new clinical guidelines, new technologies, etc.).
We try to keep the writing plain and simple so that the newsletter can be useful to researchers of any field (not just bioinformatics) and members of the general public as well.
For researchers, especially younger researchers, the idea of this newsletter is to show what successful, publishable work in the field looks like right now. At the same time, it also aims to encourage new kinds of research projects that push the field in new directions, towards new translational goals. For non-researchers, the idea is to pull back the curtain to show what research work actually looks like, and why it matters for society at large.
You can subscribe to the Data Reuse Digest here if you are not a subscriber already:
Bioinformatics Research Roadmap
Here’s how bioinformatics research can improve our understanding of biology and spur on the development of new drugs and other medical technologies…
Research News
(A) New Algorithms
Developing Computational Tools to See Biological Data in New Ways
The human body is home to millions of microorganisms - and researchers have come up with a new, creative, and cost-effective way to profile them. It involves re-analyzing existing data in a new way. The Pathonoia algorithm was designed to recognize and identify bacterial and viral sequences in human RNA-sequencing data. These non-human sequences are typically discarded as contaminants. Researchers may use the algorithm to re-examine gene expression profiles from people with disease (vs. healthy individuals) and try to detect microbial signatures that are related to disease symptoms and severity [Liebhoff et al., 2023] 🇩🇪🇺🇸
Multiple sclerosis is a debilitating autoimmune disease where progressive nerve damage leads to pain, fatigue, vision loss, and a host of other symptoms. This nerve damage occurs due to loss of the protective myelin sheath that surrounds nerve cells. Myelin is like the insulating material that encapsulates the electric wires in power lines. Researchers are working on a novel class of drugs called ‘remyelination therapies’ that restore damaged myelin - but in order to test potential drug compounds, new experimental and computational tools are needed. A research team in Switzerland has developed an experimental system that can test whole libraries of remyelination compounds on neural cells. They also designed a computer vision algorithm to ‘see’ and quantify myelin formation accurately in these cells [Seiler et al., 2023] 🇨🇭🇺🇸
A key translational goal of bioinformatics research is the production of algorithms that can predict patient prognosis based on a range of biomarkers. In this study, researchers used a random forest model (a machine learning method) that draws on everything from demographic information (age, sex, ethnicity, etc.) to lung function and blood work data to predict the prognosis of patients with hypertension. Physicians can use this algorithm to better understand and treat individual patients [Kheyfets et al., 2023] 🇺🇸🇬🇧
Understanding how individual cells metabolize nutrients and produce energy to fuel their activities could unlock new insights about human disease. A team of researchers have developed computational methods, applied to single-cell RNA sequencing data, that allow them to map out the metabolic networks active in individual cells. They applied these methods to detect metabolic differences between cancer and normal cells - and then used them to distinguish the metabolic profiles of cells in 19 different human organs. The metabolic networks that the researchers uncovered have been made publicly available to explore on a website called Metabolic Atlas [Gustafsson et al., 2023] 🇸🇪🇩🇰
When someone is infected by SARS-CoV-2, we often assume that they have acquired a single strain of the virus. There are cases, however, of individuals being infected with multiple strains at once. Researchers have developed a new algorithm that can identify these co-infection cases. They applied the algorithm to a database of SARS-CoV-2 genomic sequence data and found that 2% of the cases in the databases were likely to be co-infections. Researchers can build from this study and identify the clinical risks of co-infection. The algorithm developed here may be applied to other kinds of viral infections as well [Goya et al., 2023] 🇦🇷
(B) New Databases
Building databases to store and share biological knowledge
SalivaDB is a new database built to keep track of salivary biomarkers (genes, proteins, metabolites, micro-organisms, miRNA) with data for over 200 different diseases. This new database, which draws from information in published papers as well as existing databases, will support better patient diagnosis [Arora et al., 2023] 🇮🇳
Researchers in the UK have developed a database that maps out hundreds of new variants in a specific gene associated with the blood disorder hemophilia. This database can be used as a tool to better assess patient prognosis, and even develop new hemophilia drugs that are effective for patients with different variants [Xu et al., 2023] 🇬🇧
The Human Protein Atlas has recently been updated to a new version. The database holds a vast array of information about human proteins - their location in cells and tissues, whether or not they are secreted, and how they contribute to different diseases. The latest release integrates new datasets (specifically, single-cell transcriptomic datasets) and adds a new ‘Blood Protein’ section to the database, among other improvements [Digre & Lindskog] 🇸🇪
A new database called CCS-ATAC brings together biological data related to a special region of the heart that coordinates the heart beat - the cardiac conduction system (CCS). CCS-ATAC maps out gene regulatory elements in the CCS. These are sections of DNA that control when genes are expressed (transcription factors bind to these regulatory elements to turn genes on or off). The database also includes information on genetic variants that are thought to impact heart rhythms, such as those found commonly in people with arrhythmia [Bhattacharyya et al., 2023] 🇺🇸
A new compendium brings together thousands of published gene expression profiles for Pseudomonas aeruginosa. This resource has a range of useful applications for fellow researchers who study the common human pathogen. For example, if a researcher is studying the gene expression of Pseudomonas under specific conditions (like low oxygen availability or antibiotic treatment), they can use the compendium to identify other conditions that cause the bacteria to respond in a similar way. Resources like this one allow researchers to build more effectively on the foundations of prior work - reusing published data to enhance their own research [Doing et al., 2023] 🇺🇸
(C) Beyond the Bench
Using data gathered outside the laboratory to help diagnose, treat and prevent disease
Scientists in the UK pilot a wearable device (the ChroniSense Polso) to track vital signs in hospitalized patients. In a study with 132 participants, the device effectively measured blood pressure and heart rate (in other words, the device measurements tracked well with measurements from more traditional medical equipment) - though the researchers note that temperature, oxygen saturation, and respiration rate measurements need to be fine-tuned. Devices like this one could help make up for short staffing in hospitals and nursing homes, and even recognize worsening symptoms earlier so that clinicians can intervene more quickly [Van Velthoven et al., 2023] 🇬🇧
In light of the recent COVID-19 pandemic, scientists are searching for ways to track and predict infectious disease outbreaks before they take off. This team of researchers used both weather data and data from the social media site Twitter to develop a deep learning model that can predict outbreaks of influenza [Athanasiou et al., 2023] 🇬🇷
Speaking of COVID-19, another group of researchers used social media data to track COVID-19 symptoms in the population. Drawing on 400 million tweets over 2+ years, the researchers tracked the prevalence of many milder symptoms that aren’t often reported or emphasized in clinical data (because patients who are hospitalized tend to have much more severe symptoms - and physicians are focused on recording and treating these symptoms). This data offers a fuller picture of what mild COVID-19 looks like from a medical standpoint [Wu et al., 2023] 🇨🇳🇺🇸
Researchers in South Korea have developed an algorithm that can ‘listen in’ and detect signs of breathing irregularity in people with obstructive sleep apnea. The model was trained with ‘noisy’ data - including sounds from people with sleep apnea, but also 22,500 other home noises (everything from alarm clocks to purring cats and barking dogs). This approach to training the model will allow it to track sleep apnea more accurately and not falsely diagnose other unrelated noises as sleep apnea events [Le et al., 2023] 🇰🇷
Patients with multiple medical conditions are at high risk of unplanned hospital admissions, which can place a high burden on the healthcare system. Drawing on a mix of personal data (clinical data, measures of social support, etc.), researchers used machine learning techniques (K-means clustering and other models) to stratify these patients into different risk groups. The model and its predictions can help clinicians and hospital staff predict and prepare for readmission so that hospital resources can be better allocated and patients can be cared for more effectively [González-Colom et al., 2023] 🇪🇸
(D) Bioinformatics Analysis
Bioinformatics tools applied to answer key biological questions
Researchers want to identify the genes that drive glioma - a dangerous form of brain cancer with a poor prognosis. In this study, two genes (GRIN1 and ATP1A3) of interest were identified by analyzing public data from glioma patients. Specifically, patients who expressed these genes less had a poorer prognosis. These genes could be evaluated further as therapeutic targets [Ji et al., 2023] 🇨🇳
In addition to the genes expressed in cancer cells, another factor that can drive cancer progression is the behavior of immune cells surrounding the tumor. Researchers used image data - breast cancer slides stained for specific proteins that tend to crop up on the surface of immune cells called macrophages. This allowed them to see what kinds of macrophages were present in the different tumor samples and how their presence correlated with cancer severity. For example, in patients with Luminal-B Breast cancer, they found that the amount of macrophages expressing the CD68 protein on their cell surface is correlated with tumor grade (essentially, how abnormal the tumor cells look - a sign of their aggressiveness) [Zwager et al., 2023] 🇳🇱🇨🇳
You can use machine learning to determine which combinations of drugs are most effective for patients. Researchers in Austria analyzed 61 different drug cocktails given to patients after orthopedic surgery (the study surveyed data from 750 patients in total). They found that cocktails containing Metamizole and Paracetamol - alongside one or more of several other analgesics (Hydromorphone, Diclofenac, Diclofenac-Orphenadrine) - were most effective at decreasing pain [Fritsch et al., 2023] 🇦🇹
Researchers have identified a protein (CDK1) that could be a viable drug target for cancer patients. However, there aren’t yet any clinically proven drugs that target this protein. This study employed a string of computational techniques to generate new chemical compounds that are likely to target CDK1 (based on its protein structure) and then run simulations to predict which molecules will target CDK1 best. In the end, the research team put forward two compounds that they believe are worthy of further testing [He et al., 2023] 🇨🇳
Recent news stories have reported on a troubling rise in colon cancer cases among young people. To better understand the disease and counteract this trend, researchers are exploring how specific features of the tumor environment like hypoxia (low oxygen levels) impact patient prognosis. Drawing on public datasets, researchers found that the expression levels of a group of six hypoxia-related genes can effectively predict colorectal cancer prognosis. These genes may prove useful not only as prognostic biomarkers but also as drug targets [Qiao et al., 2023] 🇨🇳
Research Community
This month’s new studies involved research in 14 countries around the world.
Spread the Word!
Thanks for reading! If you want to help us in our mission to show how researchers can make the most of public data, please share this newsletter with any colleagues who would be interested. Just press the button below to forward the newsletter along