Using data to make better medical decisions

The Data Science Fellowship equips health care researchers and professionals to leverage data to improve medical knowledge and care.

Long before the COVID-19 pandemic, when billions of vaccine doses were administered globally, the health care industry was generating enormous amounts of data. That volume of data is only increasing, with RBC Capital Markets reporting that 30% of data worldwide is generated by the health care industry. That growth outpaces other industries, including manufacturing, financial services and entertainment.

At the Healthcare Information and Management Systems Society 2023 Global Conference, population health vendor Arcadia reported that a single hospital can generate as much as 50 petabytes of data each year. One petabyte is the equivalent of 500 billion pages of standard printed text.

That all raises important questions: are copious amounts of data too much of a good thing? How can the data be used to improve patient care? Who analyzes the data and uses it to improve the health care system?

This is where two very distinct disciplines – doctors and data scientists – intersect. A doctor is trained in medicine and patient care but lacks the skills a data scientist possesses. A data scientist is adept at collecting, analyzing and interpreting data to drive informed decision making, but does not have the medical acumen and skills required to treat patients.

Bridging the gap between these professions can maximize the benefits of the enormous amounts of health care data that are now available. Ajay Perumbeti, MD, a clinical informatics fellow at the University of Arizona College of Medicine – Phoenix, sees value and potential in using data to improve patient care. Trained as a pediatric hematologist oncologist, Perumbeti has since shifted his career focus to analyzing and building tools for clinical and bioinformatic data sets.

“I was struck by the amount of data being collected,” Perumbeti said. “And I had some intuition that maybe it would be useful to take this data to help us make good medical decisions. I think that computational approaches are complementary to a physician’s experiences. How we put those together is critical for improving health care.”

In Spring 2023, Perumbeti and Chidiebere “Peter” Echieh, MD, FWACS, were among an interdisciplinary group of participants in the Data Science Fellows program, part of a UArizona Health Sciences strategic initiative to increase the use of data science and analytics in health care. Like Perumbeti, Echieh decided he could help more people by conducting scientific research, rather than seeing and treating one patient at a time as a clinician.

“There are many steps before a discovery can become something useful. Many discoveries never make it into a doctor’s hands,” Echieh said. “For example, imagine all the information that has to transfer from discovering a biological target associated with a disease, to a biochemist finding a chemical that works on the target, to the pharmacologist who describes how it works on the body, to safety testing, and then eventually all the different phases of a clinical trial. There are so many points in this series of transfers where a discovery might not make it through to the patient.”

Through the Data Science Fellows program, fellows develop and exchange the data science expertise needed to answer challenging research questions in health sciences. Data Science Fellows also receive intensive training and mentoring focused on the use of open science, which focuses on the reproducibility of data, and computational infrastructure, which provides the ability to automate processes to efficiently ease human labor in data science research.

Carlos Lizárraga, PhD, MSc, a computation and data science educator at the Data Science Institute in the UArizona Office of Research, Innovation and Impact, explained that data scientists search for information that can be extracted from datasets to analyze relationships between variables and identify general behavior or specific patterns. The types of datasets can be quite broad, and in health sciences may include text and numeric datasets of DNA sequences, sentences or vital signs, for example, and imaging datasets of X-rays or scans. Computational tools such as algorithms are then used to create forecasting models. Ultimately, clinicians can use this information to identify the probability a person will develop a certain disease.

These computational approaches to medicine can help eliminate or correct bias based on experiential knowledge. This one reason that Perumbeti, who is board certified in pediatrics, pediatric hematology oncology and transfusion medicine, began to think about his patients in a broader context.

“I had a lot of patients with sickle cell disease. I could get a sense of how they were doing individually, but I did not have a good sense of how they were doing as a whole,” Perumbeti said. “I wasn’t able to compare them to patients in other clinics or across the country or from year to year. I started thinking about how we could look at patients with a particular condition from a population health perspective.”

Understanding how to leverage data to improve efficiency of decision making is an emerging challenge for health care leaders. Perumbeti and Echieh are among those who already see the advantage of leveraging data science tools and methods to continue the advancement of science.

“Bad information is just as bad as no information,” Perumbeti said. “Good science that is reproducible from a computational medicine perspective is critical to get where we need to go.”

New Frontiers for Better Health