Applying data science to battle childhood cancer
Acute myeloid leukaemia in children has a poor prognosis and treatment options unchanged for decades. One collaboration is using data analytics to bring a fresh approach to tackling the disease.
Acute myeloid leukaemia (AML) kills hundreds of children a year. It's the type of cancer that causes the most deaths in children under two, and in teenagers. It has a poor prognosis, and its treatments can be severely toxic.
Research initiative Target Paediatric AML (tpAML) was set up to change the way that the disease is diagnosed, monitored and treated, through greater use of personalised medicine. Rather than the current one-size-fits-all approach for many diseases, personalised medicine aims to tailor an individual's treatment by looking at their unique circumstance, needs, health, and genetics.
AML is caused by many different types of genetic mutation, alone and together. Those differences can affect how the cancer should be treated and its prognosis. To understand better how to find, track and treat the condition, tpAML researchers began building the largest dataset ever compiled around the disease. By sequencing the genomes of over 2,000 people, both alive and deceased, who had the disease, tpAML's researchers hoped to find previously unknown links between certain mutations and how a cancer could be tackled.
Genomic data is notoriously sizeable, and tpAML's sequencing had generated over a petabyte of it. As well as difficulties thrown up by the sheer bulk of data to be analysed, tpAML's data was also hugely complex: each patient's data had 48,000 linked RNA transcripts to analyse.
Earlier this year, Joe Depa, a father who had lost a daughter to the disease and was working with tpAML, joined with his coworkers at Accenture to work on a project to build a system that could analyse the imposing dataset.
Linking up with tpAML's affiliated data scientists and computational working group, Depa along with data-scientist and genomic-expert colleagues hoped to help turn the data into information that researchers and clinicians could use in the fight against paediatric AML, by allowing them to correlate what was happening at a genetic level with outcomes in the disease.
In order to turn the raw data into something that could generate insights into paediatric AML, Accenture staff created a tool that ingested the raw clinical and genomic data and cleaned it up, so analytics tools could process it more effectively. Using Alteryx and Python, the data was merged into a single file, and any incomplete or duplicate data removed. Python was used to profile the data and develop statistical summaries for the analysis – which could be used to flag genes that could be of interest to researchers, Depa says. The harmonised DataFrame was exported as a flat file for more analysis.
"The whole idea was 'let's reduce the time for data preparation', which is a consistent issue in any area around data, but particularly in the clinical space. There's been a tonne of work already put into play for this, and now we hope we've got it in a position where hopefully the doctors can spend more time analysing the data versus having to clean up the data," says Depa, managing director at Accenture Applied Intelligence.
Built using R, the code base that was created for the project is open source, allowing researchers and doctors with similar challenges, but working on different conditions, to reuse the group's work for their own research. While users may need a degree of technical expertise to properly manipulate the information at present, the group is working on a UI that should make it as accessible as possible for those who don't have a similar background.
"We wanted to make sure that at the end of this analysis, any doctor in the world can access this data, leverage this data and perform their analysis on it to hopefully drive to more precision-type medicine," says Depa.
But clinical researchers and doctors aren't always gifted data scientists, so the group has been working on ways to visualise the information, using Unity. The tools they've created allow researchers to manipulate the data in 3D, and zoom in and out on anomalies in the data to find data points that may be worthy of further exploration. One enterprising researcher has even been able to explore those datasets in virtual reality using an Oculus.
Historically, paediatric and adult AML were treated as largely the same disease. However, according to Dr Soheil Meshinchi, professor in the Fred Hutchinson Cancer Research Center's clinical research division and lead for tpAML's computational working group, the two groups stem from different causes. In adults, the disease arises from changes to the smallest links in the DNA chain, known as single base pairs, while in children it's driven by alterations to larger chunks of their chromosomes.
The tpAML has allowed researchers to find previously unknown alterations that cause the disease in children. "We've used the data that tpAML generated to probably make the most robust diagnostic platform that there is. We've identified genetic alterations which was not possible by conventional methods," says Meshinchi.
Once those mutations are found, the data analysis platformcan begin identifying drugs that could potentially target them. Protocols for how to treat paediatric AML have remained largely unchanged for decades and new, more individualised treatment options are sorely needed.
"We've tried it for 40 years of treating all AML the same and hoping for the best. That hasn't worked – you really need to take a step back and to treat each subset more appropriately based on the target that's expressed," says Meshinchi.
The data could help by identifying drugs that have already been developed to treat other conditions but may have a role in fighting paediatric AML, and by showing the pharmaceutical companies that make those drugs there is hard evidence that starting the expensive and risky.
Using the analytics platform to find drugs that can be repurposed in this way, rather than created from scratch, could cut the time it takes for a new paediatric AML treatment to be approved by years. One drug identified as a result has already been tested in clinical trials.
The results generated by the team's work has begun to have an impact for paediatric AML patients. When the data was used to show a subset of children with the disease who had a particular genetic marker that were considered particularly high risk, the treatment pathway for those children was altered.
"This data will not only have an impact ongoing but is already having an impact right now," says Julie Guillot, co-founder of tpAML.
"One cure for leukaemia or one cure for AML is very much unlikely. But we are searching for tailored treatments for specific groups of kids… when [Meshinchi] and his peers are able to find that Achilles heel for a specific cluster of patients, the results are dramatic. These kids go from a very low percentage of cure to, for example, a group that went to 95%. This approach can actually work."
Author: Jo Best
Source: ZDNet