New Tool to Quickly Cull Biobank Data, Accelerate Research of Genetic Diseases

New Tool to Quickly Cull Biobank Data, Accelerate Research of Genetic Diseases

A couple of computer scientists are getting ready to take the DNA testing technique that made Ancestry.com family trees possible to a whole new level, which may give doctors a powerful new tool against hereditary diseases such as breast cancer, diabetes and autism.

The National Human Genome Research Institute of the National Institutes of Health has awarded a $2.3 million grant to Shaojie Zhang, an associate professor of computer science from the University of Central Florida, and Degui Zhi, an associate professor of biomedical informatics from the University of Texas Health Science Center at Houston, to further develop their new genome informatics tool nicknamed RaPID for its speed and to build additional tools to address related population-genetics questions.

While finding out your genetic makeup is pretty cool, the NIH’s interest lies in the new technique because it could track down genetic disease markers across thousands, perhaps millions of people quickly. Access to that kind of data analysis could provide critical clues in developing treatments or cures. It’s an example of the power of big data.

“This method would allow us to study larger cohorts, say half a million individuals in a few minutes,” Zhang said. “Before the process was cumbersome and messy. Our approach is 100 times faster and much more accurate than current methods.”

RaPID is the first computationally feasible method for inferring identity-by-descent (IBD) segments among individuals in biobank-scale cohorts. The core algorithm within the tool lets researchers evaluate data in linear time. The tool allows researchers to identify shared DNA segments and reconstruct genetic history.

Last month, a suspect in California’s 40-year-old Golden State Killer cold case was identified using IBD segmentation. It took current technology hours to find a genetic match. It would take the RaPID technique seconds, Zhi said.

There are already huge banks of DNA samples in the United States and abroad such as TOPMed and the UK biobank . The material is collected from volunteers or those involved in studies who provide blood, tissue and other samples for research purposes. The banks have grown since the onset of precision medicine.

“With this grant, our team is developing algorithms to identify shared DNA segments within large cohorts,” Zhi stated. “We aim to advance genetic research by building new informatics tools that reveal detailed genetic relationships between humans.”

Zhang presented the team’s preliminary work at the 21st annual International Conference on Research in Computational Molecular Biology last year. The conference is one of the largest for the experts in the field.

Zhang joined UCF in 2007. He holds multiple degrees including a doctorate in computer science from the University of California at San Diego. Other UCF team members are computer science doctoral students Ardalan Naseri and Erwin Holzhauser.