Leading open-source tools, GATK and DRAGEN, are being combined to create a cutting-edge software suite for methods including small variant (SNV) and large variant (CNV/SV) detection, according to Illumina and the Broad Institute of MIT and Harvard. The co-developed secondary analysis algorithms and software will provide a standardized means of processing high-throughput sequencing data and performing variant discovery analysis.
GATK is an industry leader for identifying SNPs and indels in germline DNA and RNA sequencing data. The Illumina DRAGEN Bio-IT Platform, meanwhile, delivers, rapid secondary analysis for germline and somatic SNV, SV, CNV(A) calling as well as methylation, RNA and repeat expansion workflows. (DRAGEN stands for Dynamic Read Analysis for Genomics). DRAGEN pipelines are hardware-accelerated using reconfigurable field-programmable gate array technology (FPGA). DRAGEN Pipelines can be deployed on-premise via a local server and in the cloud through Illumina’s BaseSpaceSequence Hub.
“Illumina’s goal is to deliver industry leading technologies to our customers, whether that means creating tools ourselves, bringing new technologies and teams in-house or partnering to enhance our offerings,” said Susan Tousi, senior Vvice president of product development at Illumina. “This is why we were so thrilled to acquire Edico Genome and DRAGEN last year and in this spirit that we are partnering with the Broad with the goal to deliver best-in-class open-source software for commonly used methods. By creating a suite of algorithms combining the best of DRAGEN and GATK, we believe we can fuel the clinical adoption of sequencing by decreasing the cost and time of analysis.”
As the speed of sequencing has increased and the cost has dropped, secondary analysis has become one of the key competitive issues in genomics. Open source tools such as GATK and DRAGEN are very popular, but commercial enterprises also claim they offer advantages such as ease of use, greater user friendliness, and more support.
GATK and DRAGEN were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology. Over time, the scope expanded to include somatic short variant calling, and to tackle copy number variation (CNV) and structural variation (SV). In addition to variant callers, GATK also includes utilities to perform related tasks such as processing and quality control of high-throughput sequencing data
“This approach is a positive step for the community and will add to the available choices in data analysis to increase the quality and lower the cost of the current set of analysis methods,” said Ewan Birney, Director, Global Alliance for Genomics and Health (GA4GH) and EMBL’s European Bioinformatics Institute. “The scientific community is well-positioned to benefit from this collaboration toward gold standard analysis methods and file formats which we believe will further enable inter-institution interoperability, research, and insights to maximize the impact of genomics in healthcare.”
The co-developed, open source, secondary analysis software will be distributed through the Broad Institute’s usual community support channels, such as GitHub. Illumina intends to develop proprietary, hardware-accelerated versions of the co-developed software on the Illumina DRAGEN-Bio-IT platform. This accelerated version of the software will be complemented with the wide offering of currently available pipelines on the DRAGEN Bio-IT platform. The Broad Institute and Illumina teams will validate that the results of such hardware-accelerated versions are functionally equivalent to those of the co-developed open-source software in order to ensure interoperability of data for downstream analyses