Genome Analysis Toolkit and Apache Spark

Users of the latest release of the Genome Analysis Toolkit, an open source framework for analyzing high-throughput DNA sequencing data, can now choose Apache Spark for data processing.

Ever since the Human Genome Project produced the first draft sequence of the human genome in 2000, the cost of sequencing has dropped exponentially, from around US$100 million per genome then to around US$1,000 today. Over the same period, we have seen massive growth in the storage and processing capabilities of big data technologies like Apache Hadoop. It’s very fitting, then, to use tools from the Hadoop ecosystem for genomics, which is why Cloudera, in cooperation with the Broad Institute and other industry partners, is pleased to announce the alpha release of the Genome Analysis Toolkit (GATK) version 4 running on Apache Spark.



Details: http://blog.cloudera.com/blog/2016/04/genome-analysis-toolkit-now-using-apache-spark-for-data-processing/

Comments