Annual Report 2019
Section of Genome Analysis Platform
Yuichi Shiraishi, Ai Okada, Kenichi Chiba, Naoko Iida, Daichi Narushima, Eisaku Furukawa, Jou Nishino
Introduction
The Section of Genome Analysis Platform in the Center for Cancer Genomics and Advanced Therapeutics (C-CAT) focuses on developing and evaluating (1) Methodologies for detecting various types of somatic variants (SNV, indel, structural variations, and so on) in cancer genomes, (2) Normalization of cancer genome data formats, (3) Workflows for cancer genome analysis and (4) Platform for analyzing and sharing cancer genome data using cloud computing. These products created by us will be well utilized in projects for collecting and managing cancer genome data. Also, we are preparing for a platform for whole genome sequencing analysis in clinical settings.
Research activities
1. Development of genome analysis pipelines and frameworks
We developed bioinformatics tools for genome sequencing and transcriptome sequencing to analyze somatic mutations, germline mutations, copy number variants, structure variants, haplotype, and mRNA. We developed the optimal analysis frameworks, considering the status of CPUs and memory.
- Development of analysis platform for germline mutations on Amazon web service.
- Development of automated job submission system using AWS Batch service.
- Development of analysis tools for long-read nanopore sequencing using GPU.
- Development of automated curation system for structure variants.
2. Development of monitoring-operation systems
- Construction of monitoring-operation systems for tasks. This system monitors the tasks that are present but are inactive to delete them and to optimize their cost.
- Development of a GUI for the monitoring system. This system monitors the number and performance of tasks and their execution results unitarily. It allows us to detect abnormalities early
3. Large-scale genome analysis
Using our developed genome analysis platform, we analyze about 250 thousand data in the Sequence Read Archive which is a public repository for sequencing data.
Education
We supported many researchers who use our analysis pipelines by answering their questions about bioinformatics. We employed a postdoctoral fellow and supported his research.
Future prospects
By utilizing current and novel sequencing technologies, this section will investigate and develop novel bioinformatics methods and a computer system for cancer genomics and clinical sequencing. We will also contribute to human resources for analyzing large-size cancer genomics data. Our analysis will provide a knowledge discovery framework for further precision medicine.