Annual Report 2022
Section of Genome Analysis Platform
Yuichi Shiraishi, Naoko Iida, Ai Okada, Kenichi Chiba
Introduction
The Section of Genome Analysis Platform in the Center for Cancer Genomics and Advanced Therapeutics (C-CAT) focuses on developing and evaluating (1) methodologies for detecting various types of somatic variants (SNVs, indels, structural variations, etc.) in cancer genomes and (2) platforms for analysis and data sharing utilizing cloud computing environments.
The Team and What We Do
In our research laboratory, we aim to support cancer researchers through the development of foundational information analysis that can contribute to novel discoveries, while also furthering our own understanding of biological and medical insights through extensive analyses.
Research Activities
1) Construction of the C-CAT Utilization System
In C-CAT, the results of gene panel testing and the corresponding medical information for each patient are collected and stored with individual consent. Among these, cases where patients have given consent for data utilization are used for research purposes. In our laboratory, to facilitate the sharing of cases that have given consent for utilization with researchers, we have jointly developed a virtual desktop (C-CAT CALICO) for data users to conduct their own analyses in collaboration with Hitachi, Ltd.
The registered case data includes sequence data and mutation data, making the design and development of a secure usage environment crucial. We have designed the system in accordance with the "Security Guidelines for Medical Information Systems" set forth by the Ministry of Health, Labour and Welfare. This fiscal year, we have established a new cluster computing environment on AWS as part of the C-CAT Utilization System. With the realization of the cluster computing environment, the virtual desktop can now handle large-scale analyses and computations that previously lacked sufficient computational resources.
2) Development of the Genome Analysis Pipeline G-CAT PostProcess
To create sequence data and mutation data used in the C-CAT Utilization System, we have previously developed the genome analysis pipeline called G-CAT Workflow (https://github.com/ncc-gap/GCATWorkflow). The G-CAT Workflow is designed for use in a grid engine computing environment and allows for the sequential execution of various analysis jobs, considering dependencies. These analysis jobs include somatic mutation analysis for acquired mutations in cancer cells, germline mutation analysis for cancer genes, copy number analysis, structural abnormality analysis, haplotype analysis, transcriptome analysis, and more.
The newly developed G-CAT PostProcess is a genome analysis post-processing pipeline that annotates mutation data generated by the G-CAT Workflow and applies various false-positive filtering methods. The main functions of G-CAT PostProcess are as follows:
- It applies various filters to the numerous mutation candidates generated as the analysis result of whole genome sequencing data and detects high-priority mutation candidates.
- Similarly, it detects germline structural variations among the many mutation candidates and provides information on the genes between the breakpoints of the structural variations, as well as information on SV types such as deletions, amplifications, and inversions.
- It performs copy number analysis on the entire genome sequencing data.
Education
We supported many researchers using our analysis pipeline by answering their bioinformatics questions. We hired postdocs and supported their research.
Future Prospects
Using current new sequencing technologies, we will develop new bioinformatics methods and computer systems for cancer genomics and clinical sequencing.
List of papers published in 2022
Journal
1. Kohno T, Kato M, Kohsaka S, Sudo T, Tamai I, Shiraishi Y, Okuma Y, Ogasawara D, Suzuki T, Yoshida T, Mano H. C-CAT: The National Datacenter for Cancer Genomic Medicine in Japan. Cancer discovery, 12:2509-2515, 2022