Jump to Main Contents
ncc en
HOME > Publication & Reports > Annual Report 2021 > Center for Cancer Genomics and Advanced Therapeutics

Annual Report 2021

Genome Analysis Platform Section

Yuichi Shiraishi, Ai Okada, Kenichi Chiba, Naoko Iida

Introduction

 The Section of Genome Analysis Platform in the Center for Cancer Genomics and Advanced Therapeutics (C-CAT) focuses on developing and evaluating (1) Methodologies for detecting various types of somatic variants (SNVs, indels, structural variations, and so on) in cancer genomes, (2) Normalization of cancer genome data formats, (3) Workflows for cancer genome analysis and (4) Platform for analyzing and sharing cancer genome data using cloud computing. These products created by us will be best utilized in projects for collecting and managing cancer genome data. Also, we are preparing for a platform for whole genome sequencing analysis in clinical settings.

Research activities

(1) Construction of a utilization system

 In C-CAT, the results of gene panel tests and corresponding medical information of each patient are collected and stored with the individual’s consent. We are developing a search portal (a portal site for browsing and searching case information) and a utilization cloud (a virtual desktop for performing analysis) using a cloud system for utilization. Since the registered case data includes sequence and mutation data, we are designing and developing a secure usage environment.

(2) Development of genome analysis pipelines and frameworks using a cloud system

 To produce sequence data and mutation data to be used in the C-CAT utilization system, we constructed a genome analysis framework that includes alignment software (bwa, GATK), mutation call software for somatic analysis (GenomonMutationCall, GenomonSV, mutectcaller, GIDSS, manta), and mutation call software for germline analysis (haplotypecaller, GIDSS, manta). For GenomonMutationCall and GenomonSV, which we developed, we improved the realignment process. This year, we developed and optimized the following software.

  • Software to detect SNVs with low allele frequencies.
  • Software to measure the execution processing time and memory usage of software to calculate Microsatellite Instability (MSI).
  • Scripts to calculate Tumor Mutation Burden (TMB).
  • Software called GCATCopyNumber to analyze the copy number of whole genome sequencing data.

(3) Analysis of whole genome sequencing data of 1111 samples of the Hereditary Tumor Project

 In C-CAT, the results of gene panel tests and corresponding medical information of each patient are collected and stored with the individual’s consent. We are developing a search portal (a portal site for browsing and searching case information) and a utilization cloud (a virtual desktop for performing analysis) using a cloud system for utilization. Since the registered case data includes sequence and mutation data, we are designing and developing a secure usage environment.

(4) Creation of tools to evaluate the analysis results of whole genome sequencing data

 To evaluate the software developed to analyze the whole genome sequencing data, we created a tool to evaluate the analysis results. We confirmed that the SNVs, indels, and SV detection results were good in both sensitivity and specificity using the evaluation tool.

Education

 We supported the many researchers using our analysis pipeline by answering their bioinformatics questions. We hired postdocs and supported their research.

Future Prospects

 Using current new sequencing technologies, we will develop new bioinformatics methods and computer systems for cancer genomics and clinical sequencing.