HOME > Publication & Reports > Annual Report 2024 > Center for Cancer Genomics and Advanced Therapeutics

Annual Report 2024

Section of Genome Analysis Platform

Yuichi Shiraishi, Ai Okada, Kenichi Chiba

Introduction

The Section of Genome Analysis Platform in the Center for Cancer Genomics and Advanced Therapeutics (C-CAT) focuses on developing and evaluating (1) methodologies for detecting various types of somatic variants (SNVs, indels, structural variations, etc) in cancer genomes and (2) platforms for analysis and data sharing utilizing cloud computing environments.

The Team and What We Do

In our research laboratory, we aim to support cancer researchers through the development of foundational information analysis that can contribute to novel discoveries, while also furthering our own understanding of biological and medical insights through extensive analyses.

Research Activities

1) Development of the C-CAT Utilization System

In C-CAT, the results of gene panel testing and the corresponding medical information for each patient are collected and stored with individual consent. Among these, cases where patients have given consent for data utilization are used for research purposes. In our laboratory, to facilitate the sharing of cases that have given consent for utilization with researchers, we have jointly developed a virtual desktop (C-CAT CALICO) for data users to conduct their own analyses in collaboration with Hitachi, Ltd.

The registered case data includes sequence data and mutation data, and to establish a secure usage environment, the design was carried out in accordance with the "Guidelines for the Security Management of Medical Information Systems" established by the Ministry of Health, Labour and Welfare.

2) Development of the Genome Analysis Pipeline G-CAT PostProcess

To generate sequence data and mutation data for use in the C-CAT utilization system, we have developed the genome analysis pipeline, G-CAT Workflow (https://github.com/ncc-gap/GCATWorkflow). G-CAT Workflow is a genome analysis pipeline designed to run in Grid Engine computing environments. It can sequentially execute various analysis jobs while taking dependencies into account, including somatic mutation analysis (to detect acquired mutations in cancer cells), copy number analysis, structural variation analysis, germline mutation analysis of cancer-related genes, and transcriptome analysis.

G-CAT PostProcess is a post-processing genome analysis pipeline that annotates variant data generated by G-CAT Workflow and applies a series of filtering procedures to eliminate false positives. As databases for annotation and variant detection software are continuously evolving, we perform regular updates to ensure the pipeline remains current.

In the present fiscal year, the following components of G-CAT PostProcess were updated:

- Analysis tools used to filter false-positive variant candidates from the results of somatic mutation analysis were revised.

- EBFilter, an analysis tool employed in somatic mutation analysis, was updated, leading to a substantial reduction in processing time.

- Annotation databases utilized by G-CAT PostProcess were updated.

Education

We supported many researchers using our analysis pipeline by answering their bioinformatics questions. We hired postdocs and supported their research.

Future Prospects

Using current new sequencing technologies, we will develop new bioinformatics methods and computer systems for cancer genomics and clinical sequencing.