Annual Report 2024
Division of Genome Analysis Platform Development
Yuichi Shiraishi, Yoshitaka Sakamoto, Raúl Nicolás Mateos, Hajime Suzuki
Introduction
We are focusing on the development of algorithms, pipelines, and tools for analyzing cancer genome and transcriptome data using short-read and long-read sequencing technologies. During the rapid advancement of sequencing technologies and the development of numerous methodologies and software, we are trying to build and improve analytical platforms in terms of acceleration and refinement that can contribute to elucidating the progression, causes, and states of cancer.
Research Activities
1. Cancer genome analysis in centromeric regions
In the genomes of patients with brain tumors or myelodysplastic syndromes, recurrent translocations have been observed, which are predicted to occur within centromeric regions. To address this, we have developed and refined methods for accurately analyzing the breakpoints of these translocations using both short-read and long-read sequencing data from patients.
2. Development of a novel cancer genome analysis pipeline
Using long-read sequencing data obtained from the normal tissues of cancer patients, we are constructing personalized reference genomes and developing pipelines for cancer genome analysis based on these references.
3. Development of a classifier to predict aberrant activation of the KEAP1-NRF2 pathway from abnormal splice junctions
By extracting abnormal splice junctions from large-scale public transcriptome datasets and integrating them with mutational information of genes related to the KEAP1-NRF2 pathway, we trained and evaluated a classifier to predict aberrant activation of this pathway.
4. Development of a novel algorithm for identifying overlaps between sequencing reads
In de novo assembly, a method for reconstructing genome sequences from sequencing data, identifying overlaps between sequence reads is a critical step. We are developing an algorithm capable of accurately detecting overlaps even in highly repetitive regions such as centromeres.
5. Estimation of the age and propagation routes of founder mutations in cancer genomes
Using sequencing data from patients carrying the same founder mutation at the same locus of a given cancer-related gene, we are developing a method to infer how the founder mutations have propagated based on the extent of shared single nucleotide polymorphisms among the patients, and to estimate when the founder mutations originally occurred.
6. Development of a screening platform for splice-site creating mutations using large-scale public transcriptome data
We developed a novel computational method to efficiently identify splice-site creating mutations, which are increasingly recognized as causal variants and potential therapeutic targets, from sequencing data. By reanalyzing over 300,000 large-scale public datasets, we successfully identified such splice-site creating mutations.
Education
We provided careful and detailed responses to any questions from researchers using the genome and transcriptome analysis pipeline we developed, thereby supporting and contributing to many of their studies. In addition, we contributed to research by assisting with genome and transcriptome analysis through collaborations within the institute as well as with external organizations.
Future Prospects
We aim to construct diverse workflows for cancer genome and transcriptome analysis to deepen our understanding of cancer and contribute to treatment. These workflows will be applied to large-scale datasets to facilitate knowledge discovery from the obtained information.
