Home > Information > research topics > Development of a machine learning model to predict splenic lymph node metastasis in upper gastric cancer
Development of a machine learning model to predict splenic lymph node metastasis in upper gastric cancer-Development of a clinical decision support system based on Bayesian approaches-
Highlights
- In upper gastric cancer, the spleen may be removed based on the possibility of splenic hilar lymph node metastasis. However, the removal of the spleen has a high complication rate and may not actually metastasize, and there is a need for an appropriate decision-making method based on reliable prediction of splenic hilar lymph node metastasis.
- Machine learning models based on the conventional frequency-based approach could only provide a single point estimate and could not grasp the uncertainty of the prediction, so they were not suitable for use in clinical practice. We attempted to develop a prediction model that would be useful for decision-making by focusing on the uncertainty of the prediction based on the Bayesian approach, and we succeeded in visualizing the individual posterior probability distribution of splenic hilar lymph node metastasis in upper gastric cancer from the clinicopathological characteristics, ahead of the world.
- The Bayes-SHLNM model showed excellent performance with an ROC-AUC of 0.83.
- When tumors are divided into two categories based on the presence or absence of invasion of the greater curvature, which is recommended as an indication for splenomegaly lymph node dissection, the Bayes-SHLNM model correctly predicted that 99% of cases without invasion of the greater curvature were negative.
- The Bayes-SHLNM model is expected to provide an effective personalized index that can be used to evaluate and discuss the pros and cons of splenectomy, an invasive treatment, while understanding the benefits of performing splenectomy and the disadvantages of developing complications in that case.
Summary
A joint research group consisting of the National Cancer Center Research Institute, National Cancer Center Hospital, RIKEN Center for Advanced Intelligence Project, and Nagoya University has developed a model called “Bayes-SHLNM” that uses Bayesian logistic regression (Note 1), a type of machine learning, to predict the metastasis of lymph nodes in the splenic hilum in upper gastric cancer (Note 2). Machine learning models based on the frequency-based approach (Note 3) are unable to predict uncertainty, and therefore cannot meet the requirements of clinical practice, and even high-performance models have been unable to change the process of clinical decision-making. The Bayesian-SHLNM model developed by the research group uses a Bayesian approach (Note 4) to help with complex clinical decisions that require actionable insights that take uncertainty into account. This model also visualizes posterior probability distributions to clarify uncertainty and the range of outcomes, and can be used to help with clinical decisions in high-risk, uncertain situations (Figure 1).
The results of this research were published in the online edition of the international academic journal npj Digital Medicine (dated February 11, 2025).
Figure 1: Comparison of the frequency-based model and the Bayesian-based model (Bayes-SHLNM) for predicting the probability of metastasis in splenomegaly lymph node dissection for upper gastric cancer
This figure shows the different approaches of a frequency-based model and a Bayesian-based logistic regression model in predicting the probability of splenic hilar lymph node metastasis in patients undergoing total gastrectomy with splenectomy. The frequency-based model (upper pathway) generates a single point probability estimate of metastasis (e.g., 72%) from clinical, tumor, lymph node location, and pathology information. On the other hand, the Bayesian model (see below) uses the same data sources, but outputs a posterior probability distribution, providing a more comprehensive view of the uncertainty involved in the prediction. The posterior probability distribution allows you to visualize the range of possible outcomes and highlight the degree of uncertainty, which is very important for making informed clinical decisions in high-risk, high-uncertainty situations.
Background
Gastric cancer is the fifth most common type of cancer in the world, and the fourth most common cause of cancer-related death. Excessive resection leads to high morbidity and a short life expectancy, so the principle of radical treatment for gastric cancer is considered to be surgical treatment with appropriate regional lymph node dissection. The splenic hilar lymph nodes are regional lymph nodes for gastric cancer in the upper third of the gastric, and metastases have been found in 2.8-27.9% of cases. Currently, splenectomy with splenic hilar lymph node dissection is widely performed in Japan, but splenectomy has several drawbacks, and in particular, the incidence of postoperative complications (approximately 20-30%) is high, as reported in randomized controlled clinical trials (Note 5) and retrospective studies, and in certain cases, it has been shown to offset the effect of improving survival rates. Given the above, there is an increasing need to develop tools that can be used to select the appropriate surgical approach based on the oncological status of the cancer. In fact, various machine learning models have been published to predict lymph node metastasis in gastric cancer, with the main focus being on predicting the prognosis of advanced gastric cancer following endoscopic resection of early-stage gastric cancer. However, no models have been developed to change the surgical plan or extent of lymph node dissection.
In this study, we aimed to develop a model that would be useful for decision-making based on a Bayesian approach that focuses on the uncertainty of predictions, and to visualize the individual posterior probabilities of splenic lymph node metastasis in advanced gastric cancer based on clinicopathological features.
Research Results
- Performance comparison of Bayesian models and frequency-based logistic machine learning models
In this study, we constructed a model using data from patients who had undergone gastrectomy with lymphadenectomy, and who had also undergone total gastrectomy with splenectomy for primary gastric cancer (n = 593). We then evaluated the performance of four Bayesian models and a frequency-based logistic regression (FLR) model using five-fold cross-validation (5fCV)(Note 6). As a result, of the four Bayesian models, the Bayes-SHLNM model showed excellent performance in terms of the receiver operating characteristic area under the curve (ROC-AUC)(Note 7) (0.83), and the results were almost the same as those of the FLR model (Figure 2). These results show that Bayes-SHLNM is a robust alternative to the FLR model. When the tumors were divided into two categories based on the presence or absence of greater curvature invasion (GCI) (Note 8), which is recommended as an indication for splenomegaly lymph node dissection, both models predicted positive results in almost 20% of cases in both categories, while cases without GCI were correctly predicted to be negative in 99% of cases (Table 1). Although the results of Bayes-SHLNM and FLR were found to be similar, Bayes-SHLNM showed slightly better positive and negative results in cases without involvement of the greater curvature.
Figure 2: Comparison of the performance and 95% confidence intervals of four Bayesian models and one frequency-based model based on the average results of the five-fold cross-validation method.
ROC-AUC was compared with the true positive rate (sensitivity) and false positive rate of each model. The Bayes-SHLNM model achieved the highest ROC-AUC (0.74-0.91) of 0.83. The shaded area around each curve indicates the 95% confidence interval for each model.
【Table 1】Prediction results for the presence or absence of invasion of the greater curvature in advanced gastric cancer using the Bayes-SHLNM model
AGC: advanced gastric cancer
- Bayes-SHLNM model for displaying posterior probability distributions for individual patients
Figure 3 shows representative cases of the posterior probability distribution of splenic hilar lymph node metastasis inferred using the Bayes-SHLNM model. Figure 3a shows a case of advanced gastric cancer without involvement of the greater curvature. According to the Japanese Gastric Cancer Association (JGCA) guidelines (6th edition), “it is strongly recommended that splenectomy or splenohilar lymph node dissection not be performed for tumors that do not involve the greater curvature.” However, the Bayes-SHLNM model provides an opportunity to reconsider whether the patient is at risk of splenic hilar lymph node metastasis. Figure 3b shows a case of advanced gastric cancer with involvement of the greater curvature, but the Bayes-SHLNM model predicted that the possibility of splenic hilar lymph node metastasis was low. For advanced gastric cancer with involvement of the greater curvature, the JGCA guidelines state that “splenectomy and splenocolic dissection are weakly recommended for tumors that invade the greater curvature”, but the Bayes-SHLNM model provides an opportunity to reconsider whether or not to perform this procedure. As you can see, the Bayes-SHLNM model can be used to help reach a consensus on whether or not a patient should undergo splenectomy.
Figure 3: Examples of posterior probability distribution of splenic lymph node metastasis using the Bayes-SHLNM model for cases of advanced gastric cancer
(a) Posterior probability distribution of cases of advanced gastric cancer without involvement of the greater curvature. The Bayes-SHLNM model predicted that the average probability of splenic hilar lymph node metastasis was 0.502 (metastasis present). (b) Posterior probability distribution of cases of advanced gastric cancer with involvement of the greater curvature. The Bayes-SHLNM model predicted that the average probability of splenic hilar lymph node metastasis was 0.023 (no metastasis).
Prospects
In this study, we developed a Bayesian model, Bayes-SHLNM, to predict splenic hilar lymph node metastasis in upper gastric cancer using data from 593 patients who underwent total gastrectomy with splenectomy. To the best of our knowledge, this is the first study to report a machine learning model that uses a Bayesian approach to predict splenic hilar lymph node metastasis . The results of this study suggest that the Bayes-SHLNM model may be useful for clinical decision-making regarding whether or not to perform splenomegaly lymph node dissection in upper gastric cancer. In future studies, we plan to conduct both prospective studies and simulation-based analyses to evaluate the accuracy of the posterior probability distribution and address potential model misspecification. In these analyses, we will examine the calibration of the Bayesian approach under various scenarios, including misspecified model settings . Furthermore, we believe that validation using external data sets will provide deeper insight into the robustness of uncertainty estimation in different situations. We believe that these efforts are essential to ensuring the reliability and practicality of Bayesian approaches in clinical settings. As a result, we believe that the contribution of Bayesian approaches to the clinical decision-making process will be properly evaluated, and that this will provide a promising outlook for precision medicine.
Publication
Journal
npj Digital Medicine
Title
Establishment of a machine learning model for predicting splenic hilar lymph node metastasis
Authors
Kenichi Ishizu, Satoshi Takahashi (* Corresponding Author), Nobuji Kouno, Ken Takasawa, Katsuji Takeda, Kota Matsui, Masashi Nishino, Tsutomu Hayashi, Yukinori Yamagata, Shigeyuki Matsui, Takaki Yoshikawa, Ryuji Hamamoto (* Corresponding Author)
DOI
10.1038/s41746-025-01480-x
Date
February 11, 2025(Online pre-release)
URL
https://www.nature.com/articles/s41746-025-01480-x (linked at external site)
Funding
- Cabinet Office BRIDGE (programs for bridging the gap between R&D and the ideal society (Society 5.0) and generating economic and social value)(Principle Investigator:Ryuji Hamamoto)
- MEXT subsidy for the Advanced Integrated Intelligence Platform
Presenters
National Cancer Center
Research Institute Division of Medical AI Research and Development:Kenichi Ishizu (first author), Nobuji Kouno, Ryuji Hamamoto (corresponding author)
Department of Gastric Surgery:Kenichi Ishizu (concurrently), Masashi Nishino, Tsutomu Hayashi, Yukinori Yamagata, Takaki Yoshikawa
RIKEN Center for Advanced Intelligence Project
Cancer Translational Research Team:Satoshi Takahashi (corresponding author), Ken Takasawa, Katsuji Takeda, Ryuji Hamamoto (concurrently)
Nagoya University Graduate School of Medicine
Department of Biostatistics:Kota Matsui、Shigeyuki Matsui
Glossary
Note 1 Bayesian logistic regression
ayesian logistic regression is a method that applies the framework of Bayesian statistics to logistic regression. While conventional logistic regression takes a frequency-based approach, Bayesian logistic regression has the characteristic of setting a prior probability distribution for the parameters and then calculating the posterior probability distribution based on the data.
Note 2 Splenic hilum lymph node metastasis
Splenic hilum lymph node metastasis refers to the spread of cancer cells to the lymph nodes in the splenic hilum. The splenic hilum is an area where the splenic artery and vein, which supply blood to the spleen, converge, and there are many lymph nodes. The lymph flow of gastric cancer has a route that flows from the greater curvature of the gastric to the splenic hilum, and there is a characteristic that lymph node metastasis in the splenic hilum is likely to occur in cancer that occurs in the upper part of the gastric (cardia and corpus). In advanced gastric cancer, the splenic hilum lymph nodes are a common site of metastasis, so D2 dissection (extensive lymph node resection) may be required.
Note 3 Frequency-based approach
The frequency-based approach is a method of statistical inference that defines probability as “long-term frequency”. This approach is based on the “rate at which a specific event occurs when data is observed repeatedly”. The parameters are fixed values, and when conducting a hypothesis test, a null hypothesis (H0) is set and a decision is made as to whether or not to reject it. Generally, if p < 0.05 (5%), it is assumed that “the null hypothesis is rejected and the alternative hypothesis is supported”.
Note 4 Bayesian approach
The Bayesian approach is a method of reasoning that integrates data and prior information, viewing probability as a “degree of uncertainty”. Inference is carried out by combining prior probability distributions and likelihoods, and then using Bayes' theorem to obtain posterior probability distributions. Parameters are handled using probability distributions (prior distribution → posterior distribution), and posterior probabilities are used for hypothesis testing. Whereas estimation using the frequency-based approach gives a single estimated value, the Bayesian approach gives an estimate as a “probability distribution”, and can also take uncertainty into account.
Note 5 Randomized controlled trial (RCT)
A randomized controlled trial (RCT) is a clinical trial in which subjects are randomly assigned to groups in order to evaluate the efficacy and safety of a new treatment or medicine. RCTs are considered to be the most reliable type of evidence in medical research, and are considered the gold standard for rigorously evaluating causal relationships.
Note 6 5-fold cross-validation (5fCV)
5-fold Cross-Validation (5fCV) is a method of evaluating the performance of a model when the data set is divided into five equal subsets (folds) and each is used as test data. This is used to prevent over-fitting of the model and to perform more stable evaluations. It is a type of general Cross-Validation (CV), and 5-fold is one of the standard settings that is often used.
Note 7 Receiver Operating Characteristic Area Under the Curve (ROC-AUC)
The receiver operating characteristic area under the curve (ROC-AUC) is one of the metrics used to evaluate the performance of a classification model, and it measures the model's ability to discriminate by using the area under the ROC curve (AUC value). The ROC (Receiver Operating Characteristic) curve refers to a plot of the true positive rate (TPR) and false positive rate (FPR) at different threshold values, and the AUC (Area Under Curve) refers to the area under the ROC curve (values range from 0 .0 to 1.0). A larger AUC value means that the model has a higher ability to correctly identify classes.
Note 8 Greater curvature invasion (GCI)
Greater curvature invasion (GCI) refers to the state in which gastric cancer has spread to the greater curvature side of the gastric. The gastric is divided into two curved parts, the greater curvature and lesser curvature, and when cancer invades the greater curvature side, there is a high risk of it spreading beyond the gastric wall to surrounding tissues and lymph nodes.
Inquiries
On Research
Ryuji Hamamoto, Chief, Division of Medical AI Research and Development, National Cancer Center Research Institute
E-mail: rhamamot●ncc.go.jp
Takaki Yoshikawa, Chief, Department of Gastric Surgery, National Cancer Center Hospital
E-mail: tayoshik●ncc.go.jp
Satoshi Takahashi, Senior Research Scientist, Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project
E-mail: satoshi.takahashi.fy●riken.jp
Kota Matsui, Lecturer, Department of Biostatistics, Nagoya University Graduate School of Medicine
E-mail: matsui.k●med.nagoya-u.ac.jp
From Media
Office of Public Relations, Strategic Planning Bureau, National Cancer Center
E-mail: ncc-admin●ncc.go.jp
RIKEN, Public Relations Office, Media Relations
Phone: +81-50-3495-0247
E-mail: ex-press●ml.riken.jp
Nagoya University, General Affairs Section, School of Medicine / Graduate School of Medicine
Phone: +81-52-744-2228
E-mail: iga-sous●t.mail.nagoya-u.ac.jp