Deep Learning Model Predicts 5-Year Mortality in NSCLC

A study in JMIR Medical Informatics, led by Jong Hyuk Lee (Asan Medical Center, University of Ulsan) and colleagues, develops and internally validates deep learning models to predict 5-year mortality in non-small-cell lung cancer (NSCLC) using the Korea Central Cancer Registry (KCCR). The authors identified 3,144 patients diagnosed in 2014-2017 with complete clinical, pulmonary-function, histological, genomic and staging data, split 70/15/15 into training, validation and test sets. Five model families were tuned with Hyperband across ten predefined feature groups, with area under the ROC curve (AUC) as the primary metric and accuracy, F1, precision and recall also reported. The team used group-wise permutation importance, compared importance rankings with the Friedman test, and benchmarked against a Cox proportional hazards baseline. It is a single-registry, internally validated study; external validation, calibration and released performance figures would be needed before any clinical use.
What happened
A study in JMIR Medical Informatics, led by Jong Hyuk Lee (Asan Medical Center, University of Ulsan) with Ho Cheol Kim, Kyu-Won Jung and Chang Min Choi, develops and internally validates deep learning models to predict 5-year mortality in non-small-cell lung cancer (NSCLC) using the Korea Central Cancer Registry (KCCR). The cohort comprised 3,144 patients diagnosed in 2014-2017 with complete clinical, pulmonary-function, histological, genomic and staging data, split 70/15/15 into training, validation and test sets.
Methods
Five model families were tuned with Hyperband across ten predefined feature groups. Models were evaluated primarily by area under the ROC curve (AUC), with accuracy, F1, precision and recall also reported. The authors computed group-wise permutation importance to quantify feature-group contributions and used the Friedman test to compare importance rankings across models, with a Cox proportional hazards model as a classical survival-analysis baseline. Specific AUC values were not surfaced in the materials reviewed here.
Why it matters
Registry-scale, multimodal prognostic modeling that includes genomic and pulmonary-function inputs is a meaningful direction for clinical risk stratification, since richer feature sets can improve discrimination. The methodological choices - automated tuning plus permutation-based group importance and rank tests - support reproducible cross-model comparison. That said, this is a single national registry with internal validation only.
What to watch
External validation on independent cohorts, calibration analysis, subgroup performance by stage, histology or genomic subtype, decision-threshold and explainability outputs suitable for clinicians, and any release of model artifacts or code - these would determine whether the approach can move toward clinical utility.
Key Points
- 1A peer-reviewed JMIR Medical Informatics study trains deep models on 3,144 KCCR NSCLC patients (2014-2017) using multimodal data - clinical, pulmonary-function, histological, genomic and staging.
- 2Methodology: 70/15/15 split, Hyperband tuning across ten feature groups, AUC as the primary metric, with group-wise permutation importance and a Cox proportional hazards baseline.
- 3For clinical-ML practitioners: it is a single-registry, internally validated model; external validation, calibration and published performance figures are still needed for clinical utility.
Scoring Rationale
Now peer-reviewed (JMIR Medical Informatics) registry-scale multimodal prognostic model for 5-year NSCLC mortality (3,144 KCCR patients), with sound methodology (Hyperband tuning, permutation importance, CPH baseline). Solid, interesting clinical-ML research, but narrow in scope - single national registry, internal validation only, and no external validation or released AUC figures - so it sits mid-range. Adjusted 6.6 to 6.0; added the published version alongside the preprint.
Sources
Public references used for this report.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems
