Abstract Myeloid malignancies exhibit considerable heterogeneity with overlapping clinical and genetic features among subtypes.We present a data-driven approach that integrates mutational features and clinical covariates at diagnosis within networks of their probabilistic relationships, enabling the discovery of patient subgroups.A key strength is its ability to include presumed causal directions in the edges linking clinical and mutational features, and account for them aptly in the clustering.In a Arm Crank cohort of 1323 patients, we identify subgroups that outperform established risk classifications in prognostic accuracy.Our approach generalises well to unseen cohorts with classification based on our subgroups similarly offering advantages in predicting prognosis.
Our findings suggest that mutational patterns are often shared across myeloid malignancies, with distinct subtypes potentially drawers representing evolutionary stages en route to leukemia.With pancancer TCGA data, we observe that our modelling framework extends naturally to other cancer types while still offering improvements in subgroup discovery.