Sebcentrism

Weekly reads 15/06/26

Sebastiaan Vanuytven — Sun, 21 Jun 2026 11:55:48 GMT

This week’s reads focused on papers that push the boundaries of spatial and single-cell biology, revealing how context, whether spatial, genetic, or microbial, shapes cellular behavior and disease. Cellina unveils a novel method of tissue graph counterfactuals by means of supervised disentanglement to guess the cellular behavior upon change in microenvironment, while Decima decodes the sequence determinants of gene expression across 22 million single-cell profiles, linking DNA to disease-specific regulation. With respect to cancer, a spatial atlas of muscle-invasive bladder cancer reveals a luminal-to-basal differentiation axis which controls the immune architecture and unveils therapeutic vulnerabilities, and DeLeakage after spatial transcriptomics discovers a widespread transcript leakage issue, therefore restores the accurate interpretation of the results. At the same time, Cheruiyot et al. present how inflammatory cytokines such as IFNγ render tumors dependent on previously unknown factors, and Dohlman et al. confirm what many have questioned over the years: only orodigestive cancers contain microbiomes that are detectable, site-specific, and that have significant implications for genomic instability

Preprints/articles that I managed to read this week

Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement

Moeed et al. arxiv (2026). https://arxiv.org/abs/2606.08493

Overview of Celina methdod (Figure 1 form Moeed et al.)

The paper in one sentence

This work introduces Cellina, a supervised disentanglement framework that formalizes tissue graph counterfactuals as spatial interventions (edge and node perturbations) and enables accurate prediction of how cells would respond to altered neighborhood contexts in tissues.

Summary

Spatial transcriptomics has revealed that a cell’s gene expression is strongly influenced by its local microenvironment, yet most computational models still treat cells as independent entities. This paper addresses this gap by formalizing tissue graph counterfactuals, a class of spatial interventions that either rewire connections between cells (edge perturbations) or modify the expression of neighboring cells (node perturbations). The authors propose Cellina, a graph variational autoencoder (VAE) that decomposes each cell’s gene expression into two latent components: an intrinsic representation (encoding cell identity) and an extrinsic (spatial) representation (encoding microenvironmental influence). Supervised disentanglement, guided by cell-type and spatial-domain labels, ensures that these components remain biologically interpretable. Across benchmarks spanning 2.5 million spatially resolved cells from colorectal cancer and mouse brain datasets, Cellina outperforms both spatially informed and non-spatial baselines in counterfactual prediction, disentanglement quality, and scalability. Additionally, Cellina identifies biologically distinct cancer subdomains and enables targeted pathway-specific neighbor perturbations, demonstrating its utility for both discovery and simulation.

Personal highlights

Formalization of tissue graph counterfactuals: The paper provides a unified definition of spatial interventions as either edge perturbations (rewiring neighborhood topology) or node perturbations (altering neighbor gene expression), establishing a common framework for studying neighborhood-driven cell responses.
Supervised disentanglement: Cellina uses a dual-encoder graph VAE with biological supervision (cell-type and spatial-domain labels) to explicitly separate intrinsic cell identity from extrinsic spatial context, improving interpretability and predictive performance.
Scalable and robust performance: Across large-scale datasets, Cellina consistently outperforms competitors on key metrics (e.g., Pearson r, Signed Precision, RMSE) for both edge and node perturbation tasks, demonstrating generalizability across tissues and species.
Biological discovery without supervision: The spatial latent representation in Cellina reveals distinct cancer subdomains with interpretable signaling programs (e.g., TGFβ-dominant vs. NFκB/MAPK-dominant), aligning with known biological mechanisms and published findings.
Pathway-targeted in silico perturbations: Cellina enables simulations of pathway-specific neighbor perturbations, recreating observed subdomain effects (e.g., up-regulation of FN1 and MMP3 in cancer-associated fibroblasts) and linking them to established biological pathways.

Why should we care?

This work pushes the boundaries of computational spatial biology by providing a framework to simulate how cells would behave in altered microenvironments, disease progression, and therapeutic interventions. Traditional single-cell models often ignore the spatial context that shapes cellular behavior, but Cellina explicitly accounts for it, offering a more realistic representation of biological systems. However, the approach has limitations: it relies on computational simulations rather than direct experimental validation, and its effectiveness depends on the quality of spatial annotations and assumptions about cell segmentation and neighborhood definitions.

Decoding sequence determinants of gene expression in diverse cellular and disease states

Lal et al. Nature Methods (2026). 10.1038/s41592-026-03102-0

The paper in one sentence

Decima, a deep learning model trained on over 22 million single-cell RNA-seq profiles, predicts cell type- and disease-specific gene expression from DNA sequence, enabling the interpretation of regulatory mechanisms and the effects of noncoding variants at unprecedented resolution.

Summary

This study introduces Decima, a sequence-to-function model that leverages single-cell and single-nucleus RNA-seq data from 22 million cells across 201 cell types, 271 tissues, and 82 diseases. Decima predicts gene expression from 524-kb DNA sequences surrounding gene transcription start sites, achieving high accuracy (mean Pearson correlation of 0.80 for held-out genes across pseudobulks). The model identifies cell type-specific regulatory elements, transcription factor motifs, and sequence determinants of disease-associated expression changes. It also predicts the impact of noncoding variants (e.g., eQTLs and GWAS hits) at cell type resolution, outperforming bulk-trained models like Borzoi. Finally, Decima demonstrates proof-of-concept potential for designing synthetic regulatory elements with cell type- and disease-specific activity.

Personal highlights

Scalable single-cell resolution: Decima is trained on pseudobulk aggregated sc/snRNA-seq data from 22M cells, enabling predictions across 201 cell types, 271 tissues, and 82 diseases, exceeding the scope of previous bulk-trained models.
Accurate cell type-specific predictions: The model achieves a mean Pearson correlation of 0.80 for predicting expression of held-out genes across pseudobulks and 0.58 for predicting expression of the same gene across different cell types.
Interpretability of regulatory mechanisms: Using input × gradient attribution, Decima highlights cell type-specific regulatory elements (e.g., promoters, enhancers) and transcription factor motifs (e.g., C/EBP, RXR, TWIST1) driving differential expression, even for distal elements (>100 kb from TSS).
Variant effect prediction at cell type resolution: Decima prioritizes cell type-specific eQTLs and GWAS variants, correctly predicting the direction of effect for 87% of high-confidence sc-eQTLs and linking variants to relevant cell types (e.g., immune cells for autoimmune diseases, hepatocytes for triglyceride-associated variants).
Disease and design applications: The model reveals sequence determinants of disease-specific cell states (e.g., inflammatory signatures in Crohn’s disease fibroblasts) and demonstrates potential for in silico design of synthetic regulatory elements with cell type- and disease-biased activity.

Why should we care?

This work is connecting DNA sequence to functional outcomes in specific cell types and disease contexts. While bulk models like Borzoi have advanced our understanding of gene regulation, they lack the resolution to dissect mechanisms in heterogeneous tissues or pathological states. Decima’s ability to predict expression from sequence at single-cell resolution enables: precision medicine, disease mechanism discovery and synthetic biology. While Decima is a major step forward, its reliance on pseudobulk aggregation means it may miss nuanced single-cell heterogeneity. The model’s performance also depends on the quality and diversity of training data, and its predictions are limited to the cell types, tissues, and diseases represented in the atlas. Additionally, distal enhancer-gene interactions remain challenging, and experimental validation is still required for applications like regulatory element design

A Spatial Atlas of Muscle-Invasive Bladder Cancer Reveals Lineage-Specific Vulnerabilities and Immune Architecture

Yu et al., Cancer Discovery (2026). DOI: 10.1158/2159-8290.CD-26-0099

The paper in one sentence

This study constructs a spatially resolved atlas of muscle-invasive bladder cancer (MIBC), uncovering a continuous luminal-to-basal differentiation axis that shapes tumor architecture, immune organization, and lineage-specific therapeutic vulnerabilities.

Summary

Using spatial transcriptomics on 22 pre-treatment MIBC tumors—integrated with matched bulk RNA-seq, whole-exome sequencing, and single-cell RNA-seq—the authors map a spatially organized luminal-to-basal axis within individual tumors. Luminal tumor cores, enriched for FGFR3 and NECTIN4, are immune-excluded and associated with proteotoxic stress responses, while basal-like states at invasive margins exhibit EGFR signaling, epithelial-mesenchymal transition (EMT), genomic instability, and dense immune infiltration (including tertiary lymphoid structures, TLSs). The study validates these findings across >3,000 tumors, demonstrating conserved FGFR3-EGFR lineage exclusivity and associating luminal states with vulnerability to NECTIN4-targeted therapies (e.g., enfortumab vedotin) and basal states with chemotherapy sensitivity. Functional experiments further show that FGFR3 and EGFR signaling reciprocally regulate lineage identity, and that TLS maturity and proximity to basal regions correlate with response to neoadjuvant chemotherapy.

Personal highlights

Spatially organized lineage plasticity: MIBC tumors harbor a continuous luminal-to-basal differentiation axis, with luminal states concentrated in tumor cores and basal-like states at invasive margins, revealing intratumoral heterogeneity that bulk profiling cannot capture.
Opposing RTK programs define lineage identity: FGFR3 and EGFR mark opposing poles of the differentiation spectrum, with mutual exclusivity across tumors and conserved associations with luminal and basal programs, respectively.
Immune architecture linked to lineage states: Basal-like regions co-localize with immune-rich, immunosuppressive niches (including TLSs), whereas luminal regions are immune-excluded, providing a spatial explanation for divergent responses to immunotherapy and chemotherapy.
NECTIN4 as a luminal-specific target: NECTIN4 is spatially restricted to luminal cores, predicting sensitivity to antibody-drug conjugates like enfortumab vedotin, and its overexpression induces a luminal-like, immune-quiescent phenotype in vitro.
TLS heterogeneity and therapeutic relevance: Peritumoral TLSs in chemotherapy responders exhibit immune-active states (e.g., interferon signaling), while those in non-responders show immunosuppressive features (e.g., Treg exhaustion markers), suggesting TLS maturation as a biomarker for treatment response.

Why should we care?

This work challenges the oversimplified view of bladder cancer as a binary disease (luminal vs. basal) by demonstrating that individual tumors contain a structured, radial gradient of cell states, with distinct biological and clinical implications. The spatial separation of luminal (differentiated, immune-cold) and basal (plastic, immune-hot) regions explains why patients with the same bulk subtype may respond differently to therapies: luminal cores may evade immune-based treatments but remain vulnerable to NECTIN4-targeted drugs, while basal margins, rich in immune cells, may be more susceptible to chemotherapy or immunotherapy. Critically, the study also highlights limitations of static spatial profiling: the snapshot nature of the data cannot capture temporal dynamics (e.g., how lineages evolve during therapy), and the 2D sections may not fully represent 3D tumor architecture. Nonetheless, the identification of FGFR3/EGFR as lineage gatekeepers and TLS maturity as a response predictor offers actionable insights for spatially informed precision oncology.

Inflammatory cytokines induce new cancer dependencies

Cheruiyot et al., Nature Genetics (2026). 10.1038/s41588-026-02614-x

The paper in one sentence

Inflammatory cytokines like interferon-γ (IFNγ) and interferon-β (IFNβ) induce tumor-intrinsic genetic vulnerabilities, revealing the GPI transamidase complex and the lipid phosphatase FITM2 as critical dependencies that sensitize cancer cells to immune checkpoint blockade (ICB) and cytokine-mediated stress.

Summary

This study uses genome-scale CRISPR loss-of-function screens in eight syngeneic mouse cancer models (melanoma, pancreatic, renal, lung, and colorectal) to map genetic dependencies induced by inflammatory cytokines (IFNβ, IFNγ, TNF). The authors identify context-specific vulnerabilities: the GPI transamidase complex (Gpaa1, Pigk, Pigu, Pigt, Pigs) as a dependency for resistance to type I/II IFNs, and FITM2—a regulator of ER lipid homeostasis, as a selective dependency for IFNγ. Loss of these genes sensitizes tumors to cytokines in vitro and enhances responses to ICB in vivo. Mechanistically, FITM2 deficiency triggers ER and oxidative stress in response to IFNγ, leading to a paraptosis-like cell death mediated by interferon-inducible GTPases (IRGs and GBPs). The GPI transamidase complex, on the other hand, restrains IFN sensitivity via BST2/tetherin, a viral restriction factor. While the study provides a robust preclinical framework, clinical validation in human ICB-treated cohorts remains limited due to low mutation frequencies and lack of statistical power.

Personal highlights

Cytokine-specific dependencies mapped at scale: Genome-wide CRISPR screens across eight mouse tumor models exposed to IFNβ, IFNγ, or TNF reveal shared and model-specific genetic vulnerabilities, including canonical IFN signaling components (e.g., Socs1, Ptpn2, Usp18) and novel targets like Gpaa1 and Fitm2.
GPI transamidase complex as an IFN resistance mechanism: The GPI transamidase complex (required for GPI-anchor protein biosynthesis) suppresses tumor sensitivity to both type I (IFNβ) and type II (IFNγ) interferons via BST2/tetherin, a previously unrecognized link between IFN sensing and GPI-anchored proteins.
FITM2 loss drives IFNγ-induced ER and oxidative stress: FITM2-deficient tumors accumulate ER stress (UPR activation, BiP upregulation) and mitochondrial oxidative stress (glutathione metabolism, SOD1 dependency), culminating in a paraptosis-like cell death characterized by cytoplasmic vacuolization and caspase-independent lysis.
Interferon-inducible GTPases (IIGTPases) as executioners: IFNγ induces IRGs (Irgm1/Irgm2) and GBPs (Gbp6/7/8) in FITM2-null cells, which are essential for triggering ER stress and oxidative damage, revealing a host-defense-like mechanism repurposed for tumor suppression.
Therapeutic potential for ICB sensitization: Targeting Fitm2 or GPI transamidase genes (Pigk, Gpaa1) enhances tumor regression in immunocompetent mice treated with anti-PD-1, but not in immunodeficient (NSG) mice, underscoring the immune-dependent nature of these dependencies.

Why should we care?

The key takeaway of this work is that cancer cells are not static targets, their survival relies on adaptive pathways activated by the immune microenvironment. By exploiting these context-specific weaknesses (e.g., with FITM2 inhibitors or drugs targeting GPI anchor biosynthesis), we might overcome resistance to immunotherapies like checkpoint blockade. However, the study’s preclinical nature and the lack of strong clinical correlations in human datasets temper enthusiasm, suggesting that while these pathways are promising, their therapeutic translation will require further validation. Critically, the work also highlights a double-edged sword: IFNs can both stimulate antitumor immunity (via antigen presentation) and promote tumor adaptation (via dependencies like FITM2 or GPI transamidase). This duality underscores the need for precision strategies that tip the balance toward immune-mediated tumor elimination rather than resistance

Biodiversity and biogeography of the multi-kingdom cancer microbiome

Dohlman et al., Cell (2026). 10.1016/j.cell.2026.04.015

The paper in one sentence

A rigorous pan-cancer analysis of 16,639 tumor genomes reveals that only orodigestive cancers harbor detectable, site-specific multi-kingdom microbiomes, which correlate with tumor mutation burden.

Summary

This study addresses long-standing controversies about the presence and distribution of microbes in human tumors by developing PathSeq-T2T, a robust host-subtraction and decontamination pipeline that leverages the complete T2T-CHM13 human reference genome. Applied to 16,639 high-depth tumor whole genomes from the UK 100,000 Genomes Project, the pipeline effectively removes human sequences and environmental contaminants. After decontamination, microbial signatures in most solid tumors were indistinguishable from background, resolving prior conflicting reports. However, orodigestive tumors (oropharyngeal, esophageal, gastric, and colorectal) consistently retained microbial signals, harboring polymicrobial, multi-kingdom communities, including bacteria, fungi, viruses, archaea, and even the protozoan parasite Trichomonas, that varied by tumor site, subtype, and genomic context. Notably, microbial load correlated with tumor mutation burden (TMB), particularly in hypermutated (MSI/POLE) subtypes, suggesting a link between microbial colonization and tumor genomic instability.

Personal highlights

Robust decontamination pipeline: PathSeq-T2T sets a new standard for low-biomass microbiome detection by combining T2T-CHM13 host subtraction, multi-classifier validation (Kraken2, MetaPhlAn4, Sylph), and a pan-cancer equiprevalence (PCE) score to distinguish true microbial signals from widespread contamination.
Most cancers lack a tumor microbiome: After rigorous decontamination, microbial signals in most solid tumors (e.g., brain, breast, lung) were indistinguishable from background, challenging prior claims of ubiquitous microbial colonization in cancer.
Orodigestive cancers are microbial hotspots: Colorectal, oropharyngeal, esophageal, and gastric tumors consistently harbored site-specific, polymicrobial communities, with microbial composition mirroring the biogeography of healthy tissues (e.g., Bacteroides in colorectal, Prevotella in oropharyngeal).
Multi-kingdom communities: Beyond bacteria, these tumors hosted fungi (Candida, Saccharomyces), archaea (Methanobrevibacter smithii), viruses (HPV, EBV), and, in rare cases, protozoa (Trichomonas), expanding the known diversity of tumor-associated microbes.
Link to tumor genetics: Microbial load correlated with tumor mutation burden (TMB), with hypermutated (MSI/POLE) subtypes showing 3.9–6.5-fold higher microbial density, suggesting a potential interplay between microbial colonization and genomic instability.

Why should we care?

This work provides a critical, methodologically rigorous resolution to the heated debate about the cancer microbiome. By addressing contamination, a major confounder in prior studies, it demonstrates that tumor-associated microbiomes are not universal but are largely restricted to cancers arising at mucosal barrier sites (e.g., the digestive and upper respiratory tracts), which are already colonized by microbes under normal conditions. The finding that microbial abundance scales with tumor mutation burden hints at a bidirectional relationship: while microbes may contribute to genomic instability (e.g., via genotoxins or immune modulation), hypermutated tumors might also create a more permissive niche for microbial growth.

Correcting spatial transcriptomics data affected by a prevalent transcript leakage problem across platforms, species, and tissues

Shi et al. bioRxiv (2026). 10.64898/2026.06.13.732076

The paper in one sentence

This study identifies and addresses a widespread transcript leakage problem in spatial transcriptomics (ST) data, where transcripts diffuse from their cell of origin to neighboring cells, by introducing DeLeakage, a reference-free Bayesian method that restores accurate gene expression and improves downstream analyses.

Summary

Spatial transcriptomics (ST) has revolutionized the study of tissue organization by mapping gene expression in its spatial context. However, Shi et al. reveal a systematic issue: transcripts often leak from their originating cells into nearby cells, leading to false detection of cell-type-specific markers in unexpected cell types and distorting spatial gene expression patterns. This problem is not platform- or tissue-specific, it affects imaging-based (e.g., MERFISH, Xenium) and sequencing-based (e.g., Pixel-seq) ST data across mouse brain, human heart, and other tissues. The authors propose DeLeakage, a Bayesian hierarchical model that decomposes observed transcript counts into endogenous expression and leaked transcripts, accounting for gene-specific diffusion properties and spatial neighborhood effects. Unlike existing deconvolution methods (e.g., SPLIT, SpotClean), DeLeakage does not rely on external references and models leakage as a distance-dependent diffusion process. The method is theoretically identifiable (proven in the paper) and computationally efficient, with both CPU and GPU implementations. Validation on simulated and real ST datasets shows that DeLeakage effectively removes leakage artifacts, improves cell-type annotation, and reduces false spatial expression signals.

Personal highlights

Transcript leakage is pervasive: Across multiple ST platforms (MERFISH, Xenium, Pixel-seq), tissues (mouse brain, human heart), and species, transcripts of cell-type-specific markers (e.g., Slc17a7 for excitatory neurons) are frequently detected in unrelated cell types, with spatial patterns suggesting diffusion from neighboring cells.
DeLeakage: A gene-specific, reference-free solution: The method models leakage as a distance-dependent diffusion process with gene-specific contamination parameters, allowing it to distinguish endogenous expression from leaked transcripts without requiring reference data (e.g., scRNA-seq).
Theoretical rigor: The authors prove model identifiability under realistic conditions, addressing a key limitation of prior deconvolution methods (e.g., non-identifiability in NMF-based approaches).
Outperforms existing tools: In benchmarks against SPLIT (reference-based deconvolution) and SpotClean (spot-swapping correction), DeLeakage more accurately restores true expression levels, reduces co-detection of mutually exclusive markers, and improves cell-type clustering (e.g., 71.7% increase in Adjusted Rand Index vs. 14.2% for SPLIT).
Scalable and practical: The GPU-accelerated implementation processes large ST datasets efficiently (e.g., 130K cells in ~1 hour), with lower memory usage than alternatives like SPLIT.

Why should we care?

This work highlights a prevalent but underappreciated issue in ST and provides a robust, theoretically grounded solution. For researchers using ST, it’s a call to re-evaluate past data and consider leakage correction as a standard practice.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 8/06/26

Sebastiaan Vanuytven — Sun, 14 Jun 2026 11:38:15 GMT

This week’s reads illuminate the hidden architectures of early malignancy, the epigenetic chaos unleashed by chromosomal loss, and the cutting-edge tools reshaping how we see and interpret cellular life. In precursors of squamous cell carcinoma, a distinct population of malignant keratinocytes maintained by a ΔNp63/PITX1 axis demonstrates that pre-invasive tumors have already acquired genomic and transcriptional features of aggressiveness, whereas the invasion mechanism is not developed until later stages. At the same time, loss of the Y chromosome in lung adenocarcinoma changes the epigenetic landscape in such a way that it gives rise to EMT, lineage plasticity, and increased metastatic potential, thereby showing that chromosomal loss can be just as transformative as an oncogenic mutation. On the technology side: MetaboRamics uses 16-plex Raman imaging for live-cell spatial metabolomics, revealing the metabolic rewiring dynamics of EMT and stress responses, CytoSignal identifies ligand-receptor interactions at the cellular level, confirming its predictions through proximity ligation assays. Further, in the field of data and models, a bold research overturns the "bigger is better" belief for single-cell foundation models, indicating that performance reaches a plateau even with only 1, 10% of current datasets, which may mean that quality, rather than quantity, is the secret to fully realizing their potential.

Preprints/articles that I managed to read this week

p63 and PITX1 sustain a pre-invasive malignant keratinocyte population in squamous cell carcinoma precursors

Staeger et al. bioRxiv (2026). 10.64898/2026.05.21.725073

The paper in one sentence

A discrete population of malignant keratinocytes (ASK) in actinic keratosis harbors UV-associated mutations and copy number alterations, is sustained by a ΔNp63/PITX1 regulatory axis that blocks differentiation, and shares core oncogenic programs with invasive squamous cell carcinoma while lacking invasion effectors.

Summary

Actinic keratosis (AK) is a common precursor to cutaneous squamous cell carcinoma (cSCC), but the cellular identity and molecular programs of early malignant keratinocytes have been poorly defined. The authors performed CITE-seq (simultaneous transcriptome and surface proteome) on patient-matched AK, UV‑exposed normal skin, and non‑UV‑exposed normal skin (n=5 patients, 12 biopsies) and spatial whole‑transcriptome profiling in an independent cohort (n=4). They identify AK‑specific keratinocytes (ASK), a discrete population enriched in dysplastic basal epidermis. ASK show genomic hallmarks of malignancy: dominant UV‑associated SBS7b mutational signature, high mutational burden (median 20.75 muts/Mb), recurrent copy number alterations (9p loss harboring CDKN2A, 8q gain harboring MYC), and TP53 overexpression with reduced p53 pathway activity. Transcriptomically, ASK occupy a basal‑like, undifferentiated state, characterized by upregulation of ΔNp63 and PITX1, downregulation of Notch/HES1 signaling, and activation of glycolytic metabolism (SLC2A1/GLUT1). Comparison with published cSCC data reveals that ASK share core oncogenic programs with invasive tumor‑specific keratinocytes (TSK) – including IGFBP6, IGFBP2, ITGA6 – but lack invasion‑associated effectors (MMP1, MMP10, PTHLH). IGFBP6 is validated as a pro‑proliferative factor in the AK‑derived PM1 cell line (knockout reduces proliferation, recombinant protein increases it). The AK microenvironment shows expansion of inflammatory basal keratinocytes, barrier disruption, and early immunosuppressive T cell remodeling (CD8⁺ exhausted T cells, Tregs, Th17‑skewed populations). The study proposes that ASK represent the earliest malignant keratinocyte population, sustained by ΔNp63/PITX1, and that core dependencies established at this pre‑invasive stage may be retained in invasive tumors, offering potential targets for prevention or treatment.

Personal highlights

Genomically defined malignant population in a pre‑invasive lesion: ASK are enriched in dysplastic epidermis and exhibit definitive malignant hallmarks: UV‑dominant SBS7b mutational signature (27.1% contribution), high mutational load (significantly higher than other keratinocyte clusters), recurrent 9p loss (CDKN2A) and 8q gain (MYC), and TP53 overexpression with reduced p53 signaling – placing them on the trajectory from normal skin to invasive cSCC.
ΔNp63/PITX1 regulatory module maintains an undifferentiated state: ASK show selective regulon activity for TP63 and PITX1, with preferential expression of the ΔNp63 isoform (which maintains epithelial stemness). This axis attenuates Notch/HES1‑driven differentiation (lowest NOTCH hallmark score) and activates glycolytic metabolism (SLC2A1 upregulation, glycolysis pathway enrichment), consistent with p63‑driven metabolic reprogramming in squamous cancers.
ASK share core oncogenic programs with invasive cSCC but lack invasion effectors: Overlap analysis between ASK and published tumor‑specific keratinocyte (TSK) signatures reveals nine shared genes (IGFBP6, IGFBP2, ITGA6, PKM, LAMA3, etc.) representing early dependencies. Crucially, canonical TSK invasion markers (MMP1, MMP10, PTHLH) are not elevated in ASK, distinguishing early malignant programs from later invasive effectors.
IGFBP6 as a pro‑proliferative effector in pre‑malignant keratinocytes: IGFBP6 is among the most upregulated ASK markers, confirmed at protein level in AK tissue. Knockout of IGFBP6 in the AK‑derived PM1 cell line reduces proliferation, while recombinant IGFBP6 increases it, nominating IGFBP6 as a candidate driver of clonal expansion in early squamous carcinogenesis.
Early immune remodeling in the AK microenvironment: AK lesions show expansion of an inflammatory basal keratinocyte state (IL20, CCL2, CXCL2), depletion of terminally differentiated granular cells, and increased T cell infiltration with exhausted CD8⁺ T cells (exclusive to AK), Tregs, and Th17‑skewed populations with oligoclonal TCR expansions, indicating that immunosuppressive circuits are already established at the pre‑invasive stage.

Why should we care?

This study offers a valuable resource and a clear conceptual framework: early malignant cells in squamous carcinogenesis are already transcriptionally distinct, sustained by a ΔNp63/PITX1 circuit, and share core dependencies with invasive tumors. This suggests that targeting these early programs might prevent progression rather than merely treating established cancers. But the path from this descriptive atlas to clinical application is long, and the work should be viewed as hypothesis‑generating. Rigorous functional studies in animal models, larger validation cohorts, and prospective clinical trials will be required before any of these candidates can be considered for prevention or therapy.

Loss of the Y chromosome drives epigenetic and transcriptomic plasticity in lung adenocarcinoma

Schlüter K. et al., bioRxiv (2026). DOI: 10.64898/2026.06.01.729186

The paper in one sentence

This study demonstrates that loss of the Y chromosome (LOY) in lung adenocarcinoma (LUAD) drives epigenetic reprogramming, lineage plasticity, and metastatic potential by triggering epithelial-to-mesenchymal transition (EMT) and increasing cellular adaptability to stress.

Summary

Using multi-omic profiling (whole-genome sequencing, single-cell RNA-seq, proteomics, and epigenetic assays) of primary LUAD samples and isogenic A549 cell models, the authors show that LOY is prevalent in malignant cells and associated with poor clinical outcomes. Mechanistically, LOY causes haploinsufficiency of Y-linked dosage-sensitive regulators, leading to DNA hypomethylation at EMT gene promoters (e.g., THY1, LOX) and H3K4me3 enrichment, which collectively destabilize the chromatin landscape. This epigenetic remodeling induces EMT, inflammatory signaling, and stemness features, while increasing cell-to-cell heterogeneity and lineage plasticity. Functionally, LOY does not confer a proliferative advantage under basal conditions but enhances clonogenic potential, resilience to metabolic stress (glucose/glutamine deprivation), and resistance to genotoxic stress (ionizing radiation). In vivo, LOY cells exhibit a selective advantage during tumor engraftment and metastatic dissemination, with LOY clones disproportionately contributing to metastasis in xenograft models. Clinically, LOY correlates with aggressive disease, increased metastasis, and shorter overall survival in male LUAD patients.

Personal highlights

LOY as a driver of EMT and plasticity: LOY triggers EMT programs (e.g., THY1, LOX upregulation) and increases lineage plasticity, enabling rapid adaptation to stress and metabolic challenges.
Epigenetic remodeling: LOY induces focal DNA hypomethylation and H3K4me3 enrichment at EMT gene promoters, destabilizing chromatin and amplifying transcriptional heterogeneity.
Functional resilience: LOY cells show enhanced clonogenic capacity and resistance to metabolic (glucose/glutamine deprivation) and genotoxic stress (radiation), without baseline proliferative advantages.
Metastatic advantage: In vivo models reveal that LOY cells are selectively enriched in metastases, with LOY clones exhibiting a 21% higher metastatic capacity than ROY clones.
Clinical relevance: LOY is associated with poor survival in male LUAD patients, with low Y-linked gene expression predicting shorter overall survival (median 50 vs. 122 months).

Why should we care?

This work reframes loss of the Y chromosome (LOY), previously dismissed as a neutral byproduct of genomic instability, as a key driver of tumor evolution in lung adenocarcinoma. The main takeaway is that LOY acts as an epigenetic "gatekeeper": its loss destabilizes cellular identity, enabling tumors to adapt, evade treatment, and metastasize more effectively. This challenges the traditional view of cancer progression as solely driven by oncogenic mutations, instead highlighting how chromosomal loss can rewrite the epigenetic landscape to fuel aggression.

MetaboRamics: Highly multiplexed metabolic imaging by stimulated Raman for spatial metabolomics in live cells

Chadha R.S. et al., bioRxiv (2026). DOI: 10.64898/2026.05.21.727012

The paper in one sentence

This study introduces MetaboRamics, a 16-plex stimulated Raman scattering (SRS) microscopy platform that enables live-cell spatial metabolomics, revealing dynamic metabolic rewiring during processes like epithelial-to-mesenchymal transition (EMT) and cellular stress responses.

Summary

The authors developed MetaboRamics, a super-multiplexed SRS imaging platform that combines nine bioorthogonal Raman probes (targeting glucose uptake, lipid synthesis, choline metabolism, DNA synthesis, amino acid incorporation, and organelle markers) with five label-free channels (proteins, lipids, unsaturated lipids, saturated triglycerides) and two autofluorescence channels (NADH, FAD) for redox state readouts. This 16-plex system allows simultaneous, non-destructive, and spatially resolved tracking of multiple metabolic pathways in live cells. The platform was validated by profiling EMT in A549 cells, revealing a global attenuation of metabolic activity in mesenchymal cells, including reduced glucose-derived biomass, lipid turnover, protein synthesis, and altered redox balance. Additionally, MetaboRamics was used to phenotype cellular responses to nine metabolic stressors (e.g., fructose overload, palmitic acid, serum deprivation, inflammation, and pharmacological perturbations), uncovering pathway-specific adaptations and subcellular metabolic heterogeneity. For example, inflammatory stimuli and EMT both induced global metabolic downregulation, while pharmacological inhibitors (e.g., CHX, MG132) revealed distinct mechanisms of action on protein synthesis and glucose metabolism.

Personal highlights

16-plex live-cell metabolic imaging: Integration of nine bioorthogonal probes (e.g., EdU for DNA synthesis, d₇-glucose for lipid/protein synthesis, d₈-arachidonic acid for PUFA uptake) with five label-free channels and TPEF redox imaging, enabling unprecedented multiplexing for spatial metabolomics.
Metabolic rewiring in EMT: Mesenchymal cells exhibit ~30% reduction in glucose-derived biomass, lipid turnover, and protein synthesis, with decreased mitochondrial activity and NADH levels, indicating a globally attenuated metabolic state.
High-throughput stress phenotyping: Systematic profiling of nine metabolic stressors (e.g., fructose, palmitic acid, inflammation, drug treatments) revealed distinct and sometimes counterintuitive metabolic responses, such as increased PUFA uptake under palmitic acid stress (a potential lipotoxicity rescue mechanism).
Subcellular resolution: Spatial segmentation (cytoplasm, nucleus, nucleoli) uncovered compartment-specific metabolic changes, e.g., nucleoli-specific protein synthesis attenuation under inflammatory stress.
Technical robustness: Minimal photodamage, high signal-to-noise ratios, and ~25-minute acquisition time for 16-plex imaging, with validation via single-channel controls and orthogonal methods (e.g., TPEF, spontaneous Raman).

Why should we care?

MetaboRamics addresses an important gap in live-cell metabolomics: the ability to simultaneously visualize multiple metabolic pathways in real time, at subcellular resolution, and without destroying the sample. For non-scientists, the takeaway is that this platform reveals how cells dynamically rewire their metabolism, not just globally, but in specific organelles and under diverse stressors, providing a window into the hidden metabolic heterogeneity that drives disease progression, drug resistance, and cellular adaptation. While MetaboRamics is a major leap forward, it has limitations. The spectral overlap of Raman tags restricts the palette to 16 channels, and laser tuning speed limits throughput. Sensitivity is also constrained to the low μM–mM range, missing many low-abundance metabolites. Future improvements, such as faster lasers, epr-SRS for enhanced sensitivity, or AI-assisted unmixing, could expand its capabilities. Additionally, the biocompatibility and potential perturbation of bioorthogonal probes (despite optimization) warrant further validation in primary cells and in vivo.

Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Rubbi et al. arxiv (2026). https://arxiv.org/abs/2605.10196

The paper in one sentence

This work introduces Probability-of-Hit, an acquisition function for active learning that directly maximizes the discovery of threshold-exceeding perturbations in high-throughput experiments, improving hit recovery by up to 6.4% over baselines on real biological datasets.

Summary

High-throughput gene perturbation technologies (e.g., CRISPR screens) enable parallel testing of thousands of genetic interventions, but experimental budgets remain severely limited. The core challenge is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold, rather than locating a single global optimum. Traditional approaches are misaligned with this goal. Pure exploration wastes budget on low-value regions, while optimization methods over-exploit dominant modes and neglect disconnected high-response areas, performing poorly in the multimodal, heterogeneous landscapes typical of biological systems. The authors formalize hit discovery as a closed-loop experimental design problem and propose Probability-of-Hit (PoH), an acquisition function that ranks candidates by their posterior probability of exceeding the hit threshold. This directly targets the discovery objective, balancing exploitation of known high-response regions with exploration of promising but uncertain candidates. They prove asymptotic optimality of PoH under standard assumptions (posterior concentration, margin conditions) and demonstrate its effectiveness across synthetic benchmarks and five real-world CRISPR screening datasets (e.g., Schmidt IL-2, Zhuang NK). Empirically, PoH consistently outperforms baselines, recovering more hits under fixed budgets. For example, on the Schmidt IL-2 dataset, PoH achieves a 6.4% improvement in cumulative hit ratio over the next best method.

Personal highlights

Formalization of hit discovery as a distinct objective: The paper clarifies that hit discovery (maximizing the number of threshold-exceeding perturbations) differs fundamentally from optimization or pure exploration, requiring tailored acquisition strategies for multimodal biological landscapes.
Probability-of-Hit acquisition function: A novel, principled approach that ranks candidates by their posterior probability of being a hit (p(f(g)>τ) directly aligning acquisition with the discovery goal rather than surrogate objectives like uncertainty reduction or global optimization.
Theoretical guarantees: Proof of asymptotic optimality for PoH, showing it recovers at least Tb/2−o(Tb) hits with high probability under standard assumptions, where TTT is the number of rounds and bbb is the batch size.
Empirical validation on real biological data: Outperforms state-of-the-art baselines across five large-scale CRISPR screening datasets (e.g., Schmidt IL-2, Zhuang NK), with gains of up to 6.4% in cumulative hit ratio, demonstrating robustness in complex, high-dimensional settings.
Batch-aware design for high-throughput experiments: The algorithm supports parallel evaluation of multiple perturbations per cycle, a critical feature for modern pooled screens, and reveals that increasing batch size yields larger gains than refining hit thresholds.

Evaluating the role of pretraining dataset size and diversity on single-cell foundation model performance

DenAdel et al. Nature Methods (2026). 10.1038/s41592-026-03120-y

The paper in one sentence

This large-scale study reveals that single-cell foundation models (scFMs) plateau in performance with only a small fraction (1–10%) of current pretraining datasets, challenging the assumption that “bigger data always yields better models” in single-cell biology.

Summary

The success of transformer-based foundation models in NLP and computer vision has inspired analogous efforts in single-cell biology, where models like scBERT, Geneformer, and scGPT are trained on tens of millions of cells. However, the scaling laws that govern performance in these domains, where larger datasets consistently improve model capabilities, remain unproven for single-cell transcriptomics. This study addresses this gap by pretraining 400 models on subsets of the 22.2-million-cell scTab corpus and evaluating them across 6,400 experiments spanning zero-shot and fine-tuned tasks, including cell-type classification, batch integration, and perturbation response prediction. The authors tested five model architectures (PCA, scVI, SSL, Geneformer, SCimilarity) and three downsampling schemes (random, cell-type re-weighted, geometric sketching) to assess the impact of dataset size (1–100%) and diversity on performance. Surprisingly, they found that most models saturated at 1–10% of the full dataset (as little as ~200,000 cells), with no clear performance gains from larger or more diverse pretraining corpora. Even spiking in perturbation data (e.g., from Perturb-seq) failed to improve downstream task performance. While larger models (e.g., higher parameter counts) tended to perform better in absolute terms, they still plateaued with small datasets, and simple baselines (e.g., PCA, scVI) often outperformed transformer-based models like Geneformer. These results suggest that current scFMs are not data-limited and that scaling datasets further may not yield meaningful improvements.

Personal highlights

Performance saturation at small dataset sizes: Across all tasks (cell-type classification, batch integration, perturbation response prediction), models typically reached 95% of peak performance at 1–10% of the 22.2M-cell corpus, with 1% (~200K cells) often sufficient for near-optimal results.
No evidence of LLM-like scaling laws: Unlike large language models, single-cell foundation models do not show consistent performance improvements with increased pretraining data size, indicating that blindly scaling datasets may not be an effective strategy.
Dataset diversity does not improve performance: Neither cell-type re-weighting (balancing cell-type proportions) nor geometric sketching (sampling uniformly across transcriptional space) outperformed random downsampling, suggesting current diversity strategies are ineffective for scFMs.
Simple baselines outperform transformers: Pretrained PCA and scVI often matched or exceeded the performance of more complex models (e.g., Geneformer, SSL) on classification and batch integration tasks, raising questions about the need for transformer architectures in single-cell applications.
Model size helps, but with diminishing returns: Larger models (e.g., higher parameter counts) improved absolute performance but still saturated with small datasets, and gains diminished with each successive increase in scale.

Why should we care?

This paper provides an urgently needed dose of reality regarding scFMs: more data does not necessarily imply a more accurate model. While in NLP scaling rules can accurately predict the improvement with more data, scFMs seem to converge very early, needing just a minimal amount (1–10%) of existing training sets to achieve nearly optimal accuracy. For those outside the field, what needs to be emphasized is that the common “the more data, the better model” approach cannot always be applied; in the realm of single-cell research, high-quality data, proper alignment of tasks, and efficient modeling can play much more important roles. In terms of practical applications, the results of the study provide food for thought for both researchers and funders. Instead of racing to build ever-larger models, the field should prioritize:

Data curation: Focusing on high-quality, task-relevant datasets rather than indiscriminate scaling.
Task alignment: Ensuring pretraining objectives (e.g., masked gene prediction) match downstream applications (e.g., cell-type annotation).
Model efficiency: Exploring architectures beyond transformers, as simpler models (PCA, scVI) often perform just as well.

Nonetheless, there are several key limitations of the study. Only five architectures were tested and a subset of tasks (classification, integration, perturbation prediction) was used for evaluation, which does not guarantee that all possible use cases where scFMs could shine were considered. Furthermore, the fact that perturbation spikes did not prove useful indicates the misalignment of currently applied pretraining techniques with the hardest problems in biology. Also, there is a lack of experiments with compute-optimal scaling (the balance between model parameters, training data, and compute), which is known to lead to breakthrough advances in NLP. Finally, one should note that simple representations such as PCA and HVGs perform better than advanced models, indicating the intrinsic weakness of “cell as a sentence of genes” approach.

Depth normalization for single-cell genomics count data

Booeshaghi et al. bioRxiv (2026). 10.1101/2022.05.06.490859

The paper in one sentence

This work introduces PFlogPF (proportional fitting-log-proportional fitting), a mathematically principled normalization method that uniquely satisfies variance stabilization, depth normalization, and monotonicity—three critical but often conflicting desiderata for single-cell RNA-seq count data.

Summary

Normalization of single-cell RNA-seq counts is a foundational step that shapes all downstream analyses, yet existing methods often fail to simultaneously achieve variance stabilization (for PCA/clustering), depth normalization (to remove sequencing depth bias), and monotonicity (to preserve within-cell gene rankings). The authors prove that PFlogPF, a two-step proportional fitting approach bracketing a log transformation, is the only feature-relabeling-equivariant method that satisfies all three properties. PFlogPF is mathematically equivalent to the shifted centered log-ratio (CLR) transform, a compositional data method developed over 40 years ago but underutilized in single-cell genomics. The study benchmarks 8 normalization methods (including sctransform, log1pPF, CPM, and Seurat/Scanpy defaults) across 526 datasets (437 passing QC), demonstrating that PFlogPF consistently outperforms alternatives. It eliminates residual depth structure while preserving variance stabilization and monotonicity, unlike log1pPF (which reintroduces depth bias after logging) or sctransform (which scrambles gene rankings). Critically, PFlogPF shows superior robustness to downsampling, recovering 36.8/50 nearest neighbors on average in k-NN graphs after depth reduction, compared to just 5.8/50 for other methods. It also exhibits greater stability to feature-panel choice, with higher Jaccard overlap (0.756 vs. 0.677 for log1pPF) when switching between gene sets. The authors further reveal that Seurat’s “CLR” implementation does not perform true CLR, highlighting a widespread mislabeling in popular workflows.

Personal highlights

Theoretical optimality: PFlogPF is the only normalization method that simultaneously achieves variance stabilization, depth normalization, and monotonicity while being invariant to feature relabeling.
Equivalence to shifted CLR: PFlogPF is mathematically identical to the shifted centered log-ratio transform, linking single-cell genomics to decades of compositional data theory and resolving a long-standing gap in the field.
Unmatched benchmark performance: Across 526 datasets, PFlogPF outperforms 7 alternatives (including sctransform, log1pPF, CPM) on all three key criteria, with no residual depth correlation and perfect rank preservation within cells.
Robustness to downsampling: PFlogPF preserves local neighborhood structure in k-NN graphs even after artificial depth reduction, recovering 36.8/50 neighbors vs. 5.8/50 for other methods, a critical advantage for real-world data with variable sequencing depth.
Stability to feature selection: Unlike log1pPF, PFlogPF’s induced geometry is insensitive to gene panel choices, reducing avoidable variation in large-scale analyses and foundation model training where feature sets may vary.

Why should we care?

This paper addresses a deceptively critical but overlooked problem: normalization is the invisible foundation of single-cell analysis, and flaws here cascade into erroneous biological conclusions. Many researchers treat normalization as a “black box” preprocessing step, but the authors show that common defaults (e.g., log1pPF, sctransform) fail to fully remove depth bias or preserve gene rankings, leading to spurious differential expression calls, distorted cell-cell distances, and misinterpreted markers. The key insight is that a second proportional fitting step after the logarithm (PFlogPF) elegantly solves these issues. For non-specialists, the takeaway is that not all normalization methods are equal, PFlogPF provides a principled, one-size-fits-most solution that addresses the core pitfalls of current approaches. Its equivalence to the shifted CLR transform also connects single-cell genomics to compositional data analysis, a mature statistical field with rigorous theoretical guarantees.

CytoSignal detects locations and dynamics of ligand–receptor signaling at cellular resolution from spatial transcriptomic data

Liu et al., Nature Genetics (2026). 10.1038/s41588-026-02624-9

Overview of the CytoSignal approach (Figure 1 from Liu et al.)

The paper in one sentence

CytoSignal is a computational framework that infers the locations, mechanisms, and temporal dynamics of ligand-receptor signaling interactions at single-cell resolution from spatial transcriptomic data, validated experimentally using proximity ligation assays.

Summary

This work introduces CytoSignal and VeloCytoSignal, two methods designed to analyze spatial transcriptomic datasets (e.g., Slide-seq, Stereo-seq, Visium HD) to detect and quantify cell-cell communication at cellular resolution. CytoSignal calculates a ligand-receptor (LR) signaling score (LRscore) for each spatial position, distinguishing between contact-dependent (requiring direct cell-cell contact) and diffusion-dependent (mediated by soluble ligands) interactions. It also identifies spatial gradients in signaling strength, signaling-associated genes, and differential signaling across conditions (e.g., age or disease). VeloCytoSignal extends this by predicting temporal dynamics of signaling activity using RNA velocity, enabling the inference of whether signaling is increasing or decreasing at each location. The authors validate their approach using proximity ligation assay (PLA), a gold-standard method for detecting protein-protein interactions in situ, demonstrating that CytoSignal’s predictions align more accurately with physical LR interactions than existing computational methods. They apply their tools to embryonic mouse brain and whole-embryo datasets, revealing biologically meaningful signaling patterns (e.g., Sema3a-PlexinA4 in neuronal migration, Dll1-Notch1 in choroid plexus development, and Fgf8-Fgfr1 in neural patterning). Additionally, they use CytoSignal to identify age-associated signaling changes in a mouse model of Parkinson’s disease, highlighting immune-related interactions like Spp1-Cd44.

Personal highlights

Cellular-resolution signaling inference: CytoSignal quantifies LR signaling activity at individual spatial positions, overcoming the limitations of previous methods that aggregate signals at the cluster or tissue level.
Mechanistic distinction: The method explicitly differentiates between contact-dependent (e.g., Efnb1-Epha4) and diffusion-dependent (e.g., Sema3a-PlexinA4-Nrp1) interactions, reflecting their distinct biological constraints.
Spatial gradient detection: CytoSignal identifies continuous gradients in signaling strength (e.g., Sema3a-PlexinA4 peaks in the subventricular zone of the mouse cortex), providing insights into how signaling varies across tissue layers.
Temporal dynamics with VeloCytoSignal: By integrating RNA velocity, VeloCytoSignal predicts whether signaling interactions are increasing or decreasing over time, validated using time-series Stereo-seq data (e.g., Alb-FcRn in liver and Wnt5a-Antxr1 in jaw/tooth development).
Experimental validation: PLA experiments confirm that CytoSignal’s predictions of LR interactions (e.g., Dll1-Notch1, Fgf8-Fgfr1) localize to the same tissue regions as physical protein-protein interactions, outperforming prior computational approaches in benchmarking tests.

Why should we care?

CytoSignal provides a scalable, statistically rigorous, and biologically interpretable way to map these interactions, distinguishing between signals that require direct contact (like a handshake) and those that can act over short distances (like a whispered message). This is not just a technical advance, it enables researchers to study how signaling drives development, disease, and tissue homeostasis in unprecedented detail. Critically, the authors validate their method experimentally using PLA, a technique that directly vizualizes protein interactions in tissue. This is a major step forward, as many computational methods in this field lack ground-truth validation. The finding that CytoSignal outperforms existing tools in predicting physical interactions suggests it could become a standard for spatial cell-cell communication analysis. The temporal component (VeloCytoSignal) adds another layer, revealing when these signals are ramping up or down, insights that could be vital for understanding dynamic processes like embryogenesis or tumour progression.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 1/06/26

Sebastiaan Vanuytven — Sun, 07 Jun 2026 15:06:27 GMT

This week’s reads showcases how rapidly spatial biology is evolving, from new ways to quantify tissue architecture and cell-cell communication to technologies that bring perturbation screening into spatial context. Several studies go past the traditional neighborhood-focused approach by suggesting novel frameworks, including those called COSTE and CellSTIC, that incorporate multiscale structure and hierarchical communication programs within the tissue context. Novel technologies, represented by PerturbSpace, show how gene perturbation can be connected directly with the phenotypic spatial tissue properties, and GEARS addresses one of the long-standing problems of the field by reconstructing the tissue structure using single-cell information alone. Beyond technology, one of this week’s papers also sheds light on the significance of tissue structure from a biological perspective, since different cancer-associated fibroblast subtypes seem to play an important role in organizing either permissive or immunosuppressive tumor environments. Finally, a ground-breaking phase 3 clinical investigation for pancreatic cancer describes a novel success of daraxonrasib, which is a RAS(ON) inhibitor.

Preprints/articles that I managed to read this week

Cophenetic Spatial Topology Embedding reveals multiscale tissue architecture in spatial omics

Long et al. bioRxiv (2026). 10.64898/2026.05.26.727847

The paper in one sentence

COSTE uses directed nearest-neighbor distance profiles and hierarchical clustering to generate a sample-normalized Spatial Separation Score (SSS) that quantifies multiscale spatial relationships between cell types or transcripts without requiring a predefined neighborhood radius.

Summary

Spatial omics technologies generate rich maps of cell types and transcripts in tissues, but most analysis tools focus on local neighborhoods (e.g., cells within a fixed radius or k-nearest neighbors), which can miss longer-range and hierarchical tissue architecture. The authors introduce COSTE, a framework that computes directed inter-type distances: for each “searcher” cell type, the average distance to the nearest cell of every other “findee” type is calculated, producing an asymmetric distance matrix. Hierarchical clustering of these profiles yields a dendrogram, and cophenetic distances are normalized within each sample to produce a Spatial Separation Score (SSS) from 0 to 1 (lower = more similar spatial profiles). COSTE is segmentation‑free and can be applied to both cell-type labels and individual transcripts. Benchmarking on synthetic modular and nested patterns shows that COSTE captures nested hierarchical geometries more directly than local neighborhood enrichment methods (Squidpy, Giotto, ANE), which require parameter tuning and do not consistently recover layered structures. On a neonatal mouse pup Xenium dataset (~1.3 million cells, 44 types), COSTE reveals a structure‑within‑structure hierarchy (e.g., retina layers forming a tight cluster; two fibroblast subtypes segregating into distinct spatial domains). In a pulmonary fibrosis cohort, SSS between AT2 and capillary cells (defined as a TRU Remodeling Score) increases with disease severity, reflecting progressive alveolar‑capillary uncoupling. COSTE also recapitulates known lymph node compartments (B cell cortex, T cell paracortex, medulla). Transcript‑level analysis in a systemic sclerosis lung sample identifies a pleural‑enriched gene module (including IL10, CCL4, WT1) without cell segmentation. Finally, on TNBC imaging mass cytometry data, COSTE associates response to chemo‑immunotherapy with spatial separation patterns between antigen‑presenting cells and CD8+ T cells. The method is computationally efficient (avoids permutations) and scales to million‑cell datasets.

Personal highlights

Hierarchical spatial representation without predefined radius: COSTE builds a dendrogram from directed nearest‑neighbor distance profiles, capturing both local proximity and longer‑range nesting (e.g., cell types that form concentric layers). This contrasts with local neighborhood enrichment methods that require a user‑set radius or k and often miss nested geometry.
Segmentation‑free and transcript‑level analysis: COSTE can operate directly on single‑transcript coordinates, enabling spatial gene module discovery without cell segmentation. Applied to a fibrotic lung sample, it nominated a pleural‑enriched module containing chemokines (CCL4, IL10) and mesothelial marker WT1, all co‑localized in a thickened pleural region.
Quantitative architectural remodeling score (SSS): The sample‑normalized Spatial Separation Score provides a relative metric of spatial coupling. In pulmonary fibrosis, the SSS between AT2 and capillary cells (TRU Remodeling Score) shows a monotonic increase from healthy to severely affected lungs, and correlates with histological features (remnant alveoli vs. honeycombing).
Cross‑platform and cross‑scale utility: Validated on Xenium (mouse pup, human lung fibrosis, lymph node) and imaging mass cytometry (TNBC cohort), COSTE summarizes spatial organization from cell‑type communities down to individual transcript co‑localization.
Computational efficiency: Avoids costly permutation tests used by Squidpy and Giotto. On the neonatal mouse pup dataset (~1.3M cells), COSTE runs orders of magnitude faster with lower peak memory than permutation‑based methods under the tested configurations.

Why should we care?

The main takeaway is that COSTE offers a complementary lens to local neighborhood analysis, one that emphasizes global, multiscale arrangements. It is a useful hypothesis‑generating tool for spatial transcriptomics, especially when tissue architecture is suspected to be layered or nested. But it is not a replacement for traditional methods, and any biological interpretations derived from SSS should be validated with orthogonal approaches (e.g., spatial mapping of key cell‑type pairs, functional perturbation). Users should be cautious about over‑interpreting cross‑sample comparisons and should always inspect the raw spatial data to confirm that low SSS truly reflects biological co‑localization rather than density artifacts.

Decoding Hierarchical Cell-Cell Communication in Spatial Multi-Omics with CellSTIC

Wang et al. bioRxiv (2026). 10.64898/2026.05.27.728114

The paper in one sentence

CellSTIC is a deep learning framework that integrates spatial multi-omics data (RNA, protein, chromatin) with a hierarchical semantic tree of ligand-receptor interactions to infer cell-cell communication as structured, multiscale programs that are traceable from broad functional modules down to individual molecular pairs.

Summary

Cell-cell communication (CCC) is typically inferred from spatial transcriptomics as lists of ligand-receptor (LR) pairs, which are difficult to interpret and compare across tissues or conditions. CellSTIC addresses this by treating CCC as an organized feature of tissue architecture rather than isolated interactions. The framework has four main components: (1) a Multimodal Evidence Graph Constructor that builds spatially constrained communication graphs integrating local neighborhoods and global clustering from multiple modalities (RNA, protein, chromatin accessibility); (2) a Multimodal Evidence Integrator that learns fused latent representations while preserving spatial domain structure; (3) a Ligand-Receptor Semantics Tree Builder that organizes LR pairs into hierarchical functional modules using three optional strategies (balanced, LLM-guided, or biology-informed); and (4) a hierarchical CCC predictor trained with self-supervised edge masking and reconstruction. Benchmarking on simulated multimodal data with ground-truth CCC shows CellSTIC outperforms COMMOT, CellNEST, Scriabin, and DcjComm in edge prediction (AUROC ~0.92 vs baselines <0.85) and spatial domain identification (ARI, NMI scores higher than MEFISTO, MultiVI, PRAGA, SpatialGlue). On a human lymph node dataset, CellSTIC recovers known B cell cortex, T cell paracortex, and medulla regions, and reveals that Helper/Regulatory T cells communicate with different immune compartments via distinct branches of the LR tree (chemokine vs immunomodulatory programs). On a mouse brain 5M dataset, it resolves 15 spatial domains and identifies region-specific signaling axes (CD200-CD200R4, PENK-OPRK1) with cell-type-resolved sender-receiver patterns, including boundary-enriched communication. On mouse embryonic development (E9.5–E16.5), it distinguishes organ-specific communication trajectories (NTS-SORT1 in brain vs F2-F2R in liver) and captures network topology shifts (decreased global efficiency, increased modularity). On axolotl telencephalon development vs regeneration, it shows that WNT7B-FZD5 signaling during regeneration is not a replay of development but a distinct process with rapid redeployment, fluctuating connectivity, and delayed architectural stabilization

Personal highlights

Hierarchical LR tree for multiscale communication: Instead of flat LR lists, CellSTIC organizes interactions into a semantic tree (root → pathways → specific pairs) using three construction strategies. This allows communication to be analyzed at coarse (e.g., “chemokine signaling”) and fine (e.g., “CCL19-CCR7”) levels simultaneously, with traceable links between modules and molecular evidence.
Multimodal integration with spatial constraints: Integrates RNA, protein (ADT), and chromatin accessibility (ATAC) within a unified graph that encodes spatial distance, directionality, and cross-modality similarities. Outperforms SpatialGlue and other multimodal integration methods in spatial domain recovery (higher ARI, NMI, V-measure) on synthetic and real tissues.
State-of-the-art CCC prediction on simulated benchmarks: Achieves AUROC of 0.92 ± 0.03 across eight simulation replicates, compared to next-best baseline (COMMOT) at ~0.85. Improvements are consistent across LR pairs and not driven by a small subset, with reduced cross-replicate variance.
Distinguishes development from regeneration in axolotl brain: Shows that WNT7B-FZD5 communication during telencephalon regeneration follows a different trajectory (early peak at R5, fluctuating edge numbers, transient modularity changes) compared to development (monotonic increase, stable modularity after E54), indicating that regeneration is not a simple replay of developmental programs.
Reveals branch-restricted communication programs in lymph node: Post hoc attribution shows that Helper/Regulatory T cells use different branches of the LR tree to communicate with B lineage (broad, spanning chemokine and immunomodulatory branches) versus NK/ILC or erythroid/megakaryocyte compartments (narrow, confined to specific submodules), a level of functional specificity not visible from flat LR analysis.

Daraxonrasib or Chemotherapy in Previously Treated Metastatic Pancreatic Cancer

O'Reilly et al. New England Journal of Medicine (2026). 10.1056/NEJMoa2605555

The paper in one sentence

In a phase 3 trial of patients with previously treated metastatic pancreatic cancer, the oral RAS(ON) multiselective inhibitor daraxonrasib more than doubled median overall survival compared to standard chemotherapy (13.2 vs 6.6 months) with a hazard ratio of 0.40.

Summary

Pancreatic ductal adenocarcinoma (PDAC) is among the deadliest cancers, with most patients presenting with metastatic disease and median overall survival under 1 year. For patients who progress after first-line therapy, second-line chemotherapy options (gemcitabine plus nab-paclitaxel, liposomal irinotecan-based regimens, or FOLFOX) produce response rates below 15% and median progression-free survival of 3–4 months. More than 90% of PDAC tumors harbor oncogenic RAS mutations, predominantly at codon 12 (G12), which drive aberrant signaling through the active GTP-bound state RAS(ON). Daraxonrasib is an oral, potent inhibitor that targets the active state of both mutant and wild-type RAS across KRAS, NRAS, and HRAS, including G12, G13, and Q61 variants. The RASolute 302 trial was an international, open-label, phase 3 trial randomizing 500 patients with previously treated mPDAC 1:1 to daraxonrasib (300 mg once daily) or investigator’s choice of four standard chemotherapy regimens. The dual primary endpoints were overall survival and progression-free survival (by blinded independent central review) in the RAS G12 mutation population (91.8% of patients). Key secondary endpoints included overall survival and progression-free survival in the overall population (which also included patients with G13/Q61 mutations or no RAS mutation identified), objective response, and patient-reported quality of life. In the RAS G12 population, median overall survival was 13.2 months with daraxonrasib vs 6.6 months with chemotherapy (HR 0.40; 95% CI, 0.30–0.54; P<0.001). Median progression-free survival was 7.3 vs 3.5 months (HR 0.45; 95% CI, 0.34–0.59; P<0.001). Objective response rate was 33.2% vs 11.8%. In the overall population, median overall survival was 13.2 vs 6.7 months (HR 0.40; 95% CI, 0.30–0.53), and progression-free survival was 7.2 vs 3.6 months (HR 0.49). Patient-reported outcomes (pain deterioration, global health status deterioration) were significantly delayed with daraxonrasib (hazard ratios ~0.60, P<0.001). Adverse events of grade ≥3 occurred in 61.8% of daraxonrasib patients vs 69.6% of chemotherapy patients. Treatment-related adverse events leading to discontinuation were rare with daraxonrasib (1.2% vs 11.2%). Daraxonrasib-associated toxicities were predominantly low-grade rash (86.3% overall, 13.7% grade ≥3), diarrhea, stomatitis, and nausea, whereas chemotherapy caused more hematologic toxicity and peripheral neuropathy.

Personal highlights

Unprecedented survival benefit in second-line pancreatic cancer: Median overall survival of 13.2 months with daraxonrasib vs 6.6 months with chemotherapy (HR 0.40) more than doubles the expected survival in a disease where historical benchmarks in this setting are 6–7 months.
High response rate with durable disease control: Objective response rate of 33% (vs 12% with chemotherapy) and median progression-free survival of 7.3 months (vs 3.5 months) indicate meaningful tumor shrinkage and extended time without progression.
Favorable tolerability profile enabling longer treatment: Despite median treatment duration of 6.2 months vs ≤3.2 months for chemotherapy, daraxonrasib had lower rates of grade ≥3 adverse events (61.8% vs 69.6%), fewer treatment discontinuations (1.2% vs 11.2%), and fewer dose reductions (36.1% vs 57.5%). Adverse events were primarily manageable rash and gastrointestinal symptoms.
Patient-reported benefits in pain and quality of life: Time to deterioration of pain (9.0 vs 3.7 months) and global health status (5.6 vs 2.4 months) both significantly favored daraxonrasib, demonstrating that survival extension is accompanied by preserved or improved symptom control.
Consistent benefit across most subgroups: The hazard ratio for overall survival favored daraxonrasib across ECOG performance status, presence of liver metastases, prior chemotherapy type, and RAS mutation subtypes, though the small number of non-G12 RAS patients (8% of the cohort) precludes definitive conclusions in those subgroups.

Why should we care?

The main takeaway is that daraxonrasib offers a genuine leap forward in pancreatic cancer treatment, but with caveats: the benefit appears largest in the predominant RAS G12-mutant population, the drug has a distinct toxicity profile requiring proactive dermatologic and gastrointestinal management, and the open-label design means some optimism bias cannot be excluded. The results are sufficiently robust that daraxonrasib is likely to become a new standard of care, but confirmatory real-world evidence and cost-effectiveness analyses will determine its ultimate role. The trial also validates the broader therapeutic strategy of directly targeting the active (GTP-bound) state of RAS, a molecular target long considered "undruggable", which may have implications for other RAS-driven cancers beyond pancreatic cancer.

Spatially resolved, multimodal in vivo Perturb-seq using antibody-based cell hashing

Nevue et al. bioRxiv (2026). 10.64898/2026.05.25.727765

The paper in one sentence

PerturbSpace combines antibody-based spatial barcoding of tissue sections with single-cell multiomics and CRISPR perturbations to map, at ~80 µm resolution, how genetic perturbations affect tissue architecture and mediate non‑cell‑autonomous effects in vivo.

Summary

In vivo Perturb-seq has enabled large-scale mapping of gene function in physiological contexts, but conventional single-cell readouts require tissue dissociation, destroying spatial information and precluding analysis of how perturbations influence tissue architecture or exert paracrine effects on neighboring cells. The authors develop PerturbSpace, a method that integrates spatial hashing with high‑throughput single‑cell workflows. Tissue sections are placed onto high‑density microwell arrays pre‑filled with oligonucleotide‑conjugated antibodies against ubiquitously expressed surface markers (MHC‑I and CD45). Each well contains a unique spatial barcode that is transferred to cells via antibody binding. After dissociation, spatially barcoded cells are FACS‑enriched and processed through a modified 10x Genomics 3’ workflow to simultaneously recover transcriptomes, spatial coordinates, surface proteomes (CITE‑seq), CRISPR sgRNAs, and expressed clonal barcodes. The authors apply PerturbSpace in two settings. First, to study regenerative hematopoiesis in the spleen, they transplant Cas9+ hematopoietic progenitors transduced with a lentiviral library targeting 40 transcriptional regulators (plus clonal barcodes) into irradiated mice. After 14 days, they profile splenic colony‑forming units (CFU‑S). They identify 19,174 colonies, classify them into eight composition types, and show that perturbations such as Cebpa loss shifts colonies toward erythroid‑only, while Rcor1 or Gltscr1 loss increases colony size without altering proliferation — associated with upregulation of cell‑adhesion programs. Second, in the liver, they overexpress IFNγ (or a control peptide) in transplanted immune cells and measure paracrine effects on neighboring host cells. IFNγ‑expressing neighborhoods show strong upregulation of interferon‑response signatures (e.g., Gbp6, Stat1) and downregulation of TNF signaling in bystander cells. The method is compatible with orthogonal modalities (CITE‑seq, clonal lineage tracing) and achieves >90% spatial mapping efficiency. Resolution is limited to 80 µm (microwell spacing), sufficient for colony and neighborhood analysis but not for single‑cell morphology or immune synapses.

Personal highlights

Spatial hashing with universal surface markers: Uses anti‑MHC‑I and anti‑CD45 antibodies conjugated to oligonucleotides, enabling spatial barcoding of all nucleated mammalian cells without requiring cell‑type‑specific labels or genetic reporters.
Multimodal compatibility within a standard 10x workflow: Simultaneously recovers transcriptomes, spatial coordinates, surface proteins (119 markers), CRISPR sgRNAs, and clonal barcodes from the same single‑cell suspension.
Discovery of perturbation effects on tissue architecture: Identifies that Rcor1 knockout increases monocytic colony size without increasing proliferation, and that this is associated with upregulation of cell‑cell and cell‑matrix adhesion genes.
Direct measurement of non‑cell‑autonomous effects: Uses IFNγ overexpression in liver to show that bystander cells in IFNγ‑positive neighborhoods (but not control neighborhoods) upregulate interferon‑response signatures and downregulate TNF signaling, demonstrating PerturbSpace’s ability to map paracrine signaling in situ.
Scalable and cost‑effective: FACS enrichment of spatially barcoded cells reduces sequencing costs, and the method uses commercially available 10x kits and custom microwell arrays that can be fabricated at scale

Why should we care?

PerturbSpace is a technical advance that adds spatial context to high‑throughput in vivo perturbation screens. It is best suited for questions about tissue‑level phenotypes (colony formation, regional composition, paracrine signaling) where 80 µm resolution suffices. It is not a replacement for imaging‑based spatial methods (MERFISH, Xenium) that achieve single‑cell or subcellular resolution. The most valuable contribution may be the demonstration that spatial hashing with universal antibodies can be integrated into standard single‑cell workflows, potentially lowering the barrier for many labs to adopt spatial functional genomics. However, the need for custom arrays and the resolution trade‑off mean that PerturbSpace will likely remain a specialized tool for the near future, not a routine method.

Spatially organized cancer-associated fibroblast subtypes partition cutaneous carcinomas into immune-active and contracted, immune-repressed niches

Aschenbrenner et al. bioRxiv (2026). 10.64898/2026.06.01.729186

The paper in one sentence

Using high-plex imaging mass cytometry, this study identifies four spatially organized cancer-associated fibroblast (CAF) subtypes in basal and squamous cell carcinomas, showing that myoCAFs (contractile, αSMA⁺) define immune-poor, aggressive tumor niches, whereas iCAFs (inflammatory) associate with immune-active, checkpoint-high microenvironments.

Summary

Cutaneous basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) differ markedly in invasiveness and metastatic risk, but the stromal mechanisms underlying these differences remain incompletely understood. The authors performed 33-plex imaging mass cytometry (IMC) on 28 regions of interest from 17 human tumors (4 nodular BCC, 5 infiltrative/sclerosing BCC, 3 well-differentiated SCC, 5 poorly differentiated SCC), generating a spatially resolved single-cell atlas of >739,000 cells. They identified four fibroblast populations: reticular fibroblasts (retFIBs; healthy dermal fibroblasts), immunomodulatory CAFs (iCAFs; MMP1⁺/IDO1⁺), matrix CAFs (mCAFs; COL11A1⁺), and myofibroblast-like CAFs (myoCAFs; αSMA⁺/TAGLN⁺). Infiltrative BCC and poorly differentiated SCC showed increased stromal area, extracellular matrix deposition, and a shift toward myoCAF-dominated stroma compared with their less aggressive counterparts. Spatial neighborhood analysis revealed that iCAFs localize to immune-rich, inflamed niches with elevated T cell activation and checkpoint markers, whereas myoCAFs occupy fibroblast-dense, immune-poor regions with globally reduced immune activation. mCAFs preferentially associate with stromal immune compartmentalization, limiting immune cell entry into tumor nests. At the invasive front, iCAF density correlated with antigen-experienced T cell states, while myoCAF density correlated with immune exclusion. In vitro, patient-derived CAFs from aggressive tumors showed enhanced collagen-gel contraction, and MCAM⁺ (myoCAF-like) cells were enriched in contractile cultures. Finally, stromal nuclear YAP/TAZ was elevated in aggressive tumor subtypes, and single-cell transcriptomic reanalysis revealed that mechanotransduction-associated programs are enriched in RGS5⁺/myoCAF-like cells, whereas classical YAP/TAZ transcriptional signatures are not uniformly increased across CAF subsets. The study proposes that CAF composition—particularly the balance between iCAFs and myoCAFs—stratifies immune topology and may influence therapeutic responsiveness.

Personal highlights

Four spatially distinct CAF subtypes in skin cancer: IMC resolves retFIBs (normal dermal fibroblasts), iCAFs (MMP1⁺/IDO1⁺, inflammatory), mCAFs (COL11A1⁺, matrix-producing), and myoCAFs (αSMA⁺/TAGLN⁺, contractile). Protein-level definitions differ from transcript-based CAF classifications, with ACTA2 RNA poorly correlating with αSMA protein in situ.
Aggressive tumor subtypes show myoCAF enrichment: Infiltrative/sclerosing BCC and poorly differentiated SCC have larger stromal areas, increased ECM deposition, and higher myoCAF densities compared with nodular BCC and well-differentiated SCC, suggesting a shift from matrix-producing to contractile stromal programs.
CAF subtypes define distinct immune niches: iCAFs correlate positively with multiple immune lineages and localize to lymphocyte-rich, inflamed neighborhoods with elevated activation and checkpoint markers (PD-1, LAG-3, Granzyme B). By contrast, myoCAFs occupy fibroblast-dense, immune-poor regions and are negatively correlated with immune cell densities, indicating that CAF identity, not fibroblast abundance alone, determines local immune contexture.
Spatial organization differs from compositional changes: Subtype differences in cell-cell neighborhood enrichment do not simply mirror cell abundance changes. For example, myoCAFs are more abundant in INF BCC, yet many myoCAF-associated heterotypic neighborhoods are not stronger in INF BCC, indicating spatial rewiring beyond compositional shifts.
Functional validation links myoCAFs to contractility and mechanotransduction: Patient-derived CAFs from aggressive tumors show enhanced collagen-gel contraction, and MCAM⁺ (myoCAF-like) cells correlate with contractile capacity. Stromal nuclear YAP/TAZ is elevated in aggressive subtypes, and single-cell transcriptomics shows that mechanotransduction-input programs are enriched in RGS5⁺/myoCAF-like cells, whereas ECM/CAF matrix programs are enriched in mCAFs.

Why should we care?

The main takeaway of this work is that the "stroma" is not a uniform support structure but a heterogeneous ecosystem where different fibroblast populations create either immune-permissive or immune-exclusionary niches. This framework could guide future biomarker development and combination therapies (e.g., targeting myoCAFs to relieve immune exclusion). But the work is hypothesis-generating, not clinically actionable. Confirmation in larger, prospective cohorts with outcome data and functional perturbation experiments (e.g., CAF subset depletion in mouse models) is required before CAF subtype composition can be used to stratify patients.

Multiplexed perturbation enables scalable pooled screens

Oberlin et al. Nature Methods (2026). 10.1038/s41592-026-03095-w

The paper in one sentence

Delivering multiple CRISPR guide RNAs per cell (multiplicity of infection, MOI, of 2.5–10) can maintain or improve pooled CRISPR interference screen performance while reducing required cell numbers by up to 10-fold, enabling genome-wide screens with as few as half a million sorted cells.

Summary

Pooled CRISPR screens typically require large cell numbers (50–100 million) to maintain adequate sgRNA representation (200–500× coverage), which is prohibitive for primary cells, in vivo models, or resource-limited settings. The authors investigate whether co‑delivering multiple sgRNAs per cell via high MOI lentiviral transduction can compress screens while preserving performance. Using K‑562 CRISPRi cells (dCas9-Zim3), they first show that infection efficiency plateaus at ~30 copies per cell and that mean fluorescence intensity (MFI) of a reporter correlates linearly with copy number, providing a simple proxy for MOI. They demonstrate simultaneous knockdown of up to five surface markers at MOI 5, with ~75% of cells carrying ≥5 sgRNAs repressing all five markers. Using a compact 2,000‑sgRNA library targeting epigenetic regulators, they benchmark essential gene and drug‑resistance screens across MOIs (0.3–30) and cell numbers. In the constant cell number condition (250× coverage), MOI 2.5–10 yields higher area under the curve (AUC) for essential gene identification than MOI 0.3. In the constant sgRNA condition (reducing cells proportionally to MOI), MOI 2.5–5 compensates for 2.5–5‑fold fewer cells, maintaining performance that drops sharply in low‑MOI controls. For drug‑tolerance (imatinib resistance), moderate MOI (2.5–5) improves hit detection, with true‑positive rates exceeding 0.75. Finally, they apply optimized conditions (MOI 5) to a genome‑wide CRISPRi screen for regulators of ICAM‑1 (CD54) using as few as 0.5 million sorted cells (25× coverage), identifying known and novel regulators (e.g., TRAF6, AMBRA1, PTPN1/2). Accuracy and true‑positive rates at MOI 5 with 25× coverage match or exceed those of standard low‑MOI screens with 250× coverage

Personal highlights

Simple MOI quantification via fluorescence: MFI of a reporter (eGFP, mOrange2, mBFP2) correlates linearly with sgRNA copy number (validated by digital PCR) up to the integration plateau (~30 copies). A low‑MOI sample (single insertion) serves as a reference to extrapolate MOI, eliminating the need for specialized equipment.
Moderate MOI (2.5–10) improves essential gene detection: In constant cell number screens (250× coverage), MOI 2.5–10 achieves higher ROC‑AUC for identifying cell‑essential genes than standard MOI 0.3, with true‑positive rates increasing by up to 11%. Higher MOI (20–30) causes performance decline, likely due to sgRNA collisions or toxicity.
Multiplexing compensates for reduced cell numbers: Screens at MOI 2.5–5 can reduce cell numbers by 2.5–5‑fold (e.g., from 250× to 50× coverage) without loss of performance, whereas low‑MOI controls show progressive AUC decline. This is most pronounced for low‑abundance sgRNAs, indicating that multiplexing buffers against bottleneck losses.

Why should we care?

The key takeaway is that multiplexing sgRNAs is a simple, cost‑effective strategy to compress CRISPR screens without sacrificing data quality. But it is not a magic bullet: careful titration, validation of infection efficiency, and awareness of potential combinatorial effects are essential. The method is best suited for genome‑wide screens in robust cell lines where the goal is to identify strong hits (e.g., essential genes or drug‑resistance drivers), not for subtle or highly context‑dependent phenotypes where single‑guide precision is critical. As the authors note, pilot testing is strongly advised. Nevertheless, the framework is a valuable addition to the CRISPR toolbox, particularly for labs with limited cell numbers or budgets.

Geometry-First Generative Spatial Single-Cell Reconstruction

Azim et al. Arxiv (2026). DOI: 10.1145/3770855.3818141

The paper in one sentence

GEARS reconstructs continuous 2D spatial coordinates for dissociated single cells by learning to generate intrinsic tissue geometry from expression alone, using spatial transcriptomics as pose-invariant geometric supervision without requiring cell-type labels, histological images, or cell-to-spot assignment.

Summary

Single-cell RNA sequencing (scRNA-seq) provides deep transcriptomic profiles but destroys spatial context, while spatial transcriptomics (ST) preserves tissue structure at lower resolution with fewer spots. Most existing integration methods either deconvolve spot mixtures or map single cells onto the measured ST lattice, tying reconstructions to a fixed grid and slide-specific coordinate system—a limitation that becomes severe when scRNA-seq and ST come from unpaired samples (different individuals or tissue sections). The authors propose GEARS, a geometry-first framework that treats spatial reconstruction as generating a continuous intrinsic geometry for dissociated cells, guided by ST but not constrained to its absolute coordinates. GEARS first trains a domain-invariant encoder (combining VICReg and adversarial domain alignment) to align ST spot profiles and scRNA-seq profiles in a shared embedding space. Then, from ST slides, it samples many overlapping local minisets and trains a permutation-equivariant set model (Set Transformer) with an EDM-preconditioned residual diffusion refiner to predict local geometries under pose-invariant supervision derived from Gram matrices (which encode intrinsic distances, invariant to rotation/reflection). At inference, GEARS encodes all scRNA-seq cells, samples overlapping patches, generates local geometries, converts them to pairwise distance measurements, stitches distances via reliability-weighted aggregation, and solves a global distance-geometry problem to obtain canonical 2D coordinates. Extensive benchmarking on a seqFISH mouse embryo atlas (where single-cell ground-truth coordinates exist) and a human squamous cell carcinoma dataset (cross-slide generalization) shows that GEARS improves global distance preservation, local neighborhood fidelity, and spatial distribution alignment over nine baselines, including Tangram, STEM, scSpace, and CytoSPACE. Ablation studies confirm that residual diffusion refinement substantially corrects global scale and distribution mismatches. GEARS also recovers unsupervised spatial domains in hSCC that align with cell-type annotations and validates that predicted pDC locations correspond to elevated expression of pDC markers (BST2, NRP1) on the reference ST slide.

Personal highlights

Geometry‑first, not spot‑lattice‑first: Unlike methods that force scRNA-seq cells onto measured ST spot coordinates, GEARS reconstructs an intrinsic continuous geometry for cells, using ST only as geometric supervision. This decouples reconstruction from slide‑specific coordinate frames and supports cross‑section generalization.
Pose‑invariant supervision via Gram matrices: Targets are derived from Gram matrices (VVᵀ) of centered local spot coordinates, which are invariant to rotation and reflection. This eliminates the need to align absolute orientations between different tissue sections.
Permutation‑equivariant generator with residual diffusion refinement: A Set Transformer backbone ensures predictions are invariant to the order of input cells. An EDM‑preconditioned diffusion model refines coarse generator proposals by denoising residuals, substantially improving global distance calibration (Stress‑1 drops by 34% on hSCC, SWD by 46%).
Patchwise distance‑first inference for large datasets: Instead of predicting coordinates for all cells at once, GEARS samples overlapping patches, generates local geometries, extracts pairwise distances, and stitches them via a reliability‑weighted median. This scales to large scRNA-seq cohorts while maintaining geometric fidelity across a wide range of patch sizes.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 25/05/26

Sebastiaan Vanuytven — Sun, 31 May 2026 09:29:41 GMT

This week’s reads focus on how increasingly large-scale datasets and sophisticated computational models are transforming our ability to study biology across space, time, and entire organisms. MouseMapper integrates tissue clearing, light-sheet microscopy, and foundation-model based deep learning to map cellular changes across whole mouse bodies and uncovers surprising obesity-linked degeneration of facial sensory nerves. Alternatively, novel spatial and multimodal approaches such as ALARMIST and MultiTME go beyond static snapshots by reconstructing multicellular communication programs and by combining unpaired spatial and single-cell datasets into cohesive representations of tissue biology. Other studies address the challenge of faithfully modelling complex biological systems. Different immune donors humanised patient-derived xenografts retain surprisingly tumor-specific immune ecosystems, opening new avenues to study immune evasion and therapeutic response. Meanwhile, a time-resolved multiomic atlas of breast cancer reveals that tumours and immune cells run on misaligned circadian schedules, creating temporal windows for potential immune escape. Finally, two papers challenge conventional wisdom on biological robustness and sensing. The cancer cells appear to employ layered “recursive” robustness mechanisms, using alternative splicing and paralog compensation to buffer against deleterious mutations, and homing pigeons may use iron-loaded liver macrophages rather than specialised sensory organs to navigate via Earth’s magnetic field under overcast skies.

Preprints/articles that I managed to read this week

A deep-learning framework reveals whole-body perturbations at cell level

Kaltenecker et al. Nature (2026). 10.1038/s41586-026-10535-2

The paper in one sentence

A deep-learning pipeline (MouseMapper) segments nerves, immune cells, and 31 organs across whole cleared mouse bodies, revealing obesity-induced structural degeneration of facial sensory nerves and systemic immune cell redistribution, with molecular signatures conserved in humans.

Summary

Studying systemic diseases such as obesity requires mapping perturbations across multiple organs simultaneously, but tools for whole-body cellular analysis have been lacking. The authors developed MouseMapper, an ensemble of foundation-model-based deep-learning algorithms that segment peripheral nerves, detect immune cells (Cd68+ macrophages), and map 31 organs and tissues across entire cleared mouse bodies imaged by light-sheet fluorescence microscopy. Using transgenic reporter mice (Uchll-eGFP for nerves, Cd68-eGFP for macrophages) fed a high-fat diet for 16–18 weeks, they quantified obesity-induced changes. Unexpectedly, they identified structural degeneration of the infraorbital branch of the trigeminal nerve, reductions in nerve endings, edges, and vertices by ~58–61%, which correlated with functional sensory deficits in whisker stimulation tests. Spatial proteomics of the trigeminal ganglion revealed downregulation of axon-guidance and cytoskeletal pathways (e.g., SERPINA family proteins) and upregulation of complement and inflammation pathways. These molecular changes were recapitulated in post-mortem trigeminal ganglia from obese humans (BMI >30). Additionally, MouseMapper generated whole-body inflammation maps, showing tissue-specific shifts from small to large Cd68+ macrophage clusters in visceral adipose tissue, liver, and other organs, indicating heightened inflammatory states. The framework generalizes across imaging resolutions and antibody-labeling strategies without retraining.

Personal highlights

Whole-body segmentation at cellular resolution: MouseMapper segments peripheral nerves over centimetres, detects individual immune cells, and automatically delineates 31 organs/tissues, far exceeding previous methods (e.g., AIMOS segmented only 6 organs), enabling unbiased system-wide screening.
Obesity-induced degeneration of facial sensory nerve: Quantitative nerve graph analysis reveals that high-fat diet reduces infraorbital nerve endings by 60.7%, edges by 57.8%, and vertices by 57.6% without changing main nerve trunk thickness, pointing to impaired axonal branching rather than general degeneration.
Functional correlate of structural nerve changes: Obese mice exhibit significantly reduced whisker stimulation responses (mean score ~2.5 vs. ~5.5 in lean mice), linking structural trigeminal nerve pathology to sensory dysfunction.
Conserved proteomic signatures in mice and humans: Spatial proteomics of trigeminal ganglia identifies downregulation of SERPINA1/3 (anti-inflammatory proteases) and dysregulation of axon guidance, actin cytoskeleton, and complement pathways, changes mirrored in post-mortem human trigeminal ganglia from obese individuals.
Tissue-specific inflammation mapping: The framework quantifies shifts in macrophage cluster size distributions across 12 tissues; visceral adipose tissue and liver show a transition from small to medium/large clusters, while subcutaneous fat shows increased large clusters, revealing spatially resolved inflammatory remodeling.

Why should we care?

This study demonstrates the power of combining tissue clearing, light-sheet microscopy, and deep learning to map systemic disease at single-cell resolution across an entire mammalian body. The discovery that obesity damages a specific facial nerve (infraorbital branch of the trigeminal nerve) and impairs whisker sensation in mice, with analogous proteomic changes in obese humans, suggests that obesity may have previously unrecognized effects on sensory function. However, several caveats warrant attention. First, the Uchll-eGFP reporter line labels only a subset of peripheral nerves; the observed structural changes may not represent all nerve types. Second, the functional whisker test measures gross sensory response but does not isolate mechanosensory versus pain pathways. Third, while the proteomic overlap between mouse and human trigeminal ganglia is intriguing, the human samples were from elderly donors (mean age ~85 years), and the observed changes could reflect age-related neuropathy confounded with obesity. Fourth, the computational pipeline requires massive data storage (up to 50 TB per mouse at 4× resolution) and substantial GPU resources, limiting accessibility for many labs.

Humanized patient-derived xenografts preserve tumour-specific immune microenvironments

Stueckmann et al. bioRxiv (2026). 10.64898/2026.05.15.724697

The paper in one sentence

Patient-derived xenografts grown in mice with a reconstituted human immune system (huNOG-EXL) retain key features of the original tumour’s immune cell composition in a tumour-intrinsic, donor-independent manner, though myeloid representation remains incomplete in some models.

Summary

Preclinical cancer models that faithfully reproduce the human tumour immune microenvironment are essential for studying immune evasion and testing immunotherapies. Humanized immune system (HIS) mice, immunodeficient mice engrafted with human CD34+ hematopoietic stem cells, can support patient-derived xenografts (PDXs), but whether these models recapitulate the immune composition of the parental patient tumour has not been systematically evaluated. The authors generated 82 huNOG-EXL mice (expressing human IL-3 and GM-CSF to support myeloid differentiation) using eight different cord blood HSC donors and implanted them with 15 primary tumours (ovarian, head and neck, renal) previously established as PDXs in NSG mice. Using high-dimensional CyTOF and imaging mass cytometry, they profiled immune populations in PDX tumours, spleen, bone marrow, and matched primary tumour samples. Key findings: (1) Immune composition within PDX tumours is primarily driven by the engrafted tumour, not the HSC donor; (2) PDX tumours derived from the same primary sample cluster together across different donor backgrounds; (3) CD8+ T cell and macrophage phenotypic states are tumour-specific and reproducible; (4) Some distinctive features of primary tumours, such as γδ T cell enrichment in an ovarian cancer model and NK cell infiltration in a renal cancer model, are preserved in matched PDX tumours; (5) However, overall immune infiltration tends to be lower in PDX tumours than in primary tumours, and monocyte/macrophage representation is incomplete in certain models (e.g., HNSCC-5, OV-6). The authors also show that a patient-derived cell line xenograft (CDX) from a renal tumour generated an immune composition similar to its matched PDX. The study supports huNOG-EXL PDX models as a platform for studying tumour-intrinsic determinants of immune infiltration, while highlighting remaining limitations in myeloid cell fidelity.

Personal highlights

Tumour-intrinsic immune composition across HSC donors: PDX tumours derived from the same primary sample show highly similar immune profiles (median Pearson correlation 0.909) regardless of which human stem cell donor was used, whereas tumours from different primaries are much less similar (0.537). This establishes that the engrafted tumour, not donor variation, is the dominant determinant.
Preservation of rare immune features: An ovarian primary tumour (OV-2) with >20% γδ T cells gave rise to six PDX tumours across two HSC donors that all retained this enrichment. Similarly, a renal primary (RCC-3) with abundant NK cells produced eight PDX tumours, six of which showed NK cell infiltrates validated by imaging mass cytometry.
Reproducible T cell and macrophage states: CD8+ T cell exhaustion marker expression (PD-1, TIM-3) and macrophage subtype markers (FOLR2, TREM2, PD-L1) were consistent across PDX replicates of the same tumour, indicating that not just cell abundance but also functional states are tumour-driven.
Cell line xenografts resemble matched PDXs: A patient-derived cell line from the RCC-2 tumour (RCC-2-CL) generated immune infiltrates (monocytes/macrophages, CD4+ T cells) highly similar to those in five RCC-2 PDX tumours, suggesting that established cell lines may be usable for certain immune profiling studies.
Systemic immune composition reflects both HSC donor and tumour: In bone marrow and spleen, samples from mice with the same HSC donor were more similar than those from different donors. However, blood chimerism did not correlate with subsequent tumour immune infiltration, reinforcing that intratumoural immune composition is tumour-determined.

Why should we care?

The development of immunotherapies, drugs that help the immune system kill cancer, has been hampered by a lack of animal models that accurately reflect how human tumours interact with a human immune system. Traditional mouse models either lack a human immune system (so they cannot test human-specific therapies) or use human immune cells but lose the complex, patient-specific features of real tumours. This study systematically tests whether a popular humanized mouse model (huNOG-EXL with PDXs) reproduces the immune cell types found in the original patient tumours. The encouraging finding is that several tumour-specific immune patterns—including unusual enrichments of γδ T cells or NK cells—were preserved across multiple mice, even when the human stem cell donor varied. This suggests that the model can capture biology driven by the tumour itself rather than by individual immune system variation.

Decoding multicellular communication motifs from Spatial Transcriptomics with ALARMIST

Fan et al. bioRxiv (2026). 10.64898/2026.05.21.726986

The paper in one sentence

ALARMIST uses Bayesian Poisson tensor factorization to discover recurring multicellular communication motifs, coordinated sets of cell types and ligand-receptor interactions, from spatial transcriptomics data, revealing higher-order signaling programs missed by pairwise LRI analysis.

Summary

Cell-cell communication in tissues involves multiple cell types sending and receiving multiple signals simultaneously, yet existing computational methods analyze ligand-receptor interactions (LRIs) in isolation, losing the coordinated, higher-order structure of tissue microenvironments. The authors introduce ALARMIST, a probabilistic framework that decomposes a patch-by-LRI count matrix (constructed from spatial neighborhoods) into latent communication motifs. Each motif captures a recurrent pattern: which cell types signal to which others, which LRIs mediate those signals, and how these patterns are spatially organized. The framework uses Bayesian Poisson tensor factorization (BPTF) to handle extreme sparsity, projects motif activities to single-cell resolution, and links motif activation to downstream gene expression changes via Poisson GLMs (excluding ligand/receptor genes to avoid circularity). Benchmarking on semi-synthetic data (generated from scRNA-seq reference with ground-truth motifs) shows ALARMIST recovers cell-type interaction networks with ~0.8 cosine similarity, substantially outperforming COMPOTES, Tensor-cell2cell, and NICHES (<0.6). It also achieves higher F1 scores for motif-associated gene recovery and better spatial reconstruction (Lee’s L) than baseline methods. Cross-platform validation on matched Xenium 5K and CosMx 6K sections (COAD, OV, HCC) shows motif LRI compositions and spatial niches are concordant, with discordance attributable to platform differences in cell-type-specific LRI detection rather than fundamental method instability. Applied to lung adenocarcinoma (LUAD) progression (AIS to invasive), ALARMIST identifies a "healthy vasculature" motif (fibroblast-epithelial support signals, HGF→MET, FGF7→FGFR2) and a "tumor vasculature" motif (VEGFA→KDR, DLL4→NOTCH1). pDCs in healthy vasculature regions upregulate IRF7, TLR7/9, and type I interferon response genes, while tumor vasculature pDCs show downregulation, implicating pDCs as drivers of early inflammatory niches. In glioblastoma (GBM) low-grade to high-grade transformation, ALARMIST identifies an mGAM (malignancy-associated glioma macrophage)-centered hub-and-spoke motif, with GRN→SORT1 signaling from mGAMs to MES-like tumor cells and reciprocal ANXA1→FPR1 signaling.

Personal highlights

Higher-order communication motifs, not pairwise LRIs: ALARMIST jointly models all LRI co-occurrence within spatial patches using Bayesian Poisson factorization, capturing recurring multicellular signaling programs (e.g., “healthy vasculature” vs “tumor vasculature”) rather than ranking individual interactions, addressing a fundamental limitation of existing methods.
Superior performance on synthetic benchmarks: With ground-truth motifs embedded in scRNA-seq reference data (10 motifs, 16 cell types), ALARMIST recovers cell-type interaction networks with median cosine similarity ~0.8 versus <0.6 for NICHES, <0.5 for Tensor-cell2cell, and <0.4 for COMPOTES. It also achieves higher F1 scores for motif-associated gene recovery and better spatial reconstruction (Lee’s L) across varying cell densities.
Cross-platform concordance across Xenium and CosMx: On matched consecutive sections (COAD, OV, HCC), ALARMIST motifs show cosine similarities ~0.7 between platforms, and spatial niche proportions correlate strongly (Pearson r up to ~0.85). Discordance arises from cell-type-level LRI detection differences, not algorithmic instability, providing practical guidance for multi-platform studies.
pDCs as drivers of early lung cancer inflammation: In AIS-to-LUAD progression, ALARMIST reveals that “healthy vasculature” motif-active pDCs upregulate IRF7, TLR7/9, and type I interferon response genes, while T cells and macrophages show elevated IFN-γ response and glycolysis. Tumor vasculature motif-active pDCs show downregulated IRF7 and METTL3, with suppressed effector functions. This suggests chronic interferon stimulation may enable immune escape even before tumor invasion, a hypothesis the authors link to ICB resistance mechanisms.
mGAM-centered motif and GRN-SORT1 axis in glioma transformation: In low-grade glioma with high-grade regions, ALARMIST identifies a macrophage-centered hub-and-spoke motif where mGAMs (malignancy-associated glioma macrophages) engage MES-like tumor cells via GRN→SORT1 (macrophage→tumor) and receive ANXA1→FPR1 signals (tumor→macrophage). A 20-gene signature from motif-active mGAMs significantly stratifies TCGA LGG patient survival (log-rank p < 0.05), linking a spatial communication program to clinical outcomes.

Why should we care?

ALARMIST addresses a genuine gap: tumors are not just collections of pairwise cell interactions but complex networks where multiple cell types coordinate through multiple signals. By extracting "motifs", repeating patterns of multicellular signaling, the method distills spatial transcriptomics data into interpretable programs that can be mapped across disease stages. This is conceptually powerful. However, several limitations warrant caution. First, ALARMIST assumes mRNA expression correlates with functional protein signaling, ignoring post-transcriptional regulation, secretion dynamics, and receptor internalization. Second, the patch size (default 50 µm) is arbitrary and user-tuned; too large mixes distinct microenvironments, too small yields sparse counts. The authors acknowledge this but provide no data-driven default selection. Third, the binary classification of cells into "motif-active" via GMM thresholding may oversimplify continuous signaling gradients; the authors retain continuous loadings for impact analysis but threshold for spatial maps. Fourth, the cross-platform validation shows non-trivial discordance (e.g., 5–8 CosMx motifs poorly aligned per cancer type), attributed to cell-type LRI detection differences, but this also implies that motif discovery is sensitive to platform-specific biases in gene detection and cell typing, users switching platforms may obtain different biological conclusions. Fifth, the biological findings: pDCs driving early inflammation and GRN-SORT1 in glioma, are intriguing but entirely hypothesis-generating; no experimental validation (e.g., pDC depletion, GRN neutralization) is provided, and the survival signature, while statistically significant, was derived from the same data used to identify the motif (though tested in independent TCGA data).

Cycle-consistent deep generative modeling unifies cellular states across unpaired spatial and single-cell modalities

Zhang et al. bioRxiv (2026). 10.64898/2026.05.25.727736

The paper in one sentence

MultiTME uses cycle-consistent deep generative modeling to integrate unpaired spatial and single-cell datasets (e.g., Xenium, CODEX, scRNA-seq) into a shared latent space, enabling cross-modal cell typing, transcriptome panel completion, and correction of platform-specific technical biases without requiring paired measurements.

Summary

The authors present MultiTME, a variational autoencoder framework that learns a shared latent representation across unpaired modalities. Key innovations include: (1) modality-specific projection layers that map heterogeneous feature spaces (e.g., 50 proteins vs 5,000 genes) to a common intermediate dimension; (2) a shared encoder with no modality identifiers, forcing the latent space to capture biological state rather than platform artifacts; (3) cycle consistency losses that enforce bidirectional translation between modalities at both latent and observation levels, aligning distributions without paired data; (4) a spatial regularizer that encourages cells of the same type in local tissue neighborhoods to have similar latent representations; and (5) optional semi-supervision using marker genes or expert annotations. Benchmarking on a human tonsil dataset (scRNA-seq + CODEX) shows MultiTME achieves 94.7% cell typing accuracy, outperforming MaxFuse (89.1%), Celesta (70.9%), and Astir (73.7%). On colorectal cancer data (scRNA-seq + Xenium), MultiTME imputes held-out genes with median Pearson correlation ~0.5, significantly better than ENVI, Harmony, and StabMap (<0.3). The imputed transcriptome reveals a spatially organized proliferative–invasive tumor axis not visible from Xenium alone. On serial sections of Visium HD (spatially resolved bulk transcriptomics) and CODEX (single-cell proteomics), MultiTME assigns whole-transcriptome profiles to individual CODEX cells, achieving per-gene correlations ~0.7 with ground-truth Visium HD, compared to ~0.1 for iStar. Finally, across five cancer types, MultiTME translates CosMx measurements to match Xenium, correcting platform-specific background biases and improving cross-platform concordance (R² increases from ~0.42 to near 1.0 after translation).

Personal highlights

Cycle consistency for unpaired multimodal integration: Enforces bidirectional translation between modalities without requiring paired cells or shared features. This allows MultiTME to align dissociated scRNA-seq with spatially resolved CODEX or Xenium data, where no cell-level correspondence exists.
Spatially regularized latent space: Uses k-nearest neighbors in physical tissue coordinates weighted by cell-type probabilities to encourage that neighboring cells of the same type have similar latent representations. This preserves spatial organization in the integrated embedding, a feature absent from most single-cell integration methods.
State-of-the-art cross-modal cell typing: On tonsil scRNA-seq + CODEX, MultiTME achieves 94.7% accuracy in transferring scRNA-seq annotations to spatial proteomic cells, substantially outperforming MaxFuse (89.1%) and proteomic-only classifiers (70–74%). The confusion matrix shows reduced misclassification among closely related lymphocyte subsets (CD4 vs CD8 T cells, germinal-center vs proliferating B cells).
Whole-transcriptome panel completion and spatial super-resolution: MultiTME imputes missing genes in Xenium panels (median per-gene Pearson r ~0.5) and assigns full transcriptomes to CODEX cells from adjacent Visium HD sections, recovering spatial expression patterns of EPCAM and MUC2 at single-cell resolution, sharper than Visium HD bins and far better than the iStar baseline.
Platform bias correction across CosMx and Xenium: On five cancer types, MultiTME translation from CosMx to Xenium dramatically improves cross-platform gene expression concordance (slope from 0.42 to near 1.0, R² increases substantially). The model generalizes to held-out fields of view and, to a lesser extent, to unseen cancer types, suggesting it learns transferable technical correction.

Why should we care?

MultiTME addresses a conceptually big problem: biological measurements are always incomplete and biased by the technology used. A cell’s transcriptome measured by Xenium differs systematically from the same cell measured by CosMx; a protein panel captures only a fraction of functional state; scRNA-seq loses all spatial context. MultiTME’s core assumption, that there exists a shared underlying biological state that generates all these modality-specific observations, is reasonable but unprovable. The cycle consistency mechanism is elegant: if translating from modality A to B and back returns the original, then the translation must preserve biologically relevant information. However, this does not guarantee that the latent space captures true biology rather than a technically convenient but biologically arbitrary alignment. Several limitations temper enthusiasm. First, the model requires either expert annotations or high-confidence marker-based pseudo-labels to anchor cell types. In datasets where marker specificity is poor or cell states are not well captured by known markers, the semi-supervised regularization may propagate errors. Second, the spatial regularizer assumes that consecutive tissue sections preserve local cell-type composition, which holds approximately but ignores tissue deformation, cell migration, and differences in sectioning plane. Third, the panel completion results, while statistically significant, show median correlations around 0.5, meaningful for pathway-level analyses but insufficient for confident single-gene conclusions, especially for low-abundance transcripts. The authors demonstrate that aggregated pathway scores are better preserved than individual genes, which is the appropriate use case. Fourth, the platform bias correction is impressive but the leave-one-disease-out generalization drops noticeably, indicating that some platform-specific biases are tissue-dependent; a universal “CosMx-to-Xenium” transformer does not yet exist

Circadian misalignment underlies immune escape in breast cancer

Liang et al. bioRxiv (2026). 10.64898/2026.05.26.726543

The paper in one sentence

Time-resolved single-nucleus multiomic mapping of the breast cancer tumor microenvironment reveals that cancer epithelial cells and immune cells are rhythmically but desynchronized across the circadian cycle, creating temporal gaps in antigen presentation, T cell activation, and checkpoint signaling that promote immune evasion.

Summary

Circadian rhythms regulate cellular processes, but how they are coordinated across cell types within the tumor microenvironment (TME) has remained largely unexplored. The authors performed snRNA-seq and snATAC-seq on 4T1 mouse triple-negative breast cancer (TNBC) tumors collected at four circadian time points (CT4, CT10, CT16, CT22), generating a time-resolved multiomic atlas of 101,321 nuclei spanning 14 cell types. They identified 6,224 circadian genes across the TME, organized into 20 functional modules covering cell cycle, immune activation, metabolism, and extracellular matrix remodeling. Cancer epithelial cells exhibited a striking temporal switch: a proliferative state (high cell cycle, DNA repair) peaked around the early daytime (CT0–3), while an inflammatory state (high antigen presentation, interferon response) peaked at night. In contrast, CD4⁺ and CD8⁺ T cells and macrophages showed peak activation and effector programs during the nighttime (murine active phase). This created three axes of circadian misalignment: (1) tumor proliferation decoupled from immune activation; (2) antigen presentation (peaking at night) out of phase with T cell recognition (peaking during the day); and (3) PD-1 expression in T cells (peak at night) asynchronous with PD-L1 expression in cancer cells and macrophages (peak during the day). Within T cells, activation and exhaustion programs overlapped temporally, potentially promoting dysfunction. Using CYCLOPS 2.0 to infer circadian phase from human TNBC scRNA-seq data, the authors found conserved phase relationships and similar misalignment patterns, with immune activation peaking during the inferred daytime (active phase for diurnal humans). The study proposes that circadian desynchronization across TME compartments is a previously underappreciated mechanism of tumor immune evasion and suggests that timing of immunotherapy may influence efficacy.

Personal highlights

Comprehensive circadian atlas of the TME: Time-resolved single-nucleus multiomics across 14 cell types reveals 6,224 rhythmic genes organized into 20 functional modules, including cell-type-specific and shared circadian programs in cancer cells, immune populations, and stromal cells.
Cancer epithelial cells oscillate between proliferative and inflammatory states: A time-of-day–dependent shift from a proliferative state (peaking at early daytime, CT0–3) to an inflammatory state (peaking at night, CT20–24), characterized by differential expression of cell cycle vs. antigen presentation and interferon-response genes. Both state composition and intrinsic subtype-specific rhythms contribute to this oscillation.
Three axes of circadian misalignment: (i) Tumor proliferation peaks opposite to T cell/macrophage activation; (ii) Antigen presentation (cancer and myeloid cells) peaks at night, while T cell recognition peaks during the day; (iii) PD-1 expression in T cells peaks at night, whereas PD-L1 in cancer/myeloid cells peaks during the day, creating prolonged checkpoint signaling across the cycle.
Temporal overlap of T cell activation and exhaustion: In CD4⁺ T cells, activation/costimulation genes peak alongside exhaustion markers (Ctla4, Tox) during the daytime, suggesting that circadian-driven activation may simultaneously engage inhibitory circuits, constraining effective antitumor responses.
Conserved circadian architecture in human TNBC: Using CYCLOPS 2.0 to infer phase from single-cell data, the authors demonstrate preserved phase relationships among immune populations and similar misalignment between antigen presentation and T cell recognition, supporting translational relevance for chronotherapy.

Why should we care?

The main takeaway of this study is that the time of day matters for tumor–immune interactions, and this has potential implications for when cancer immunotherapies are administered. But the field is still early: robust prospective clinical trials testing timed immunotherapy are needed before any practice change. The paper is a strong hypothesis-generating resource, not a therapeutic guideline. It also highlights a broader principle: biological systems are not static snapshots; time is a dimension that must be integrated into our understanding of disease mechanisms and treatment design.

Recursive mutational robustness in cancer through intra- and inter-genic compensation

Dandage et al. bioRxiv (2026). 10.64898/2026.05.26.727768

The paper in one sentence

Cancer cells tolerate deleterious mutations by upregulating alternatively spliced isoforms that skip the mutated region, a form of intra-genic compensation that often works together with paralog-mediated inter-genic compensation in a recursive manner, driven by nonsense-mediated decay and transcriptional adaptation.

Summary

Cancer cells harbor hundreds to thousands of somatic mutations, yet most are tolerated without catastrophic fitness loss. Known mechanisms include paralog buffering (inter-genic compensation), but many essential genes lack paralogs. This study explores whether alternative splicing provides intra-genic functional redundancy, where one isoform can compensate for another that carries a deleterious mutation. Using pan-cancer genomics and transcriptomics data (30 tumor types from TCGA, cancer cell lines from CCLE), the authors systematically identified cases where mutations occur in exons that can be skipped in alternative isoforms. They found that mutation-skipping isoforms are frequently expressed (median ~33% of isoforms per gene) and often upregulated in samples carrying the mutation compared to matched controls. They developed a "mutational robustness score" combining upregulation magnitude and compensation extent. Strong intra-genic robustness was associated with higher perturbation tolerance (higher deleterious mutation frequency) and was more context-specific than inter-genic robustness. Notably, genes exhibiting intra-genic robustness also showed stronger inter-genic compensation through paralogs, a "recursive" architecture. Mechanistically, they implicate nonsense-mediated mRNA decay (NMD) of mutation-bearing isoforms triggering self-transcriptional adaptation (self-TA), leading to upregulation of the gene's pre-mRNA and consequently relative increase of mutation-skipping isoforms. The study also shows that mutation-skipping isoform upregulation correlates with differential isoform usage in protein interaction partners, suggesting restorative rewiring of protein complexes. Tumor suppressor genes (TSGs) showed significantly lower robustness than non-TSGs, and high robustness in non-TSGs was associated with worse patient survival across most cancer types, indicating that cancer cells exploit this buffering to preserve pro-tumorigenic functions.

Personal highlights

Widespread expression of mutation-skipping isoforms: Across 30 cancer types, ~33% of a gene’s isoforms that skip deleterious mutations are expressed, comparable to the overall fraction of expressed isoforms (~39%). Exon skipping is the predominant splicing event enabling mutation avoidance.
Compensatory upregulation of skipping isoforms: In tumors carrying mutations that perturb specific isoforms, the mutation-skipping isoforms are frequently upregulated (162 significant events at FDR<0.1, far outnumbering downregulation). This trend holds in CRISPR perturbation data, where perturbation-skipping isoforms increase expression after sgRNA targeting.
Recursive robustness between isoform and paralog levels: Genes with strong intra-genic robustness (score >0.5) show significantly higher inter-genic robustness scores (p=3e-5), and vice versa. This suggests layered compensatory architectures where alternative splicing and gene duplication provide nested buffering against mutational insults.
Self-transcriptional adaptation as mechanism: Mutations that generate premature termination codons (frameshift, stop-gain) are most associated with intra-genic robustness. NMD of these isoforms correlates with increased pre-mRNA abundance (rs=0.29, p<1e-10), supporting a model where degradation intermediates trigger transcriptional upregulation of the same gene, selectively enriching mutation-skipping isoforms.
Tumor suppressors evade robustness, non-TSGs exploit it: TSGs have significantly lower intra-genic and inter-genic robustness scores compared to non-TSGs, suggesting their inactivation requires not just mutation but also circumvention of compensatory mechanisms. Conversely, high robustness in non-TSGs correlates with worse survival in most cancer types (hazard ratio >1 in 16/20 cancer types), indicating cancer cells depend on protecting non-essential but pro-growth functions.

Why should we care?

The main takeaway is that cancer's mutational tolerance is not merely passive, it involves active transcriptional adaptation through alternative splicing and paralog upregulation. This opens potential therapeutic angles: if cancer cells depend on specific mutation-skipping isoforms for survival, those isoforms could be targeted (e.g., with antisense oligonucleotides) to create synthetic lethality. However, the translational gap is large; the study is hypothesis-generating and would require extensive functional validation before any clinical application. As a conceptual advance, it reframes how we think about genetic robustness—not just as duplication-driven but as a layered, recursive property of gene architecture. But the mechanistic details remain speculative, and the clinical relevance awaits prospective testing.

Homing pigeon navigation relies on superparamagnetic macrophages under overcast conditions

Lisowski et al. Science (2026). 10.1126/science.ady2486

The paper in one sentence

Superparamagnetic iron-accumulating macrophages in the pigeon liver are required for magnetic orientation when solar cues are unavailable, suggesting a novel organ-level mechanism for magnetoreception.

Summary

For decades, the mechanisms by which birds sense Earth’s magnetic field have remained controversial, with competing hypotheses implicating cryptochromes in the retina, magnetite particles in the beak, or ion-channel perturbations in the vestibular system. This study reports a fourth, unexpected mechanism: macrophages in the pigeon liver accumulate ferric iron (Fe³⁺) within ferritin protein nanocages, rendering them superparamagnetic—a property previously described in mammalian splenic red pulp macrophages. Using vibrating sample magnetometry, the authors show that pigeon liver (and to a lesser extent spleen) exhibits a magnetic blocking temperature (T_B) of 12 K and hysteresis loops at low temperatures, consistent with superparamagnetic ferritin nanoparticles, whereas muscle, beak, and eye lack such signals. Histological staining (Prussian blue) reveals iron-positive cells exclusively in liver and spleen, colocalizing with MHC II⁺ macrophages. Magnetic column separation and single-cell RNA sequencing confirm these cells express macrophage signature genes (including Spi-c, involved in erythrocyte clearance) and phagocytose dextran. Clodronate liposome treatment eliminates magnetic liver cells, reduces Spi-c expression, and depletes MHC II⁺ macrophages, without affecting heterophils (avian neutrophils). Electron microscopy shows that iron-laden macrophages reside within 2 μm of unmyelinated nerve fibers in the hepatic portal triad, suggesting potential neuro-immune signaling. Crucially, the authors performed a behavioral experiment: 34 homing pigeons trained over a 19 km route were randomly assigned to clodronate or control liposomes. Under completely overcast conditions (no sun or polarized light cues), all 16 control pigeons homed within 70 minutes, whereas none of the 18 clodronate-treated birds returned that day, showing random spatial orientation. When cloud cover cleared and the sun became visible, clodronate-treated pigeons homed normally, indicating intact flight ability and reliance on solar cues when available. The authors propose that superparamagnetic hepatic macrophages collectively sense the geomagnetic field and transmit directional information to the brain via afferent vagal or sympathetic innervation.

Personal highlights

Superparamagnetic macrophages in the liver: Vibrating sample magnetometry identifies a magnetic blocking temperature (T_B = 12 K) and hysteresis loops in pigeon liver and spleen, but not in muscle, beak, or eye. Prussian blue staining confirms ferric iron accumulation exclusively in these organs, localized to cells with macrophage morphology and MHC II expression.
Macrophage depletion abolishes magnetic orientation: Under overcast skies (no solar cues), 0/18 clodronate-treated pigeons homed vs 16/16 controls. The same depleted birds homed normally when the sun emerged, proving the effect is specific to magnetic orientation, not general flight impairment.
Macrophage–nerve proximity: Electron microscopy reveals iron-positive macrophages within 2 μm of unmyelinated nerve bundles in the liver portal triad, providing an anatomical substrate for signal transmission to the brain via the autonomic nervous system.
Mechanistic link to iron metabolism: Pigeon liver macrophages express Spi-c and other genes involved in erythrocyte clearance and ferritin storage, similar to red pulp macrophages in mammals. The superparamagnetic property arises naturally from the ferritin nanocages that sequester iron from hemoglobin degradation.
A fourth magnetoreception mechanism: Distinct from cryptochrome-based (light-dependent), beak magnetite, and vestibular ion-channel hypotheses, this liver macrophage-based mechanism operates under overcast conditions and may generalize to other animals (e.g., bats, sharks) that navigate without visual cues.

Why should we care?

This paper challenges a long-standing orthodoxy in sensory biology: that magnetic field detection resides in specialized cells within the head (eye, beak, inner ear). Instead, it implicates a peripheral organ, the liver, and a cell type better known for immune defense than for sensing physical forces. The finding that iron-accumulating macrophages, which exist in many vertebrates, can become superparamagnetic and potentially influence navigation is conceptually striking. The behavioral experiment is clean: clodronate treatment removes the cells, and pigeons lose their ability to orient under overcast conditions but not under sunny skies, ruling out non-specific toxicity.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 18/05/26

Sebastiaan Vanuytven — Sun, 24 May 2026 11:24:41 GMT

This week’s reads focus on a common challenge in modern biology: how to disentangle signal from context, whether that means reconstructing the embryonic origin of a childhood cancer, separating true cellular expression from spatial contamination, or identifying the stromal programs that suppress anti-tumor immunity. One study uses naturally occurring developmental mutations to trace a rare sarcoma through monozygotic twins, showing that the tumor arose in one fetus and metastasized in utero to the other . There are several computational advances that are pushing spatial biology towards tissue-scale resolution and reliability: HESTIA and Cellist address the growing challenge of analyzing and segmenting million-cell spatial datasets, SpatialArtifacts detects damaged regions that resemble biological signal, and DeSpotX mathematically addresses contamination in single-cell spatial transcriptomics via identifiable generative modeling. A bispecific antibody approach that targets pathogenic fibroblasts alone and spares systemic TGFβ toxicity is used to address the central conductors of immune suppression in lung cancer, LRRC15-positive fibroblasts on the tumor microenvironment side.

Preprints/articles that I managed to read this week

Embryonic origin of cancer in newborn twins

Walkowiak et al. bioRxiv (2026). 10.64898/2026.05.10.722519

The paper in one sentence

Whole‑genome sequencing of 23 normal tissues, 10 tumour samples, and 11 placental samples from newborn monozygotic twins with a rare sarcoma reveals that the tumour arose in one twin and spread in utero to the other, while early embryonic lineages contributed asymmetrically to the placenta and each twin.

Summary

The authors investigated the origin of an MN1::ZNF341‑rearranged undifferentiated sarcoma in newborn monozygotic twin girls (twin A with disseminated lesions, twin B with brain and skin lesions). They performed whole‑genome sequencing (WGS) on six normal samples per twin (various organs), 11 bulk placental samples, 10 tumour samples (eight from twin A, two from twin B), and 12 laser‑capture microdissected (LCM) trophoblast samples. After filtering out germline variants and sequencing artefacts, they identified 254 early embryonic somatic mutations (mosaic variants arising during development) by requiring: ≥300‑600 reads spanning the position across normal samples, VAF ≥0.1 in at least one sample, no significant strand bias, and high mapping quality. These mutations were used as lineage tracers to reconstruct twinning phylogeny. They also called clonal tumour mutations and copy number alterations (CNAs; loss of 1q and 18p) using ASCAT. Three mutation groups emerged: (A) mutations present in both twins (shared lineage), (B) twin‑A‑specific mutations (VAF ~0.5 in twin A, absent in twin B), and (C) twin‑B‑specific mutations. Unexpectedly, placental samples were dominated by twin B lineages (not equal contribution). The twinning phylogeny showed asymmetric fates: one lineage gave rise almost exclusively to twin A, another to both twins and placenta, and a third to twin B and placenta. For the tumour, all tumour samples (including those from twin B) carried twin‑A‑specific mutations but not twin‑B‑specific mutations, proving a single origin in twin A followed by in utero metastasis to twin B. Subclonal tumour analysis (13 mutation clusters based on VAFs across samples) revealed parallel evolution after the MN1::ZNF341 fusion, with different clones seeding different metastatic sites. Finally, using VAFs of twin‑specific mutations, they estimated substantial cell transfer (twin‑to‑twin transfusion) in both directions.

Personal highlights

Developmental somatic mutations as natural lineage tracers: instead of using experimental barcodes, the authors leveraged spontaneously occurring somatic mutations acquired during the first cell divisions of the embryo. By sequencing multiple normal tissues from both twins and the placenta, they identified 254 early mutations whose VAF patterns distinguish three embryonic lineages with asymmetric contributions to each twin and the placenta.
Stringent filtering to distinguish early embryonic from germline and artefactual variants: somatic variants were called with CaVEMan and filtered by depth (300‑600 reads), VAF ≥0.1, absence of strand bias, and manual JBrowse inspection. Germline variants were removed using a binomial test (VAF >0.5 expected for heterozygous germline). Variants in copy‑number‑altered regions were excluded. This rigorous pipeline is essential for reliable lineage tracing in a single family without technical replicates.
Estimating tumour cell infiltration in normal tissues using phased CNAs: tumours had loss of 1q and 18p. By phasing heterozygous SNPs in these regions (assigning alleles to the lost vs retained chromosome based on VAF >0.75 or <0.25 in a pure tumour sample), they calculated the fraction of tumour cells in each normal sample from the median VAF deviation from 0.5. This provided purity estimates consistent with direct counting of clonal tumour mutations, enabling correction for cross‑sample contamination.
Subclonal tumour phylogeny from mutation cluster VAFs: clonal tumour mutations (present in all samples) were distinguished from subclonal ones by clustering VAFs across eight tumour samples. Thirteen clusters emerged, revealing that the MN1::ZNF341 fusion was an early (truncal) event, followed by parallel evolution: e.g., cluster 11 mutations absent from the right parietal tumour, cluster 12 unique to that tumour, cluster 13 unique to liver metastasis. This demonstrates within‑patient metastatic heterogeneity.

Targeting LRRC15 in cancer-associated fibroblasts modifies the extracellular matrix and enhances tumor immune responses to suppress lung cancer progression

Qi et al. Cancer Research (2026). 86(10):2377–2392

The paper in one sentence

A specific type of fibroblast in lung tumors, marked by the protein LRRC15, promotes cancer growth by remodeling the surrounding matrix and polarizing immunosuppressive macrophages, and targeting LRRC15 with a novel bispecific antibody slows tumor progression in mice.

Summary

This study investigates LRRC15-positive cancer-associated fibroblasts (CAFs), a tumor-specific cell population enriched in lung cancer. Using single-cell transcriptomics of human and mouse samples, the authors show that LRRC15+ CAFs are associated with poor patient survival. Mechanistically, LRRC15 in CAFs drives the production of extracellular matrix components, particularly collagen I, which in turn promotes the polarization of CD206+ "M2-like" macrophages. These macrophages suppress CD8+ T-cell activity, creating an immunosuppressive environment that favors tumor growth. Genetic deletion of LRRC15 in CAFs reduces collagen deposition, decreases M2 macrophage polarization, restores CD8+ T-cell cytotoxicity, and slows tumor progression in multiple mouse models—effects that are dependent on an intact immune system. Finally, the authors develop a bispecific antibody that simultaneously targets LRRC15 and neutralizes TGFβ (the cytokine that induces LRRC15 expression). This antibody reduces LRRC15 expression in CAFs, limits tumor growth, and avoids the systemic toxicity associated with broad TGFβ inhibition.

Personal highlights

LRRC15+ CAFs are tumor-enriched and prognostic: in lung cancer patients, LRRC15+ CAFs constitute ~40% of all fibroblasts within tumors but are nearly absent in adjacent normal tissue, and their signature correlates with worse survival.
LRRC15 drives macrophage polarization via ECM remodeling: LRRC15 deficiency in CAFs reduces collagen I production, and this diminished extracellular matrix directly limits the polarization of macrophages toward an immunosuppressive CD206+ phenotype.
Immune-dependent tumor suppression: genetic deletion of LRRC15 in CAFs slows lung tumor growth in immunocompetent mice but has no effect in immunodeficient NSG mice or in direct co-culture with cancer cells, confirming that the effect is mediated by the immune system.
Macrophages as critical mediators: depleting macrophages abolishes the tumor-suppressive effect of LRRC15 deletion, placing macrophages downstream of LRRC15+ CAFs in the immunosuppressive cascade.
A bispecific antibody with improved safety: An LRRC15-TGFβ trap antibody preferentially accumulates in LRRC15+ CAFs within tumors, reduces LRRC15 expression and ECM density, suppresses tumor growth in mice, and avoids the splenomegaly seen with systemic TGFβ blockade.

Why should we care?

Tumors are not just masses of cancer cells, they are complex ecosystems. Fibroblasts, a type of connective tissue cell, can be “corrupted” by tumors to become allies that shield the cancer from the immune system. The researchers identified a specific protein called LRRC15 on these corrupted fibroblasts that acts like a master switch. When they blocked LRRC15 in mice, the fibroblasts stopped building dense barriers and no longer instructed immune cells called macrophages to suppress the body’s cancer-killing T-cells. The broader takeaway is that instead of trying to kill cancer cells directly, we might be able to reprogram the tumor’s supportive environment. The bispecific antibody developed here is particularly clever: it targets LRRC15 to deliver a TGFβ-blocking “payload” only to the problematic fibroblasts, avoiding the serious side effects that occur when TGFβ is blocked throughout the body. While this work is still in preclinical mouse models, it highlights a promising strategy for making lung tumors more vulnerable to the immune system, an approach that could eventually be combined with existing immunotherapies.

HESTIA: Scalable Multimodal Integration of Histology and High-Resolution Spatial Transcriptomics for Robust Spatial Domain Identification

Zhong et al. bioRxiv (2026). 10.64898/2026.05.14.723098v1

The paper in one sentence

HESTIA is a computationally efficient algorithm that integrates tissue histology images with high-resolution spatial gene expression data to map tissue structures at single-cell scale, overcoming the memory failures and data sparsity that plague existing methods.

Summary

Modern spatial transcriptomics technologies (e.g., Stereo-seq, Visium HD) can now map gene expression across entire tissue sections at subcellular resolution, generating datasets with millions of data points. However, existing analysis tools were not designed for this scale, they run out of memory or produce noisy results due to extreme sparsity (most genes are undetected in any given spot). The authors present HESTIA, a multimodal algorithm that addresses both challenges. HESTIA uses a hierarchical vision transformer to extract features from H&E-stained histology images, and a novel dual-autoencoder system that simultaneously processes high-resolution and spatially aggregated low-resolution transcriptomic data. A cross-resolution consistency constraint stabilizes sparse signals. The fused representation is then used for spatial domain identification (clustering). Benchmarking on a mouse brain dataset (Stereo-seq, >818,000 bins at highest resolution) shows that HESTIA is the only method among nine that can process the full dataset without memory failure. It achieves superior clustering accuracy and spatial continuity compared to eight competing algorithms. Applied to a human lung adenosquamous carcinoma sample (>2 million bins) and two colorectal cancer Visium HD datasets, HESTIA identifies clinically relevant intratumoral heterogeneity, including an immune-active B-cell niche in lung cancer and tumor subdomains associated with REG family genes or SPP1+ macrophages in colorectal cancer. Ablation studies confirm that the dual-resolution design provides greater benefits for sparser, lower-quality data.

Personal highlights

Unmatched scalability: HESTIA is the only multimodal method capable of processing a full-slice Stereo-seq dataset with >818,000 bins (or a 2-million-bin human lung cancer sample) on a single GPU with <120 GB RAM, while eight competing algorithms fail due to out-of-memory errors at much lower resolutions.
Dual-autoencoder with cross-resolution consistency: by learning from both high-resolution and spatially aggregated low-resolution transcriptomic data simultaneously, HESTIA stabilizes sparse molecular signals and improves clustering accuracy—a benefit that is most pronounced for lower-quality or sparser datasets.
Superior spatial domain identification in mouse brain: at both bin20 (grid) and single-cell (cellbin) resolution, HESTIA accurately delineates fine anatomical structures (e.g., CA1/CA3 stratum pyramidale, dentate gyrus, corpus callosum) with higher adjusted Rand index and spatial continuity than SpaGCN, MUSE, StereoMM, ConGR, and other competitors.
Clinically relevant discoveries in human cancer: in a big lung adenosquamous carcinoma sample, HESTIA identified an immune-active niche within the squamous region enriched for B-cell immunity genes (CCL19, IGLC2, IGLC3, MZB1). In colorectal cancer, it resolved intratumoral heterogeneity—one sample showed REG family gene-expressing tumor subdomains (linked to invasion and poor differentiation), another revealed tumor subdomains co-localized with pro-tumorigenic SPP1+ macrophages.
Robust to transcriptomic sparsity: ablation studies show that the dual-resolution design yields greater performance gains when sequencing depth is low, demonstrating that HESTIA effectively mitigates the gene dropout problem inherent to high-resolution spatial platforms.

Why should we care?

Spatial transcriptomics allows scientists to create detailed molecular maps of tissues showing exactly which genes are active in each cell and where. HESTIA solves this problem by being smart about memory usage and by compensating for the inevitable gaps in the data (because no technology can measure every gene in every cell). With HESTIA, researchers can now analyze entire cancer biopsies at single-cell resolution to identify subtle but clinically important features, such as an immune hotspot that might predict immunotherapy response, or a tumor boundary that is actively invading healthy tissue.

Cellist: Accurate, Scalable and Cross-Platform Cell Identification for High-Resolution Spatial Transcriptomics

Sun et al. Nature Genetics (2026). 10.1038/s41588-026-02610-1

The paper in one sentence

Cellist is a computational method that integrates tissue images with gene expression data to accurately assign transcripts to individual cells across diverse spatial transcriptomics platforms, overcoming the memory and accuracy limitations of existing tools.

Summary

High-resolution spatial transcriptomics technologies can now map gene expression at subcellular resolution, but identifying which transcripts belong to which cell, remains a major bottleneck. Existing methods are either platform-specific, computationally too slow for large datasets, or fail to preserve the biological integrity of gene expression within cells. The authors introduce Cellist, a multimodal approach that combines nuclear staining images with spatial gene expression to segment cells. Cellist first identifies nuclei from images (using Watershed or Cellpose), then uses a probabilistic model that balances expression similarity and physical distance to assign surrounding transcripts to the correct cell. Benchmarking on nine datasets across five platforms (Stereo-seq, Seq-Scope, seqFISH+, STARmap, and 10x Xenium) shows that Cellist consistently achieves higher within-cell expression consistency, better cell-type annotation accuracy, and superior computational efficiency compared to existing methods (SCS, StereoCell, Baysor, UCS). In an application to post-neoadjuvant immunotherapy NSCLC samples, Cellist-enabled segmentation revealed spatially distinct tumor clones with different stemness signatures and identified macrophage subtypes (CXCL9, SPP1, TREM2) with unique spatial distributions at the tumor-stroma boundary, offering insights into therapy response.

Personal highlights

Cross-platform versatility: Cellist works on both barcoding-based platforms (Stereo-seq, Seq-Scope) and imaging-based platforms (seqFISH+, STARmap, Xenium), unlike most existing methods that are designed for only one technology family.
Superior within-cell expression purity: using novel metrics (random correlation, directional split correlation, and variance-based purity score), Cellist consistently outperforms competing methods in preserving transcriptomic coherence within segmented cells, meaning cleaner, less contaminated single-cell profiles.
Scalable to massive datasets: Cellist processes a full Stereo-seq mouse brain dataset (~140,000 cells) and a human NSCLC dataset (>2 million bins) on a single GPU with moderate memory, while methods like SCS fail on large samples due to memory constraints.
Improved cell-type annotation: in mouse brain, Cellist-segmented cells showed higher correlation with matched scRNA-seq reference data and yielded more specific marker gene expression (higher log fold-changes) compared to other segmentation methods.
Biological discovery in NSCLC: applied to post-immunotherapy lung cancer samples, Cellist identified two tumor clones with distinct copy number alterations, one with higher cancer stem cell signatures. It also revealed spatial co-localization of CXCL9 and SPP1 macrophages at the tumor boundary, with opposing roles in T-cell recruitment versus exclusion.

Why should we care?

Spatial transcriptomics allows scientists to see which genes are active in exactly which cells within a tissue like a molecular Google Maps of a tumor or brain. However, the raw data are messy: transcripts (gene readouts) are scattered around, and no one has perfect boundaries around each cell. If you assign a transcript to the wrong cell, your conclusions about what that cell is doing will be wrong. Cellist is like a smart sorting algorithm that uses both the tissue image (where the nucleus is) and the gene expression patterns to decide which transcripts belong to which cell. It works across different technologies and scales to millions of cells without crashing

SpatialArtifacts: a computational framework for tissue artifact detection in spatial transcriptomics data

He et al. bioRxiv (2026). 10.64898/2026.05.15.725260

The paper in one sentence

SpatialArtifacts is a computational method that uses mathematical morphology operations to detect and classify spatially contiguous tissue artifacts (such as dry spots and edge damage) in spatial transcriptomics data, enabling precise removal of technical noise while preserving biologically meaningful low-expression regions.

Summary

Spatial transcriptomics allows researchers to map gene expression across tissue sections, but technical artifacts from sample preparation: tissue lifting, folding, uneven reagent coverage create regions of artificially low RNA capture. Existing quality control methods either remove spots based on fixed global thresholds (which mistakenly discard biologically valid low-expression areas like brain white matter) or use local neighborhood statistics that miss large, irregularly shaped edge artifacts. The authors present SpatialArtifacts, a framework that first identifies outlier spots using median absolute deviation (MAD) thresholds, then applies morphological image processing operations (3×3 fill, 5×5 outline, star-pattern connectivity) to connect these outliers into coherent patches that match the irregular geometry of real tissue damage. The method classifies artifacts into four categories (large/small edge, large/small interior) and provides spot-level coordinates for targeted removal. Validation across human hippocampus, dorsolateral prefrontal cortex (DLPFC), and colorectal cancer datasets on both Visium and VisiumHD platforms shows that SpatialArtifacts removes 2–3% of spots compared to 13–22% removed by BLADE or 23% by global thresholds, while preserving known anatomical structures. Benchmarking against SpotSweeper and BLADE reveals complementary strengths: SpotSweeper excels at isolated low-quality spots, BLADE detects slide-level edge effects but lacks precision, and SpatialArtifacts fills the gap for spatially coherent regional artifacts.

Personal highlights

Morphological operations adapted from computer vision: SpatialArtifacts applies focal kernels (3×3 fill, 5×5 outline, star-pattern) to connect outlier spots into irregularly shaped artifact regions, mimicking how pathologists identify tissue damage from images.
Preserves biologically meaningful low-expression regions: unlike global UMI thresholds that remove 23% of spots (including healthy white matter and mucosal crypts), SpatialArtifacts removes only 1.9–3.4% of spots, correctly retaining areas with naturally low transcription.
Hierarchical classification of artifact types:aArtifacts are categorized as large/small and edge/interior, enabling flexible filtering strategies (e.g., automatically remove edge artifacts but flag large interior artifacts for manual review).
Cross-platform compatibility: works on both standard Visium (hexagonal grid, ~5,000 spots) and VisiumHD (square grid, >130,000 bins) with resolution-aware parameter scaling, maintaining physical coverage equivalent across platforms.
Independent validation with expert annotations: in human DLPFC data, the 87 spots previously labeled as “Unannotated” by domain experts were entirely identified as artifacts by SpatialArtifacts, demonstrating automated recovery of manual quality control decisions.

DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics

Wang and Gentles, bioRxiv (2026). 10.64898/2026.05.12.724704

The paper in one sentence

A deep generative model that uses anchor genes (genes absent in a given cell cluster) to uniquely separate native expression from spatially structured ambient contamination in single-cell-resolution spatial transcriptomics data.

Summary

In single-cell-resolution spatial transcriptomics (platforms such as Xenium, MERFISH, CosMx, and Stereo-seq), 20–40% of transcripts are assigned to neighboring cells due to ambient diffusion, segmentation errors, or tissue overlap. This contamination compromises cell-type annotation, spatial expression patterns, and cell-cell communication networks. Existing decontamination methods face three fundamental challenges: (i) the native and contamination components cannot be uniquely separated from observed counts (non-identifiability), (ii) contamination is spatially local but most methods use a single global ambient profile, and (iii) low-expression genes are vulnerable to over-correction. The authors introduce DeSpotX, a deep generative model that addresses each challenge. For identifiability, it defines anchor genes, genes not natively expressed in a given cell cluster, inferred automatically from per-cluster expression rates, and proves mathematically that these constraints restore a unique decomposition. For spatial structure, it estimates contamination locally using a cluster-masked, distance-weighted average over cross-cluster spatial neighbors, excluding same-cluster cells to avoid signal dilution. For signal preservation, a learned diffusion prior regularizes latent expression states, preventing over-correction of low-but-real biological signal. Benchmarking on spike-in simulations across five datasets spanning four platforms shows DeSpotX achieves AUROC >0.94 on every dataset, outperforming SoupX, DecontX, ResolVI, and SpaceBender by 0.02–0.12, with the lowest per-cell and global calibration errors. On real tissues, decontaminated counts produce cleaner cluster separation in UMAP embeddings, tighter marker-gene localization to canonical cell types, and increased spatial autocorrelation (Moran’s I) for biologically relevant genes. The method is robust to inaccuracies in anchor masks and cell-cluster labels, and runs in 16–21 minutes on a million-cell dataset—substantially faster than competing deep-learning methods.

Personal highlights

Identifiability via anchor genes: the first method to formally prove non-identifiability of the native–contamination decomposition and restore identifiability using anchor genes, automatically identified from per-cluster expression rates, providing a provable guarantee rather than heuristic regularization.
Spatially local, cluster-isolated contamination estimation: estimates contamination from cross-cluster spatial neighbors only, using distance-weighted averaging and a graph attention network encoder. The cluster mask excludes same-cluster neighbors, preventing dilution of native signal and enabling the method to recover the cross-cluster contamination fraction that drives downstream artifacts.
Diffusion prior preserves low-expression signal: a denoising diffusion prior learned jointly with the model regularizes latent expression states, preventing over-correction of genuinely low but biologically important signal.
Superior benchmark performance across platforms: on spike-in simulations spanning Xenium, MERFISH, CosMx, and Stereo-seq, DeSpotX achieves AUROC >0.94 on every dataset, with gains of 0.02–0.12 over the best baseline (ResolVI), and the lowest per-cell calibration error, indicating that accurate per-cell contamination estimates drive global calibration rather than error cancellation.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 11/05/26

Sebastiaan Vanuytven — Sun, 17 May 2026 13:37:24 GMT

This week’s reads explore how cellular states, spatial organization, and hidden layers of biology shape disease progression across cancer, aging, and tissue biology. In colorectal cancer, the potential to metastasise emerges not from additional driver mutations, but from eversible MAPK-high/WNT-low chromatin state that can altered with KRAS inhibition. Multiple new methods push spatial transcriptomics past static neighbourhood maps: InterScale separates local from tissue-wide interaction programs, while CellNeighborEX v2 detects context-specific communication signals directly from Visium data without relying on ligand–receptor databases. In other works, we see the TransCODE Consortium’s systematic effort to catalog non-canonical microproteins and “peptideins,” evidence that hematopoietic stem-cell dormancy and not apoptosis is the key safeguard limiting mutational accumulation during aging, a UK Biobank analysis linking both short and long sleep duration to accelerated biological aging across multiple organs, while BARseq3 demonstrates how transcriptomics, translatomics, and cellular lineage barcodes can now be integrated in the same tissue section.

Preprints/articles that I managed to read this week

A high-MAPK, low-WNT cell state drives metastatic dissemination in colorectal cancer

Heinlein et al. Nature Cancer (2026). 10.1038/s43018-026-01155-w

The paper in one sentence

Serial orthotopic passaging of CRISPR-engineered mouse colon organoids selects for a highly metastatic MAPK-high, WNT-low transcriptional state driven by copy number gains in MAPK pathway genes and chromatin remodeling at AP-1 motifs, which is reversible by KRASG12D inhibition.

Summary

The authors generated an immunocompetent mouse model of metastatic colorectal cancer by sequentially introducing mutations in Apc, KrasG12D, Trp53, and Smad4 (AKPS) into small intestinal organoids. Parental organoids formed primary tumors but rarely metastasized in C57BL/6N mice. To enhance metastatic competence, they performed five rounds of serial orthotopic passaging: isolating liver metastases, expanding cells in vitro, and re-injecting into new mice. After five passages (P5), the resulting line (m484) showed high metastatic frequency to liver and lung. Whole-exome sequencing revealed no new driver mutations but stepwise increases in copy number alterations, particularly amplifications on chromosomes 6, 15, and 17 encompassing MAPK pathway genes (Kras, Braf, Raf1, Mapk11–14). Bulk RNA-seq on sorted EpCAM+ tumor epithelial cells showed elevated MAPK target genes (e.g., Spry4, Dusp4) and suppression of WNT targets (e.g., Lgr5, Smoc2). ATAC-seq on sorted tumor cells revealed increased chromatin accessibility at AP-1 motifs (FRA1, FOS, JUNB) and decreased accessibility at TCF/LEF motifs in metastatic cells. Integrating ATAC-seq and RNA-seq with BETA (Binding and Expression Target Analysis) identified direct target genes, including Emp1 (a metastasis marker) controlled by AP-1. Treatment with the KRASG12D inhibitor MRTX1133 in vivo reversed the MAPK-high/WNT-low state, reduced Emp1 expression, and suppressed liver and lung metastases. Human CRC patient data (AVANT and CALGB cohorts) showed that a high-MAPK/low-WNT gene signature is associated with shorter overall survival.

Personal highlights

Serial orthotopic passaging in immunocompetent mice: rather than using immunodeficient hosts, the authors performed five rounds of injecting organoids into the colon of C57BL/6N mice, isolating liver metastases, and re-expanding them. This selected for metastatic competence without introducing new driver mutations, yielding a reproducible model (P5 line m484) that retains immune system interactions.
Copy number gains as drivers of MAPK pathway activation: WES revealed that metastatic P5 organoids acquired amplifications of chromosomes 6, 15, and 17, leading to increased copy numbers of Kras, Braf, Raf1, and multiple Mapk genes. These amplifications correlated with increased mRNA and protein-level MAPK activity, despite no additional point mutations in the pathway.
ATAC-seq reveals AP-1 motif opening and TCF/LEF closure: compared to non-metastatic P1 tumors, P5 tumor epithelial cells showed increased chromatin accessibility at AP-1 transcription factor binding sites (FRA1, FOS, JUNB) and reduced accessibility at WNT-associated TCF/LEF motifs, establishing a chromatin landscape permissive for MAPK-driven gene expression.
BETA integration of ATAC-seq and RNA-seq identifies direct regulatory targets: using Binding and Expression Target Analysis (BETA), the authors linked AP-1 motifs to upregulated genes in P5 cells (e.g., Emp1) and TCF/LEF motifs to downregulated genes in P1 cells (e.g., Smoc2, Nkd1). The Emp1 locus showed increased accessibility at predicted AP-1 binding sites, validated by ENCODE ChIP-seq data showing BATF, JUNB, and FOS binding at the human EMP1 promoter.
KRASG12D inhibition reverses the metastatic transcriptional state: treatment with MRTX1133 (30 mg/kg twice daily) in mice with established P5 tumors reduced MAPK target gene expression (Dusp4, Emp1), increased WNT target expression (Smoc2), and shifted the transcriptome toward the non-metastatic P1 state (PCA, PC1 dimension). Lung metastases showed greater sensitivity than liver metastases, suggesting tissue-specific modulation of pathway activity.

Why should we care?

This study provides a methodologically careful demonstration that metastatic competence in CRC can arise without new driver mutations, instead through copy number gains and chromatin remodeling that tip the balance between MAPK and WNT signaling. For cancer biologists, the serial orthotopic passaging approach in immunocompetent mice offers a tractable system to study metastasis while preserving immune interactions, unlike tail-vein or intrasplenic injection models that bypass dissemination. Importantly, the authors show that KRASG12D inhibition reverses the pro-metastatic chromatin state, but also note that WNT reactivation occurs as a form of adaptive resistance, explaining why KRAS inhibitors have limited efficacy in CRC and suggesting that combination with WNT pathway inhibitors may be needed. The main limitations: the model uses small intestinal rather than colonic organoids, and the human survival analysis is retrospective and based on gene signatures rather than direct measurement of the described cell state.

InterScale reveals multi-scale cellular interaction programs in spatial transcriptomics

Drummer et al. bioRxiv (2026). 10.64898/2026.05.07.723456

The paper in one sentence

InterScale integrates a graph convolutional network (local) and a transformer encoder (global) to jointly model short‑range and tissue‑scale cellular interactions from spatial transcriptomics data, with separate linear decoders and attention‑based interpretation to identify scale‑specific gene programs and directional communication.

Summary

InterScale is a modular framework for spatial transcriptomics that explicitly separates local neighborhood information from global tissue context. The input is a gene expression matrix and a spatial adjacency graph (e.g., radius‑based or hexagonal grid). A local component (default: two‑layer GCN) aggregates information from k‑hop neighbors to produce a local embedding Hlocal. This embedding, together with a CLS token, is passed to a transformer encoder whose attention mask is set to the inverse of the adjacency matrix, allowing attention between non‑neighboring cells to capture long‑range interactions, yielding a global embedding Hglobal. Two separate linear decoders reconstruct the masked gene expression from Hlocal and HglobalHglobal, respectively. Training uses a self‑supervised masked node prediction objective (scaled cosine error or Gaussian negative log‑likelihood). After training, InterScale provides three levels of interpretation: (1) tissue level: CLS token attention reveals which cell types contribute to condition prediction; (2) cell level: net attention flow ( A−AT ) and gradient‑based relevance aggregation produce directional sender‑receiver maps; (3) gene level: standardized decoder loadings (weights scaled by embedding and gene standard deviations) identify genes that are preferentially reconstructed by the local vs. global decoder, which are then linked to biological pathways via enrichment analysis. The authors validate on a SHH‑induced neural organoid dataset (local genes: neuronal differentiation; global genes: progenitor regulators) and a type‑1 diabetes pancreas CosMx dataset (local: oxidative stress, global: PI3K‑AKT signaling). Benchmarking against GCN‑only, transformer‑only, and competing methods (AMICI, Steamboat) shows that InterScale improves condition classification and reduces sensitivity to graph radius selection.

Personal highlights

Explicit separation of local and global embeddings via dual decoders: instead of merging multi‑scale signals into a single latent space, InterScale trains two linear decoder, one on Hlocal(GCN output) and one on Hglobal(transformer output). The reconstruction loss enforces that different spatial scales explain different parts of the gene expression variance, enabling downstream attribution of genes to local vs. global programs.
Inverse adjacency mask for global attention: unlike standard graph transformers that use full attention, InterScale masks out edges that exist in the spatial graph (i.e., it forces attention only between cells that are not direct neighbors). This design choice ensures that the transformer cannot simply recapitulate local information and must learn genuinely long‑range dependencies.
Standardized gene loadings from linear decoders: by rescaling decoder weights Wfe by σ(He)/σ(Xf), the method produces interpretable coefficients (change in gene expression in standard deviations per one‑SD change in latent dimension). This allows ranking of genes by their contribution to local vs. global embeddings without requiring cell‑type annotations or ligand‑receptor priors.
Net attention flow and gradient‑based relevance aggregation: Raw attention scores are known to be unreliable as explanations. InterScale uses self‑attention relevance propagation (integrating attention maps with gradients) to collapse multi‑head attention into a single matrix, then computes net flow A−AT and normalizes by window‑wise maximum absolute flow. Directional sender‑receiver summaries (dot plots) are derived by averaging net flow across cell types, with dot size representing consistency (reciprocal standard deviation).
Modular architecture with replaceable components: the local component can be swapped for other GNNs (GIN), expression embeddings (scVI), or precomputed spatial domains (CellCharter, BANKSY). The global transformer can be replaced by sparse or linear attention variants. This design allows the framework to adapt to different data regimes (e.g., very large datasets) without retraining the entire pipeline.

Why should we care?

InterScale addresses a fundamental limitation of existing methods: most tools either look only at immediate neighbors (GNNs, niche models) or treat all cells as equally connected (standard transformers), but rarely separate these scales in a way that is both trainable and interpretable. The dual‑decoder architecture with inverse adjacency masking is a clean, practical solution to force the model to learn multi‑scale representations rather than collapsing to a single dominant scale. The standardized gene loading approach provides a hyperparameter‑free, cell‑type‑agnostic way to identify which genes are driven by local vs. global signaling, something that typically requires manual annotation or prior knowledge. The attention flow analysis, while still correlative, offers a more stable alternative to raw attention scores by focusing on directionality. The main limitations are: (1) sliding windows break interactions across window boundaries; (2) the method does not infer causal directionality (e.g., A → B via C); (3) scale interpretation is platform‑dependent (what is “global” in Visium may differ from CosMx)

Identifying context-specific cell-cell interaction genes without ligand-receptor databases from spatial transcriptomics

Kim et al. bioRxiv (2026). 10.64898/2026.05.08.723913

The paper in one sentence

CellNeighborEX v2 detects genes upregulated by cell-cell interactions from low‑resolution Visium data by comparing observed expression to scRNA‑seq‑derived expectations, then uses a hybrid statistical test and regression to infer context‑specific interaction genes and their source‑neighbor cell‑type pairs without relying on predefined ligand‑receptor databases.

Summary

CellNeighborEX v2 is a computational framework designed for low‑resolution spatial transcriptomics (e.g., 10x Visium) where each spot captures multiple cells. The method uses matched scRNA‑seq reference data to estimate “expected” expression per spot via cell2location deconvolution, then computes residuals (observed – expected). Positive residuals indicate potential cell‑cell interaction (CCI)‑driven upregulation. To identify context‑specific CCI genes (e.g., by spatial region or disease condition), the framework applies a hybrid statistical test: a permutation test across contexts (to control false positives) and a chi‑squared test within each context (to capture localized signals), combined via a Cauchy combination with empirically optimized weights (0.9 for chi‑squared, 0.1 for permutation). Detected genes are further processed with a two‑step regression: first a ridge‑regularized negative binomial model to estimate contributions of candidate source‑neighbor cell‑type pairs, then a linear model to isolate individual pair effects. The method infers directional interactions (source cell type expresses the CCI gene, neighbor modulates it) and can capture paracrine, contact‑dependent, and ECM‑mediated communication. Validation includes synthetic Visium data, pseudo‑Visium data aggregated from high‑resolution Slide‑seq, and real paired Visium/CosMx/Visium HD datasets from ovarian cancer, colorectal cancer, and mouse lymph node. Benchmarking against Niche‑DE and ligand‑receptor‑based methods (CellChat) shows improved precision and recall, particularly for non‑database genes

Personal highlights

Database‑free detection via residual modeling: instead of querying known ligand‑receptor pairs, CellNeighborEX v2 compares observed Visium expression to a null expectation derived from scRNA‑seq references and deconvolution. Genes with consistently higher expression in tissue than predicted from cell‑type composition alone are candidate CCI genes, covering canonical signaling, contact, and ECM‑mediated interactions without prior pathway knowledge.
Hybrid statistical test for context specificity: a permutation test (shuffling residuals across spots, 1,000 iterations) evaluates whether a gene’s upregulation is specific to a given spatial region or condition, while a chi‑squared goodness‑of‑fit test identifies localized deviations within a context. Cauchy combination (9:1 weight favoring chi‑squared) balances sensitivity and false‑positive control, as benchmarked on synthetic data.
Two‑step regression to infer directional cell‑type pairs: for each CCI gene, the method first uses a ridge‑regularized non‑negative negative binomial model to assess all candidate source‑neighbor pairs (from correlation‑filtered cell types). A second linear model isolates each pair’s contribution while adjusting for confounding interactions, producing Wald test p‑values for ranking.
Recovery of fine‑grained interactions from aggregated low‑resolution data: on pseudo‑Visium data (60 μm bins from 10 μm Slide‑seq), CellNeighborEX v2 recovered 92 of 102 (90%) previously validated contact‑dependent genes from mouse hippocampus and 33 of 34 from mouse liver cancer, demonstrating that interaction signals survive spatial downsampling and can be extracted from standard Visium.

Expanding the human proteome with microproteins and peptides from non‑canonical ORFs

Deutsch et al. Nature (2026). 10.1038/s41586-026-10459-x

The paper in one sentence

The TransCODE Consortium integrated 3.5 billion non‑HLA and 240 million HLA mass spectra, ribosome profiling, CRISPR screens, and a new evolutionary constraint metric (ORBL) to detect and classify 7,264 non‑canonical open reading frames (ncORFs) into a tiered system, introducing “peptidein” as a formal annotation category for translated products with indeterminate functional status.

Summary

This large‑scale collaborative effort set out to determine which of 7,264 GENCODE‑annotated ncORFs (including upstream, downstream, internal, and lncRNA‑derived ORFs) produce detectable microproteins. The authors built two PeptideAtlas resources: a non‑HLA build of 295 ProteomeXchange datasets (3.5 billion MS/MS spectra, mostly tryptic) and an HLA build (240 million spectra, no‑enzyme search). Stringent HUPO‑HPP criteria (≥2 unique peptides ≥9 aa, ≥18 aa coverage) and a decoy‑estimated protein‑level FDR <0.1% were applied. Only ~2.5% of ncORFs (183 out of 7,264) were detected in non‑HLA data, while 24.6% (1,785) were detected in HLA data – almost exclusively HLA‑I, with strong binding prediction concordance (NetMHCpan). To address the lack of amino‑acid conservation typical of ncORFs, the authors developed ORBL (ORF Relative Branch Length): a phylogenetic metric measuring conservation of start codon, stop codon, and reading frame across 116 placental mammals (or primates). ORBLv is the branch length fraction of species with a conserved ORF; ORBLq is the quantile of that score among size‑ and biotype‑matched untranslated ORFs, providing a measure of ORF‑level constraint independent of amino acid sequence. Using this, 30.4% of ncORFs showed high constraint (ORBLq >0.9). A tier classification system (1A: ≥2 non‑HLA peptides; 1B: ≥2 HLA peptides; 2A/2B: 1 peptide; 3: only HLA; 4: only Ribo‑seq; 5: in silico) was applied, with manual spectral and Ribo‑seq inspection. Only 15 ncORFs met tier 1A criteria for potential protein‑coding status; most were reclassified as “peptidein” – a new term for confidently detected translation products lacking sufficient evidence for a conventional protein‑coding gene. Functional CRISPR‑Cas9 screening across 8 cell lines targeting >2,000 ncORFs, combined with meta‑analysis of 25 screens, identified 51 pan‑essential ncORFs, including c10riboseqorf92 in the OLMA/LINC lncRNA, whose coding sequence rescued the knockout phenotype. The paper also includes targeted PRM validation, multi‑protease digestion experiments, and structural predictions (AlphaFold3, ESMFold), concluding with seven community consensus points on annotation guidelines.

Personal highlights

Two massive PeptideAtlas builds with stringent FDR control: the non‑HLA build (3.5 billion spectra, 1,172 experiments) and HLA build (240 million spectra, 592 experiments) used protein‑level FDR <0.1% and HUPO‑HPP criteria (≥2 unique peptides ≥9 aa, ≥18 aa coverage). Manual inspection of all ncORF PSMs (859 HLA spectra, 183 non‑HLA spectra) validated 88.7% of multi‑study HLA hits, but only 30/42 (71%) non‑HLA ncORFs with two peptides passed manual check.
ORBL: evolutionary constraint on ORFness, not amino acid sequence: unlike PhyloCSF (which scores amino‑acid conservation), ORBL quantifies conservation of start codon, stop codon, and reading frame across whole‑genome alignments. ORBLq normalizes against matched untranslated ORFs (same biotype, similar length), revealing that 30.4% of ncORFs (including 45.8% of uORFs) exhibit significant constraint, whereas only 2% have positive PhyloCSF scores. Detected HLA peptides were significantly enriched in high‑ORBLq ncORFs (P = 1.4×10⁻¹²).
Tier classification system for ncORF evidence: provisional tiers combine Ribo‑seq (all ncORFs have it by design), non‑HLA MS, and HLA MS. Final tiers after manual inspection: 15 tier 1A (meet HUPO‑HPP criteria in non‑HLA MS), 601 tier 1B (≥2 HLA peptides), 39 tier 2A (1 non‑HLA peptide), 1,059 tier 2B (1 HLA peptide). Only three tier 1A ncORFs were ultimately annotated as protein‑coding by GENCODE; the rest became “peptidein”.
Peptidein, a new annotation category for uncertain functional products: to resolve the paradox of confidently detected translation products that lack evidence for a conventional protein‑coding gene (e.g., only detected in cancer/immortalized cells, too short for HUPO‑HPP criteria, no known function), the consortium introduced “peptidein”. This formal category sits between “not detected” and “protein‑coding gene”. 121 initial peptidein annotations are provided, with full guidelines forthcoming.

Why should we care?

This paper is a landmark collaborative effort that finally brings methodological rigor to the “dark proteome” of non‑canonical ORFs. The main limitations are that the analysis is manual and labor‑intensive (not scalable for most labs), the peptidein concept may be interpreted as a “lesser” category rather than a legitimate status, and the field still lacks consensus on whether HLA‑presented peptides alone constitute proof of a stable protein. Nevertheless, the paper sets a new baseline for how to integrate MS, ribosome profiling, evolution, and functional genomics to systematically evaluate thousands of candidate ORFs

Dormancy, not apoptosis, restricts hematopoietic stem cell mutagenesis during aging

Fotopoulou et al. bioRxiv (2026). 10.64898/2026.05.09.724021

The paper in one sentence

Using clonal in vitro expansion of single LT‑HSCs followed by whole‑genome sequencing (≥30X coverage, VAF cutoff 0.3), genetic ablation of the intrinsic apoptosis pathway (Bak⁻/⁻ BaxΔ/Δ), a doxycycline‑chase H2B‑GFP label‑retention model, and NanoSeq on pooled HSCs, the authors show that apoptosis does not limit mutation accumulation during aging, whereas dormancy reduces the mutation rate ~3‑fold, and sterile inflammation (poly(I:C)) accelerates aging‑associated mutagenesis.

Summary

The study investigates how hematopoietic stem cells (HSCs) restrict mutation accumulation during normal aging in mice. The authors first established a sensitive pipeline: single LT‑HSCs were sorted, expanded in vitro to generate clonal colonies, and subjected to whole‑genome sequencing (WGS) with paired tail germline controls. To ensure accurate mutation calling, they performed a benchmarking experiment by downsampling a deeply sequenced clone (89X) to 10–80X coverage in triplicate, identifying that 30X coverage captures ~87% of confident SNVs and ~73% of confident indels, which was set as the minimum threshold. Variants acquired during in vitro expansion were filtered out using a VAF cutoff <0.3. Using this method, they confirmed an age‑associated increase in SNVs (44 mutations per genome per year) with mutational signatures (HSPC, SBS1, SBS5, SBS18) similar to human aged HSCs. To test the role of intrinsic apoptosis, they used Scl‑Cre‑ERT2 Bak⁻/⁻ Baxᶠˡ/ᶠˡ mice (tamoxifen‑induced deletion of Bax in HSCs already lacking Bak). At 8 months of age, LT‑HSCs from Bak‑BaxΔ/Δ mice showed no significant difference in SNV, indel, or SV burden compared to wild‑type controls, and mutational signatures were largely unchanged, indicating that apoptosis is not a major restriction mechanism during physiologic aging. To test the role of dormancy, they used Scl‑tTA H2B‑GFP mice with an 18‑month doxycycline chase. Dormant (label‑retaining) LT‑HSCs had significantly fewer SNVs than active (non‑retaining) cells from the same aged mice, with a ~3‑fold slower accumulation rate. Dormant cells also had lower SBS1 (cell‑division‑associated) and SBS18 (ROS‑associated) signature burdens. Finally, to test whether inflammation accelerates mutagenesis, they treated young mice with three rounds of poly(I:C) (TLR3 agonist). Because inflammatory exposure reduces colony formation efficiency, they used NanoSeq directly on pooled ~1500 LT‑HSCs per mouse, avoiding in vitro expansion bias. Poly(I:C)‑treated mice showed ~25% higher SNV burden than PBS controls, with an aging‑like mutational spectrum (enriched for HSPC signature).

Personal highlights

Benchmarking of coverage for single‑HSC mutation detection: by downsampling a 89X‑sequenced clone to lower depths (10–80X) in triplicate, the authors established that 30X coverage captures ~87% of confident SNVs and ~73% of confident indels. This empirical threshold is more rigorous than typical coverage used for clonal barcoding studies.
Clonal expansion plus VAF filtering to exclude in vitro artifacts: single LT‑HSCs were expanded in vitro to generate sufficient DNA for WGS. Variants with VAF <0.3 were filtered out, as they arise from divisions during culture (diluted from the original 0.5 VAF). This ensures that called mutations reflect in vivo aging, not culture‑induced errors.
Genetic ablation of intrinsic apoptosis (Bak‑Bax double knockout): using Scl‑Cre‑ERT2‑driven deletion of floxed Bax in a Bak⁻/⁻ background, the authors disabled the mitochondrial apoptosis pathway specifically in HSCs. This is a clean genetic intervention, and the absence of increased mutation burden challenges the long‑held dogma that apoptosis is a key gatekeeper against mutagenesis in stem cells.
Label‑retention model (H2B‑GFP) to isolate dormant vs. active HSCs from the same aged mice: an 18‑month doxycycline chase allowed prospective sorting of label‑retaining (dormant) and non‑retaining (active) LT‑HSCs from identical 22‑month‑old donors. This design controls for chronological age and environment, directly attributing lower mutation burden to the dormant state rather than inter‑individual variation.
NanoSeq for mutation burden in inflamed HSCs without clonal expansion: Poly(I:C) treatment impairs colony formation, making the standard clonal expansion method biased (only proliferating clones would be sequenced). NanoSeq, a duplex‑sequencing‑based method that generates consensus from both DNA strands, allowed direct enumeration of mutations in pooled ~1500 LT‑HSCs per mouse, circumventing expansion bias and confirming inflammation‑accelerated mutagenesis.

Why should we care?

The work convincingly demonstrates that dormancy and not apoptosis is the dominant protective mechanism against age‑related mutagenesis in HSCs. However mouse HSCs may differ from human HSCs in apoptosis dependency, and the inflammatory model uses a strong artificial agonist (poly(I:C)) rather than chronic infection.

Sleep chart of biological ageing clocks in middle and late life

The MULTI Consortium et al. Nature (2026). 10.1038/s41586-026-10524-5

The paper in one sentence

Using generalized additive models (GAMs) on ~500,000 UK Biobank participants, the study quantifies U‑shaped associations between self‑reported sleep duration and 23 multi‑organ, multi‑omics biological age gaps (BAGs), then integrates GWAS, genetic correlation, survival analysis, structural equation mediation, and Mendelian randomization to dissect genetic and environmental contributions, disease risks, and causal directionality.

Summary

This large‑scale epidemiological study investigates whether sleep duration has a nonlinear (U‑shaped) relationship with biological ageing across multiple organ systems and molecular layers. The authors curated 23 previously developed BAGs: 7 from MRI (brain, heart, liver, pancreas, spleen, adipose, kidney), 11 from plasma proteomics (ProtBAGs), and 5 from plasma metabolomics (MetBAGs). Sleep duration (field 1160, self‑reported hours per 24h) was restricted to 4–10h to avoid sparse extremes. For each BAG, they fitted a generalized additive model (GAM) with cubic regression splines (mgcv package), adjusting for age, sex, BMI, blood pressure, assessment centre, and disease status. The effective degrees of freedom (e.d.f.) of the smooth term quantified nonlinearity; the sample‑specific BAG minimum (optimal sleep duration) was derived from the spline curve. Nine BAGs showed significant U‑shaped associations (Bonferroni‑corrected P < 0.05/23), with optimal sleep ranging 6.4–7.8h depending on organ and sex. To test whether the observed associations are genetically driven, they performed GWAS (REGENIE) comparing short (<6h) vs normal (6–8h) and long (>8h) vs normal sleep in >300k individuals, identifying distinct genomic loci. Genetic correlations (LDSC) between sleep traits and 527 disease endpoints from FinnGen/PGC revealed broad systemic correlations for short sleep, but brain‑focused correlations for long sleep. Survival analyses (Cox proportional hazards) linked both short and long sleep to increased all‑cause mortality and incident disease endpoints. Mediation analysis using structural equation modelling (SEM; lavaan) tested whether MRI‑derived BAGs mediate the effect of sleep on two late‑life depression (LLD) subtypes. Short sleep showed direct effects on LLD, while long sleep acted predominantly through brain and adipose BAGs (62% mediated). Finally, two‑sample Mendelian randomization (five estimators: IVW, Egger, weighted median, simple mode, weighted mode) assessed reverse causality from 525 disease endpoints to sleep traits, finding no widespread causal effect (though pleiotropy sensitivity remained). Replication attempts in two smaller cohorts (BLSA, MESA) showed similar U‑shaped patterns but did not reach statistical significance.

Personal highlights

Generalized additive models with cubic splines to detect U‑shaped associations without prior assumption: the authors used GAM with cubic regression splines to flexibly model sleep duration vs. BAG, allowing the data to determine nonlinearity. The effective degrees of freedom (e.d.f.) of the smooth term quantifies curve complexity; e.d.f. >1 indicates nonlinearity. Optimal sleep duration was derived from the spline curve’s minimum, not from arbitrary cutoffs.
23 multi‑organ, multi‑omics biological age gaps (BAGs) developed via nested cross‑validation: BAGs were trained on pathology‑free controls using repeated holdout cross‑validation (50 repetitions, 80/20 split) with multiple algorithms (LASSO, support vector regressor, elastic net, neural network). Age bias correction was applied. This framework ensures minimal overfitting and provides interpretable “age gap” (biological age minus chronological age) for each organ/omics layer.
Binary GWAS for short and long sleep duration vs. normal sleep: rather than treating sleep as a continuous trait (which would obscure nonlinearity), the authors performed case‑control GWAS (REGENIE) comparing short (<6h) vs normal (6–8h) and long (>8h) vs normal. This design respects the U‑shaped relationship and identified distinct genetic architectures: short sleep associated with brain‑tissue enrichment (MAGMA), long sleep with multiple loci but less tissue specificity.
Structural equation mediation with organ‑specific BAGs as mediators: using the temporal ordering (sleep measured at baseline, MRI at follow‑up), the authors tested whether MRI‑derived BAGs mediate the sleep → late‑life depression pathway. The model included direct path (sleep → LLD) and indirect path (sleep → BAG → LLD), adjusted for covariates. This revealed that long sleep’s association with depression is largely indirect (e.g., brain BAG mediated 62% of the effect), whereas short sleep shows stronger direct effects.
Two‑sample Mendelian randomization with pleiotropy sensitivity analyses: to test whether disease causes sleep disturbances (reverse causality), they performed MR using 525 disease GWAS (FinnGen, PGC) as exposures and binary sleep traits as outcomes. Five estimators (IVW, Egger, weighted median, simple mode, weighted mode) were used, with heterogeneity tests (Cochran’s Q), MR‑Egger intercept for directional pleiotropy, MR‑PRESSO global test, and leave‑one‑SNP analysis. Most analyses did not support widespread causal effects of disease on sleep, though some pleiotropy biases were noted (e.g., depression to long sleep showed inconsistent estimates across estimators).

BARseq3: a modular system for integrating spatial multi-omics and cellular barcoding in single cells

Qi et al. bioRxiv (2026). 10.64898/2026.05.13.724900

The paper in one sentence

BARseq3 decouples barcode sequencing from spatial gene detection using independent rolling-circle amplification (RCA) libraries and sequential Illumina sequencing‑by‑synthesis, enabling modular combination of transcriptomics, translatomics, and cellular barcoding (e.g., viral lineage or connectivity barcodes) in fixed tissue sections at subcellular resolution.

Summary

BARseq3 is a modular in situ sequencing platform that separates the detection of cellular barcodes (e.g., random 30‑mer viral barcodes for neuronal tracing) from the measurement of other molecular modalities. The workflow consists of three independent modules: (1) Gene Module: hybridization of padlock probes (SNAIL for total mRNA or TRI for ribosome‑bound mRNA) followed by ligation, RCA, and crosslinking to the tissue; (2) Barcode Module: reverse transcription of barcode RNA, gap‑filling padlock hybridization, ligation, RCA, and crosslinking; (3) Sequencing Module: sequential Illumina sequencing‑by‑synthesis using orthogonal primers, first for gene IDs (encoded in padlock probes) and then for unknown barcodes. The two modules are experimentally independent, allowing any combination (gene only, barcode only, or both). The authors benchmark BARseq3 against BARseq2 (previous coupled method) using barcoded Sindbis virus in mouse motor cortex, showing significantly more gene amplicons per barcoded cell (p < 0.0001) with fewer probes per gene (4 vs 12) and lower probe concentrations. Specificity was validated using Pcp2 (Purkinje cell‑specific) and Malat1 (nuclear) probes, with BARseq3 achieving higher on‑target/off‑target ratios that increase with probe concentration. Simultaneous detection of transcriptome (SNAIL), translatome (TRI), and barcodes in the same cells is demonstrated, with no translatome signal when the ribosome‑binding splint probe is omitted. As a standalone spatial transcriptomics assay (1,745 genes), BARseq3 yields ~92 UMIs and ~71 genes per cell in mouse cerebellum, reproduces known cell types and marker expression, and shows high reproducibility across serial sections (Pearson r = 0.98). The method works on fresh‑frozen and PFA‑fixed tissue (with pepsin pretreatment) and across multiple species (zebra finch, frog, octopus).

Personal highlights

Decoupled barcode and gene libraries via independent RCA and orthogonal sequencing primers: unlike BARseq2 where barcode and gene readout are coupled, BARseq3 physically separates the two libraries. Gene Module amplicons are sequenced first using a gene‑specific sequencing primer; then the primer is stripped and a barcode‑specific primer is hybridized for subsequent cycles. This allows each module to be optimized independently and enables plug‑and‑play exchange of gene detection chemistries without redesigning barcode capture.
Modular architecture supporting multiple spatial omics in the same cell: the Gene Module can accommodate any hybridization‑based assay. The authors demonstrate parallel detection of total mRNA (SNAIL probes, adapted from STARmap) and translating mRNA (TRI probes, adapted from RIBOmap) alongside barcodes. SNAIL probes use a split‑padlock design; TRI probes add a ribosome‑binding splint probe that only circularizes when the mRNA is bound to a ribosome. Control experiments confirm that translatome signal requires the splint probe.
Improved sensitivity and specificity over BARseq2: with only 4 SNAIL probes per gene (vs 12 padlocks in BARseq2) and lower concentrations, BARseq3 produces ~2‑fold more gene amplicons per barcoded cell (mean ~38 vs ~18). On‑target signal for Pcp2 and Malat1 is concentration‑dependent and significantly higher than BARseq2, while off‑target signal remains low and not significantly different. This improvement likely comes from more efficient RCA and crosslinking chemistry.
High‑throughput barcode sequencing with Illumina chemistry: the Sequencing Module uses standard Illumina incorporation and cleavage reagents (from MiSeq kits) and four‑color imaging. Barcode signal‑to‑noise remains high across 10 sequencing cycles (Fig. 2D), with clear base calling. This enables de novo sequencing of unknown barcodes (e.g., random 30‑mers from MAPseq viruses) without prior sequence knowledge, unlike hybridization‑based barcode detection methods that require known barcodes.

Why should we care?

By separating barcode readout from gene readout into independent RCA libraries and orthogonal sequencing primers, the method avoids the complexity of designing padlock probes that simultaneously capture both features. This modularity means that any existing RCA‑based spatial assay (STARmap, RIBOmap, TEMPOmap) can be combined with any barcoding approach (viral barcodes for connectomics, genomic lineage barcodes, CRISPR sgRNA barcodes for perturbation screens) by simply running the two modules sequentially on the same section. For users, the key practical takeaway is that BARseq3 achieves higher sensitivity with fewer probes per gene than its predecessor, making probe design and synthesis more affordable. The demonstration of simultaneous transcriptomics + translatomics + barcodes in the same cells is technically impressive, though the biological utility of adding translatomics to spatial mapping is still emerging. The main limitations are the need for probe concentration optimization (not a one‑size‑fits‑all protocol) and the fact that barcode sequencing efficiency depends on barcode abundance (Sindbis virus gives high expression; low‑abundance barcodes may be harder to detect)

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 04/05/26

Sebastiaan Vanuytven — Sun, 10 May 2026 08:09:19 GMT

This marks the first week of my attempt to publicly summarize the papers I’ve read. Typically, I aim to read at least one paper per day during my commute. However, this week, I’m taking the weekend to recharge after submitting a paper. This week’s reads focus on understanding and manipulating the spatial and ecological organization of tissues, tumors, and cellular communication. While INSPIRE focuses on integrating millions of spatially resolved cells, both across techniques and different stages of development, Phoenix uses conventional H&E staining to predict the spatial transcriptome virtually. In parallel, Renoir, scRICH, CASEI, and Spatial EcoTyper provide increasingly advanced frameworks to interpret the communication, organization, and multicellular ecosystem formation between cells in diseased tissue. From a translational perspective, there is the phase 1b trial for pancreatic cancer demonstrating that netrin1 inhibition could revert EMT-mediated chemoresistance and lead to surgical conversion among those who have locally advanced stages of the disease. On the other hand, many computational approaches indicate that relying only on cellular phenotypes to explain pathophysiological processes is not sufficient and call for the attention to be paid to spatial relationships between cells, interactions at edges, and communication heterogeneity within the same cell type.

Preprints/articles that I managed to read this week

Netrin1 blockade alleviates resistance to chemotherapy in pancreatic cancer

Roth et al. Nature (2026). 10.1038/s41586-026-10436-4

The paper in one sentence

In a phase 1b trial, adding the anti‑netrin1 antibody NP137 to mFOLFIRINOX chemotherapy in locally advanced pancreatic cancer was safe, showed promising activity (median PFS 10.85 months, 23% conversion surgery rate), and downregulated EMT pathways, with high expression of the netrin1 receptor neogenin predicting better outcomes.

Summary

Pancreatic ductal adenocarcinoma (PDAC) is highly aggressive, and locally advanced disease (LAPC), tumours that are unresectable due to vascular involvement but without distant metastases, has limited treatment options. Chemoresistance is often driven by epithelial‑mesenchymal transition (EMT). Netrin1, a developmental cue re‑expressed in many cancers, promotes EMT, and the monoclonal antibody NP137 inhibits netrin1. This single‑arm phase 1b trial (Lap‑NET1) enrolled 43 patients with LAPC who received NP137 plus mFOLFIRINOX every 2 weeks for up to 12 cycles. NP137 was well tolerated, with no unexpected toxicity beyond mFOLFIRINOX alone. Objective response rate was 29% (all partial responses), median progression‑free survival (PFS) was 10.85 months, and median overall survival (OS) was 16.43 months. Notably, 23% of patients underwent conversion surgery with R0 resection, a substantially higher rate than historical benchmarks (5‑18%). Laser‑capture microdissection and RNA‑seq of pre‑ and post‑treatment tumour samples showed significant downregulation of the EMT pathway after treatment – the expected mechanistic effect of netrin1 blockade. In a separate dataset, mFOLFIRINOX alone tended to increase the same EMT signature. High expression of the netrin1 receptor neogenin (NEO1) in pre‑treatment biopsies was associated with markedly better outcomes: median PFS 15.6 months (vs 10.2 months for low neogenin) and median OS not reached (vs 12.7 months). Immunohistochemistry confirmed that neogenin protein levels correlated with mRNA and predicted longer PFS/OS. Preclinical experiments showed that NP137 inhibits migration of pancreatic cancer cells in a neogenin‑dependent manner via the MAPK‑ERK pathway.

Personal highlights

First clinical evaluation of netrin1 blockade in pancreatic cancer: the anti‑netrin1 antibody NP137 was combined with mFOLFIRINOX in a phase 1b trial focused on locally advanced PDAC. The combination was well tolerated, with grade ≥3 NP137‑related adverse events in only 12% of patients and no new safety signals.
Promising activity with high conversion surgery rate: median PFS (10.85 months) and OS (16.43 months) compare favourably with benchmark studies such as NEOPAN (PFS ~6‑10 months). The 23% R0 resection rate is substantially higher than the 6‑18% reported with FOLFIRINOX alone in similar populations, suggesting that NP137 may increase the likelihood of curative‑intent surgery.
Mechanistic validation of EMT reversal in patient tumours: laser‑capture microdissection of pre‑ and post‑treatment biopsies allowed transcriptomic analysis specifically of tumour cells. EMT pathway was significantly downregulated after NP137 + mFOLFIRINOX, whereas in a separate dataset mFOLFIRINOX alone tended to upregulate EMT activity.
Neogenin as a candidate predictive biomarker: high expression of the netrin1 receptor neogenin (NEO1) in pre‑treatment tumours correlated with significantly longer PFS (15.6 vs 10.2 months) and OS (NR vs 12.7 months) in the NP137‑treated cohort, while in patients treated with mFOLFIRINOX alone high neogenin trended towards worse outcomes. IHC confirmed the association, providing a potential stratification tool for future trials.
Preclinical evidence for neogenin‑dependence: in Panc02 mouse pancreatic cancer cells, silencing neogenin abolished the anti‑migratory effect of NP137, and the MAPK‑ERK pathway was identified as a downstream mediator.

Why should we care?

Locally advanced pancreatic cancer remains a devastating diagnosis with very few effective options. This study provides early‑phase evidence that targeting netrin1, a developmental protein hijacked by tumours to drive EMT, can be safely added to intensive chemotherapy and may meaningfully improve outcomes, including a higher chance of conversion to surgical resection. The identification of neogenin as a potential biomarker offers a path toward patient selection, which is crucial for a targeted therapy. However, important caveats must be acknowledged: the trial is single‑arm, not randomised; the sample size is modest (n=43); the PFS/OS comparisons with historical controls are indirect; and the biomarker analysis is exploratory. The encouraging results are hypothesis‑generating, not practice‑changing. A randomised phase 2 trial is needed to confirm efficacy and validate neogenin as a predictive biomarker.

Robust identification of cell-cell communication heterogeneity in single cells

Bocci et al. bioRxiv (2026). https://www.biorxiv.org/content/10.64898/2026.04.29.721691v1

The paper in one sentence

scRICH is a computational framework that integrates single‑cell transcriptomics, RNA velocity, and gene regulatory networks to identify heterogeneous cell‑cell communication behaviors within the same cell type, construct multi‑scale signaling models, and link distinct communication pathways.

Summary

Most existing cell‑cell communication (CCC) inference methods either operate at the cell‑type level (averaging over potentially heterogeneous behaviors) or at the single‑cell level (suffering from data sparsity). scRICH bridges this gap by first identifying pathway‑specific “communication modes” (e.g., sender, receiver, inactive) within each cell type using unsupervised clustering on ligand, receptor, and downstream target expression. Cells are then grouped into “meta‑cells” (cell type × CCC mode), enabling robust network construction. scRICH optionally incorporates RNA velocity of downstream targets to confirm pathway activation, and uses mutual information and GRN inference (via spliceJAC) to uncover relationships between different CCC pathways. The framework accepts either standard scRNA‑seq or unspliced/spliced counts (for velocity), works with both diffusion‑based and Michaelis‑Menten interaction models, and extends to spatial transcriptomics. Benchmarking against CellChat, CellPhoneDB, and LIANA+ shows strong concordance when aggregated to cell‑type level, while scRICH uniquely quantifies within‑type heterogeneity via a heterogeneity score and permutation‑based robustness testing.

Personal highlights

Within‑type heterogeneity detection: scRICH identifies that cells of the same annotated type can have distinct CCC roles (sender, receiver, inactive) for a given pathway, quantified by a heterogeneity score. This avoids information loss from cell‑type averaging.
Meta‑cell resolution: by defining meta‑cells as the Cartesian product of cell types and CCC modes, scRICH constructs communication networks at an intermediate scale that balances single‑cell detail and statistical robustness.
Integration of RNA velocity as a downstream response filter: instead of relying solely on ligand/receptor expression, scRICH can condition CCC on the activation of pathway‑specific targets, quantified by RNA velocity (unspliced/spliced ratio), reducing false positives from unproductive signalling.
Multi‑pathway GRN linking: using mutual information and spliceJAC, scRICH infers gene regulatory networks connecting two CCC pathways, identifying intermediate transcription factors and enabling prediction of pathway crosstalk.
Spatial transcriptomics compatibility: the same meta‑cell framework can be applied to spatial data, allowing mapping of CCC mode distributions across tissue regions and quantification of local spatial heterogeneity.

Why should we care?

scRICH addresses a recognised limitation: treating all cells of a given type as identical in CCC flattens biological complexity. Its novelty lies in systematically identifying functional subpopulations (sender vs. receiver) within cell types without requiring prior knowledge. The integration of RNA velocity as a “response verification” step is a principled way to filter spurious ligand‑receptor pairs. However, the method inherits limitations: it depends on curated pathway databases (ligands, receptors, targets), which may be incomplete or context‑specific; the RNA velocity model (scVelo vs. UniTVelo) affects results; and the meta‑cell approach increases the number of groups, which may reduce statistical power for rare cell types

Charting spatial ligand-target activity using Renoir

Rao et al. Nature Communications (2026). 10.1038/s41467-026-72388-7

The paper in one sentence

Renoir is a computational framework that quantifies spatially resolved ligand‑target activity scores from spatial transcriptomics data by integrating cell type deconvolution, receptor expression, and mutual information, enabling the inference of communication domains and pathway activity maps.

Summary

Most existing cell‑cell communication inference methods focus on ligand‑receptor interactions without assessing whether those interactions actually affect downstream target gene expression, and they rarely account for the spatial context of cell type arrangement. Renoir addresses these gaps by computing a “neighborhood activity score” for each curated ligand‑target pair at each spatial location (spot or cell). For low‑resolution spatial data (e.g., 10x Visium), Renoir first deconvolves cell type abundances using a matched scRNA‑seq reference. The score integrates: (1) the abundance of cell types expressing the ligand and the target (weighted by cell type‑specific gene expression), (2) the presence of a suitable receptor for the ligand in the cell type harbouring the target gene, (3) cell type‑specific gene entropy, and (4) mutual information between ligand and target expression from the scRNA‑seq data. For single‑cell resolution data (e.g., Xenium, CosMx), cell type annotations are used directly. The method then clusters spots based on the ligand‑target activity vectors to infer “spatial communication domains”, identifies domain‑specific active ligand‑target pairs, ranks ligands by activity, and maps pathway‑level activity. Renoir was benchmarked on simulated datasets derived from human intestine, triple‑negative breast cancer, and brain cortex, outperforming COMMOT, stLearn, and SpatialDM in spatial activity accuracy, precision‑recall, and alignment with ground truth annotations. Applications to mouse brain, TNBC, fetal liver, and hepatocellular carcinoma datasets demonstrate its versatility across platforms and resolutions.

Personal highlights

Ligand‑target activity, not just ligand‑receptor binding: unlike most methods that stop at receptor expression, Renoir requires that the target gene be expressed in a cell type that also expresses a cognate receptor for the ligand, and it quantifies the strength of the ligand‑target relationship using mutual information. This reduces false positives from unproductive signalling.
Spatially aware neighbourhood scoring: for each spot, Renoir aggregates contributions from its spatial neighbours, capturing local co‑localisation of ligand‑expressing and target‑expressing cells. The score is computed as a product of an expression correspondence score (cell type‑weighted) and an inherent similarity measure (mutual information‑based)..
Communication domains and pathway mapping: by clustering spots on the ligand‑target activity vectors, Renoir identifies spatially coherent regions with distinct signalling behaviours. It can also map the spatial activity of any gene set (e.g., Hallmark pathways) by aggregating the activity scores of ligand‑target pairs belonging to that set.
Benchmarked across tissue types and platforms: Renoir was rigorously tested on simulated data from three distinct tissue contexts (intestine, breast cancer, brain) and showed consistent improvement over state‑of‑the‑art methods in spatial accuracy, precision‑recall, and domain recovery (ARI/NMI). It also demonstrated robustness to downsampling (60% UMI retention) and moderate cell‑type annotation noise (30% perturbation).

Pan-cancer virtual spatial transcriptomics from routine histology with Phoenix

Tran et al. bioRxiv (2026). 10.64898/2026.04.25.720812

The paper in one sentence

Phoenix is a generative flow-matching model that predicts spatial single-cell gene expression from routine H&E-stained histology images, enabling virtual spatial transcriptomics at population scale without experimental spatial profiling.

Summary

Spatial transcriptomics technologies remain too expensive and slow for large-scale clinical or population-level studies. The authors address this by training Phoenix—a 1.2 billion-parameter conditional flow matching model, on 22.2 million cell-image and cell-expression pairs from 79 Xenium slides and 924 Xenium cores (The Nest dataset). Phoenix projects histology image patches into a latent space, then generates spatially resolved transcriptomic profiles conditioned on image features, optionally with neural compression via an MLP-Mixer autoencoder. The model was trained on the JURECA supercomputer for over 10,000 GPU hours. Critically, Phoenix generalizes zero-shot to unseen donors, organs (including gastric tissue absent from training), and cohorts across three continents, outperforming existing methods (BLEEP, DeepSpot, GHIST, SpatialEx) by 35–173% in Spearman correlation. The authors demonstrate applications from breast cancer subtyping to treatment response modeling across 9,544 TCGA patients, but the core contribution is a scalable computational framework that infers spatial expression directly from archival pathology slides without fine-tuning.

Personal highlights

22.2 million training pairs across 16 organ systems: the Nest dataset combines 79 Xenium slides and 924 Xenium cores from FFPE tissues, explicitly excluding lower-quality Visium data due to batch effects and morphological distortions.
Latent flow matching with transformer backbone: Phoenix uses conditional flow matching (CFM) with a modified transformer (8 blocks, embedding dimension 512, 8 attention heads) to generate latent expression vectors conditioned on pathology foundation model embeddings (H-Optimus, UNI2-h, or Virchow2).
Zero-shot generalization across organs and cohorts: trained on public TENX data (32 samples, 8.2M cells) and evaluated on five external cohorts (CHUV, LMU, NCBI, SNUH, UKER) spanning unseen gastric and head/neck tissues without donor or sample overlap.
Multi-resolution inference (single-cell, 55μm, 100μm): Phoenix operates at three spatial scales; the two pseudo-spot resolutions (hexagonal tiling with 6% overlap) enable faster inference when single-cell detail is unnecessary.
Limitation acknowledged: platform-specific and panel-restricted: Phoenix is trained and validated exclusively on Xenium data; the authors report that training on Xenium 5K Prime data fails to generalize (Fig. S7), likely due to lower sensitivity rather than model design, and cross-platform transferability remains untested.

Why should we care?

Spatial transcriptomics costs thousands of dollars per sample and takes days to run, making it impractical for population-scale studies or routine clinical use. Phoenix demonstrates that a well-trained generative model can predict spatial expression from standard H&E slides, already collected in every pathology lab, with sufficient accuracy for biological discovery and biomarker screening. The broader takeaway is not that experiments are obsolete, but that computational inference can now screen thousands of archived specimens to prioritize which ones warrant expensive spatial profiling.

Decoding condition-specific cellular crosstalk in spatial omics via bilinear edge classification

Karin et al. bioRxiv (2026). 10.64898/2026.05.03.722470

The paper in one sentence

CASEI learns a bilinear interaction term between gene expression profiles of neighboring cells to classify which edges in a spatial proximity graph are condition-associated, then filters by training dynamics confidence to reveal condition-specific cellular rewiring.

Summary

Many methods compare cell states between conditions (e.g., healthy vs. diseased), but spatial reorganization can occur without changes in individual cell state distributions. CASEI addresses this by shifting the unit of analysis from nodes (cells) to edges (cell-cell pairs). It constructs a k-nearest neighbor spatial graph from coordinates, then trains a low-rank bilinear classifier to distinguish edges from different conditions (e.g., healthy vs. fibrotic) based on the paired expression profiles. The bilinear form xiTWcWcTxjxiTWcWcTxj explicitly models multiplicative gene-gene interactions across neighboring cells. To avoid overconfident single-epoch predictions, CASEI averages softmax probabilities across all training epochs (training dynamics confidence) and retains only the top 5% of edges per condition. The resulting condition-adjusted graph enables downstream neighborhood enrichment, gene pair extraction, and spectral decomposition into interpretable interaction programs. On synthetic perturbations (area mixing, cell type enrichment, cell type swapping, and interaction enrichment) that preserve global cell state distributions, CASEI outperforms node-level MLP, graph neural network (GCN, GAT, GraphSAGE), and edge-level MLP baselines (AUROC improvements across all four cases). The authors demonstrate applications to human atherosclerosis, liver fibrosis, and the aging mouse brain—but the core contribution is a framework for detecting condition-specific spatial reorganization from spatial transcriptomics data without requiring changes in cell type abundance or expression.

Personal highlights

Edge-level classification with a bilinear form: instead of classifying individual cells, CASEI classifies edges between neighboring cells using xiTWcWcTxjxiTWcWcTxj, a low-rank bilinear interaction term that captures multiplicative gene-gene co-variation across spatial neighbors, something additive or concatenative MLP architectures cannot easily represent without exponentially more parameters.
Training dynamics confidence filtering: averaging predicted probabilities across all epochs (rather than using final model outputs) and retaining only top 5% of edges per condition produces a condition-adjusted graph that removes transient or noisy predictions, following the same principle as Annotatability but applied to edges instead of cells.
Synthetic perturbations that preserve cell states: the benchmark includes four scenarios (area mixing, cell type enrichment, cell type swapping, interaction enrichment) where the distribution of individual cell expression profiles remains unchanged, meaning node-level methods (MELD, HiDDEN, Annotatability) cannot detect the spatial reorganization by design. CASEI detects these shifts while all baselines perform near random.
Condition-adjusted graphs for downstream analysis: filtered edge sets enable both neighborhood enrichment (which cell type pairs are condition-specifically connected) and gene-gene interaction extraction via Δ=Wc1Wc1T−Wc2Wc2TΔ=Wc1Wc1T−Wc2Wc2T, with spectral decomposition yielding orthogonal gene programs associated with the condition.
Limitation co-occurrence ≠ direct signaling: the authors explicitly note that CASEI identifies spatial co-organization, not necessarily direct ligand-receptor communication. Cells may co-localize due to shared microenvironment preferences or structural constraints. Additionally, segmentation artifacts (incorrectly assigned transcripts between neighboring cells) could inflate interaction signals.

Interpretable, flexible and spatially aware integration of multiple spatial transcriptomics datasets from diverse sources

Zhao et al. Nature Genetics (2026). 10.1038/s41588-026-02579-x

The paper in one sentence

INSPIRE integrates multiple spatial transcriptomics datasets from different technologies, conditions, and developmental stages using a GNN-based encoder with adversarial learning, then decomposes the harmonized representations into interpretable spatial factors and gene programs via integrated non-negative matrix factorization.

Summary

Spatial transcriptomics datasets vary widely in resolution, gene coverage, sample preparation, and biological context, making joint analysis challenging. The authors propose INSPIRE, a deep learning framework that takes gene expression and spatial coordinates from multiple tissue sections as input. A shared GNN-based encoder (graph attention or lightweight graph convolutional) maps each cell/spot into a latent space while preserving spatial dependencies. To remove unwanted technical variation across sections, INSPIRE uses a tailored adversarial learning strategy with pairwise discriminators that encourage mixing between adjacent sections, critically, the discriminators are only active for shared biological structures, allowing section-unique signals to remain unaligned. From the integrated latent space, INSPIRE performs a joint NMF across all sections: non-negative spatial factors (per cell/spot weights) and a shared gene loading matrix, plus section- and gene-specific effects to absorb residual confounders. The training objective combines integration loss, NMF reconstruction loss (Poisson likelihood), an auto-encoder regularizer to preserve biological information, and a geometry regularizer to maintain pairwise similarity structures. After training, INSPIRE outputs aligned latent representations, spatial factors (e.g., layer-specific enrichments), and interpretable gene programs. The method is demonstrated on human DLPFC (12 sections), mouse brain sagittal/coronal integration with partial overlap, cross-technology (Slide-seqV2 + MERFISH, seqFISH + Stereo-seq), mouse skin wound healing, human breast cancer Xenium (>280k cells), mouse organogenesis (8 time points, >500k spots), and 3D reconstruction from adjacent sections. Benchmark comparisons include SpiceMix, NSFH, PRECAST, MEFISTO, GraphST, Harmony, Seurat, and others.

Personal highlights

Adversarial alignment with adaptive discriminators: INSPIRE uses pairwise discriminators between consecutive sections, but their activity is data-driven, active for shared structures (e.g., isocortex) to promote mixing, inactive for unique structures (e.g., cerebellum) to preserve section-specific biology. This avoids forcing alignment where none exists.
Integrated NMF with residual modeling: after alignment, INSPIRE jointly factorizes all sections into shared spatial factors βs and a shared gene loading matrix μμ, plus section- and gene-specific offsets γs to absorb unwanted technical variation. The non-negativity and sum-to-one constraints on both ββ and μμ make the factors interpretable as soft spatial domains and gene programs.
Spatial awareness via GNNs with two scalability modes: the encoder supports graph attention layers for moderate-sized datasets (adaptive neighborhood weighting) or lightweight graph convolutions for atlas-scale data (precomputed multi-hop features enable minibatch training). The latter scaled to 5 million cells across 10 synthetic sections.
Cross-technology integration without retraining: INSPIRE aligns data from different platforms (e.g., Slide-seqV2 with 23k genes and MERFISH with 1.1k genes) by taking the intersection of shared genes. It then imputes missing gene expression from the higher-coverage technology into the lower-coverage one, enabling downstream analyses (e.g., differential expression in gut tube) that neither dataset alone could support.

Why should we care?

Spatial transcriptomics is generating a flood of heterogeneous data—different technologies, resolutions, species, and conditions—but most analysis tools are designed for single or replicate sections. INSPIRE provides a unified framework that simultaneously addresses three common pain points: (1) removing batch effects without over-aligning unique biology, (2) producing interpretable outputs (spatial factors + gene programs) rather than black-box latent embeddings, and (3) scaling to millions of cells. The adversarial design with discriminators that turn off for unique structures is a practical solution to a real problem (e.g., aligning sagittal and coronal brain sections where the cerebellum exists in only one view).

Non-invasive profiling of the tumour microenvironment with spatial ecotypes

Zhang et al. Nature (2026). 10.1038/s41586-026-10452-4

The paper in one sentence

Spatial EcoTyper integrates single-cell spatial transcriptomics across cancer types to discover conserved multicellular “spatial ecotypes” (SEs), and Liquid EcoTyper uses a binary neural network to infer these SEs non-invasively from plasma cfDNA methylation profiles.

Summary

The authors present two interconnected machine-learning frameworks. First, Spatial EcoTyper takes single-cell-resolution spatial transcriptomics data (MERSCOPE, Xenium, Visium HD) from multiple tumour samples. For each sample, it defines “spatial neighbourhoods” (SNs), grid cells of 50 μm radius, and aggregates expression per cell type within each SN to create snGEPs (spatial neighbourhood gene expression profiles). These snGEPs are then converted into cell-type-specific similarity matrices, which are fused across cell types using similarity network fusion (SNF). The fused matrix is clustered (Louvain) to obtain sample-level spatial clusters. Across samples, the process is repeated: cell-type-specific cluster GEPs (ccGEPs) are concatenated, fused again via SNF, and clustered by NMF to yield nine conserved SEs (SE1–SE9) that recur across carcinomas and melanomas. SEs are validated on held-out ST platforms and scRNA-seq data.

Second, Liquid EcoTyper is trained to predict SE levels from CpG methylation data (e.g., 450K arrays or EM-seq). It uses a binary neural network that learns ~400 interpretable CpG sets (each set a group of CpGs, analogous to a gene set) from simulated cfDNA mixtures (tumour methylation + healthy plasma background). The model outputs relative abundances for SEs, non-SE tumour content, and a healthy background class. Training uses TCGA melanoma samples (n=461) with paired RNA-seq (for SE ground truth) and methylation. Validation uses real plasma cfDNA from 23 melanoma patients with matched tumour ST or EM-seq, and then applied to 78 pretreatment plasma samples from ICI-treated melanoma patients.

Personal highlights

Spatial neighbourhoods as “spatial meta-cells”: aggregating up to k cells of a given cell type within a 50 μm radius reduces technical drop-out and cell-type abundance bias, allowing Spatial EcoTyper to focus on cell-state variation rather than local cell-type composition changes.
Similarity network fusion across co-registered cell types: by fusing cell-type-specific similarity matrices (SNs and later clusters) that share the same spatial coordinate ordering, the method integrates transcriptional covariance across lymphoid, myeloid, and stromal lineages to discover multicellular ecotypes without manual annotation.
Cross-sample conservation via two-stage fusion: first, SNs are clustered per sample; then, cluster-level GEPs (ccGEPs) are concatenated across samples and fused again. This identifies ecotypes that generalize across cancer types and platforms, as validated on independent Xenium Prime and Visium HD data.
Liquid EcoTyper’s binary CpG set network: a binary neural network (with binarized weights during forward pass) learns CpG sets as features—not individual CpGs—making predictions robust to methylation dropout and providing interpretable links back to SE consensus genes (e.g., promoter and gene-body CpGs of SE7 markers are enriched in the learnt SE7 CpG sets).

Why should we care?

This study bridges spatial transcriptomics and liquid biopsy by showing that multicellular spatial organization leaves a detectable epigenetic fingerprint in circulating cell‑free DNA. For computational biologists, the key technical advance is the two‑stage SNF‑based fusion of co‑registered spatial neighbourhoods, which enables discovery of conserved multicellular patterns without relying on predefined cell states or manual region annotations. The binary CpG set network offers a transparent, regularized alternative to black‑box deep learning for methylation‑based deconvolution. For clinical researchers, the work demonstrates that a simple blood draw could, in principle, profile TME spatial ecotypes and stratify immunotherapy response, though the authors are careful to note that these results are exploratory and require prospective validation.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 27/4/36

Sebastiaan Vanuytven — Sun, 03 May 2026 10:22:11 GMT

This week’s reads higlight how much biology still sits just beyond the reach of our standard tools and how new methods are starting to uncover it. Reference-free single-cell analysis identifies novel transcriptional variance, including novel protein families even in non-model organisms, whereas innovations in spatial transcriptomics address longstanding issues such as RNA mobility and leakage. Simultaneously, novel computational models offer new insights into the study of spatial organization, ranging from the coordinated expression of genes within different cell types to subcellular localization of RNAs. From the biology perspective, one of the most interesting papers focuses on the role of mechanics of the heart as an effective but underestimated anti-cancer mechanism, whereas another one describes the development of rationally designed combination therapy for diffuse midline glioma, where the approach utilizes co-existing tumour states to achieve higher efficacy than mono-therapies.

Preprints/articles that I managed to read this week

Reference-free discovery with barcoded single-cell sequencing

Dehghannasiri et al. Nature Biotechnology (2026). 10.1038/s41587-026-03084-6

The paper in one sentence

sc‑SPLASH is a reference‑free, statistics‑first pipeline for droplet‑based single‑cell and spatial transcriptomics that discovers regulated sequence variation (including novel secreted repeat proteins missing from reference genomes) without alignment, while its BKC module preprocesses 10x data ~50× faster than UMI‑tools.

Summary

Most scRNA‑seq analyses rely on alignment to a reference genome, which biases discovery toward known genes and fails in non‑model organisms with incomplete references. The authors adapt the SPLASH framework (k‑mer‑based, statistical test for sample‑dependent sequence diversity) to barcoded 10x data. First, they develop BKC (barcoded‑read k‑mer counter), a C++ tool that extracts trusted cell barcodes, performs UMI deduplication, and counts anchor‑target k‑mer pairs – ~50× faster than UMI‑tools. Second, they build contingency tables per anchor across cells and compute a closed‑form P‑value for anchor‑target distribution heterogeneity. In human Tabula Sapiens data, sc‑SPLASH identifies cell‑type‑specific alternative splicing (e.g., RPS24, MYL6) and, after integration with IgBLAST, detects 60,697 productive V(D)J sequences across 16 tissues. On Visium spatial data, it finds a tumor‑associated double mutation in MT‑ND4 in squamous carcinoma and distinguishes keratin paralogs KRT16/KRT17. In electric eel, it detects RPS24 exon 6 inclusion in electrolytes vs. exclusion in stroma, evolutionarily conserved with humans. Crucially, in the freshwater sponge Spongilla lacustris (no complete reference), sc‑SPLASH identifies a highly diverse “granny” anchor (667 targets, entropy 6.2) absent from NCBI. Follow‑up PacBio sequencing reveals a family of five secreted repeat proteins (Granrep1‑5) expressed in granulocytes and amebocytes, immune‑responsive to LPS/cGAMP, and highly polymorphic (2‑6 alleles per gene). Similarly, in tunicate Ciona robusta, sc‑SPLASH finds a YYD repeat anchor with dozens of targets, identifying two genes composed almost entirely of 24‑bp repeats, expressed in circulating hemocytes and peaking during metamorphosis. These discoveries showcase sc‑SPLASH’s power to reveal hidden transcriptomic complexity in any organism, without a reference.

Personal highlights

Ultra‑fast, reference‑free preprocessing: BKC performs cell barcode filtering, UMI deduplication, and k‑mer counting in C++ with parallelization, running ~50× faster than UMI‑tools (165 s vs. 9,272 s on a 10x dataset) and using less memory than Cell Ranger or STARsolo. This makes large‑scale reference‑free analysis practical.
Discovery of novel secreted repeat protein families in non‑model organisms: in sponge, sc‑SPLASH identifies the “granny” anchor with 667 distinct targets, leading to the characterisation of five Granrep genes – entirely absent from the reference genome, encoding secreted proteins with imperfect 30‑bp repeats, a signal peptide, and a lysine‑rich region. These are expressed in granulocytes (immune cells) and upregulated by LPS/cGAMP, suggesting an immune function.
Cell‑type‑specific alternative splicing and V(D)J detection without alignment: sc‑SPLASH detects RPS24 alternative splicing (inclusion/exclusion of microexons) across human tissues and in electric eel electrolytes vs. stroma, and integrates with IgBLAST to assemble 60,697 in‑frame V(D)J sequences from plasma and B cells – all without relying on a pre‑aligned reference for the discovery step.
Spatial transcriptomics applications that aligners miss: on Visium data from squamous cell carcinoma, sc‑SPLASH identifies a MT‑ND4 double mutation (CC→TT) enriched in the carcinoma region, and distinguishes KRT16 vs. KRT17 paralog expression patterns. In human fetal intestine, it detects RPS24 exon 5 inclusion in epithelium vs. exclusion in stroma – a 3‑nt microexon that standard pipelines often overlook.
Robust to batch effects and scalable: because the statistical test conditions on observing the anchor, sc‑SPLASH is naturally robust to technical variation. Across donors, the overlap of significant anchor clusters in the same tissue is significantly higher than expected by chance (binomial test, P < 2.2×10⁻¹⁶), confirming biological reproducibility.

Why should we care?

For researchers working on non‑model organisms, organisms with poor or incomplete genome assemblies, or any system where reference bias is a concern, sc‑SPLASH offers a genuine alternative to alignment‑dependent workflows. It does not require a reference to discover regulated sequence variation – it works directly from raw reads. The discovery of the Granrep and YYD repeat protein families, completely missed by standard pipelines and absent from reference genomes, is a powerful proof‑of‑principle that sc‑SPLASH can uncover biology that would otherwise remain invisible. That said, the method is a discovery engine, not a fully automated annotator, the novel genes required substantial follow‑up with long‑read sequencing and manual assembly to characterise. Also, while sc‑SPLASH is computationally efficient, post‑processing (e.g., extender alignment, Pfam search) still benefits from a reference. The tool is best seen as an unbiased hypothesis generator for sequence variation (splicing, mutations, paralog usage, repetitive elements, novel genes) that can be applied to any barcoded single‑cell or spatial dataset, including clinical samples and environmental species.

SpaceBender: Denoising spatial transcriptomics data to enhance biological signals

Chen et al. bioRxiv (2026). 10.64898/2026.04.20.719715

The paper in one sentence

SpaceBender adapts a deep generative model (originally for single‑cell ambient RNA removal) to spatial transcriptomics by incorporating spatially local ambient RNA profiles, outperforming existing denoising methods on simulations and chimeric tissues, and revealing hidden biological structures such as light‑zone vs. dark‑zone follicular regions in human lymph node.

Summary

Spatial transcriptomics (ST) data suffer from RNA diffusion – transcripts physically move from their cell of origin to neighbouring spots, blurring biological signals. Existing denoising methods (SpotClean, SpaDiff) either do not fully exploit spatial context or are based on different noise models. SpaceBender builds on the CellBender framework, adding two key spatial adaptations: (1) leveraging automated tissue detection to define empty spots (background) as negative controls, and (2) estimating ambient RNA profiles from local spatial neighbourhoods rather than globally. In simulated ST data (with transcript positions perturbed by 100–1000% of spot radius), SpaceBender achieved lower root‑mean‑squared error and Jensen‑Shannon divergence than SpotClean and SpaDiff. On mouse‑human chimeric Visium data (where ground‑truth species mixing is known), SpaceBender gave higher adjusted mutual information and adjusted Rand index, indicating that denoised clusters better separate human and mouse spots. In a human lymph node Visium dataset, SpaceBender split a single follicle cluster into two biologically meaningful subclusters – the light zone (LZ) and dark zone (DZ) – with enriched pathway scores (proliferation, DNA repair) that were far more significant after denoising (e.g., proliferation p‑value from 4.29×10⁻⁵ to 4.11×10⁻¹⁶). In a melanoma Visium dataset with a known B2M‑loss subclone, SpaceBender improved separation of the subclone from other tumour spots (higher silhouette score) and increased the number of differentially expressed genes from 2 to 75 (FDR<0.05). Finally, SpaceBender extended to subcellular resolution (MERFISH, CosMx, Xenium), reducing off‑target marker expression (e.g., CD79B in non‑B cells) and decreasing apparent doublet counts (CD3D⁺CD79B⁺ cells) significantly (Fisher’s exact test p‑value 2.2×10⁻¹⁶). The method is open‑source and parameter‑robust.

Personal highlights

Spatially aware ambient RNA modeling improves denoising: unlike single‑cell methods that assume uniform background, SpaceBender computes local ambient RNA profiles per spatial neighbourhood, capturing diffusion gradients across tissue regions. This is implemented by defining empty “background” spots (using automated tissue detection) and modelling their gene expression as a spatially varying prior.
Consistently outperforms existing methods on benchmarks: on simulated data with escalating noise (100–1000% spot radius), SpaceBender achieved RMSE ≈1.88 vs. 2.47 (SpotClean) and 2.92 (SpaDiff). On mouse‑human chimeric tissues, SpaceBender gave the highest adjusted mutual information (0.11 vs. -0.04 for SpotClean, -1.27 for SpaDiff), demonstrating that denoised clusters better match true species identity.
Extends to subcellular resolution data (MERFISH, CosMx, Xenium): SpaceBender reduced off‑target expression of cell‑type markers (e.g., B‑cell marker CD79B in non‑B cells) and significantly decreased doublet‑like co‑expression of CD3D (T cells) and CD79B (B cells) in the MERFISH tonsil dataset (Fisher’s exact p‑value 2.2×10⁻¹⁶). Similar improvements were seen in CosMx NSCLC and Xenium melanoma data.

Mechanical load inhibits cancer growth in mouse and human hearts

Ciucci et al. Science (2026). 10.1126/science.ads9412

The paper in one sentence

Mechanical forces from heartbeats suppress cancer cell proliferation by activating Nesprin‑2‑mediated mechanotransduction, which reduces histone H3K9 trimethylation and decompacts chromatin at growth‑regulatory loci, explaining why the heart is remarkably resistant to both primary and metastatic cancers.

Summary

The heart is rarely affected by cancer, a puzzling fact given its high blood flow and constant perfusion, which should favour metastasis. The authors hypothesised that the same mechanical forces that stop cardiomyocyte proliferation after birth might also inhibit cancer cells. Using a heterotopic heart transplantation model in mice (where a donor heart is surgically connected to neck vessels, restoring blood flow but removing left‑ventricular load), they found that lung cancer cells injected into unloaded hearts grew dramatically larger tumours than those in normally loaded hearts – not due to better initial engraftment, but due to increased proliferation. Engineered heart tissues (EHTs) with adjustable mechanical load confirmed the effect: unloading promoted cancer cell growth, while overloading suppressed it. In human cardiac metastases from three different primary tumours (lung, colon, melanoma), spatial transcriptomics revealed a common transcriptional signature in cardiac lesions, with strong up‑regulation of histone demethylases and reduced H3K9me3 and chromatin compaction compared to matched extracardiac metastases. Mechanistically, mechanical load acts through Nesprin‑2, a linker of nucleoskeleton and cytoskeleton (LINC) complex protein. Silencing Nesprin‑2 in cancer cells restored their ability to proliferate in loaded hearts and EHTs, increased H3K9me3 and chromatin compaction, and abolished the growth‑suppressive effect of mechanical load. The study links physical forces to epigenetic regulation of cancer cell proliferation, identifying a previously unrecognised tumour‑suppressive mechanism unique to the heart.

Personal highlights

The heart actively suppresses cancer growth via mechanical load: in a genetically engineered mouse model (K‑RasG12D; p53‑/‑), tumours developed in liver, skeletal muscle and other organs – but never in the heart, despite comparable oncogene activation. Heterotopic transplantation showed that unloading the heart dramatically increased the growth of injected lung cancer cells, proving that mechanical load (not just blood flow or immune surveillance) is the key protective factor.
Cardiac metastases share a conserved transcriptional signature, independent of primary tumour type: spatial transcriptomics of human cardiac metastases (from lung, colon and melanoma) revealed that cancer cells in the heart up‑regulate histone demethylases (e.g., KDM4C, KDM4D) and have reduced H3K9me3 and less compact chromatin compared to matched extracardiac lesions. This signature was not seen in primary tumours or other metastases, suggesting that the heart mechanically reprograms cancer cells.
Nesprin‑2 is the essential mechanosensor: silencing of Nesprin‑2 (but not other LINC complex proteins) in lung, colon and melanoma cells completely reversed the growth‑suppressive effect of mechanical load. Nesprin‑2‑silenced cancer cells grew as large tumours in loaded hearts, with increased H3K9me3 and chromatin compaction, demonstrating that Nesprin‑2 transmits mechanical forces into epigenetic changes that inhibit proliferation.
Chromatin accessibility and H3K9me3 are dynamically regulated by load: ATAC‑seq and ChIP‑seq on cancer cells harvested from loaded vs. unloaded hearts showed that mechanical load increases chromatin accessibility at loci involved in cell‑cycle arrest, mechanosensing and calcium homeostasis, while reducing H3K9me3 at those same regulatory regions. The effects were recapitulated in EHTs and were dependent on Nesprin‑2.
Potential therapeutic implications: although the heart’s high mechanical load is unique, the study raises the possibility that artificial mechanical stimulation (e.g., via external devices) might be explored to suppress cancer growth in other tissues. More immediately, the work explains why cardiac metastases are rare and small, and it identifies the Nesprin‑2–H3K9me3 axis as a plausible target for preventing or treating cardiac metastases.

Why should we care?

This paper elegantly solves a long‑standing medical curiosity: why does the heart, a highly vascularised, constantly perfused organ, almost never get cancer? The answer is not that cancer cells cannot reach the heart, but that the relentless mechanical beating creates a hostile environment that stops them from proliferating. The discovery of Nesprin‑2 as the key force sensor that translates physical strain into an epigenetic brake on cell division is satisfying at a basic science level and opens new questions about how other tissues with distinct mechanical properties (e.g., skeletal muscle, bone, blood vessels) might also suppress or promote cancer. However, the translational relevance is limited, you cannot easily “mechanically load” a metastatic deposit in the liver or lung without causing damage. The study also does not test whether existing heart failure patients with reduced cardiac output (lower load) have a higher incidence of cardiac metastases.

Systematic design of combination therapy by targeting master regulators of coexisting diffuse midline glioma cell states

Calvo Fernández et al. Nature Genetics (2026). 10.1038/s41588-026-02550-w

The paper in one sentence

A network-based framework that infers master regulator proteins from single‑cell RNA‑seq data identifies seven coexisting cell states in diffuse midline glioma (DMG) and predicts clinically actionable drug combinations that target complementary states, with avapritinib plus ruxolitinib nearly tripling median survival in mice.

Summary

Diffuse midline glioma (DMG) is a universally fatal pediatric brain tumour driven by non‑actionable histone mutations and characterised by extensive intratumoural heterogeneity. The authors developed a generalisable, mutation‑agnostic strategy to design combination therapies targeting coexisting cell states. Using single‑cell RNA‑seq from 14 DMG patients and protein activity inference (metaVIPER), they resolved seven malignant cell states (oligodendrocyte precursor cell (OPC)-like, oligodendrocyte (OC)-like, astrocyte (AC)-like) with distinct master regulator (MR) proteins. Pooled CRISPR‑Cas9 screens validated that these MRs represent functional dependencies, with FOXM1 being a conserved essential gene. They then profiled 372 clinically relevant drugs (FDA‑approved or late‑stage) in two DMG cell lines using PLATE‑seq, generating transcriptional perturbation signatures that reveal DMG‑specific mechanisms of action. The OncoTarget (targeting individual MRs) and OncoTreat (inverting the activity of the top 50 MRs) algorithms predicted drugs predicted to selectively deplete each cell state. In a subcutaneous xenograft model that recapitulates all seven human cell states, 8 out of 9 predicted drugs selectively depleted their target states (e.g., avapritinib depleted OPC states; ruxolitinib depleted AC states). In an orthotopic syngeneic model, OPC‑targeting monotherapies (avapritinib, trametinib, dinaciclib) modestly improved survival, but AC‑targeting drugs had little effect alone. However, combinations targeting complementary OPC and AC states significantly outperformed monotherapies: avapritinib + ruxolitinib extended median survival to 83 days vs. 25 days (vehicle) and 53.5 days (avapritinib alone). The synergy was not cell‑autonomous (in vitro additive) but reflected co‑depletion of distinct cell states in vivo. The framework also predicted drug efficacy in three patient biopsies, with predictions being twice as likely to be effective. The study establishes a tumour‑agnostic, mechanism‑based pipeline for rational combination therapy design in heterogeneous cancers.

Personal highlights

Seven conserved DMG cell states resolved by protein activity, not just gene expression: using metaVIPER to infer regulatory protein activity from single‑cell RNA‑seq, the authors identified seven recurrent malignant states (OPC, OPCC, OPCQ, OC, OPC/OC, AC, OPC/AC) across 14 patients, each driven by distinct master regulators. This goes beyond conventional transcriptomic clustering and captures functional regulatory programmes.
CRISPR screens validate master regulators as essential dependencies: pooled knockout screens targeting all transcription factors in three genetically distinct DMG cell lines showed that VIPER‑inferred tumourigenic MRs are significantly enriched in essential genes. FOXM1 emerged as the most conserved dependency across states, and several other MRs (e.g., DLX1, SOX10) are known H3K27me3 targets that become de‑repressed by H3K27M mutations.
Large‑scale drug perturbation profiling defines DMG‑specific mechanisms of action: PLATE‑seq transcriptomic profiling of 372 oncology compounds in two high‑fidelity DMG cell lines generated proteome‑wide activity signatures. Drugs with unrelated primary targets often converged on similar DMG‑specific MoA profiles, enabling the OncoTreat algorithm to predict which drugs would invert the activity of cell‑state‑specific tumourigenic MRs.
In vivo validation: 8/9 drugs selectively deplete predicted states, but monotherapies targeting minority states fail: in a DIPG17 subcutaneous xenograft that preserved all seven human cell states, five OPC‑targeting drugs (avapritinib, trametinib, dinaciclib, etc.) specifically depleted OPC(/OPCC) states, while three of four AC‑targeting drugs (ruxolitinib, venetoclax, larotrectinib) depleted AC(/OPC/AC) states. However, in an orthotopic pontine model, only OPC‑targeting monotherapies modestly improved survival; AC‑targeting drugs alone had no benefit – consistent with AC states being a minority population.
Combinations targeting complementary cell states dramatically extend survival where monotherapies fail: Avapritinib + ruxolitinib extended median survival to 83 days vs. 25 days (vehicle) and 53.5 days (avapritinib alone); trametinib + ruxolitinib (45.5 vs. 28 days) and dinaciclib + ruxolitinib (48 vs. 30 days) also significantly outperformed monotherapies. Bliss independence assays in OPC‑dominant cell lines showed additive, not synergistic, effects – proving that in vivo synergy arises from co‑depletion of distinct cell states, not from cell‑autonomous drug interaction.

Why should we care?

This work provides a blueprint for moving beyond empirical, cell‑line‑based drug synergy testing to a mechanism‑driven, cell‑state‑resolved combination strategy. The key insight is that in a heterogeneous tumour, effective combination therapy does not require two drugs that kill the same cell better together, it requires two drugs that kill different cell populations that coexist within the same tumour. For DMG, a disease with <10% two‑year survival and no effective medical therapy, the clinically actionable combinations identified (avapritinib + ruxolitinib, trametinib + ruxolitinib, avapritinib + larotrectinib) are all FDA‑approved or late‑stage compounds, positioning them for rapid clinical translation. The framework itself is tumour‑agnostic and mutation‑agnostic: it requires only single‑cell or bulk RNA‑seq from patient tumours and a pre‑computed library of drug perturbation profiles. This could be applied to any heterogeneous cancer where coexisting cell states drive therapeutic resistance. Limitations include the reliance on in vivo models that may not fully capture human tumour microenvironment complexity, and the fact that the survival benefits, while impressive, are still modest (83 days median – a ~3‑fold extension but not cure)

CoPro: Dissecting the coordinated progression of cell states in spatial transcriptomics

Miao et al. bioRxiv (2026). 10.64898/2026.04.17.719309

The paper in one sentence

CoPro is a computational framework that uses spatial kernel‑restricted canonical correlation analysis to detect multiple, overlapping, continuous gene expression gradients that progress in a coordinated manner across different cell types in spatial transcriptomics data.

Summary

Spatial transcriptomics allows us to see where genes are expressed, but most analysis methods discretise tissues into distinct “neighbourhoods” or recover only a single dominant gradient. CoPro takes a different approach: it models tissue organisation as a superposition of continuous axes of coordinated variation across cell types. The core idea is a spatial kernel‑restricted CCA (skrCCA): for two or more cell types, CoPro finds linear combinations of genes (cell‑type‑specific “progression scores”) that maximise their correlation after weighting by spatial proximity (a Gaussian kernel). This captures how the molecular states of neighbouring cells change together along a shared spatial axis. CoPro can operate in unsupervised mode (discovering axes de novo) or supervised mode (using a known trajectory in one cell type to find coupled programs in others). It can also transfer learned gene weights to new samples, enabling cross‑sample comparison without spatial registration. Through simulations and four real datasets (colon injury, brain striatum, aging liver, kidney), the authors show that CoPro resolves orthogonal gradients (e.g., crypt morphology vs. inflammation in injured colon; dorsal‑ventral vs. medial‑lateral in brain striatum), recovers known zonation in liver and kidney from histology‑imputed data, and quantifies the breakdown of tissue organisation during aging.

Personal highlights

Spatial kernel‑restricted CCA captures cross‑type coordination at single‑cell resolution: unlike methods that bin cells into grids or discrete neighbourhoods, CoPro operates directly on pairwise spatial distances via a Gaussian kernel. This preserves rare cell types and avoids arbitrary discretisation, while the kernel bandwidth is automatically selected from the data.
Decomposes multiple overlapping spatial gradients: many tissues contain several biological processes superimposed in the same space (e.g., a differentiation gradient plus a patchy inflammatory response). CoPro iteratively finds orthogonal axes of coordinated progression, separating these processes into distinct, interpretable components – a capability lacking in most existing spatial methods.
Supervised mode for hypothesis‑driven discovery: Given a known or inferred spatial trend in one cell type (e.g., tubular epithelial ordering along the corticomedullary axis), CoPro identifies gene programs in other cell types (e.g., vascular endothelium) that co‑vary with it. This makes the framework useful for targeted biological questions.
Cross‑sample axis transfer without spatial registration: By fixing gene weights learned from a reference sample, CoPro projects new samples onto the same biological axis (e.g., a “disease progression” score). This enables direct comparison of cell states across samples or conditions without aligning tissue morphology

Resolving sensitivity, specificity and signal contamination in Xenium spatial transcriptomics

Bilous et al. Nature Methods (2026). 10.1038/s41592-026-03089-8

The paper in one sentence

Analysis of 41 breast and lung tumour sections reveals that Xenium spatial transcriptomics data suffer from substantial transcript spillover between neighbouring cells, and the authors introduce SPLIT, a reference‑based computational method that decomposes mixed signals to improve cell‑type purity and reveal biologically relevant signatures such as T‑cell exhaustion.

Summary

This study provides one of the largest Xenium datasets to date (41 sections from 27 donors, both breast and lung cancer) and systematically evaluates key performance characteristics: sensitivity (transcript detection), specificity (spillover contamination), panel design (targeted vs. 5K Prime), and segmentation strategies. The authors show that targeted panels (e.g., Lung panel) have higher per‑gene sensitivity than the broader 5K panel, despite detecting fewer total genes. They demonstrate that transcript spillover – where transcripts from one cell are incorrectly assigned to a neighbour – is pervasive, particularly affecting low‑RNA cells like T cells, and correlates strongly with local abundance of the contaminating cell type (e.g., malignant cells). Using RCTD doublet mode, they quantify contamination as a secondary cell‑type weight. They then introduce SPLIT (Spatial Purification of Layered Intracellular Transcripts), which uses the RCTD weights and reference profiles to decompose each cell’s expression into primary and secondary components, effectively removing contaminating signal. SPLIT outperforms other correction methods (ResolVI, ovrlpy) in preserving gene detection, improving cell‑type separation, and recovering biological signals – notably, after SPLIT correction, T cells near malignant cells show clear exhaustion signatures (HAVCR2, CTLA4, PDCD1, LAG3, CXCL13) that were obscured by spillover. SPLIT is deconvolution‑agnostic (works with any reference‑based method) and can be combined with alternative segmentation algorithms (e.g., ProSeg) for further gains.

Personal highlights

Transcript spillover is widespread and quantifiable: using RCTD’s doublet mode, the authors show that a cell’s secondary contamination weight correlates strongly with the local abundance of the contaminating cell type (e.g., malignant cells). This spillover disproportionately affects low‑RNA‑content cells (e.g., T cells) and can lead to misannotation (e.g., a CD8+ T cell called as malignant).
Targeted panels outperform the 5K panel in per‑gene sensitivity: while the 5K panel detects more total transcripts, targeted panels show higher sensitivity per gene, better cell‑type separation, and fewer QC failures. About 60% of 5K cells fail QC due to low transcript counts, a crucial trade‑off for users choosing panels.
SPLIT improves signal purity without over‑correction: unlike methods that reduce total gene counts or distort expression, SPLIT uses a simple, interpretable scaling factor based on reference profiles. It retains more cells and genes while significantly reducing contamination, as measured by cosine similarity to matched snRNA‑seq reference profiles and by removal of malignant marker genes from T cells.

Why should we care?

For researchers using imaging‑based spatial transcriptomics, this works provides essential guidance: transcript spillover is real, it affects downstream biological conclusions (e.g., cell‑cell communication, exhaustion), and it can be corrected. The comparison of targeted vs. 5K panels gives practical advice: if sensitivity for specific genes matters, targeted panels are better; if you need broad discovery, accept lower per‑gene sensitivity. SPLIT is a practical, open‑source tool that integrates with existing annotation pipelines (RCTD) and works with any segmentation. It does not require raw transcript coordinates or complicated spatial models, making it easy to adopt. However, SPLIT depends on a good reference dataset; missing cell types can lead to artefacts. Also, the validation is limited to two cancer types, and the IHC validation only worked on five samples

SubCellSpace: Automated characterization of subcellular mRNA localization patterns in spatial transcriptomics

Wouters et al. bioRxiv (2026). 10.64898/2026.04.28.720613

The paper in one sentence

SubCellSpace is a convolutional variational autoencoder that learns a general, interpretable latent space of subcellular mRNA localization patterns from imaging‑based spatial transcriptomics data, enabling automated detection of non‑randomly localised transcripts, pattern classification, and unsupervised exploration of colocalisation and cellular heterogeneity.

Summary

Until recently, studying subcellular RNA localisation at scale was impossible. Imaging‑based spatial transcriptomics (MERFISH, Xenium) now provide single‑molecule resolution, but computational tools for automated pattern discovery are lacking. Many existing methods rely on hand‑crafted features (distance to nucleus), assume unrealistically high transcript counts, or cannot handle the heterogeneity of real data. SubCellSpace takes a different approach: it converts each cell‑gene observation into a 100×100 pixel image (gaussian‑blurred transcript positions plus a nuclear mask) and trains a convolutional variational autoencoder (CVAE) on a large simulated dataset of nine pattern types (random, intranuclear, extranuclear, perinuclear, cell‑edge, pericellular, nuclear‑edge, protrusion, foci). The encoder compresses each image into a 15‑dimensional latent space that separates patterns by type while being robust to cell shape and orientation. A classifier trained on this latent space assigns a pattern‑probability score to each observation. To determine whether a gene is significantly localised across a cell population, SubCellSpace compares the distribution of these scores (over all cells expressing that gene) to a null distribution generated by shuffling transcript positions within each cell, using a Kolmogorov‑Smirnov test and Earth Mover’s Distance as an effect size. The method was validated on a novel Xenium dataset of HEK293T cells targeting 220 genes with known subcellular compartment assignments from APEX‑seq (precision 0.99, recall 0.30 at stringent threshold). Applied to mouse small‑intestine MERFISH data, SubCellSpace correctly identified 13 of 19 known apical‑basal polarised genes (F1 0.79) and, remarkably, used the latent space to infer the orientation (left/right) of enterocytes from the localisation pattern of Apob alone.

Personal highlights

Learns a general, interpretable latent space from simulated patterns: The CVAE is trained on 9 pattern classes simulated across 317 cell shapes, with Gaussian blur to handle sparse transcripts (10–100 spots per cell). The resulting 15‑dimensional embedding separates pattern types (silhouette 0.263), is robust to cell identity and rotation, and generalises to unseen patterns (e.g., protrusion maps to a distinct cluster) and real data without retraining.
Automated pipeline for pattern detection and quantification: SubCellSpace includes an end‑to‑end processing pipeline that (re)segments cells, generates per‑cell‑gene images, and computes embeddings. A random forest classifier then produces a pattern‑probability per observation. The per‑gene test distribution is compared to a spatially shuffled null using a Kolmogorov‑Smirnov test, with Earth Mover’s Distance as an effect size (thresholds 0.03 lenient, 0.06 stringent). This controls false discovery rate (precision 0.99 at stringent threshold).
Validated on novel APEX‑seq‑guided Xenium dataset: The authors generated a bespoke Xenium dataset of HEK293T cells targeting 220 genes (including 170 with known compartment assignments from APEX‑seq and 50 controls). SubCellSpace achieved F1 0.46 (stringent) with precision 0.99, correctly separating nucleus‑associated from cytosolic/ER‑membrane patterns. This is the first publicly available benchmarking resource for subcellular localisation in imaging‑based ST.
Unsupervised exploration reveals colocalisation and cellular orientation: Beyond supervised classification, the latent space enables unsupervised tasks. Genes with similar patterns (e.g., colocalising transcripts) cluster together, and the embedding captures subtle variations such as the left‑right orientation of polarised enterocytes. Using only the apical marker Apob, SubCellSpace could infer the apical/basal direction of other genes, recovering the known polarity of 83% of polarised genes.

Why should we care?

Subcellular mRNA localisation is a critical but understudied layer of gene regulation, yet systematic discovery has been limited by the lack of scalable, automated methods. The SubCellSpace approach provides an insightful and interpretable methodology that translates the output of MERFISH/Xenium-based spatial transcriptomics into an analytical, statistical localization classification. The ability to learn on simulated data and generalize on experimental data without further training is its strong point. The newly created APEX-seq-driven Xenium dataset will serve as a useful benchmark for further methods’ development. It should be noted that there are certain drawbacks to the SubCellSpace methodology; for example, it is not effective at classifying “foci” localization patterns because of the Gaussian blur. Furthermore, the latent space is not disentangled (orientation and localization pattern type share the same dimensions), and eight to ten cells per gene are required for the reliable detection of localization patterns. However, despite its disadvantages, the SubCellSpace methodology is an excellent starting point for genome-wide investigation of mRNA localization processes.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 20/4/26

Sebastiaan Vanuytven — Sun, 26 Apr 2026 10:00:31 GMT

This week’s reads highlight how deeply cell fate, microenvironments, and genetic programs are intertwined—from early tumor initiation to immune escape and even synthetic DNA design. Lineage tracing now reveals that the ability for breast cancer cells to metastasize is pre-programmed into certain clones, while early-stage lung cancers create their own fibrotic, immunosuppressive microenvironment through an elegantly simple AREG-EGFR pathway. Meanwhile, on the temporal and spatial analysis front, a study sheds light on how rapidly tumor cells can inhibit immune activity; CD8+ T cells become functionally exhausted after less than 24 hours of infiltration in prostate cancer. Technological advances in this realm are ever-evolving as well: SpaMosaic resolves a longstanding issue in integrating multiple spatial omics data types, and a thorough benchmarking study finally provides clarity in detecting copy number variations from single-cell RNA sequencing data. Outside of cancer research, a groundbreaking study provides experimental proof that autoimmunity can develop from the collective activity of hundreds of immune cells, each harboring different mutations. Lastly, generative AI moves from the transcriptome to the genome, with DNA-CRAFT synthesizing regulatory elements unique to each cell type.

Preprints/articles that I managed to read this week

Mapping breast cancer lineage in radiation and immunotherapy using the REMAP mouse

Marshall et al. bioRxiv (2026). 10.64898/2026.04.15.718689

The paper in one sentence

REMAP, an inducible CRISPR-based lineage tracing mouse model of spontaneous HR+ breast cancer, reveals that metastatic potential is encoded in heritable clonal EMT programs, while cancer-associated fibroblasts remain plastic, and that radiation plus anti-PD1 reduces tumor burden and clonally expands T cells.

Summary

The authors developed REMAP (Recording Evolution in Mammary tumors via Active PyMT) by combining MARC1 homing CRISPR barcodes, inducible Cas9, and the MMTV-PyMT spontaneous breast cancer model. Female mice received doxycycline from embryonic stages to induce lineage recording across all cell types. Tumors developed spontaneously, and mice were either untreated or treated with 5Gy radiation to one tumor followed by three doses of anti-PD1. Treatment reduced cumulative tumor burden, including non-irradiated tumors (abscopal effect). Single-cell RNA-seq of 189,468 cells from biopsies and endpoints revealed TME remodeling. Using shared hgRNA edits between primary tumors and metastatic sites (lung/lymph nodes), they identified 10 metastatic clones. These clones showed elevated EMT signatures as a heritable clonal state, while non-metastatic clones had higher proliferation and OXPHOS. In contrast, individual CAF clones spanned multiple transcriptional subtypes, indicating stromal plasticity. Radiation reduced T cell clonal diversity (clonal expansion), while anti-PD1 alone had less effect. Limitations include small sample size (n=3 per group), variable editing efficiency, low recovery of barcodes in scRNA-seq, and inability to fully separate radiation from immunotherapy effects.

Personal highlights

Inducible lineage tracing in spontaneous breast cancer: REMAP uses doxycycline-inducible Cas9 to edit MARC1 homing guide RNAs during development, enabling clonal tracking across all cell types, including tumor, immune, and stromal compartments, in an immunocompetent, autochthonous HR+ breast cancer model.
Metastatic potential is a heritable clonal state: clones whose barcodes were detected in both primary tumors and metastatic sites (lung/lymph nodes) showed elevated EMT signatures at the clonal level compared to non-metastatic clones. This suggests that metastatic capacity is an intrinsic, lineage-linked program rather than solely induced by the metastatic niche.
Stromal plasticity contrasts with tumor lineage restriction: individual CAF clones were found across multiple CAF subtypes (vascular, matrix, etc.), indicating that CAF identity is highly plastic and not strictly determined by clonal origin. This differs markedly from tumor cells, where clonal states were more stable.
Radiation drives T cell clonal expansion and reduces diversity: analysis of TCR sequences revealed lower clonal diversity in irradiated tumors, with expansion of specific T cell clones. Anti-PD1 alone did not strongly affect diversity, highlighting radiation’s role in priming adaptive immune responses in this ER+ model.
Abscopal effect of combined therapy: treatment with radiation (to a single tumor) plus anti-PD1 reduced not only the irradiated tumor but also non-irradiated distant tumors, suggesting systemic immune activation, an important observation for HR+ breast cancer, which is typically considered “immune cold.”

Why should we care?

This study addresses a clinically relevant question: how to make hormone receptor-positive breast cancer, typically less responsive to immunotherapy, more susceptible to immune checkpoint blockade. The REMAP mouse provides a technically sophisticated tool for linking clonal history to cell state in a spontaneous, immune-intact model. The finding that metastatic clones carry a heritable EMT signature supports the idea that some clones are "born to metastasize," while CAF plasticity suggests that targeting the stroma might be more challenging due to state flexibility. The abscopal effect and T cell clonal expansion after radiation are encouraging for clinical strategies combining radiotherapy with immunotherapy. However, the study has substantial limitations: only three mice per group, low and variable lineage barcode recovery (especially from scRNA-seq), and an experimental design that cannot separate radiation effects from anti-PD1 effects.

Benchmarking scRNA-seq copy number inference: A comprehensive evaluation and practitioner’s guide

Chang et al. bioRxiv (2026). https://www.biorxiv.org/content/10.64898/2026.04.12.718050v1.full

The paper in one sentence

A systematic benchmark of 12 CNV inference methods across 28 real and synthetic datasets (>100,000 cells) finds that allele-aware Numbat excels when SNP coverage is high (≥5×), while CopyKAT offers the best all-around performance when only processed expression matrices are available.

Summary

Copy number variation (CNV) inference from single-cell RNA-seq data is critical for identifying malignant cells and reconstructing tumor subclonal architecture, but the performance of existing tools varies widely and previous benchmarks have been limited in scope. The authors evaluated 12 methods (8 expression-centric, 4 allele-aware) across 28 datasets spanning primary tumors, metastases, cell lines, and patient-derived organoids, including 20 with matched scDNA-seq ground truth. They assessed four dimensions: malignant cell classification accuracy, CNV inference at cell/subclone/bulk levels, scalability (runtime/memory), and robustness to sequencing depth, intronic read inclusion, reference panel choice, and CNV-state cutoffs. Key findings: Numbat (allele-aware) achieves the highest accuracy when SNP-level depth is ≥5× but degrades significantly below that threshold. Among expression-centric tools, CopyKAT, Clonalscope, inferCNV, and SCEVAN performed reliably, with CopyKAT offering the best balance of accuracy, interpretability, and computational efficiency. SCEVAN excelled at deletion detection, while Numbat and Clonalscope were more sensitive to amplifications. Scalability varied dramatically: Numbat and Clonalscope scaled near-linearly, while SCEVAN scaled quadratically; Chloris, inferCNV, and CaSpER failed on 20k-cell datasets. The authors provide a tree-based decision framework to guide tool selection based on data availability and analytical goals.

Personal highlights

Largest benchmark to date: the study covers 12 methods, 28 real datasets (>100,000 cells), and extensive synthetic simulations, substantially larger than previous benchmarks (which typically evaluated 5-6 methods on fewer datasets). This scale enables more confident comparisons.
Allele-aware methods are powerful but condition-dependent: Numbat consistently outperformed expression-centric tools when SNP coverage was high (≥5×), with normalized F1 scores significantly higher. However, its performance collapsed below that threshold, making CopyKAT the safer default when raw BAM files are unavailable or coverage is low.
Tool-specific strengths matter: SCEVAN was uniquely sensitive for detecting deletions, while Numbat and Clonalscope excelled at amplifications. For subclonal reconstruction, CopyKAT, sciCNV, Clonalscope, and Numbat performed best at the cell level, but subclone-level performance was universally lower, suggesting that clustering CNV profiles into subclones remains challenging.
Scalability is a practical differentiator: For large-scale atlas projects (>10,000 cells), Numbat and Clonalscope scaled efficiently (runtime slopes ~1.0-1.3), while SCEVAN scaled quadratically (slope 2.04). Chloris, inferCNV, and CaSpER failed to complete on 20k-cell datasets within 24 hours, making them impractical for large cohorts.
Practical robustness insights: Including intronic reads (Cell Ranger v7+) had negligible effects on most tools but significantly degraded Clonalscope performance (median F1 drop 0.16). CopyKAT and Clonalscope were robust to CNV-state cutoff choices, while inferCNV was sensitive. Reference panel choice (patient-matched vs. external) did not substantially affect top methods, good news for datasets lacking matched normals.

Why should we care?

For cancer researchers analyzing scRNA-seq data, the choice of CNV inference tool is often arbitrary, yet it profoundly affects conclusions about malignant cell identity, subclonal architecture, and tumor evolution. This benchmark provides evidence-based, actionable guidance rather than anecdotal recommendations. The key takeaway: if you have raw BAM files and good SNP coverage, use Numbat; if you only have a gene expression matrix (the common scenario due to data sharing restrictions), CopyKAT is the safest bet. The paper also highlights underappreciated practical issues, intronic reads hurting Clonalscope, SCEVAN's quadratic scaling, and the difficulty of subclonal reconstruction even with good CNV calls. Limitations include the reliance on scDNA ground truth from different cells (not the same exact cells), the focus on droplet-based platforms, and the fact that no simulation perfectly captures real tumor complexity.

Polyclonal selection of immune checkpoint mutations in thyroid autoimmunity

Nicola et al. Nature (2026). https://www.nature.com/articles/s41586-026-10493-9

The paper in one sentence

Using ultra-accurate duplex sequencing (NanoSeq), the authors discover that patients with autoimmune thyroid disease harbor hundreds of independent B cell clones with convergent inactivating mutations in immune checkpoint genes TNFRSF14 (HVEM) and CD274 (PD-L1), providing the first direct evidence for a long-standing hypothesis that somatic mutations enable self-reactive lymphocytes to escape tolerance.

Summary

For over 70 years, immunologists have speculated that somatic mutations might allow self-reactive lymphocytes to bypass tolerance checkpoints and cause autoimmune disease, but technical limitations prevented direct testing. The authors applied whole-exome and targeted NanoSeq, a single-molecule duplex sequencing method, to thyroid biopsies from 14 patients with Hashimoto thyroiditis or Graves disease. They found exceptionally strong positive selection for truncating mutations in TNFRSF14 (dN/dS = 141) and CD274 (dN/dS = 37), along with mutations in other immune-regulatory genes (TNFAIP3, TET2, CCR6, CBL, etc.). Remarkably, each patient harbored tens to hundreds of independent mutant B cell clones, with most clones representing <1% of cells (median VAF ~0.2%). Cumulatively, up to >50% of B cells in some patients carried TNFRSF14 mutations. Laser capture microdissection, spatial transcriptomics, and single-nucleus DNA sequencing (112 nuclei from donor H1) confirmed that mutations occur in B cells, revealed widespread biallelic loss of TNFRSF14 via copy-neutral LOH, and identified clones with 4-6 driver mutations. Control samples (healthy blood, spleen, lymph nodes, tonsillitis, goiter) showed no selection on TNFRSF14 or CD274. Recombinant antibodies from mutant B cell clones bound thyroid peroxidase (TPO) or thyroglobulin (TG), confirming self-reactivity. The authors propose a “polyclonal cascade” model where somatic mutations in tolerance checkpoints allow many independent self-reactive B cell clones to escape suppression and collectively drive autoimmunity.

Personal highlights

Convergent immune checkpoint mutations at unprecedented scale: across 14 AITD patients, TNFRSF14 and CD274 showed the strongest positive selection signals ever reported in non-cancer human tissues, with dN/dS ratios of 141 and 37 respectively. Each patient harbored dozens to hundreds of independent truncating mutations, including start codon losses and cysteine-disrupting missense mutations affecting receptor-ligand binding.
Polyclonal landscape invisible to conventional sequencing: most mutant clones were extremely small (median VAF ~0.2%, maximum ~2.4%), explaining why standard DNA sequencing missed them. Cumulatively, however, mutant B cell fractions reached >50% in some patients (e.g., 62% of B cells in donor H1 carried TNFRSF14 mutations), representing a substantial burden of somatic evolution.
Biallelic loss and multi-hit clones: single-nucleus sequencing revealed widespread copy-neutral loss of heterozygosity (CN-LOH) at TNFRSF14, with 30 of 41 TNFRSF14-mutant B cells having lost the second allele. Multiple cells carried 4-6 driver mutations (e.g., TNFRSF14 + CD274 + TET2 + BRAF + PIK3CD), and a phylogenetic clade showed stepwise acquisition of TET2 mutation before V(D)J recombination, followed by independent TNFRSF14 and CD274 losses, a striking example of multistep somatic evolution in autoimmunity.
Highly specific to autoimmune thyroid disease: analysis of six control datasets, including sorted memory B cells from healthy donors, spleen, lymph nodes, tonsillitis, and goiter—showed no significant selection on TNFRSF14 or CD274 (q > 0.1). This indicates the phenomenon is not a general feature of B cell aging but is specifically associated with autoimmune pathology.
Self-reactivity of mutant clones confirmed: recombinant antibodies generated from reconstructed BCR sequences of TNFRSF14/CD274-mutant B cells bound thyroid peroxidase (TPO) or thyroglobulin (TG) by ELISA, and several antibodies stained normal thyroid tissue by immunohistochemistry. This directly links driver-mutant clones to autoantigen recognition.

Why should we care?

This study provides the first direct molecular evidence for a hypothesis first proposed by Burnet in the 1950s: that somatic mutations in lymphocytes can break immune tolerance and cause autoimmune disease. The discovery of hundreds of independent B cell clones with inactivating mutations in immune checkpoints (HVEM, PD-L1) in autoimmune thyroid tissue, but not in controls, is striking. It suggests a "polyclonal cascade" model of autoimmunity that is fundamentally different from cancer's monoclonal origin: many cooperating clones, each with its own driver mutations, may collectively mount an autoimmune attack. This could explain several clinical mysteries: why autoantibodies appear years before symptoms (clones accumulating slowly), why deep B cell depletion therapies work (removing the mutant clone pool), and why autoimmune diseases often worsen over time (progressive acquisition of additional drivers). However, the study has important limitations: causality is not proven (mutations could be a consequence rather than a cause of inflammation), the findings are limited to thyroid tissue (though the authors suggest they may extend to other autoimmune diseases), and the sample sizes are modest

Early fibrotic niches establish tumour-permissive microenvironments

Cardoso et al. Nature (2026). 10.1038/s41586-026-10399-6

The paper in one sentence

Oncogenic KRAS-mutant lung AT2 cells reprogram into Areg-secreting DATP-like states that activate adjacent fibroblasts via EGFR, triggering a sequential fibrotic and immune niche that sustains tumour growth, a conserved and targetable axis in early lung adenocarcinoma.

Summary

Using a KrasG12D mouse model (Red2Kras) with lineage tracing, the authors mapped the earliest events in lung tumour initiation. Within weeks of oncogene activation, mutant AT2 cells transition into a regenerative “damage-associated transient progenitor” (DATP)-like state that secretes high levels of amphiregulin (Areg). Areg activates EGFR on adjacent alveolar fibroblasts, driving their reprogramming into fibrotic fibroblasts (Pdgfrβ⁺Runx1⁺Tnc⁺). These fibrotic fibroblasts then remodel the extracellular matrix and, via tenascin-C (Tnc) signalling through TLR4 on resident alveolar macrophages, induce macrophage expansion and an immunosuppressive phenotype (reduced MHC-II, increased Msr1). This coordinated niche sustains mutant epithelial plasticity (Sox9⁺ DATP states) and promotes tumour growth. Depleting fibroblasts, inhibiting EGFR (gefitinib), or deleting Areg in AT2 cells reverses niche remodelling and reduces tumour burden. The circuit is conserved in early-stage human KRAS-mutant lung adenocarcinoma, where AREG⁺ KRT8⁺SOX9⁺ cells associate with fibrotic ACTA2⁺RUNX1⁺ fibroblasts. An inducible human KRASG12D AT2 organoid system recapitulates the fibrotic response, blocked by gefitinib. The study identifies a reversible, targetable vulnerability before tumours become treatment-resistant.

Personal highlights

Areg-EGFR axis as the master switch for early niche formation: mutant AT2 cells upregulate Areg within days of oncogenic activation, and genetic or pharmacological blockade of this axis prevents fibrotic fibroblast reprogramming, macrophage remodelling, and tumour growth.
Sequential stromal-immune crosstalk: fibrotic fibroblasts (Pdgfrβ⁺Runx1⁺Tnc⁺) appear first, directly contacting nascent tumours. They then remodel resident alveolar macrophages via a Tnc-TLR4 axis, inducing an Msr1⁺, MHC-II⁻ immunosuppressive phenotype. Macrophage depletion (clodronate) impairs tumour growth and recruitment of neutrophils/γδ T cells, demonstrating their functional role.
Reversible and targetable early plasticity: KrasG12D inhibition (MRTX1133) or EGFR blockade (gefitinib) reduces DATP-like and CD177⁺ reprogrammed epithelial states, restores AT2:AT1 balance, and reverses fibrotic and immune remodelling, showing that early tumour-niche circuits are not fixed but depend on continuous signalling.
Conserved in human early-stage LUAD: Single-cell analysis of KRAS-mutant human lung adenocarcinomas reveals KRT8⁺SOX9⁺ AREG⁺ transitional cells and fibrotic fibroblasts (ACTA2⁺RUNX1⁺CTHRC1⁺) spatially associated. An inducible human KRASG12D AT2 organoid system recapitulates EGFR-dependent fibroblast fibrotic reprogramming, validating translational relevance.
Inflammatory fibroblasts emerge later, not at tumour onset: Unlike injury repair where inflammatory fibroblasts precede fibrotic ones, in oncogenesis Lcn2⁺ inflammatory fibroblasts appear only after macrophage remodelling (4 weeks), localise to tumour peripheries, and lack Tnc expression, suggesting a distinct spatiotemporal hierarchy with fibrotic cues dominating tumour core

Why should we care?

This study shifts the focus from late-stage, treatment-resistant tumours to the earliest, reversible steps of lung cancer formation. It shows that before a tumour is even detectable, mutant cells co-opt regenerative programmes to build their own supportive niche: a fibrotic, immunosuppressed microenvironment that is entirely dependent on a simple signalling axis (Areg-EGFR). The key finding is that this niche is not an inevitable consequence of mutation but a dynamic, targetable circuit. Blocking EGFR or deleting Areg reverses the niche and reduces tumour burden, suggesting a potential preventive or interception strategy for high-risk individuals.

Spatiotemporal profiling reveals the role of inflammatory niche in driving prostate cancer

Nazir et al. bioRxiv (2026). 10.64898/2026.04.19.719485

The paper in one sentence

Using spatial transcriptomics and time-resolved single-cell sequencing (Zman-seq), this study shows that CD8 T cells become dysfunctional within 24 hours of infiltrating prostate tumors due to an inflammatory niche of Ccl2⁺Jak2⁺ cancer-associated fibroblasts and pro-inflammatory macrophages, and that combining PD-1 blockade with JAK1/2 inhibition remodels this niche and reduces tumor growth.

Summary

Prostate cancer is notoriously resistant to immune checkpoint blockade (ICB), but the mechanisms remain poorly understood. The authors developed a novel immunocompetent mouse model by orthotopically transplanting Rb1⁻/⁻ Trp53⁻/⁻ Pten⁻/⁻ (TKO) organoids derived from NINJA mice, which allows inducible neoantigen expression (GP33/GP66) to recruit CD8⁺ and CD4⁺ T cells. Tumors were castration-resistant, metastatic, and maintained high AR expression. Using high-resolution spatial transcriptomics (Xenium, 5,100 genes) on wild-type, 6-week, and 12-week tumors, as well as lung metastases, they mapped the progressive remodeling of the tumor microenvironment (TME). Normal Pi16⁺ fibroblasts were replaced by inflammatory Ccl2⁺Jak2⁺ cancer-associated fibroblasts (CAFs), and anti-inflammatory macrophages (Mrc1⁺Cd163⁺) were replaced by pro-inflammatory (Spp1⁺Fn1⁺) macrophages and neutrophils. Dysfunctional CD8 T cells accumulated at tumor borders and cores. To determine the kinetics of immune cell reprogramming, they applied Zman-seq, labeling blood CD45⁺ cells at -36h, -24h, and -12h before sacrifice. This revealed that CD8 T cells acquire a dysfunctional signature as early as 24 hours after tumor infiltration. Ligand-receptor analysis pointed to interactions between inflammatory CAFs, pro-inflammatory macrophages, and CD8 T cells via Cxcl16-Cxcr6, PD-L1-PD-1, and CD80-CTLA4. Based on the prominent JAK-STAT signaling in malignant cells, CAFs, and macrophages, they tested anti-PD1 plus the JAK1/2 inhibitor ruxolitinib. The combination significantly reduced tumor growth and weight, while monotherapies had modest or no effect. Spatial transcriptomics showed that combination therapy reduced malignant epithelial cells and pro-inflammatory macrophages, increased normal (Pi16⁺) fibroblasts, and increased the density of dysfunctional CD8 T cells in the tumor core and memory CD8 T cells at the tumor interface—suggesting a ‘normalized’ niche.

Personal highlights

A new immunocompetent prostate cancer model with neoantigens: by combining TKO organoids with the NINJA system (inducible GP33/GP66 neoantigens), the authors created a model that overcomes the poor immunogenicity of traditional prostate cancer models, enabling study of tumor-specific T cell responses. The tumors are castration-resistant, metastatic, and maintain AR expression
Temporal mapping of immune cell dysfunction via Zman-seq: this is the first in vivo time-resolved analysis of immune cell reprogramming in prostate cancer. CD8 T cells showed increased dysfunction scores and decreased naive/memory scores within 24 hours of tumor entry, with upregulation of Pdcd1, Havcr2, Ifng, and Tnf. The rapid onset highlights how quickly the TME suppresses anti-tumor immunity.
Inflammatory CAFs and macrophages form a tolerogenic niche: spatial transcriptomics revealed that Ccl2⁺Jak2⁺ inflammatory CAFs and Spp1⁺ pro-inflammatory macrophages dominate the TME by 6 weeks, replacing normal fibroblasts and anti-inflammatory macrophages. These cells are positioned near dysfunctional CD8 T cells and engage in immunosuppressive ligand-receptor interactions (PD-L1-PD-1, CD80-CTLA4, Cxcl16-Cxcr6).
JAK inhibition synergizes with PD-1 blockade: While anti-PD1 alone was ineffective (as expected for prostate cancer), combination with ruxolitinib significantly reduced tumor volume and weight. The combination reshaped the TME: reduced malignant cells and pro-inflammatory macrophages, increased normal (Pi16⁺) fibroblasts, and altered CD8 T cell localization—with memory T cells enriched at the tumor interface and terminally dysfunctional T cells in the core.
Cross-species conservation of CAF subtypes and T cell states: The mouse model’s fibroblast heterogeneity (inflammatory CAFs, Pi16⁺, Col23a1⁺) and CD8 T cell states (dysfunctional, effector, memory) matched those observed in scRNA-seq data from 25 human prostate cancer patients, supporting the model’s translational relevance.

Why should we care?

This study addresses a significant clinical problem: why prostate cancer is "cold" and resistant to immunotherapy. By revealing that CD8 T cells become dysfunctional within 24 hours, driven by an inflammatory fibroblast-macrophage niche, it provides a clear mechanistic rationale for combination therapy. The finding that JAK inhibition (ruxolitinib, already approved for myelofibrosis) can sensitize prostate tumors to PD-1 blockade is promising, though the results are still preclinical and in a model with engineered neoantigens. The use of Zman-seq to time-resolve immune cell reprogramming is technically impressive and could be applied to other tumor types. However, the study has limitations: small numbers (n=3 per group for many spatial analyses), the model uses neoantigens that may not reflect the full complexity of human prostate cancer heterogeneity, and the JAKi combination effects on tumor weight, while significant, are modest. The increase in terminally dysfunctional CD8 T cells in the core (rather than effector cells) is also somewhat counterintuitive, though the authors interpret this within the "proliferative burst" model.

SpaMosaic: Mosaic Integration of Spatial Multi-Omics Data

Yan, X. et al. Nature Genetics (2026). https://doi.org/10.1038/s41588-026-02573-3

The paper in one sentence

The authors present SpaMosaic, a method that integrates spatially resolved multi-omics data from multiple tissue sections with different modality compositions (e.g., RNA-only, protein-only, or paired sections) using contrastive learning and graph neural networks to create a unified, spatially aware latent space.

Summary

Spatial omics technologies (e.g., spatial transcriptomics, epigenomics, proteomics) are powerful but often measure only one or a few modalities per section, and different sections may use different technologies or cover different tissue regions. SpaMosaic solves the “mosaic integration” problem: combining multiple tissue sections with different modality compositions (e.g., one section has RNA+ATAC, another has RNA+protein, a third has only ATAC) into a unified, spatially aware latent space. The method constructs spot‑spot graphs that capture both spatial proximity and cross‑section feature similarity, then uses weighted light graph convolutional networks (WLGCNs) and contrastive learning to align embeddings from different modalities while preserving spatial structure. SpaMosaic outperformed existing mosaic integration methods (Cobolt, scMoMaT, StabMap, MIDAS, CLUE) on simulated and real datasets spanning mouse brain development, human lymph node/tonsil, and mouse embryo. Key capabilities include: (1) recovering spatial domains that are obscured in single‑modality analyses, (2) imputing missing modalities (e.g., histone marks from RNA+ATAC data) with correlations matching biological expectations, and (3) integrating drastically different platforms (e.g., Misar‑seq vs. Stereo‑seq with 30‑fold resolution differences). SpaMosaic is scalable (>800,000 spots per section) and open‑source.

Personal highlights

First dedicated method for spatial mosaic integration: unlike existing tools that assume all modalities are measured in all samples or that ignore spatial context, SpaMosaic explicitly handles partially overlapping modalities across sections while leveraging spatial coordinates to improve alignment and domain detection.
Contrastive learning + graph neural networks: SpaMosaic uses positive pairs (different modalities from the same spot) and negative pairs (modalities from different spots) to learn a modality‑agnostic latent space, while WLGCNs encode spatial proximity and cross‑section mutual nearest neighbors. This combination outperformed ablation variants lacking spatial information.
Accurate imputation of missing modalities: in a postnatal mouse brain dataset with RNA, ATAC, and three histone modifications (H3K27me3, H3K4me3, H3K27ac) measured across different sections, SpaMosaic imputed missing epigenomic profiles. The imputed H3K4me3 and H3K27ac gene activity scores correlated better with RNA expression than measured ATAC did and recovered known oligodendrocyte differentiation genes (Sox2, Sox10, Olig1/2) in the corpus callosum.
Cross‑platform, cross‑stage integration that works: SpaMosaic integrated six embryonic mouse brain sections from two technologies (Misar‑seq at 50 µm resolution, Stereo‑seq at 25 µm) across developmental stages E12.5–E18.5. It resolved fine anatomical structures (DPall, subpallium, hypothalamus, midbrain, hindbrain) that competing methods (CLUE, Cobolt, MIDAS, StabMap, scMoMaT, MultiVI, UINMF) failed to recover consistently.
Scalable and versatile: SpaMosaic processed a single section with >800,000 spots and integrated over 100 sections. It worked on diverse data types (RNA, ATAC, histone modifications, protein) and handled extreme resolution imbalances (e.g., Visium HD with 99,000 spots vs. standard Visium with 2,700 spots), outperforming alternatives in spatial continuity (CHAOS/PAS) while maintaining reasonable batch mixing (iLISI).

Why should we care?

Spatial omics is producing a flood of heterogeneous data: different labs use different technologies (Visium, Stereo‑seq, DBiT‑seq, Mux‑seq), measure different molecular layers (RNA, chromatin accessibility, histone marks, proteins), and profile different tissue sections. Researchers who want to build a comprehensive spatial atlas face the "mosaic" problem, how to combine these patchy, non‑overlapping datasets without losing spatial biology. SpaMosaic provides a practical, well‑benchmarked solution that not only aligns the data but also imputes missing measurements, enabling cross‑modal analyses that were previously impossible. For example, by integrating RNA and ATAC with histone modification data measured on separate sections, the authors discovered that imputed H3K4me3 and H3K27ac signals were more informative than measured ATAC for identifying oligodendrocyte genes. Limitations: performance depends on the quality of cross‑section mutual nearest neighbor matching, and the current implementation requires user specification of contrastive learning hyperparameters.

Conditional Monte Carlo tree diffusion for designing cell-type-specific and biologically faithful regulatory DNA

Awasthi et al. arXiv. https://doi.org/10.48550/arXiv.2604.20488

The paper in one sentence

DNA‑CRAFT combines a discrete diffusion model trained on 3.2 million natural regulatory DNA elements with a Monte Carlo tree search guided by a specificity reward to generate synthetic enhancers and promoters that achieve high predicted activity in desired cell types while suppressing activity in undesired ones, all while preserving the natural “regulatory grammar” of the genome.

Summary

Designing synthetic DNA elements that drive gene expression only in target cells (e.g., specific neurons for Parkinson’s therapy) while remaining silent elsewhere is a central challenge in gene therapy and synthetic biology. The authors introduce DNA‑CRAFT, a two‑stage framework. First, they train a masked diffusion language model (DiMamba backbone) on the ENCODE registry of ~3.2 million human and mouse cis‑regulatory elements, conditioned on five classes (enhancer, promoter, CTCF, poised, open chromatin) using classifier‑free guidance. This generative model learns to produce sequences that statistically resemble natural regulatory DNA (3‑mer correlation 0.97, motif correlation 0.97). Second, they adapt Monte Carlo tree guidance (MCTG) with a MinGap reward that explicitly maximises the difference between mean predicted activity in desired cell types and maximum activity in undesired cell types. This tree search bias the diffusion sampling toward sequences with high specificity. Benchmarked on designing enhancers for three human cell lines (HepG2, K562, SK‑N‑SH) and for T‑cell‑specific chromatin accessibility across eight immune cell types, DNA‑CRAFT consistently outperforms existing methods (Ledidi, CG, SMC, TDS, DRAKES, Ctrl‑DNA). It achieves positive MinGap scores (others often negative) while maintaining high biological fidelity (motif correlation >0.88, 3‑mer correlation >0.96). Ablations confirm that both class conditioning (enhancer > promoter) and the specificity reward are essential.

Personal highlights

Generative model learns natural regulatory grammar without supervision: the discrete diffusion model (DiMamba) trained on ENCODE cCREs captures 3‑mer distributions (Pearson r=0.97) and global transcription factor motif frequencies (Spearman r=0.97) from natural DNA. Conditional sampling with classifier‑free guidance produces distinct motif signatures for enhancers, promoters, and CTCF sites that align with biological knowledge
Monte Carlo tree guidance with a specificity reward: unlike prior inference‑time alignment methods that optimise a single scalar reward (e.g., activity in one cell type), DNA‑CRAFT uses a MinGap score – mean activity in desired cells minus max activity in undesired cells – and integrates it into a tree search that explores branching denoising steps. This explicitly encourages differential activity, not just absolute activity.
State‑of‑the‑art on cell‑line‑specific enhancer design: for HepG2, K562, and SK‑N‑SH, DNA‑CRAFT achieves the highest MinGap scores (e.g., 9.07 for K562 vs. 7.66 for DRAKES and 0.20 for Ctrl‑DNA). It maintains motif correlations (0.93, 0.92, 0.88) and 3‑mer correlations (>0.96), while gradient‑based (Ledidi) and RL‑based (Ctrl‑DNA) methods sacrifice biological fidelity for modest specificity gains.
First to achieve positive T‑cell specificity across eight immune cell types: when tasked with designing sequences active in CD8⁺ and CD4⁺ T cells but silent in B cells, NK cells, macrophages, mast cells, and naive T cells, DNA‑CRAFT is the only method with a positive MinGap score (0.123). Inference‑time alignment methods (SMC, CG, TDS) all produced negative scores, indicating they inadvertently activated undesired cell types.
Inference‑time guidance avoids costly retraining: the base diffusion model is trained once on ENCODE. New design objectives (different cell types, different specificity criteria) only require running MCTG without model fine‑tuning – a practical advantage over methods like DRAKES (diffusion fine‑tuning) or Ctrl‑DNA (RL fine‑tuning) that need per‑task retraining.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 13/04/26

Sebastiaan Vanuytven — Sun, 19 Apr 2026 08:46:14 GMT

This week’s reads revolve around a common theme: how cellular identity, history, and context shape outcomes: from clonal evolution in cancer to therapeutic response and even organelle-level repair. Next-generation technologies push boundaries in measurement capabilities, with GIFT providing high throughput mutation profiling from fixed single-cell transcriptomes, and STAPLE facilitating automated spatial analysis workflows along with an AI layer to interpret findings. Meanwhile, several reports question our assumptions regarding resistance and plasticity. Cancer cells are able to be primed for resistance to multiple treatments due to rare “generalists” states, and genome-wide perturbation screens have shown convergence on the same dedifferentiated and AP-1/TEAD phenotype regardless of various genetic pathways used. As for colorectal cancers, metastatic ability develops much earlier than previously believed, driven by oncofetal programs generated within the microenvironment, and not due to mutations. Bridging research and clinical practice, the DRUP trial serves as a reminder of the potential of genomic-guided treatment using off-label therapies, with some improvements, although limited, but remarkable life-changing cases observed. Lastly, a novel therapeutic approach has emerged targeting mitochondrial transplantation.

Preprints/articles that I managed to read this week

Scalable genotyping in fixed transcriptomes resolves clonal heterogeneity via single-cell sequencing

Blattman et al. bioRxiv (2026). 10.64898/2026.04.11.717967

The paper in one sentence

GIFT (Genotyping in Fixed Transcriptomes) enables simultaneous detection of hundreds of somatic mutations and whole-transcriptome profiles from the same single cell by using a gap-filling polymerase reaction between adjacent probes, working in fixed and even FFPE tissues.

Summary

The authors address a major limitation of existing single-cell genotyping methods: most can only detect 1-3 mutations per cell, are biased toward transcript ends, and cannot handle formalin-fixed paraffin-embedded (FFPE) samples. GIFT integrates into the 10x Genomics Flex workflow by adding a gap-filling step: two probes hybridize flanking a mutation site, a strand displacement-deficient polymerase (Sulfolobus DNA polymerase IV) fills the gap and converts the 5'OH end to a ligation-competent state, enabling capture of the native sequence. The method achieves >99% genotyping accuracy in cell line mixes and scales to 611 targets per cell—a ~100-fold increase over RT-based approaches like GoT. Key technical optimizations include betaine to improve yield, a decrosslinking step (80°C) to rescue FFPE samples (10-fold increase in gene expression), and a probabilistic framework to distinguish true mutations from PCR template switching and sequencing errors. Applied to 712,664 CD34+ cells from 35 myeloproliferative neoplasm (MPN) patients, GIFT reveals JAK2 V617F dosage-dependent interferon responses in monocytes and resolves clonal phylogenies in a patient progressing to AML, including discovery of a low-frequency NRAS mutation that later expanded. The method also works on spatial transcriptomics (Visium HD).

Personal highlights

Unprecedented multiplexing for RNA-based genotyping: GIFT captures up to 611 targeted loci per cell, compared to 1-3 for GoT-like methods, enabling clonal lineage tracing with dozens of mutations per patient rather than a handful.
FFPE compatibility through decrosslinking: A high-temperature decrosslinking step (80°C) increases gene expression recovery ~10-fold in archival FFPE glioblastoma tissue, making previously inaccessible clinical samples amenable to single-cell mutation-transcriptome linkage.
Higher accuracy than dual-probe alternatives: in head-to-head comparison, GIFT achieved >99% correct calls for 90% of variants versus 90.5% for dual-probe genotyping (which uses separate wildtype and mutant probes), due to direct reading of native sequence rather than allele-specific hybridization.
Genotype-aware modeling reveals JAK2 dosage effects: Using MrVI to separate mutation effects from patient variation, GIFT shows that JAK2 V617F drives interferon-γ response in monocytes and HSCs, with a gradient from wildtype to heterozygous to homozygous, observable only with sufficient per-cell genotyping depth.
Discovery of subclonal variants missed by bulk sequencing: in an MPN-AML progression patient, GIFT identified a low-frequency NRAS mutation not detected in whole-blood bulk sequencing, which later expanded at transformation, demonstrating the power of single-cell clonal tracing.

Why should we care?

GFT removes the inability to ask which mutations a cell carries while measuring its full transcriptome, especially from FFPE biopsies. Prior methods forced trade-offs between scale (1-3 mutations), sample type (fresh only), or accuracy. GIFT shifts that trade-off dramatically, hundreds of mutations from FFPE at >99% accuracy, though with the caveat that detection still depends on target gene expression and gap length (optimal ≤5 bp). The MPN cohort analysis demonstrates real-world utility: resolving why JAK2-mutant cells behave differently across patients, tracking subclones before clinical progression, and linking copy-number alterations to transcriptional programs. For method developers, the gap-filling chemistry (using a strand displacement-deficient polymerase) is a clever enzymatic workaround that could be adapted to other platforms. That said, the method requires custom probe design per mutation, the probabilistic genotyping pipeline is non-trivial, and sensitivity for low-expression transcripts remains limited.

STAPLE: automating spatial transcriptomics analysis and AI interpretation

Lvovs et al. bioRxiv (2026). 10.64898/2026.03.30.715127

The paper in one sentence

STAPLE is a modular, Nextflow-based pipeline that automates end-to-end spatial transcriptomics analysis, from cell typing to ligand-receptor inference, and adds an AI reporting layer that uses large language models to summarize findings and suggest biological interpretations.

Summary

Spatial transcriptomics analysis typically requires stitching together multiple tools (e.g., RCTD for cell typing, Squidpy for spatial statistics, CellChat for ligand-receptor inference) with manual data wrangling, making analyses slow, hard to reproduce, and difficult to scale across samples. STAPLE systematizes this process into a single Nextflow workflow with five phases: data ingest, preprocessing, cell type annotation (supporting reference-based RCTD or reference-free BayesTME/CoGAPS), spatial and ligand-receptor analysis (Squidpy, SpaceMarkers), and reporting. Key design choices include using AnnData as a unified data format, propagating sample metadata throughout, and aggregating results into a MultiQC report. The novel addition is an AI-enabled reporting layer: the MultiQC report can be fed directly to an LLM (e.g., GPT via Copilot) with a simple prompt, and the model will identify cross-sample patterns, link ligand-receptor pairs to literature, and even suggest candidate therapeutic perturbations. The authors demonstrate STAPLE on a pancreatic ductal adenocarcinoma (PDAC) Visium HD dataset and a 38-sample nucleus accumbens (NAC) dataset. The pipeline is containerized, nf-core compliant, and available on GitHub.

Personal highlights

End-to-end automation of spatial transcriptomics: STAPLE replaces fragmented, manual analysis workflows with a single Nextflow command that handles cell typing, spatial statistics, ligand-receptor inference, and cross-sample aggregation, significantly improving reproducibility and scalability.
Modular and tool-agnostic design: users can swap cell typing methods (RCTD, BayesTME, CoGAPS or bring their own) and ligand-receptor tools (Squidpy, SpaceMarkers) via configuration parameters, without rewriting pipeline code.
Seamless cross-sample comparison with metadata propagation: Any additional columns in the sample sheet (e.g., treatment response, diagnosis) are carried through the pipeline and automatically used to contrast ligand-receptor interactions or spatial autocorrelation across groups.
AI-powered interpretation layer: the MultiQC report is structured with clear table headers and descriptions, allowing an LLM (tested with GPT via Microsoft Copilot) to directly query results, link findings to literature, and propose follow-up experiments, though this relies on the quality of the prompt and the LLM’s domain knowledge.

Pre-existing cell states predict resistance to multiple treatments

Schaff et al. Cell Genomics (2026). 10.1016/j.xgen.2026.101191

The paper in one sentence

Using multi-treatment clonal tracing and single-cell RNA sequencing, this study shows that rare melanoma clones can develop resistance to multiple diverse treatments through pre-existing gene expression states, with high CD44 expression marking cells broadly resistant to BRAF inhibition, MEK inhibition, and hypoxic stress.

Summary

The authors address a clinically relevant but understudied question: can the same cancer cells become resistant to multiple different treatments, or does each drug select for distinct subpopulations? They barcoded melanoma cells and exposed them in parallel to six treatments with distinct mechanisms: targeted inhibitors (dabrafenib, trametinib), metabolic stressors (CoCl₂ mimicking hypoxia, acidic pH), and chemotherapeutics (cisplatin, doxorubicin). By tracking clonal abundance before and after treatment, they found that while some clones were “specialists” (resistant to one drug), rare “generalist” clones were overrepresented among the top 10% resistant clones across all six conditions. Using scRNA-seq before treatment, they identified CD44 as a marker of multi-treatment resistance: CD44-high cells sorted from naive populations showed significantly greater resistance to dabrafenib, trametinib, and CoCl₂. Mechanistically, CD44-high cells had elevated lysosomal activity, suggesting a potential drug sequestration mechanism. Finally, by mapping clonal transcriptional states from untreated cells to their resistant progeny, they show that different initial states (e.g., differentiated vs. mesenchymal) lead to divergent resistance programs even within the same treatment condition.

Personal highlights

Multi-treatment clonal tracing at scale: the authors simultaneously tracked resistance to six different treatments from the same barcoded pool of ~200,000 clones, revealing that clonal resistance is heritable (high concordance between replicates) and that ~20-40% of top resistant clones are treatment-specific, while rare clones (~11) appear in the top 10% for all six conditions.
CD44 as a pre-existing marker of multi-treatment resistance: unbiased comparison of gene expression signatures predictive of resistance across treatments identified CD44 (along with FN1) as associated with resistance to dabrafenib, trametinib, and CoCl₂. FACS-sorted CD44-high cells indeed showed significantly greater resistance to these three treatments, while CD44-low cells were more resistant to doxorubicin.
Lysosomal activity as a potential mechanism: CD44-high cells exhibited elevated lysosomal activity even before treatment, which was further enhanced upon drug exposure, suggesting enhanced drug sequestration and degradation as a possible basis for multi-treatment resistance, though CD44 itself may be a marker rather than a driver (pharmacological inhibition did not consistently sensitize cells).
Divergent resistance trajectories from distinct initial states: Using consensus NMF, the authors identified four universal gene programs in untreated cells. Differentiated clones (melanocytic markers) and mesenchymal clones (RTK-high, CD44-high) followed distinct paths to dabrafenib resistance, with mesenchymal clones expanding 33-fold during treatment compared to only 1.6-fold for differentiated clones.

Why should we care?

In the clinic, patients rarely receive just one drug; they get sequential or combination therapies. Yet most resistance studies focus on single agents, implicitly assuming that resistance to different drugs arises from independent subclones. This paper challenges that assumption by showing that a small subset of cells can be "generalists", pre-programmed to survive multiple unrelated stresses. The identification of CD44 as a surface marker for multi-treatment resistance is practically valuable: it suggests that CD44-high cells could be prospectively isolated and studied, or potentially targeted. That said, the study has important limitations: it uses a single melanoma cell line in vitro, so the relevance to patient tumors, with their complex microenvironments, immune interactions, and genetic heterogeneity, remains unclear. CD44 inhibition did not consistently sensitize cells, indicating it is a marker rather than a driver. And the "generalist" clones are rare, so their clinical impact may be limited unless they expand under selective pressure

Mapping convergent regulators of melanoma drug resistance by PerturbFate

Xu et al. Nature (2026). 10.1038/s41586-026-10367-0

The paper in one sentence

PerturbFate, a scalable single-cell multi-omic CRISPR screening platform, reveals that diverse genetic perturbations conferring drug resistance in melanoma converge on a shared dedifferentiated cell state driven by cooperative AP-1 and TEAD transcription factors, with MED12 uniquely suppressing SOX10 activity and VEGFc acting as a common downstream effector.

Summary

The authors developed PerturbFate, a combinatorial-indexing-based single-cell platform that simultaneously profiles chromatin accessibility, nascent RNA, steady-state RNA, and sgRNA identities from the same cell at <$0.01 per cell. They applied it to BRAF(V600E) melanoma cells (A375) targeting >140 genes previously linked to vemurafenib resistance. By integrating multimodal data, they reconstructed cell-state trajectories and identified a shared dedifferentiated state associated with drug resistance, marked by convergent activation of AP-1 (e.g., FOSL1, JUN) and TEAD family transcription factors. Perturbations of Mediator complex components revealed module-specific mechanisms: kinase module (MED12, MED13, CCNC) activated YAP signaling and inflammatory programs, while tail module (MED15, MED24) helped maintain YAP-dependent transcription under drug pressure. MED12 knockdown uniquely impaired SOX10 activity, driving resistance across conditions. Across diverse resistant perturbations, KLF5, SMAD3, RREB1 and FOSL1 emerged as broadly activated regulons; combinatorial targeting of these TFs reduced the growth advantage of resistant clones. VEGFc was identified as a convergent downstream effector co-regulated by YAP and the Mediator complex. Validation in patient tumors and TCGA data supported the clinical relevance of these regulons.

Personal highlights

Multimodal single-cell CRISPR screening at scale: PerturbFate combines combinatorial indexing with metabolic labeling of nascent RNA, enabling joint profiling of chromatin accessibility, nascent/steady-state transcriptomes, and sgRNAs in >300,000 cells across 119 perturbations. The platform costs <$0.01 per cell, making large-scale screens feasible.
Convergent dedifferentiated state across diverse perturbations: despite targeting functionally disparate genes (chromatin regulators, signaling adaptors, ubiquitination factors), 46 of 52 perturbations that altered cell-state positioning shifted cells toward a common undifferentiated state, driven by cooperative AP-1 and TEAD transcription factor activities.
Module-specific Mediator complex functions in resistance: perturbations of the Mediator complex revealed that kinase module components (MED12, MED13, CCNC) suppress dedifferentiation via YAP signaling, whereas tail module components (MED15, MED24) maintain resistance under drug pressure without strongly altering basal chromatin states. MED12 knockdown uniquely impaired SOX10 activity, mimicking SOX10 loss-of-function.
Convergent regulatory hubs identify combinatorial vulnerabilities: across resistant perturbations, KLF5, SMAD3, RREB1 and FOSL1 regulons were most broadly activated. Combined inhibition of these TFs (via genetic or pharmacological means) reduced the growth advantage of resistant clones by a mean 3.1-fold, suggesting that co-targeting convergent nodes may overcome multi-perturbation resistance.
VEGFc as a shared downstream effector: VEGFc was consistently activated in dedifferentiated states across perturbations and conditions, co-regulated by YAP and core Mediator, and induced upon SOX10 knockdown. VEGFc knockdown suppressed resistance conferred by MED12, MED19, MED15 and MED24 perturbations, identifying it as a potential therapeutic target.

Why should we care?

The phenomenon of drug resistance in cancer has been extensively studied by interrogating the function of one gene or pathway at a time. What sets this paper apart is the systematic perturbation of more than 100 resistance genes and subsequent examination whether these genes share common transcriptional programs that underlie the emergence of resistance. Indeed, it appears that many roads can lead to the same resistance state. Notably, such an observation is both promising in terms of providing novel therapeutic targets (such as the combinatorial inhibition of multiple convergent genes KLF5, SMAD3, RREB1, FOSL1) and concerning because tumors have multiple potential genetic strategies to acquire resistance. From a technical standpoint, the PerturbFate framework presented here is a significant advance in itself. Multi-omic CRISPR screens have become possible, and the addition of measuring nascent RNA brings time-resolved information into the system to understand regulatory dynamics better. However, the study was conducted exclusively in vitro using only one melanoma cell line (A375).

Emergence of oncofetal plasticity is ubiquitous in early colorectal cancers

Buissant des Amorie et al. Nature (2026). 10.1038/s41586-026-10344-7

The paper in one sentence

Metastasis-associated oncofetal cell states appear in virtually all early-stage colorectal cancers at the moment of invasive front formation, driven not by new genetic mutations but by nearby submucosal fibroblasts that transition into cancer-associated fibroblasts and signal through TGFβ and prostaglandins.

Summary

This study challenges the conventional view that metastatic competence arises late in colorectal cancer (CRC) progression. Using spatial transcriptomics (GeoMx, CosMx), single-cell RNA-seq, and multiregional organoid biobanking from 16 early-stage (T1) CRCs, the authors show that oncofetal cell states, marked by LAMC2 and the High Relapse Cell (HRC) signature, are already present at the invasive front of most T1 tumors, regardless of whether they eventually metastasize. Whole-genome sequencing of paired organoids from tumor core and invasive front revealed no additional driver mutations in invasive cells, ruling out genetic selection. Instead, the cause lies in the microenvironment: submucosal trophocytes (normal fibroblasts) transition into trophocyte-like cancer-associated fibroblasts (CAFs) immediately after malignant transformation. These CAFs induce oncofetal plasticity in adjacent tumor cells via TGFβ and prostaglandin E2/D2 signaling. Functional validation using CRISPR-engineered EMP1-mNeon reporter organoids confirmed that TGFβ plus prostaglandins robustly induce the oncofetal state. Notably, while oncofetal cells are necessary for metastasis, their presence alone is insufficient—metastatic T1 tumors showed downregulation of immune-related programs, suggesting immune evasion as a critical additional bottleneck.

Personal highlights

Oncofetal plasticity emerges at the earliest invasive stage: in 232 T1 CRCs (the earliest stage where tumor cells penetrate the submucosa), the HRC/LAMC2 oncofetal signature was present in invasive fronts of nearly all tumors, regardless of metastatic status. This overturns the assumption that metastatic programs are late-acquired.
No genetic drivers of invasive phenotypes: whole-genome sequencing of paired organoids from tumor core and invasive front showed identical driver mutations (APC, TP53, KRAS) and copy number profiles. Growth factor dependency assays confirmed functional similarity, proving that oncofetal plasticity is extrinsically induced, not genetically selected.
Trophocyte-like CAFs originate from normal submucosal fibroblasts: spatial single-cell transcriptomics capturing the moment of malignant transformation (intramucosal carcinoma → T1 sm1 → T1 sm3) revealed that trophocyte-like CAFs emerge from tissue-resident submucosal trophocytes, not from other fibroblast populations. These CAFs colocalize with oncofetal cells at the invasive front.
TGFβ and prostaglandins drive oncofetal state induction: organoid-fibroblast cocultures showed that 3D-cultured fibroblasts (which phenocopy invasive front CAFs) strongly induce oncofetal programs. A CRISPR-based EMP1-mNeon reporter screen identified TGFβ1/3 and PGE2/PGD2 as the most potent inducers, with combination treatment yielding the highest EMP1+ cell fractions.
Oncofetal plasticity is necessary but not sufficient for metastasis: while oncofetal cells were ubiquitous, only a subset of T1 CRCs metastasized. Comparison of metastatic versus non-metastatic tumors revealed downregulation of immune-related programs (e.g., CD8 T cell signatures) in metastatic invasive fronts, indicating that immune evasion is a co-requisite bottleneck.

Why should we care?

This work reframes how we think about metastasis initiation in colorectal cancer. Rather than being a late event driven by accumulating mutations, metastatic competence, specifically the oncofetal plasticity that enables dissemination, is acquired almost immediately after the tumor breaches the muscularis mucosa. The good news: this plasticity is induced by the local microenvironment (submucosal fibroblasts) rather than hard-to-target genetic changes, and the key signals (TGFβ and prostaglandins) are pharmacologically tractable. The cautionary note: because these oncofetal cells are present in virtually all early tumors, their presence alone cannot predict metastasis; additional immune evasion is required. This explains why most T1 CRCs do not metastasize despite harboring plastic cells. A limitation: the study focuses on the earliest invasive stage; whether the same mechanisms persist in advanced CRC remains to be determined, and the clinical utility of oncofetal markers for risk stratification is limited by their near-universal presence.

Prospective evaluation of genomics-guided off-label treatment

Verkerk et al. Nature (2026). 10.1038/s41586-026-10405-x

The paper in one sentence

A large prospective Dutch trial (DRUP) providing off-label targeted and immunotherapies to 1,610 advanced cancer patients with no standard options found modest overall benefit (34.9% clinical benefit rate, 15.7% objective response rate) but identified exceptional responders (7%) and generated evidence that supported national reimbursement for one indication.

Summary

The Drug Rediscovery Protocol (DRUP) is an investigator-initiated, multicentre platform trial launched in the Netherlands in 2016. It offers patients with advanced solid tumours who have exhausted all standard treatments access to approved targeted or immunotherapies outside their registered indications, matched to molecular alterations. Between July 2016 and May 2024, 1,610 patients started treatment with 37 different off-label drugs across 103 tumour types and 75 molecular targets. Of 1,363 evaluable patients, 39% had rare cancers. The clinical benefit rate (confirmed response or stable disease ≥16 weeks) was 34.9%, with an objective response rate of 15.7% and a median progression-free survival of 3.4 months. Grade ≥3 treatment-related adverse events occurred in 28.4% of patients, including 11 grade 5 events. Fourteen of 15 completed stage 2 cohorts met protocol-defined success criteria, but only one (nivolumab in MSI-H tumours) progressed to stage 3 and achieved national reimbursement. Tumour type significantly influenced outcomes in 4 of 17 drug-target subgroups, arguing against purely tumour-agnostic approaches. The authors conclude that off-label precision medicines should only be used within structured frameworks that systematically capture outcomes.

Personal highlights

Large-scale real-world evidence across rare cancers: the study includes 1,610 heavily pretreated patients, 39% with rare cancers (incidence <6/100,000/year). The clinical benefit rate for rare cancers was comparable to common cancers, demonstrating the value of precision approaches for orphan diseases.
Modest overall activity but meaningful exceptional responders: the overall objective response rate (15.7%) and median PFS (3.4 months) are modest. However, 7.0% of patients were exceptional responders (complete response or progression-free ≥2 years), including durable responses in BRAF V600E-mutant brain tumours, MSI-H cancers, and MET-altered lung cancers, showing that substantial benefit is possible for molecularly defined subsets.
Tissue context still matters for many targets: while tumour-agnostic approvals are increasingly common, DRUP found that tumour type significantly affected clinical benefit in 4 of 17 drug-target subgroups (e.g., MET amplifications responded better in lung cancer than other cancers). This suggests that ignoring histology entirely may be premature.
Substantial toxicity highlights risks of unstructured off-label use: over a quarter of patients (28.4%) experienced grade ≥3 treatment-related adverse events, with 11 treatment-related deaths. This underscores that off-label use without systematic monitoring exposes vulnerable patients to significant harm without the benefit of generating learnable evidence.
Translation from positive signals to practice remains challenging: although 14 of 15 completed stage 2 cohorts met the protocol’s success criteria (≥5 of 24 patients with clinical benefit), only one progressed to stage 3 and achieved reimbursement. Barriers included modest clinical impact (short-term stable disease was overvalued by the trial’s criteria), patent expiry, and the rarity of some indications.

Why should we care?

Off-label prescribing of cancer drugs is widespread, a recent US analysis found 18.6% of patients receive at least one off-label therapy, but outcomes are rarely captured systematically. This practice creates risks: patients may experience serious toxicity without benefit, healthcare systems bear high costs, and the medical community learns nothing. DRUP provides a blueprint for responsible off-label use: prospective, data-generating, and ethically sound. The sobering takeaway is that even with strong biological rationale, most off-label combinations produce only modest benefit. However, for a small subset of patients (7% exceptional responders), the impact is transformative. The trial also reveals implementation hurdles: many promising signals never translate to practice because of patent expiry or because trial endpoints (stable disease at 16 weeks) do not align with regulatory standards (durable responses, meaningful PFS extension)

Cell-type-targeted mitochondrial transplantation rescues cell degeneration

Ayupov et al. Nature (2026). 10.1038/s41586-026-10391-0

The paper in one sentence

MitoCatch, a platform that uses protein binders (nanobodies, DARPins, or antibodies) to deliver healthy mitochondria to specific cell types, rescues neuronal degeneration in a model of Leber’s hereditary optic neuropathy and after optic nerve crush in mice.

Summary

Mitochondrial dysfunction underlies many untreatable diseases, but delivering healthy mitochondria to affected cells has been inefficient and non-targeted. The authors developed MitoCatch, a system with three complementary strategies: displaying binders on the target cell surface (MitoCatch-C), on the mitochondrial surface (MitoCatch-M), or using bispecific binders linking both (MitoCatch-Bi). Using anti-GFP nanobodies as a model, they show that donor mitochondria are internalized, escape endosomes, fuse/fission with native mitochondria, and remain motile. The outer membrane of internalized mitochondria is exposed to the cytosol, as demonstrated by a destabilized nanobody stabilization assay. They then targeted mitochondria to multiple human cell types (neurons, endothelial, cardiac, T cells) and to retinal organoids, using binders against CD71, CD73, CD31, CD4, and CD8. Higher binder affinity improved targeting efficiency. In a patient-derived iPS cell model of Leber’s hereditary optic neuropathy (LHON; mt11778G>A mutation), transplantation of healthy mitochondria increased oxygen consumption, upregulated nuclear-encoded oxidative phosphorylation genes, and improved cell survival under glycolysis inhibition. In mice, targeted delivery of mitochondria to retinal ganglion cells (via anti-GFP nanobody displayed on PV+ cells) increased neuronal survival by ~47% and preserved light-evoked responses after optic nerve crush. No detectable immune response against displayed nanobodies or mitochondrial proteins was observed.

Personal highlights

Three modular targeting strategies: MitoCatch offers flexible approaches: cell-surface displayed binders, mitochondrion-surface displayed binders, or bispecific linkers, allowing adaptation to different cell types and experimental contexts without re-engineering the core delivery mechanism.
Functional integration of transplanted mitochondria: donor mitochondria fuse with native networks, move along neurites at speeds comparable to endogenous mitochondria, and are exposed to the cytosol. This suggests they can participate in cellular metabolism rather than being degraded.
Affinity tuning improves delivery efficiency: raising binder affinity enabled effective targeting at lower mitochondrial doses, a critical parameter for potential therapeutic translation.
Rescue of LHON patient neurons in vitro: transplanting healthy mitochondria into iHNeurons derived from a homoplasmic LHON patient (mt11778G>A) increased basal, ATP-linked, and maximal respiration, upregulated nuclear-encoded mitochondrial genes, and improved survival under glycolytic stress (galactose medium) by ~24% compared to non-transplanted controls.
In vivo neuroprotection after optic nerve crush: targeted delivery of mitochondria to PV+ retinal ganglion cells in mice increased cell survival by 47% (2.5 µg dose) and preserved light-evoked calcium responses, while reducing axonal beading. mtDNA-depleted mitochondria had no effect, confirming the specificity of healthy mitochondrial function.

Why should we care?

Mitochondrial diseases, ranging from optic neuropathies (LHON) to Leigh syndrome, Parkinson's, and heart failure, currently lack effective treatments. The idea of transplanting healthy mitochondria has been around for decades, but two barriers have limited clinical translation: poor efficiency and inability to target specific diseased cell types. MitoCatch addresses both by borrowing principles from viral targeting (surface binders) and showing that engineered nanobodies or antibodies can guide mitochondria to the right cells. The proof-of-concept in a patient-derived LHON model and in a mouse optic nerve crush model is encouraging. However, the study is still preclinical: the LHON rescue was in cultured neurons, not in patients; the in vivo work used a mechanical injury model rather than a genetic mitochondrial disease; and long-term efficacy, safety, and potential immunogenicity remain to be tested (though short-term no immune response was detected).

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 06/04/26

Sebastiaan Vanuytven — Sun, 12 Apr 2026 15:52:06 GMT

This week’s reads highlight how timing, context, and scale shape biological systems: from early embryonic quality control to cancer adaptation and next-generation computational frameworks. In the context of development, mosaic gastruloids have shown that competition between cells during development is an all-or-nothing event; depending on a certain threshold of p53 level, cells will compete during a limited period of time during gastrulation. At the other end of the spectrum, cancer cells appear to be much more adaptive than previously expected, these cells are able to “learn” from previous experiences during therapy through AP-1-induced epigenetic memory. As for novel spatial frameworks like STDrug, they demonstrate once again that cell behavior should not be studied without considering the context. As far as novel technological developments are concerned, scale seems to be an emerging constraint. Annbatch addresses it by removing loading restrictions for terabyte-sized datasets, while CoLa-VAE approaches the problem by thinking about cells as nodes in communication networks rather than independent entities. The use of deep learning approaches in regulatory biology becomes increasingly popular; for example, large-scale multiomic atlases provide insight into the “syntax” of gene regulation in human development.

Preprints/articles that I managed to read this week

Mosaic Gastruloids Reveal a Temporal Restriction for Developmental Cell Competition

Frenster, J. D. et al. Nature Cell Biology (2026). https://doi.org/10.1038/s41556-026-01923-x

The paper in one sentence

Using mosaic mouse gastruloids, the authors show that p53-deficient cells act as supercompetitors, eliminating wild-type neighbours through mitochondrial apoptosis, but only during a narrow developmental window at the onset of gastrulation when acute relative p53 protein levels determine competitive outcomes.

Summary

Cell competition is a quality control mechanism that eliminates less fit cells during early embryogenesis, but studying it has been difficult in mammalian systems. The authors turned to mouse gastruloids as a controllable, scalable model. They created mosaic gastruloids by mixing wild-type (WT) cells with very small numbers of p53 knockout (p53KO) cells (as few as 2 out of ~150 cells). Remarkably, p53KO cells acted as supercompetitors, dramatically impairing the growth of neighbouring WT cells through mitochondrial apoptosis (cleaved caspase-3), without affecting cell cycle progression. WT cells neighbouring p53KO cells showed elevated p53 protein levels, and this was causal: transient p53 degradation using an auxin-inducible degron system during a specific 48–72 hour window was sufficient to confer supercompetitor status. Crucially, cell competition only occurred when both populations were within a narrow developmental window—the transition from primed pluripotency to early gastrulation (gastruloid days 2–4, equivalent to mouse embryonic days E5.5–E7.5). Heterochronic mixing experiments showed that cells outside this window could neither initiate nor respond to competition. Wnt and BMP signalling protected against competition, while modulation of Nodal or ERK had no effect. The transcription factors Brachyury and Eomesodermin were required for the competitive response, linking competition to the gastrulation gene regulatory network.

Personal highlights

Gastruloids as a powerful 3D competition model: unlike 2D cultures where competition only occurred at confluence, gastruloids enabled genuine developmental competition. As few as two p53KO cells (1.3% of starting population) measurably impaired WT growth, demonstrating an astonishingly potent effect size.
Competition restricted to a narrow developmental window: using heterochronic mosaics, reaggregating cells from gastruloids of different ages, the authors showed that both winner and loser cells must reside within a specific window (48–96 hours in gastruloids, equivalent to E5.5–E7.5 in mice). Outside this window, p53KO cells no longer competed. This reveals a stage-gated fitness checkpoint at the onset of gastrulation.
Mitochondrial apoptosis, not proliferation, drives loser elimination: p53KO cells did not proliferate faster; instead, they induced p53 stabilization and mitochondrial apoptosis (Bcl2-sensitive, Bax/Bak-dependent) in neighbouring WT cells. Overexpression of Bcl2 in WT cells rescued them from competition, confirming the intrinsic apoptosis pathway as the executioner.
Wnt and BMP signalling protect from competition: agonism of Wnt (ChIR) or BMP4 reduced competition, while inhibition increased it. By contrast, Nodal/Activin and ERK/FGF modulation had no effect. This suggests that posteriorizing signals may suppress the competitive checkpoint, possibly linking it to replication stress and error-prone cell populations.
Acute p53 protein levels determine winner/loser status: using an auxin-inducible p53-degron system, the authors showed that transient p53 degradation for just 24 hours (48–72 h or 72–96 h) was sufficient to convert WT cells into supercompetitors. p53 levels in neighbouring WT cells rose reciprocally. This provides direct causal evidence that relative p53 protein dynamics at gastrulation onset dictate competitive outcomes.

Why should we care?

Cell competition is an important quality control mechanism that is responsible for sculpting early embryos. However, investigating cell competition in mammals was limited by the difficulty of accessing the gastrula stage. Gastruloids provide an excellent system to study cell competition due to their scalability, manipulation, and imaging capabilities, which allows reproducing the main aspects of gastrulation, both in terms of timing and location. In this study, the authors demonstrate that gastruloids can serve as an outstanding model system to elucidate the molecular mechanisms of cell competition, including cellular comparisons of fitness, removal of weaker cells by stronger cells, and temporally limited competition. The results of the study have broad applications, not only for studying cell fate during development but also for other biological fields where p53-dependent cell competition may be involved, such as cancer, where p53 loss leads to a growth advantage, or stem cell therapy, where differences in fitness between cells result in the depletion of the required cell population.

Annbatch: Unlocking Terabyte-Scale Training of Biological Data in AnnData

Gold, I. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.24.713961

The paper in one sentence

The authors introduce Annbatch, a high-performance mini-batch loader for the AnnData format that pre-shuffles disk-backed datasets and reads contiguous chunks, achieving up to 40× faster data loading and reducing training times from days to hours for terabyte-scale biological data.

Summary

Training deep learning models on large biological datasets is often limited not by model complexity but by inefficient data loading. Conventional approaches that randomly access individual samples from disk are slow, while existing solutions either require file format conversion (breaking compatibility with established tools) or fail to saturate modern GPUs. Annbatch solves this by providing an end-to-end data loading framework fully integrated with the scverse ecosystem (AnnData, Scanpy, etc.). It consists of two components: (1) a pre-shuffler that reorganizes on-disk AnnData files by writing out randomly ordered contiguous chunks without loading the entire dataset into memory, and (2) a high-performance data loader that reads these pre-shuffled chunks sequentially, leveraging the Zarr storage backend and techniques like direct I/O, pinned memory, and GPU acceleration.

Personal highlights

End-to-end solution within scverse:uUnlike existing high-performance loaders that require conversion to custom formats (breaking compatibility with Scanpy, Squidpy, and other scverse tools), Annbatch reads and writes standard AnnData files. This means users can train models at scale and then immediately validate outputs using the rich scverse ecosystem.
Pre-shuffling enables contiguous disk access: the key innovation is pre-shuffling the on-disk data into randomly ordered contiguous chunks. This allows the loader to fetch large sequential blocks rather than performing slow random access for individual samples, dramatically improving I/O throughput while preserving batch randomness.
Zarr backend with Rust acceleration: Annbatch leverages the Zarr storage format and a new Rust-based bridge (zarrs-python) to achieve higher performance than HDF5-based h5ad files. Pre-shuffling overhead dropped from ~12 hours to ~5.5 hours, and throughput increased from ~20,000 to ~54,000 samples/sec.
Works across modalities: Beyond single-cell RNA-seq, Annbatch accelerates data loading for single-cell microscopy images (22× speedup over random indexing) and whole-genome sequencing data (41× for rare variants, 23× for common variants), demonstrating its generality across biological data types.

STDrug: A Spatial Transcriptomics Framework for Personalized Drug Repurposing

Yang et al. bioRxiv (2026). 10.64898/2026.04.03.715101

The paper in one sentence

STDrug is a computational framework that leverages spatial transcriptomics data—preserving the spatial organization of tissues, to prioritize patient-specific repurposed drugs by identifying disease-reversed gene signatures within matched tumor and normal spatial domains.

Summary

Current drug repurposing methods often rely on bulk or single-cell RNA-seq, which lose the spatial context of how cells interact within their microenvironment. STDrug addresses this gap by integrating spatial transcriptomics from paired tumor and adjacent normal tissues. The framework first uses graph convolutional networks and coherent point drift to align and identify paired spatial domains between diseased and healthy samples. Then, it calculates a comprehensive drug score by combining: (1) reversal of disease-associated gene expression, (2) spatial domain–domain interactions, (3) drug efficacy, and (4) side-effect profiles. A machine learning model, guided by GPT-4o-derived gene–disease associations, weights key reversible genes. Tested on hepatocellular carcinoma (HCC) and prostate cancer (PCa) datasets, STDrug significantly outperformed single-cell-based methods (ASGARD, Beyondcell), achieving AUCs of 0.81–0.82. Real-world electronic health record analysis (MarketScan, n=264M) showed that top-ranked drugs (e.g., atorvastatin, digoxin, niacin for HCC; digoxin, colchicine, sirolimus for PCa) were associated with delayed cancer onset. In vitro validation on five PCa cell lines confirmed cytotoxicity of bortezomib and vorinostat at clinically relevant concentrations.

Personal highlights

Spatial context matters for drug repurposing: unlike single-cell approaches, STDrug explicitly models paired spatial domains between tumor and normal tissue, capturing microenvironment-dependent gene expression changes that inform drug reversal scores.
Superior benchmark performance: STDrug achieved significantly higher AUCs (0.81 for HCC, 0.82 for PCa) compared to ASGARD (0.61, 0.57) and Beyondcell (0.59, 0.48) when validated against clinical trial and literature-derived reference sets.
Multi-modal drug scoring: the framework integrates transcriptomic reversibility, spatial domain–domain interactions, drug efficacy (GDSC), and toxicity (SIDER) into a single patient-level score, enabling more balanced therapeutic prioritization.
Real-world clinical validation: using a large claims database (MarketScan), top STDrug-predicted compounds (e.g., atorvastatin, digoxin, niacin, sirolimus) were associated with significantly delayed cancer onset in propensity-matched analyses, supporting their protective effects.

Why should we care?

Most drug repurposing methods today treat cells as if they exist in a vacuum, ignoring the fact that a tumor's behavior and its response to drugs, is deeply shaped by its surrounding tissue architecture. STDrug offers a practical, computationally grounded step toward incorporating this spatial reality into therapeutic discovery. The approach currently requires paired tumor-normal samples, which are not always available, and its predictive accuracy, while promising, still depends on the quality of pharmacogenomics perturbation databases and LLM-derived gene weightings. The real-world and in vitro validations are encouraging, but prospective clinical trials would be needed before any of these predictions could guide treatment.

AP-1 mediates cellular adaptation and memory formation

Li et al. Nature Communications (2026). 10.1038/s41467-026-70862-w

The paper in one sentence

Cancer cells exposed to targeted therapy can form and retain “memories” of their pre-treatment gene expression state through the transcription factor AP-1, enabling them to adapt and become resistant to higher drug doses in a process that resembles cellular learning.

Summary

Most cells respond to stress by executing hard-wired genetic programs, but whether they can learn from experience, encoding a memory of a transient state and acting upon it later, has remained unclear. Using a melanoma cell line (WM989) and MAPK pathway inhibitors (trametinib, vemurafenib), Li and colleagues show that cells exposed to a low drug dose adapt over 10–14 days to survive a subsequent higher dose, an effect that cannot be explained by simple selection of pre-existing resistant variants. This adaptive behavior depends on the transcription factor AP-1: inhibiting AP-1 abolishes the dose-escalation survival advantage. The authors then demonstrate genuine memory formation: if they transiently induce “passenger” genes (e.g., FKBP5, TGM2) with dexamethasone and add trametinib during that induction window, those genes remain highly expressed weeks later, even after the dexamethasone signal is removed. Using a dual-color AP-1 reporter (EGFP and mCherry driven by AP-1 binding sites), they show that the initial, stochastic imbalance in reporter expression in a single cell is “remembered” in its entire resistant colony, evidence for cis-encoded epigenetic memory (regulation by association, not just by DNA sequence). Mechanistically, CBP/p300 (both its histone acetyltransferase and bromodomain activities) is required for memory formation and maintenance, suggesting a “read-write” mechanism for propagating activating chromatin states. The work establishes that cancer cells can learn from therapy-induced stress, with AP-1 as a key mediator of this non-genetic adaptive memory.

Personal highlights

Cellular adaptation beyond selection: dose-escalation experiments show that cells pre-exposed to low-dose trametinib are far more likely to survive a subsequent high dose than cells never exposed, indicating active adaptation rather than pure selection of rare resistant clones.
AP-1 is necessary for adaptive memory: pharmacological inhibition of AP-1 (JNK-IN-8 or T5224) during the low-dose “training” period completely blocks the survival benefit upon dose escalation, positioning AP-1 as a central regulator of this learning-like behavior.
Memory of “passenger” gene expression: transient induction of dexamethasone-responsive genes (unrelated to therapy resistance) becomes permanently encoded if trametinib is added during that induction. Those genes remain elevated for weeks without the original stimulus, proving that memory is not restricted to pre-existing resistance programs.
Cis‑encoded epigenetic memory demonstrated by dual reporter: in cells with two AP-1 reporters (same promoter, different fluorophores), the initial stochastic imbalance in red vs. green expression at the time of drug addition biases the entire resistant colony’s color. This shows that memory is stored at the gene locus itself (in cis), not via global transcription factor feedback loops.
CBP/p300 as a “read-write” machine for activating memory: inhibiting either the acetyltransferase domain (A-485) or the bromodomain (SGC-CBP30) of CBP/p300 blocks memory formation or erases existing memory, supporting a model where acetylation marks are propagated through cell divisions.

Why should we care?

This work challenges the textbook view that gene regulation is entirely hard-wired by evolutionarily conserved programs. It shows that individual cancer cells can learn from a stressful experience (low-dose therapy) and retain that memory to cope with a future, more severe challenge (high-dose therapy). While this is a form of non-neural “cellular learning,” it also represents a troubling mechanism of acquired drug resistance, one that is not genetic and therefore invisible to standard DNA sequencing. The good news is that this memory can be disrupted pharmacologically (by inhibiting AP-1 or CBP/p300), raising the possibility of “erasing” resistance memories to resensitize tumors. Beyond cancer, similar AP-1-dependent memories have been observed in inflammation, wound healing, and aging, suggesting a general principle. That said, the work is primarily in a single melanoma cell line, and whether these memories operate in patients or across diverse cancer types remains to be tested

Multiomics and deep learning dissect regulatory syntax in human development

Liu et al. Nature (2026). 10.1038/s41586-026-10326-9

The paper in one sentence

Deep learning models trained on a multi-organ single-cell atlas of chromatin accessibility across 12 human fetal organs reveal a lexicon of 508 regulatory sequence motifs, including “hard” and “soft” syntactic rules governing how transcription factors cooperate to control gene expression during development.

Summary

The authors generated the Human Development Multiomic Atlas (HDMA), profiling both chromatin accessibility and gene expression from 817,740 fetal cells across 12 organs between 10-23 weeks post-conception. This resource spans 203 cell types and over 1 million candidate cis-regulatory elements. Using deep convolutional neural networks (ChromBPNet) trained to predict accessibility from local DNA sequence in each cell type, they identified 508 de novo motifs that influence chromatin accessibility. Notably, they discovered 67 composite motif pairs exhibiting synergistic effects, with 48 showing “hard” syntax (requiring precise spacing and orientation, typically under 20 bp) consistent with DNA-mediated cooperativity, and 27 showing “soft” syntax (flexible arrangements up to 150 bp) consistent with nucleosome-mediated cooperativity. A small subset of motifs (15) were predicted to reduce accessibility rather than promote it, including ZEB/SNAIL, HIC, and BCL11A motifs. The authors validated their enhancer predictions using VISTA transgenic mouse assays, corrected several mis-annotated enhancers, and demonstrated that disease-associated genetic variants often fall in fetal-specific regulatory elements, with predicted effects on accessibility concordant with eQTL data.

Personal highlights

Multi-organ single-cell atlas of development: the HDMA resource provides matched chromatin accessibility and gene expression across 12 fetal organs from the same cells, substantially improving coverage and quality compared to previous single-omics atlases.
Deep learning reveals a comprehensive motif lexicon: ChromBPNet models trained on 189 cell types identified 508 de novo motifs predictive of accessibility, including both positive (97%) and negative (3%) regulators, with ubiquitous promoter-dominant motifs (NRF1, NFY, YY1/2) distinguished from tissue-specific distal motifs.
Hard and soft regulatory syntax: systematic in silico marginalization of 138 composite motifs identified 67 synergistic motif pairs, with 48 exhibiting “hard” syntax (precise spacing/orientation constraints, e.g., E-box+homeodomain at 5 bp head-to-tail) and 27 exhibiting “soft” syntax (flexible arrangements up to 150 bp), providing in vivo evidence for distinct modes of transcription factor cooperativity.
Ubiquitous negative regulatory motifs: 15 motifs, including ZEB/SNAIL and HIC, were consistently predicted to reduce chromatin accessibility despite being widely distributed in accessible regions, and variants disrupting these motifs were significantly enriched for upregulating eQTLs, suggesting they represent a broader repertoire of repressive regulatory signals.
Disease variants in fetal regulatory elements: analysis of fine-mapped GWAS variants revealed that many adult-onset disease-associated variants are enriched in fetal-specific accessible regions, including an asthma variant in fetal lung macrophages and a coronary artery disease variant in fetal muscle endothelial cells, suggesting developmental contexts may influence disease predisposition.

Why should we care?

This work advances our understanding of how DNA sequence encodes gene regulatory information during human development, a problem often called the "cis-regulatory code." By combining large-scale single-cell profiling with interpretable deep learning, the authors show that transcription factors don't just bind DNA independently; they follow specific grammatical rules, with some requiring rigid spacing like letters in a word ("hard syntax") while others tolerate flexibility like conversational grammar ("soft syntax"). The discovery that certain sequence motifs consistently reduce accessibility, and that these are enriched in variants that increase gene expression, challenges the assumption that open chromatin primarily reflects activating regulation. While experimental validation of predicted motif syntax rules remains limited, and the VISTA enhancer data represents a small curated set, this work provides a foundational framework and resource for interpreting non-coding genetic variation and understanding how sequence controls cell-type-specific gene regulation.

CoLa-VAE: Cell-Cell Communication-aware Variational Autoencoder with Dynamic Graph Laplacian Constraints

Chen et al. bioRxiv (2026). https://www.biorxiv.org/content/10.64898/2026.03.28.715052v1

The paper in one sentence

CoLa-VAE extends variational autoencoders for single-cell RNA-seq by explicitly incorporating cell-cell communication signals as a graph Laplacian constraint, producing latent representations that capture both intrinsic transcriptional programs and extrinsic signaling topology.

Summary

The authors address a fundamental limitation of current single-cell representation learning methods (e.g., scVI, Seurat): they model each cell’s expression as a function of its own latent state, ignoring intercellular signaling. CoLa-VAE uses a VAE with a partitioned latent space, one subspace constrained by a standard Gaussian prior (z_Normal) and another constrained by a dynamic graph Laplacian (z_CCC) derived from ligand-receptor interactions. Crucially, the communication graph is recomputed periodically during training using the decoder’s denoised expression matrix, creating a positive feedback loop that improves both communication inference and representation quality. The framework is modular, supporting four different CCC scoring methods (CellChat, CellPhoneDB, iTalk, CytoTalk). Benchmarking on PBMC datasets shows CoLa-VAE excels at structural clustering metrics (Silhouette, Dunn, Calinski-Harabasz) and denoising fidelity (global and local geometry preservation), though its Adjusted Rand Index is only comparable to or slightly below Seurat. In a real-world snRNA-seq dataset from human ventral midbrain, CoLa-VAE corrected misannotated cells (e.g., reclassifying microglia as oligodendrocytes) and spontaneously isolated doublets as satellite clusters. The method also extends to spatial transcriptomics (human DLPFC), where spatial constraints improve imputation of laminar gene expression patterns.

Personal highlights

Communication-aware latent space disentanglement: CoLa-VAE explicitly separates latent dimensions into a CCC-constrained subspace (capturing signaling topology) and a normal subspace (capturing intrinsic variation), enabling the model to distinguish extrinsic signaling effects from intrinsic transcriptional heterogeneity.
Method-agnostic and robust integration: The framework supports four distinct CCC inference modules (CellChat, CellPhoneDB, iTalk, CytoTalk) and consistently outperforms baselines on structural clustering metrics, demonstrating that communication constraints improve cluster compactness and separation regardless of the specific scoring formula.
Superior denoising with preserved local and global geometry: Compared to scVI, CoLa-VAE reconstructs expression matrices that better retain both global data structure (Mantel test) and local neighborhoods (kNN overlap), while maintaining comparable marker gene recovery, suggesting the graph Laplacian prevents over-smoothing.
Extension to spatial transcriptomics: By incorporating physical distance constraints into the graph Laplacian, CoLa-VAE imputes sparse spatial expression patterns (e.g., MBP in white matter, PCP4 in cortical layer 5) that are nearly invisible in raw data, though it tends to group layers 2-6 into broader functional zones rather than sharp bands.

Targeted sequencing of mutations via RNA-templated gap filling of oligonucleotides for single-cell RNA-seq

Saurty-Seerunghen et al. bioRxiv (2026). 10.64898/2026.04.10.717677

The paper in one sentence

A new enzymatic method using Bst polymerase’s reverse transcriptase and nick-translation activities enables targeted detection of somatic mutations from single-cell RNA-seq data without requiring DNA isolation, integrating seamlessly with the 10x Genomics Flex workflow.

Summary

Current approaches for linking mutations to cell identity in single-cell transcriptomics either require separate DNA genotyping (e.g., GoT) or rely on direct RNA capture with limited sensitivity. The authors develop an RNA-templated gap-filling strategy that leverages the dual activity of BstFL polymerase: first extending a probe across a mutation-containing gap via reverse transcription, then nicking the 5’OH terminus of the opposing probe to enable ligation by SplintR ligase. This converts initially ligation-incompetent probes into amplifiable products that are captured alongside standard gene expression probes in the 10x Flex system. The method was validated on pooled breast and prostate cancer cell lines (MCF-7, SK-BR-3, LnCAP) targeting 64 loci. Of 38 targets with sufficient cell coverage, mutant calls showed 80-100% specificity to the expected cell line. Target mRNA abundance was the primary determinant of detection efficiency (Pearson R=0.37, p<0.01), while gap length (5-12 bp) and mutation position within the gap had no significant effect. Probes with GC content outside the 44-72% range showed reduced sensitivity. The approach preserves transcriptomic integrity, enabling simultaneous gene expression and variant profiling without separate DNA workflows.

Personal highlights

Enzymatic gap-filling bypasses ligation limitations: unlike padlock probes or direct ligation methods that require pre-existing 5’ phosphates, the BstFL polymerase both fills the sequence gap and converts the 5’OH terminus to a ligation-competent state via nick translation, enabling detection of mutations in regions where traditional ligation would fail.
Integration with commercial single-cell platforms: the method is designed for the 10x Genomics Flex system, using the same probe capture mechanism as gene expression probes. This allows parallel library preparation for transcriptome and genotyping without modifying the core workflow.
Modest but predictable detection efficiency: genotyping rates varied widely across 50 targets (0-50% of cells with the mutation). Target expression level explains a significant portion of this variation (R=0.37), but the remaining scatter suggests other factors—potentially probe design, secondary structure, or allele-specific expression—remain unaccounted for.
Sequence constraints are manageable: gap length (5-12 bp) and mutation position within the gap showed no correlation with efficiency. GC content of the flanking probes matters only at extremes (<44% or >72% GC), where sensitivity drops significantly. This suggests the assay works across diverse sequence contexts with reasonable robustness.
Potential for multi-locus somatic genotyping: in a proof-of-principle experiment with three cell lines, the method correctly assigned mutant status to expected samples for most targets (80-100% specificity), though 11 of 61 targeted loci yielded fewer than 3 genotyped cells, highlighting current limitations for low-expressed genes.

Why should we care?

This paper addresses a practical bottleneck in cancer biology and developmental genetics: how to know which mutations are present in which single cells while simultaneously measuring their transcriptomes. Current solutions either sacrifice throughput, require separate DNA extraction, or fail at low-expressed loci. The RNA-templated gap-filling method offers an elegant biochemical workaround using one enzyme to both synthesize and "unlock" the probe that fits within a widely adopted commercial platform. That said, the detection efficiency is modest (often below 50% of expected mutant cells) and heavily dependent on target expression, meaning rare transcripts or low-coverage cells will yield sparse genotyping data.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 30/3/26

Sebastiaan Vanuytven — Sun, 05 Apr 2026 14:22:48 GMT

This week’s articles highlight how fast the pace of developments is going in the field of measuring, modeling, and modifying biological systems—spanning from the resurrection of biological cells using artificial genomes to characterizing four levels of gene regulation within a single cell. Experimentally, CHARM provides a true multi-layered understanding of gene regulation by assessing together three-dimensional (3D) genome structure, histone marks, chromatin accessibility, and transcription, while TotalX addresses one of the limitations of droplet single-cell sequencing methods by adding the noncoding RNAs component to the analysis. Moreover, new techniques are providing novel connections between technologies and spatial resolutions—LazySlide links conventional histopathology to transcriptomics using a scverse-friendly approach, while CREsted translates scATAC-seq data not only into enhancer decoding but also cell-type-specific designs. Conceptually, “zombie” bacteria have been brought back to life using artificial transplantation of genomic information, erasing the boundaries between life and death, while patient stratification can actually be achieved just by using simple cell-type proportions, thus contradicting previous assumptions in single-cell genomics.

Preprints/articles that I managed to read this week

Selection-Free Whole Genome Transplantation Revives Dead Microbes

Seidel, Z. P., et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.13.711674

The paper in one sentence

The authors developed a method to chemically inactivate the genome of recipient bacterial cells, enabling selection-free whole genome transplantation that can revive “dead” cells by installing a new donor genome, overcoming a key barrier to extending this technology beyond a narrow group of bacteria.

Summary

Whole Genome Transplantation (WGT) is a powerful synthetic biology technique where a donor genome is installed into a recipient cell, effectively reprogramming it. However, WGT has only worked reliably within a specific group of mycoplasma bacteria because most other bacteria possess active homologous recombination systems that generate false positives, where only a small piece of the donor genome recombines into the recipient rather than a full genome replacement. To solve this, the authors created what they term “zombie cells”: recipient bacteria killed by chemically crosslinking their native DNA with mitomycin C (MMC), but whose cellular machinery for transcription and translation remains intact. When a donor genome is transplanted into these genomically inactive cells, the donor genome can be expressed and the cell is “revived”, a living bacterial cell constructed from non-living parts. The authors optimized MMC treatment to kill recipient cells with high efficiency (∼10⁶-fold reduction in viability) while preserving their capacity to accept and express donor genomes. This selection-free approach eliminated the need for antibiotic resistance markers, as only cells that successfully received the donor genome could grow. The method works with donor genomes prepared from both bacterial and yeast sources. Using an alternative DNA crosslinker (psoralen with UVA activation) also supported transplantation, suggesting the approach is modular. The efficiency of selection-free WGT was dramatically improved: without MMC, one transplant was obtained per ∼150 million recipient cells; with MMC, one transplant per ∼288 recipient cells, a ∼500,000-fold increase in efficiency relative to viable cells. The authors confirmed successful transplants through growth assays, colony color (blue/white screening), and multiplex PCR, verifying that blue colonies contained the full donor genome.

Personal highlights

Zombie cells as a general chassis: by killing the recipient’s genome while preserving cellular function, the authors created a “dead” cell that can be revived by a donor genome. This reframes the distinction between life and death as a matter of genome replication rather than cellular viability, opening new conceptual and experimental possibilities.
Overcoming the homologous recombination barrier: the key technical advance is eliminating false positives caused by homologous recombination. Previous attempts to extend WGT to other bacteria consistently failed because antibiotic resistance markers from donor genomes would recombine into recipient genomes without full replacement. Chemical genome inactivation prevents this entirely.
Selection-free transplantation: by removing the need for antibiotic selection, the method can work with donor genomes lacking resistance markers. This simplifies the workflow and removes constraints on donor genome design.
Dramatic efficiency improvement: the selection-free protocol yields one transplant per ∼288 recipient cells, compared to one per ∼150 million cells in conventional WGT, a ∼500,000-fold increase in efficiency when normalized to viable cells. This makes the process far more practical.

Why should we care?

This work represents a significant step toward making whole genome transplantation a general platform for synthetic biology, not a technique limited to a small group of mycoplasma. By solving the false-positive problem that has stymied broader application for over a decade, the authors have opened the door to installing synthetic genomes into a much wider range of bacterial species. The ability to revive "dead" cells with a new genome also challenges our intuitive understanding of what it means for a cell to be alive, suggesting that cellular life can be separated into two components: the information (the genome) and the machinery that executes it.

LazySlide: Accessible and Interoperable Whole-Slide Image Analysis

Zheng, Y. et al. Nature Methods (2026). https://doi.org/10.1038/s41592-026-03044-7

The paper in one sentence

The authors introduce LazySlide, an open-source Python package built on the scverse ecosystem that enables efficient whole-slide image analysis, feature extraction, natural language querying, and multimodal integration with transcriptomics data using a familiar, low-code API.

Summary

Whole-slide images (WSIs) are foundational for histopathology and tissue biology, but computational analysis remains challenging due to fragmented data structures, platform-specific constraints, and poor integration with modern omics workflows. Existing tools like QuPath, CLAM, and TIAToolbox offer partial solutions but often lack interoperability with the single-cell and spatial omics frameworks widely used in biomedical research. To address this, the authors developed LazySlide, a general-purpose Python framework built on the scverse ecosystem (which includes AnnData, Scanpy, SpatialData, and Squidpy). LazySlide introduces a custom data structure, WSIData, that inherits from SpatialData but is optimized for proprietary WSI formats (e.g., SVS, NDPI) without requiring costly duplication or conversion to OME-TIFF. The framework supports tissue segmentation, tiling, quality control, feature extraction using vision foundation models (ResNet, ViT, UNI, Virchow, GigaPath), and feature aggregation to slide-level representations. Beyond standard image analysis, LazySlide leverages vision-language foundation models (PLIP, CONCH, PRISM, TITAN) to enable natural language queries of tissue images, zero-shot classification, slide captioning, and text-guided segmentation. It also provides seamless multimodal integration with RNA-seq data through the RNALinker class, allowing users to correlate morphological features with gene expression. The authors demonstrate LazySlide on GTEx artery slides (healthy vs. calcified), showing that text-based calcification scores distinguish groups, and that integrating image and transcriptomic embeddings identifies calcification-related pathways (e.g., IL-18 signaling) missed by RNA-only analysis. Zero-shot organ classification across nine tissue types achieves high accuracy without task-specific training. Benchmarks show LazySlide requires fewer lines of code, lower token count, and simpler API than existing tools, with faster tissue segmentation and competitive or better classification performance compared to QuPath.

Personal highlights

scverse-native interoperability: LazySlide is built directly on the scverse ecosystem (SpatialData, AnnData, Scanpy, Squidpy). This means researchers already familiar with single-cell and spatial omics workflows can apply the same data structures and analysis patterns to whole-slide images, enabling seamless integration of histopathology with transcriptomics.
WSIData: efficient, no-duplication access: unlike existing SpatialData extensions that convert WSIs to OME-TIFF (incurring 5–10× disk overhead), WSIData provides direct, lazy access to proprietary WSI formats while maintaining compatibility with scverse tools. This avoids data duplication and reduces storage costs.
Natural language querying of tissue images: using vision-language models (PLIP, CONCH), users can search for histological patterns using plain English prompts (e.g., “calcification”, “lymphocyte”). The framework computes text-to-image similarity maps, enabling zero-shot region-of-interest detection without training data.
Multimodal integration with RNA-seq: the RNALinker class links morphological features from WSIs with paired transcriptomic profiles. By anchoring on image-derived scores (e.g., calcification score), users can identify genes whose expression correlates with specific histological patterns, bridging tissue morphology and molecular mechanisms.
Zero-shot classification and segmentation: LazySlide supports organ classification, disease scoring, and text-guided tissue segmentation without task-specific training. For example, it correctly classified nine human organ types using only organ names as prompts, and can generate segmentation masks by thresholding text-image similarity and refining with SAM2.

Why should we care?

Computational pathology has long operated in a silo, separated from the rich single-cell and spatial omics ecosystems that dominate modern biomedical research. LazySlide breaks down this barrier by providing a unified, scverse-compatible framework for whole-slide image analysis. This means that researchers can now process WSIs, extract morphological features, and integrate them with transcriptomics data using the same tools and workflows they already use for single-cell analysis.

Scalable Single-Cell Total RNA Sequencing Unifies Coding and Noncoding Transcriptomics

Isakova, A. et al. Nature Biotechnology (2026). https://doi.org/10.1038/s41587-026-03068-6

The paper in one sentence

The authors developed TotalX, a modified droplet-based single-cell RNA sequencing method that captures both polyadenylated and non-polyadenylated transcripts, including miRNAs, tRNAs, lncRNAs, and histone RNAs, by adding enzymatic polyadenylation and custom template-switching to the standard 10x Genomics workflow.

Summary

Conventional single-cell RNA-seq methods rely on poly(A) capture, which systematically excludes noncoding RNAs such as microRNAs (miRNAs), transfer RNAs (tRNAs), long noncoding RNAs (lncRNAs), and histone mRNAs. This creates a blind spot in most single-cell atlases, despite the critical regulatory roles of these transcripts. Specialized total RNA protocols exist but typically require custom equipment, extensive sample processing, or bespoke pipelines, limiting their scalability. TotalX addresses this by adapting the widely adopted 10x Genomics Chromium 3′ platform with a minimal set of modifications. The key steps include: (1) enzymatic polyadenylation of total RNA in the reverse transcription mix, (2) use of a custom template-switching oligo (dUTSO) that is later digested with uracil-DNA glycosylase, (3) Cas9-mediated ribosomal RNA depletion (DASH) at the pre-amplified cDNA stage, and (4) optional size selection for miRNA-enriched fragments (18–50 bp). All modifications are compatible with standard microfluidics hardware and software, and the resulting libraries can be processed with a modified Cell Ranger pipeline. Benchmarking against VASA-seq (a high-performance but equipment-intensive method) showed comparable gene detection per UMI and per cell. TotalX detected a broad diversity of noncoding RNAs, including miRNAs, tRNAs, snoRNAs, snRNAs, and histone RNAs, at scale (over 11,000 cells per run). The optional miRNA enrichment step improved detection of mature miRNAs (e.g., MIR17, MIR221, MIR222) but reduced the proportion of mappable reads by ∼30%.

Personal highlights

Droplet-platform compatible without specialized hardware: TotalX builds directly on the 10x Genomics Chromium 3′ workflow, the most widely used single-cell platform. This means labs already equipped with standard microfluidics can adopt total RNA profiling without purchasing custom instruments or learning entirely new protocols.
Captures a broad spectrum of noncoding biotypes: beyond miRNAs, TotalX detects tRNAs, lncRNAs, snoRNAs, snRNAs, scaRNAs, and histone mRNAs—many of which are completely invisible to poly(A)-based methods. This provides a far more complete molecular phenotype of individual cells.
Optional miRNA enrichment with trade-offs: the miRNA(+) modification adds a gel-purification step for small fragments (18–50 bp), which dramatically improves detection of mature miRNAs (e.g., MIR17, MIR221) but reduces the proportion of reads mapping to the genome (from ∼75% to ∼45%). This trade-off can be tuned depending on experimental priorities.
Reveals tRNA-codon usage coordination across cell types: TotalX captures tRNA expression with sufficient quantitative accuracy to show a strong correlation (Pearson r = 0.79) between tRNA supply and amino acid demand based on codon usage of expressed genes across PBMC cell types, a finding that connects noncoding RNA profiling to translational biology.

Why should we care?

Most single-cell atlases to date have been built on poly(A) capture, meaning they systematically exclude a large and functionally important fraction of the transcriptome. This blind spot has likely obscured regulatory mechanisms in development, immunity, and disease. TotalX removes that blind spot while preserving the scale, accessibility, and interoperability of the dominant droplet-based platform.

CREsted: Modeling Genomic and Synthetic Cell-Type-Specific Enhancers Across Tissues and Species

Kempynck, N. et al. Nature Methods (2026). https://doi.org/10.1038/s41592-026-03057-2

The paper in one sentence

The authors present CREsted, a Python package that streamlines training, interpreting, and designing cell-type-specific enhancer models from single-cell ATAC-seq data, and use it to compare mesenchymal-like cancer states across melanoma and glioblastoma, revealing shared and distinct regulatory codes.

Summary

Understanding how enhancers drive cell-type-specific gene expression remains a major challenge. CREsted is a software package that takes scATAC-seq data, trains sequence-based deep learning models to predict chromatin accessibility across cell types, interprets the learned enhancer codes (identifying transcription factor binding sites), and designs synthetic enhancers with desired cell-type specificity. The authors demonstrate CREsted on a mouse motor cortex dataset, achieving high predictive accuracy (Pearson r = 0.82) and recovering known cell-type-specific transcription factor motifs. They then train a human PBMC model (DeepPBMC), validating predicted TFBSs against ChIP-seq data and correctly explaining the dense IFNB1 enhanceosome. Crucially, they apply CREsted to compare mesenchymal-like (MES) cancer cell states across melanoma and glioblastoma (GBM) using cell lines. The model (DeepCCL) groups MES-like states across cancer types, revealing shared enhancer codes (AP-1, TEAD, RUNX, NFI, ATF/CREB) but also differences—TEAD motifs are specific to cell lines, whereas SOX and RFX motifs appear only in patient biopsy samples. Using a topic model on glioma biopsies (DeepGlioma), they show that the MES-like enhancer program in cell lines is only partially recapitulated in tumors, highlighting limitations of cell line models. Finally, they train a zebrafish developmental model (DeepZebrafish) and successfully design synthetic enhancers specific to cardiac muscle, skeletal muscle, and endothelial cells, validated in vivo.

Personal highlights

End-to-end enhancer modeling from scATAC-seq: CREsted provides a unified workflow from data preprocessing (peak normalization, topic modeling) to model training (regression or classification, with fine-tuning on cell-type-specific peaks) to interpretation (gradient-based contribution scores, TF-MoDISco motif discovery) and synthetic design (in silico evolution or motif embedding). This fills a gap left by more generic frameworks like gReLU.
Outperforms large pretrained models on cell-type-specific tasks: on mouse cortex, a CREsted model trained from scratch (DeepBICCN2) matched or exceeded fine-tuned Borzoi and HyenaDNA/Nucleotide Transformer models, while being far more parameter-efficient. Base Borzoi struggled to distinguish fine neuronal subtypes, highlighting the value of task-specific training.
Cross-cancer comparison of mesenchymal-like states: CREsted revealed that MES-like enhancer programs in melanoma and GBM cell lines share core regulators (AP-1, TEAD, RUNX, NFI, ATF/CREB). However, comparison with patient glioma biopsies showed that TEAD motifs are overrepresented in cell lines but not in tumors, whereas SOX and RFX motifs are specific to biopsies. This suggests that cell line models capture only part of the in vivo enhancer logic.
Interpretable enhancer code discovery: in PBMCs, CREsted correctly identified all validated TFBSs in the CD79A B-cell enhancer, the TCRα enhancer, and even the densely packed IFNB1 enhanceosome (recovering 3 of 4 IRF sites, ATF-2/c-Jun, and NF-κB). Compared to classical motif enrichment tools (pycisTarget, pyChromVAR), CREsted achieved higher precision for identifying functional binding instances.
In vivo validated synthetic enhancer design in zebrafish: using a developmental scATAC-seq atlas (639 cell type–timepoint combinations), CREsted designed enhancers specific to cardiac muscle, skeletal muscle, and endothelial cells. All three cardiac and three skeletal muscle enhancers showed correct cell-type specificity in vivo; two of three endothelial enhancers were also specific. Dual-specificity enhancers (active in both cardiac and skeletal muscle) were more challenging, with lower success rates, but still achievable.

Why should we care?

Enhancers are the primary drivers of cell identity, yet decoding their sequence grammar has been notoriously difficult due to the combinatorial and degenerate nature of transcription factor binding sites. CREsted makes state-of-the-art enhancer modeling accessible to experimental labs, providing a scverse-compatible Python package that handles large scATAC-seq atlases

CHARM: Single-Cell Four-Omics Sequencing Reveals the Layered Regulatory Genome

Chen, Y. et al. Nature (2026). https://doi.org/10.1038/s41586-026-10322-z

The paper in one sentence

The authors developed CHARM, a single-cell method that simultaneously profiles genome conformation, histone modifications (H3K27me3 or H3K27ac), chromatin accessibility, and gene expression in the same cell, enabling integrated analysis of how these regulatory layers coordinate to define cell identity.

Summary

Gene expression is regulated by multiple layers of epigenomic information—chromatin accessibility, histone modifications, and 3D genome architecture—but existing single-cell methods capture at most two or three of these modalities in the same cell, requiring computational integration that can introduce batch effects. The authors developed CHARM (chromatin conformation, histone modification, accessibility, and RNA expression multi-omics), a plate-based method building on their previous HiRES technology. CHARM adds Tn5-based tagging of accessible chromatin and specific histone modifications (using CUT&Tag) to the same nucleus before Hi-C proximity ligation and RNA capture. All materials are co-amplified in a single tube, with sequencing reads separated in silico using unique identifiers.

Applying CHARM to mouse embryonic stem cells (805 cells passing QC) and mouse cortex (4,265 cells with H3K27ac), they achieved high data quality comparable to or better than existing single-modality methods. In mESCs, they reconstructed single-cell 3D genome structures at 5-kb resolution, resolved distinct cell-cycle dynamics for accessibility (which follows replication timing) versus H3K27me3 (which is re-established independently, aided by 3D spatial exposure), and identified discrete 3D clusters of accessible chromatin that resemble super-enhancer hubs and are enriched for cell-type-specific genes. In mouse cortex, integrating all four modalities improved gene expression prediction over single modalities, with H3K27ac contributing the most. Using Shapley analysis, they identified over 7,000 cell-type-specific enhancer–promoter linkages, including a novel distal enhancer for Gad2 in Ndnf/Lamp5 inhibitory neurons and a Satb2 enhancer linked to human intelligence-associated SNPs.

Personal highlights

Four modalities, one cell, one tube: CHARM uniquely combines Hi-C (3D genome), CUT&Tag (histone modifications), ATAC-like accessibility, and RNA-seq in a single workflow without physical separation of molecular components. This preserves native regulatory relationships and avoids cross-experiment batch effects that plague computational integration.
Resolving epigenetic dynamics across the cell cycle: using single-cell CHARM data, the authors ordered mESCs along a pseudo-cell-cycle trajectory. Chromatin accessibility recovery closely followed DNA replication timing (early-replicating domains recover earlier). By contrast, H3K27me3 restoration was less dependent on replication timing but was predicted by 3D spatial proximity to H3K27me3-rich regions, supporting a model where 3D genome architecture contributes to epigenetic memory.
3D clusters of accessible chromatin as regulatory hubs: reconstructed single-cell 3D genome structures revealed that accessible chromatin forms discrete spatial clusters (median 380 per cell, radius ~110 nm), consistent with super-resolution imaging. These clusters were enriched for super-enhancers, Mediator, and BRD4, and genes inside clusters showed higher expression and more coordinated co-expression than those outside, supporting the enhancer-promoter hub model.
Integrative gene expression prediction outperforms single modalities: a non-negative regression model using accessibility, H3K27ac, and 3D interactions together achieved the highest prediction accuracy. H3K27ac alone outperformed accessibility or 3D contacts alone, but combining all three modalities yielded the best performance, highlighting the complementary information in each layer.

Why should we care?

Understanding how cells regulate gene expression requires integrating multiple layers of epigenomic information, but until now, no single-cell method could measure four key modalities simultaneously in the same cell. CHARM fills this gap, providing a direct view of how 3D genome architecture, histone modifications, accessibility, and transcription converge within individual cells.

Cell Type Composition Drives Patient Stratification in Single-Cell RNA-seq Cohorts

Halter, C. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.27.714811

The paper in one sentence

A systematic benchmarking of 11 single-cell RNA-seq cohorts shows that simple centered log-ratio transformed cell-type proportions consistently outperform state-of-the-art sample representation methods for unsupervised patient stratification, while being orders of magnitude faster and directly interpretable.

Summary

Large single-cell RNA-seq cohorts offer the opportunity to discover clinically meaningful patient subgroups, but the computational challenge of transforming single-cell data into sample-level representations has led to the development of increasingly complex methods (MOFA+, scPoli, MrVI, GloScope, etc.). This study systematically benchmarked seven state-of-the-art sample representation methods alongside simple baselines across 11 cohorts (697 samples) covering diverse biological conditions. Surprisingly, the simplest approach, centered log-ratio (CLR)-transformed cell-type proportions, termed ECODA (Exploratory COmpositional Data Analysis), achieved the highest performance across all datasets and evaluation metrics. It was also the most interpretable, directly revealing which cell types drove patient separation (e.g., specific T-cell subsets in cytomegalovirus infection, alveolar type II cells in pulmonary fibrosis). ECODA was orders of magnitude faster (seconds vs. hours) and showed remarkable robustness to batch effects and to the choice of cell-type annotation strategy (expert, unsupervised clustering, or automated tools).

Personal highlights

Simple baseline outperforms complex methods: across 11 datasets, CLR-transformed cell-type proportions (ECODA) achieved the highest average separation scores, surpassing deep learning models (MrVI, scPoli), optimal transport methods (PILOT), and factor decomposition approaches (MOFA+, scITD). The improvement was consistent across three complementary metrics (ANOSIM, Modularity, ARI).
Proper compositional handling matters: raw cell counts performed poorly, and even simple frequency-based representations were significantly outperformed by log-ratio transformations. Centered log-ratio (CLR) proved both effective and practical, avoiding the need to select a reference cell type as in additive log-ratio.
Interpretability as a key advantage: ECODA does not produce black-box embeddings; the loadings in principal component analysis directly indicate which cell types drive patient separation. In the Adams pulmonary fibrosis dataset, just two cell types—alveolar type II (ATII) and peribronchial vascular endothelial cells—separated healthy, IPF, and COPD samples. In the Kfoury prostate cancer metastasis dataset, immature B cells and tumor inflammatory monocytes separated benign, distal, involved, and tumor sites.
Robust to batch effects and annotation strategy: ECODA embeddings showed stronger biological signal and weaker technical batch clustering compared to pseudobulk gene expression. Performance was consistent across expert-annotated, unsupervised Leiden clustering, and automated annotations (HiTME, scATOMIC), provided sufficient granularity was used.

Why should we care?

This study delivers a humbling but important message: for the increasingly common task of patient stratification from scRNA-seq cohorts, the most sophisticated deep learning methods do not necessarily outperform a simple, interpretable baseline. The finding that inter-sample biological variation is often dominated by shifts in cell-type abundance, rather than subtle transcriptional changes within cell types, has practical implications. It suggests that researchers can quickly gain meaningful insights by first analyzing cell-type composition, without needing to run computationally intensive models.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 23/3/26

Sebastiaan Vanuytven — Sun, 29 Mar 2026 17:51:19 GMT

This week’s reads highlight how biology is increasingly being understood and manipulated across scales, from cellular memory and ageing to atlas-level integration and new AI-driven representations. On the computational front, Harmony2 is a tool for integrating 100 million cells while explicitly addressing overintegration, and Tripso is a method for rethinking cell identity as a composition of interpretable gene programs, not just embeddings. On the atlas front, a massive cross-species resource for pancreatic cancer in mice and humans reveals how therapy reshapes the state of tumors and which mouse models actually model late-stage disease. Other papers reveal new layers of biological memory and context, such as how aging reprograms tumors to favor metastasis through the integrated stress response, how chronic inflammation leaves epigenetic scars that drive the potential for tumors to form, and how TimeVault lets cells literally store their memory for later. Finally, the tumor microenvironment is revealed to be a highly organized system of spatial and cellular interactions, where macrophage niches and chemokines determine whether immunity helps or hinders tumor growth.

Preprints/articles that I managed to read this week

Harmony2: Scaling Single-Cell Data Integration to 100 Million Cells Without Sacrificing Biology

Patikas, N. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.16.711825

The paper in one sentence

The authors present Harmony2, a redesigned version of the widely used integration algorithm that scales to over 100 million cells and 1,000 batches while introducing algorithmic improvements that prevent overintegration, the erroneous merging of biologically distinct cell types.

Summary

As single-cell RNA-seq atlases grow to encompass hundreds of millions of cells from thousands of donors and studies, computational integration methods face two major challenges: scaling efficiently without specialized hardware, and balancing the removal of technical batch effects with the preservation of genuine biological differences. Overintegration, where methods incorrectly merge distinct cell types, has become a particular concern in heterogeneous datasets where cell populations do not overlap across batches. The authors introduce Harmony2, a major update to the widely used Harmony algorithm. They implemented several key optimizations: a sparse matrix backend that avoids redundant computation, closed-form inversion for arrowhead-structured regression problems (which scales linearly rather than cubically with batch number), and automatic batch pruning that excludes batches with insufficient representation in each cluster. More importantly, they introduced a dynamic lambda estimation that adapts the regression penalty based on cluster-batch composition, preventing overintegration of rare or non-overlapping populations. Using the Tahoe-100M dataset (~100 million cells across 1,135 batches), they show that Harmony2 integrates 1 million cells from 800 batches in under 1 minute on a CPU, a 203-fold speedup over the original version, with linear scaling in both cells and batches. In a controlled stress test using an inflamed joint atlas where two sample groups had no overlapping cell types, Harmony2 achieved batch mixing comparable to more aggressive methods but preserved lineage structure almost perfectly, whereas other methods either failed to integrate or merged distinct cell types like T and B cells. Finally, they applied Harmony2 to the Human Lung Cell Atlas (2.2 million cells) and identified rare epithelial cell populations (ionocytes, tuft cells, neuroendocrine cells) without requiring supervised approaches, including a previously unannotated neuroendocrine-like cell population enriched in tumor samples.

Personal highlights

Linear scaling with both cells and batches: Harmony2 integrates 1 million cells from 800 batches in under 1 minute with just 2.1 GB memory, a 203-fold speedup and 12.5-fold memory reduction compared to Harmony1. Critically, runtime and memory scale linearly with both the number of cells and the number of batches, whereas the original version scaled quadratically.
Arrowhead optimization for single-covariate integration: for the common case of integrating over a single batch variable (e.g., sample or dataset), Harmony2 uses a closed-form inversion for arrowhead-structured matrices, reducing computational complexity from O(B³) to O(B). This makes large-scale integration with many batches computationally tractable.
Dynamic lambda prevents overintegration: the new dynamic lambda estimation adapts the ridge regression penalty based on the expected number of cells from each batch in a cluster. This prevents the algorithm from shrinking coefficients for rare but genuine cell populations, avoiding the overintegration seen in other methods.
Controlled stress test for unbiased evaluation: the authors designed a rigorous benchmark by splitting a large joint atlas into two groups with non-overlapping cell types. This allowed them to directly measure overintegration, methods like Seurat-RPCA and LIGER-QN achieved high batch mixing but collapsed distinct lineages, while Harmony2 matched their mixing without sacrificing purity.

Why should we care?

Single-cell atlases are growing faster than our ability to analyze them, and the public domain now contains over 100 million profiles. Harmony2 makes it practical for any researcher with standard computing resources to integrate these massive datasets, reducing the need for specialized hardware or cloud computing. More importantly, it addresses a subtle but critical failure mode in integration: the accidental merging of distinct cell types. This is especially important for studying rare populations or disease-specific states that may be present in only a subset of samples

Cross-Species Single-Cell Atlases Chart Progression, Therapy-Driven Remodelling, and Immune Evasion in Pancreatic Cancer

Lucarelli, D. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.19.712924

The paper in one sentence

The authors generated integrated single-cell atlases of human and mouse pancreatic cancer comprising over 1.6 million cells, revealing ten malignant states, rare immune populations, radiotherapy-driven remodelling, and a framework to benchmark which mouse models best recapitulate advanced human disease.

Summary

Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, yet most single-cell datasets come from early-stage, treatment-naïve resectable tumours, leaving advanced and treated disease, the clinical reality for most patients, severely underrepresented. To address this, the authors constructed comprehensive integrated atlases of human and mouse PDAC, harmonizing 16 human studies (257 donors, >1 million cells) and a diverse set of mouse models (101 tumours, >600,000 cells) using scANVI and a tailored feature selection strategy. The human atlas resolves over 60 distinct cell states across malignant, stromal, immune, and endothelial compartments. Within the malignant compartment, they identified ten transcriptionally distinct programs extending beyond the classical-basal dichotomy, including epithelial, EMT, mesenchymal, hypoxic, and highly-invasive states. They also uncovered rare immune populations consistently present across datasets, including CD4⁺CD8⁺ double-positive T cells and CD3⁺ macrophages, which were spatially localized to tertiary lymphoid-like structures. Leveraging clinical metadata, they found that radiotherapy exposure is associated with a distinct “EMT-persistent” malignant state and a remodelled microenvironment characterized by expanded tumour-associated endothelium, depletion of intratumoral T cells, and heightened laminin-CD44 signalling, features linked to poor prognosis in independent cohorts. To bridge preclinical and human biology, they built a matched mouse atlas using expressed barcodes to unambiguously distinguish malignant from stromal cells. Cross-species comparison revealed that orthotopic syngeneic allografts more faithfully recapitulate the cellular diversity and EMT-enriched states of advanced human PDAC, whereas autochthonous genetically engineered models (e.g., KPC) predominantly model earlier, epithelial-dominant disease. Core malignant programs, however, are highly conserved across species.

Personal highlights

Unified human PDAC atlas across 16 studies: by integrating 11 core datasets with 5 additional studies using scANVI and a biologically curated feature set, the authors created a stable reference of >1 million cells spanning early to advanced and treated disease. A hierarchical annotation framework (4 levels, >60 states) provides a common language for the field.
Rare immune populations consistently detected: the atlas revealed CD4⁺CD8⁺ double-positive T cells and CD3⁺ macrophages across multiple datasets and donors. Spatial transcriptomics placed DP T cells preferentially in tertiary lymphoid-like structures, suggesting they may represent a quiescent, naive-like reserve compartment rather than an activated cytotoxic population.
Radiotherapy drives a persistent EMT state and immune-excluded niche: RT-exposed tumours were enriched for a distinct “EMT-persistent” malignant program that branches from the canonical epithelial-mesenchymal continuum. This was accompanied by expansion of tumour-associated endothelium, depletion of multiple T-cell subsets, and enhanced laminin-CD44 signalling, a known axis limiting T-cell infiltration. RT-associated genes like MYO1E and CDK14 correlated with worse survival in TCGA.
Cross-species benchmarking of mouse models: using the unified annotation framework, the authors directly compared autochthonous GEMMs (e.g., KPC) with orthotopic syngeneic allografts. Orthotopic models showed greater compositional and transcriptional similarity to advanced human PDAC (enriched for EMT and mesenchymal states), while GEMMs aligned more with earlier, epithelial-dominant disease. Core malignant programs, however, were highly conserved across species.

Why should we care?

Pancreatic cancer research has been hindered by fragmented datasets and uncertainty about which preclinical models best reflect the human disease patients actually face, particularly at advanced stages and after treatment. This work provides a unified, extensible reference that not only captures the full complexity of human PDAC but also offers a quantitative framework for benchmarking model fidelity. The findings on radiotherapy-induced remodelling suggest that standard-of-care treatment may inadvertently select for more aggressive, immune-excluded malignant states, pointing toward rational combination strategies (e.g., targeting laminin-CD44 or ADAM10) that could improve outcomes. For the broader biomedical community, this atlas serves as a template for how to build cross-species, treatment-aware resources that accelerate translation from model systems to the clinic.

Chemokine-Defined Macrophage Niches Establish Spatial Organization of Tumor Immunity

Ghosh, S. et al. Nature Immunology (2026). https://doi.org/10.1038/s41590-026-02445-2

The paper in one sentence

This study reveals that in lung cancer, spatially segregated macrophage subsets have opposing functions: CD206⁺ interstitial macrophages positioned along airways organize lymphocyte recruitment and tertiary lymphoid structures to restrain tumors, while CCL2-producing interstitial macrophages within tumor regions recruit pro-tumorigenic recruited macrophages that suppress immunity.

Summary

Macrophages are abundant in tumors, but distinguishing their diverse roles has been hampered by overlapping surface markers. The authors used single-cell and spatial transcriptomics to map macrophage heterogeneity in mouse models of lung cancer (melanoma and adenocarcinoma). They identified three major populations: tissue-resident CD206⁺ interstitial macrophages (IMs), CD206⁻ IMs, and recruited macrophages (recMacs) derived from circulating monocytes. Spatial mapping revealed a striking division of labor. CD206⁺ IMs, expressing CXCL13, CXCL9, and CXCL10, localized along bronchovascular regions and were essential for lymphocyte recruitment and tertiary lymphoid structure (TLS) formation. Selective depletion of these cells using Pf4creCx3cr1DTR mice markedly increased tumor burden and eliminated TLS across three independent lung cancer models. In contrast, CD206⁻ IMs and recMacs populated tumor cores and expressed pro-tumorigenic genes including CCL2, Spp1, and Arg1. Using bone marrow chimeras with busulfan conditioning to preserve tissue-resident IMs, they showed that IM-derived CCL2, not endothelial or recruited cell-derived CCL2, was the dominant driver of recMac recruitment and tumor growth. Finally, they investigated the fate of recMacs in draining lymph nodes, where they differentiate into monocyte-derived dendritic cells (moDCs) that rely on CCR5 for migration. Transient CCR5 blockade with the FDA-approved drug maraviroc during neoantigen vaccination selectively reduced antigen-bearing moDC trafficking to lymph nodes and significantly enhanced vaccine efficacy, reducing metastatic burden.

Personal highlights

Spatial segregation defines macrophage function: using Xenium spatial transcriptomics at subcellular resolution, the authors mapped distinct macrophage subsets to specific anatomical niches. CD206⁺ IMs lined bronchial airways and expressed CXCL13, while recMacs and CD206⁻ IMs localized to tumor cores and expressed CCL2. This spatial organization directly correlated with opposing functions in tumor immunity.
CD206⁺ IMs are non-redundant organizers of TLS and tumor control: selective depletion of CD206⁺ IMs using Pf4creCx3cr1DTR mice, which targets chemokine-expressing IMs while sparing other macrophages, led to 3.7- to 7.2-fold increases in tumor burden across melanoma, lung adenocarcinoma transfer, and spontaneous GEMM models. This was accompanied by complete loss of TLS and marked reductions in CXCL9, CXCL10, and CXCL13 protein levels.
IM-derived CCL2, not endothelial CCL2, drives pro-tumorigenic macrophage recruitment: using busulfan conditioning to selectively preserve long-lived, tissue-resident IMs while depleting short-lived circulating myeloid cells, the authors elegantly dissected CCL2 sources. Only when CCL2 was absent from IMs, but present in recMacs and endothelium, did tumor burden remain high. This refines the prevailing model that endothelial or cancer cell CCL2 is the dominant driver.
CCR5-dependent moDCs suppress vaccine-induced antitumor immunity: Ly6C⁺ recMacs differentiate into moDCs that migrate to draining lymph nodes via CCR5, distinguishing them from conventional DCs (which use CCR7). Transient CCR5 blockade with maraviroc specifically reduced antigen-bearing moDC trafficking and enhanced neoantigen vaccine efficacy by 3.1-fold compared to vaccination alone.
Translational potential of transient CCR5 blockade: unlike previous studies using prolonged systemic CCR5 blockade that broadly affects multiple immune cell types, this work shows that short-term maraviroc administration during the priming phase of vaccination selectively disrupts immunosuppressive moDC migration without impairing DC-mediated cross-priming. This offers a clinically feasible strategy to enhance cancer vaccine efficacy.

Why should we care?

This work fundamentally reframes how we think about macrophages in cancer. Rather than relying on oversimplified M1/M2 classifications or surface markers like CD206 that are shared across functionally distinct subsets, the authors show that macrophage function is determined by anatomical location and the specific chemokines they produce. The identification of spatially segregated, opposing macrophage niches, one pro-immunity, one pro-tumor, provides a roadmap for more precise therapeutic targeting. Critically, the finding that transient CCR5 blockade with an already-approved drug (maraviroc) can enhance neoantigen vaccination offers a clinically actionable strategy to tip the balance away from immunosuppressive moDCs and toward protective DC-mediated immunity.

Ageing Promotes Metastasis via Activation of the Integrated Stress Response

Patel, A. A. H. et al. Nature (2026). https://doi.org/10.1038/s41586-026-10216-0

The paper in one sentence

Using aged mouse models of KRAS-driven lung cancer, the authors show that physiological ageing reprograms tumour cells to limit primary tumour growth while promoting metastatic dissemination through epigenetic activation of the integrated stress response (ISR) and its effector ATF4, creating a glutamine-dependent vulnerability that can be therapeutically targeted.

Summary

Lung cancer predominantly affects older individuals, yet most preclinical studies use young mice, creating a disconnect between experimental models and the patients they aim to treat. The authors addressed this by inducing lung tumours in young (2-3 months) and aged (18-19 months) KRAS-driven KP mice, which approximate the median age of human NSCLC diagnosis. Aged mice developed 2.5-fold smaller primary lung tumours yet showed markedly higher incidence of lymph node and distant organ metastases compared to young mice. Primary tumour cultures from aged mice (KP-O) exhibited increased anoikis resistance, enhanced invasive capacity in 3D cultures, and dramatically higher metastatic potential in orthotopic, intravenous, and subcutaneous transplantation models. Mechanistically, aged tumour cells showed sustained activation of the PERK-eIF2α arm of the unfolded protein response, driving elevated and persistent expression of the integrated stress response (ISR) effector ATF4. ATAC-seq revealed increased chromatin accessibility at the Atf4 locus and reduced accessibility at UPR-resolving genes in aged cells, explaining the sustained stress response. ATF4 was both necessary and sufficient for the aged metastatic phenotype: ATF4 depletion abolished the enhanced metastatic capacity of KP-O cells, while ATF4 overexpression in young cells conferred metastatic competence. Metabolically, ATF4 drove a shift from glucose anaplerosis toward glutamine-dependent metabolism, creating sensitivity to glutaminase (GLS) inhibition. CB-839 (telaglenastat) treatment abrogated metastasis from KP-O tumours in vivo without affecting primary tumour growth, revealing a therapeutic window. Clinical analyses confirmed that ATF4 is enriched in tumours from older patients with KRAS-mutant lung adenocarcinoma and correlates with poor survival in advanced-stage disease.

Personal highlights

Ageing uncouples primary tumour growth from metastasis: aged mice developed smaller primary lung tumours but showed dramatically increased metastatic burden across multiple models. This paradox, smaller primary tumours but more aggressive dissemination, mirrors clinical observations in older patients and reveals that ageing actively reprograms tumour behaviour rather than simply accelerating growth.
ATF4 as the causal driver of ageing-induced metastasis: the ISR effector ATF4 emerged as a central node. Genetic depletion of ATF4 abolished the metastatic capacity of aged tumour cells, while ATF4 overexpression alone was sufficient to confer metastatic competence to young cells. This establishes ATF4 as a key molecular mediator of age-associated metastatic progression.
Epigenetic priming sustains ISR activation: aged tumour cells showed increased chromatin accessibility at the Atf4 locus and reduced accessibility at UPR-resolving genes (Hspa5, Wfs1, Gadd34). This explains why aged cells mount a stronger, more persistent ISR response to stress, a mechanistic link between ageing-associated epigenetic changes and functional phenotypes in cancer.
ATF4 rewires metabolism toward glutamine dependence: stable isotope tracing revealed that aged tumour cells shift from glucose anaplerosis toward glutamine-dependent TCA cycle replenishment. This metabolic plasticity creates a selective vulnerability: GLS inhibition with CB-839 blocked metastasis from aged tumours without affecting primary tumour growth or young tumours.

Why should we care?

This work addresses a blind spot in cancer research: the vast majority of preclinical studies use young mice, yet most cancer patients are older. The finding that ageing actively reprograms tumour cells, suppressing primary growth while promoting metastasis, has profound implications for how we model and treat cancer. It suggests that drugs that fail in clinical trials might have succeeded if tested in age-appropriate preclinical models. More concretely, the identification of ATF4-driven glutamine dependence as a vulnerability in aged tumours offers a rational therapeutic strategy: GLS inhibitors, which have shown disappointing results in unselected patient populations, might be effective in older patients with ATF4-high tumours. This study argues that incorporating biological age into experimental design and therapeutic stratification is not just important, it may be essential for translating preclinical discoveries into meaningful clinical benefit for the patients who need it most.

A Genetically Encoded Device for Transcriptome Storage in Mammalian Cells

Chao, Y.-K. et al. Science (2026). DOI: 10.1126/science.adv9353

The paper in one sentence

The authors engineered a synthetic system called TimeVault that uses self-assembling vault particles to capture and stably store cellular mRNA within living cells, enabling retrospective retrieval of past transcriptomic states linked to future phenotypic outcomes.

Summary

Understanding how cells make decisions, whether to divide, differentiate, or become resistant to therapy, requires linking past molecular states to future outcomes. However, current methods are destructive (RNA-seq provides only snapshots) or lose lineage context (RNA export systems decouple transcripts from their cells of origin). To address this, the authors developed TimeVault, a genetically encoded system for time-resolved transcriptome storage in living mammalian cells. TimeVault leverages the vault particle, a large endogenous cytoplasmic ribonucleoprotein complex with a hollow barrel-shaped structure. By fusing the poly(A)-binding protein (PABP) to the vault-interacting (INT) domain, the authors engineered these particles to capture polyadenylated mRNA via PABP’s affinity for poly(A) tails. Inducible expression of both the major vault protein (MVP) and PABP-INT enables temporal control over transcriptome recording. TimeVault-captured RNA showed remarkable stability, with a half-life of 13.5 days in lysates and 132.5 hours in living cells, over sevenfold longer than unprotected cytosolic mRNA. The system was minimally perturbative, with no significant effects on cell viability or global gene expression beyond the two overexpressed components. RNA-seq revealed that captured transcriptomes were highly reproducible and correlated well with cytosolic profiles, with minimal bias except for expected depletion of mitochondrial and long non-coding RNAs. The authors demonstrated TimeVault’s utility by recording transient stress responses (heat shock and hypoxia), successfully retrieving past transcriptional signatures after the stress had subsided. They then applied the system to a clinically relevant problem: understanding pre-existing heterogeneity that enables drug persistence in PC9 lung cancer cells treated with the EGFR inhibitor osimertinib. TimeVault revealed a pre-drug persister state characterized by upregulation of oxidative phosphorylation and specific genes (PI3, FN1, LCN2, AKR1C1, CXCL8). Functional validation through chemical inhibition and RNAi confirmed that these pathways contribute to the persister phenotype.

Personal highlights

Engineering vault particles for RNA capture: the authors repurposed endogenous vault particles, large, hollow ribonucleoprotein complexes of unknown function, as molecular “time capsules.” By fusing PABP to the vault-interacting domain, they achieved selective capture of polyadenylated mRNA within the particle’s protective interior.
Exceptional RNA stability: TimeVault-protected RNA exhibited a half-life of 132.5 hours in living cells, compared to 17.1 hours for unprotected cytosolic RNA. In lysates, captured RNA persisted for nearly two weeks, approaching the theoretical limit of base-catalyzed RNA degradation. This enables recording windows far beyond what metabolic labeling can achieve.
Minimal cellular perturbation: overexpression of the vault components did not affect cell viability or global gene expression beyond the two transgenes themselves. This non-perturbative nature is critical for recording endogenous states without confounding the biology under study.
Recording transient stress responses: TimeVault successfully captured past transcriptional signatures of heat shock and hypoxia, even after the stress was removed and the cells had returned to baseline. The system excluded transcripts generated after the recording window, demonstrating temporal specificity.
Uncovering pre-drug persister states: in PC9 lung cancer cells, TimeVault revealed a pre-existing transcriptional state associated with future osimertinib persistence. Markers included FN1, PI3, LCN2, AKR1C1, and CXCL8, with oxidative phosphorylation enriched over proliferation signatures. Functional validation confirmed that these pathways contribute to the persister phenotype, offering potential therapeutic targets.

Why should we care?

Biology is fundamentally dynamic, but most molecular measurements are static snapshots. TimeVault offers a fundamentally new capability: the ability to store a cell's transcriptomic state in a durable, lineage-retained format and retrieve it later to link past molecular events to future outcomes. This opens the door to retrospective analysis of cellular decision-making in complex, inaccessible systems where continuous observation is impossible. While the current system records only a single time window and requires bulk analysis, future iterations could enable multi-timepoint recording and single-cell resolution. TimeVault represents a powerful step toward making cellular history readable, turning transient transcriptional states into durable, retrievable records of the past.

Tripso: A Self-Supervised Transformer for Gene Program-Centric Single-Cell Analysis

Moullet, M. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.24.713961

The paper in one sentence

The authors introduce Tripso, a self-supervised transformer framework that represents cellular state through multiple interpretable gene program-specific embeddings, enabling principled comparison of cell states across conditions while maintaining biological interpretability.

Summary

Single-cell genomics generates high-dimensional data that captures complex biological states, but most computational approaches compress this information into a single latent representation per cell. This entanglement of multiple biological processes makes it difficult to interpret what drives differences between conditions or cell types. To address this, the authors developed Tripso (Transformers for learning Representations of Interpretable gene Programs in single-cell transcriptOmics), a self-supervised framework that models cellular state through multiple gene program (GP)-specific embeddings. Tripso operates in three hierarchical stages. First, a gene encoder learns contextualized gene embeddings within each cell using masked language modeling. Second, a set of GP-specific transformer blocks, each corresponding to a user-defined biological program such as a signaling pathway or transcription factor regulon, generates embeddings summarizing the activity of that program in each cell. Third, a global cell representation is learned by attending to these GP embeddings and reconstructing gene expression. The framework supports both hypothesis-driven analysis using curated GPs and discovery of novel, data-driven GPs by clustering attention patterns. Tripso embeddings can be visualized, used for optimal transport across conditions, and interrogated to quantify gene-level contributions to each GP and GP-level contributions to overall cell identity. Benchmarking on Perturb-seq data and complex tissues like the human endometrium showed that Tripso outperformed existing methods (Spectra, Expimap) at discriminating pathway activity and detecting genetic perturbations.

Personal highlights

Moving beyond single embeddings: unlike conventional methods that compress each cell into a single vector, Tripso represents cells through multiple GP-specific embeddings. This disentangles concurrent biological processes (e.g., signaling responses, transcription factor activity, cell cycle) that would otherwise be conflated, enabling targeted analysis of specific programs.
Transformer-based architecture for GPs: each GP is modeled with its own transformer block, using a CLS token that attends to the expression of program genes within each cell. This allows the model to capture context-dependent relationships, genes within the same GP can contribute differently depending on cell state, while maintaining interpretability.
Gene- and GP-level importance quantification: Tripso provides two interpretability mechanisms: (i) gene importance scores via cosine similarity between gene embeddings and GP CLS tokens, revealing which genes drive a program in specific contexts; and (ii) GP importance scores via ablation, quantifying how much each GP contributes to the overall cell representation.
Data-driven GP discovery: beyond predefined programs, Tripso’s discovery module identifies novel GPs by clustering CLS-to-gene attention patterns. In skin inflammation, this revealed GP23, an AD-associated program in IL13+ tissue-resident memory T cells with spatial enrichment near sebaceous glands, that showed minimal overlap with existing gene set databases.

Why should we care?

Single-cell atlases are growing rapidly, but extracting biologically meaningful insights often relies on compressing cellular complexity into a single embedding that obscures the interplay of distinct gene programs. Tripso addresses this by reframing cell state as a composition of interpretable, program-level representations. This approach enables researchers to ask more precise questions: How does the activity of a specific transcription factor regulon differ between fetal and adult stem cells? Which gene programs distinguish in vivo from in vitro cultures, and can we target those differences to improve culture conditions?

Epigenetic memory of colitis promotes tumour growth

Nagaraja et al. Nature (2026). https://doi.org/10.1038/s41586-026-10258-4

The paper in one sentence

Chronic colitis leaves a long-lasting epigenetic imprint in colonic stem cells that primes them for faster tumour growth upon oncogenic mutation.

Summary

This study addresses a long-standing question in cancer biology: how chronic inflammation increases cancer risk beyond simply inducing mutations. Using a mouse model of colitis, the authors show that intestinal stem cells retain a durable epigenetic memory of inflammation, even after tissue recovery. By combining single-cell transcriptomics, chromatin accessibility profiling, and lineage tracing (via their newly developed SHARE-TRACE method), they demonstrate that this memory is encoded primarily at the chromatin level rather than in gene expression. A key feature of this memory is a persistent increase in accessibility of AP-1 transcription factor binding sites, which is maintained for over 100 days and inherited across stem cell divisions. Functionally, this epigenetic state primes cells for a stronger regenerative and proliferative response. When oncogenic mutations (e.g. APC loss) are introduced, these “primed” cells give rise to larger tumours, particularly at early stages of tumorigenesis. Importantly, this effect depends on AP-1 activity: pharmacological inhibition reduces tumour growth in colitis-exposed tissue. Mechanistically, the study further identifies cooperation between AP-1 and FOX transcription factors in stabilizing this memory program, and links chromatin accessibility changes to DNA methylation alterations

Personal highlights

Long-lived epigenetic memory in stem cells: colonic stem cells retain inflammation-induced chromatin changes for >100 days after recovery, despite largely normal gene expression profiles.
AP-1 as a central regulator of memory: persistent increases in AP-1 motif accessibility define the memory state and are maintained across cell divisions.
Clonal inheritance of epigenetic states: using SHARE-TRACE, the authors show that memory is cell-intrinsic and propagated through stem cell lineages, with substantial heterogeneity between clones.
Priming for tumorigenesis: colitis-exposed stem cells generate larger tumours upon oncogenic mutation, driven by enhanced early tumour outgrowth rather than increased tumour number.
Mechanistic and therapeutic angle: AP-1 inhibition reduces tumour growth in the colitis context but does not erase the underlying epigenetic memory, highlighting both opportunity and limitation for intervention.

Why should we care?

This work reframes how we think about the link between chronic inflammation and cancer. Rather than acting only through DNA damage, inflammation can leave a lasting “memory” in tissue stem cells, effectively priming them for future disease

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

What a Failed Collaboration Taught Me

Sebastiaan Vanuytven — Wed, 25 Mar 2026 21:01:47 GMT

Disclaimer: This post describes a personal experience. All identifying details, including roles, project structure, and specific circumstances, have been altered or omitted to protect the privacy of everyone involved. Any resemblance to a specific individual is unintentional. The events described reflect my own perspective and recollection. This post is not intended to make any factual claims about any identifiable person, and nothing here should be read as a formal accusation or legal statement. My purpose in sharing this is to help others recognise difficult dynamics, not to cause harm to anyone.

Last month, I wrote about the mentors who shaped me and received feedback that the majority of it focused on the positive aspects of being a scientist. So, this week, I’d like to discuss a defining moment that shaped me differently, a collaboration that not only failed, but actively harmed me.

Academics rarely discuss this openly. Junior scientists frequently fail to recognise toxic dynamics until they are deeply involved, or worse, blame themselves. I hope my story and the few insights I gained will help someone else spot red flags earlier.

Setting the scene: this was a collaboration in which my role was intended to be advisory, more providing guidance rather than doing the hands-on work myself. Due to circumstances, I became more involved and had to contribute directly myself. I saw this as an opportunity to prove to myself that I could be like the mentors I’d had in my career. On top of that, I liked the person I was working with; they seemed motivated, willing to go the extra mile, and as enthusiastic about science as me.

The Red Flags (In Hindsight)

Looking back, I can see the warning signs that I either overlooked or chose to ignore. If any of these sound familiar, please take note.

Dismissiveness of expertise. They would make decisions about experimental design without consulting me, even when it directly involved analysis I would need to do. When I raised concerns, they’d dismiss them quickly or thank me and proceed with their original plan anyway. I thought I wasn’t explaining myself clearly enough. What I now see: they didn’t view my field as requiring equal expertise. I was technical support, not a peer scientist whose input should shape the project.
One-way collaboration. I helped with countless analyses, troubleshooting sessions, and manuscript edits, often dropping my own work when they needed something urgently. In return: delayed responses to my questions, minimal engagement with my work, never once asking “how can I help you?” I told myself our skill sets were different and that my help was an investment that would pay off. It didn’t. The relationship was extractive: they took freely and offered nothing back.
Treated as subordinate, not collaborator. Emails arrived as instructions, not discussions, requests that assumed my availability, assumed my agreement, and assumed the scientific approach had already been decided. The phrasing was polite enough on the surface, but the structure was always the same: here is what needs to happen, here is when. They made decisions about how my analysis would be presented without asking whether the approach was scientifically sound. Meetings were scheduled assuming my availability; deadlines were set without checking if I agreed. I told myself this was just efficient communication. What I now recognise: this was someone treating me as if I worked for them, not with them. A peer would have opened a discussion about the scientific approach and asked for my input. Instead, they issued directives and expected compliance.
Moving goalposts. Every completed analysis led to “that’s great, but can you also...” New requirements emerged endlessly. I thought this was normal iterative science. It wasn’t. The goalposts moved because there was no agreement about my role, and no intention to define one.
Never a peer. Despite months of consistent work, I never felt like my value was established. Each new request felt like starting from zero. They would discuss my analyses, decide how to interpret them, plan next steps, all without me in the room. I’d be brought in afterwards to execute what was already decided. Even on shared work, decisions about whether my contributions were valid were made by others without my involvement, as if my own analyses required external validation before I could claim them. I told myself I was being insecure, that they were just brainstorming efficiently. What I now see: I was never promoted to “peer” in their mind. They owned the ideas; I provided the labour.
Physical boundary violation. At some point, during a meeting, they made physical contact in a way that felt intrusive and presumptuous. I rationalised it as an awkward moment, something I shouldn’t make a big deal about. Still, I immediately spoke to my partner afterwards to process what had happened. Now I can admit what it was: an expression of the same entitlement I experienced throughout the collaboration. They felt that they could decide when I was done, not me. It was the clearest signal of how they viewed our dynamic.

The Moment I Knew It Wasn’t Fixable

For months, I clung to hope. Maybe if I explained it clearly enough? Maybe if I provided specific examples? Maybe if I named the pattern? Maybe they just didn’t realise. I assumed this was a misunderstanding that could be resolved through better communication. So I sent a message outlining what had happened and what I needed going forward. Their response dashed any hope.

They reframed my concerns as a communication problem and suggested we simply had different perspectives on what had happened. They expressed a wish to move past the negativity between us and focus on the future. They expressed sadness about how things had turned out.

Please read that again: my legitimate concerns became a matter of tone and timing rather than substance. Objective events, such as making decisions about my schedule without asking, making decisions about my contributions without consulting me, and excluding me from discussions, were recast as a matter of perspective. And their sadness took centre stage, making my boundary-setting about their hurt feelings.

In that moment, I realised that they are incapable of acknowledging that their actions had an impact on me. They will always dismiss my concerns as misunderstanding or exaggeration. They simply cannot accept responsibility. That was when something changed. I stopped asking, “How do I make them understand?” and instead asked, “How do I protect myself?”

The subsequent face-to-face conversation simply confirmed it. When we sat across from each other, they were wondering if I would still help them. They explained why things went the way they did. They expressed a hope that things would settle back to how they had been, which meant no actual change. They were upset that I hadn’t told them sooner, as if the issue was my timing rather than their behaviour. I watched in real time as they demonstrated what their message had already implied: they cannot and will not accept responsibility for impact. That’s when I knew. This wasn’t a communication issue. This wasn’t a tough time. This was a fundamental mismatch. They wanted to be perceived as generous and well-meaning without altering their behaviour. I required recognition of impact and actual change. These two things can’t coexist.

If you keep thinking “one more conversation will fix this” while waiting for someone to finally understand, pay attention to how they react when you name the problem. Do they acknowledge the impact or explain their intentions? Do they ask for what you need or justify their actions? The answer indicates whether or not a repair is possible.

The Cost

I’d like to be honest about how this affected me. I did not sleep well. I could not eat. I questioned my own scientific competence and whether I was a good person. I wondered if I was being “too sensitive” or “difficult.” I brought tension into my other work and relationships. There were days when I dreaded opening my inbox and going to work. The cognitive load of controlling another person’s emotions while suppressing my own was exhausting.
Recognising a toxic relationship is one thing. Another challenge is getting yourself out of it, especially after investing time, effort, and hope. It took longer than I wanted to admit.
What made it harder was that I had grown to like this person. Their competence. The rare instances in which a genuine “thank you” appeared in their messages. I succumbed to intermittent reinforcement, just enough warmth to keep me hopeful things would change. I began to believe that I was the problem.

What helped me gain perspective was having many conversations with my partner, allowing myself to feel sad and all of the other emotions I’d been suppressing, reconnecting with mentors who reminded me of my worth, and focusing on projects that actually energised me.

I wish I could say the hurt is over. It isn’t; it’s still a part of me. But it’s gradually becoming a badge of honour, a lesson I had to learn.

What I Learned

Here’s what I wish someone had told me, and what I hope will help others who feel stuck in a similar situation:

Trust your gut. If something feels wrong, it probably is. You’re not being overdramatic or too sensitive.
Document everything. Create a paper trail and keep a record of your interactions, especially if you sense a pattern. You may need it later, or it can be a reminder when you start to believe you’re imagining things.
Talk to someone, preferably multiple people. A mentor, a friend, your partner, a therapist. Toxic dynamics are easier to spot when you’re not the one inside them.
You are not responsible for someone else’s behaviour. This took me a long time to realise. You cannot fix a collaboration by being more accommodating, more patient, or more forgiving.
Setting boundaries is not aggression. Asking for respect as a scientist or peer is not “causing drama.” It’s the minimum requirement for a healthy professional relationship.
Walking away is not failure, even if the voice in your head screams it at night. Some collaborations cannot be saved. Recognising that is not you failing; it’s you gaining wisdom.
Recovery takes time. Don’t expect to bounce back the next day. Be patient with yourself and focus on what gives you energy.
It’s better to be disliked for who you are than liked for who you’re not. Standing up for yourself may make you “the difficult one,” and you may suffer more than the people you call out. But you’ll be able to look in the mirror without shame.
Don’t trade your wellbeing for your ego. No reward, not even a high-profile authorship, is worth staying in a collaboration that drains and harms you.
It’s never too late. Even if you only realise what’s happening after years, you have the right to speak up. You don’t owe anyone an explanation for why it took you this long to see it.

The Scientist I’m Becoming

This experience did not simply teach me what to avoid. It showed me what to build

I’m becoming a collaborator who states boundaries from the start. In new collaborations, I now have explicit discussions about roles, expectations, and how we will work together. I pose the following questions: “How do we want to make decisions about the direction of this work?” , “What does successful collaboration look like to you?” , “How will we handle disagreements?” These conversations may feel awkward at first, but they’re far less awkward than attempting to retrofit boundaries into a broken dynamic.
I’m becoming a mentor who checks in on the relationship, not just the science. When I work with students or junior colleagues now, I don’t simply ask, “How’s the project going?” I’m curious: “how’s our collaboration working for you?” “Is there anything I could be doing differently?” I give them space to tell me when something isn’t working before it becomes unbearable.
I’m becoming someone who models what healthy collaboration looks like. I ensure that my contributions to others’ work are visible, as are their contributions to mine. I engage people in discussions about their work from the start. I seek permission before making decisions that will affect someone else’s time or analysis. Not because I’m being virtuous, but because I understand how it feels when these things are missing.
I’m becoming comfortable disappointing people who don’t respect my boundaries. The old me would have agreed to every request, afraid that saying no would make me “difficult” or “not a team player.” The new me understands that people who respect you will not disappear if you set a limit. People who do not respect you will see your boundaries as a problem. That is information, not a character flaw.
I’m becoming an advocate for others who are struggling. When I notice junior colleagues being treated as technical support rather than scientific contributors, I speak out. When I see someone’s expertise dismissed or their work claimed by others, I call it out. not aggressively, but clearly. Because I understand what it’s like to feel invisible, and how much it means when someone notices you
Most importantly, I’m becoming someone who knows the difference between being helpful and being used. True collaboration makes both parties better off. True mentorship involves mutual development. Collaboration does not occur when you are the only one giving, when you are exhausted rather than energised, or when you feel as if you are begging for basic respect. Recognising this is not you being difficult. You’re being wise.

This experience cost me several months of peace and confidence. But it helped me understand the type of scientist I want to be and the culture I want to contribute to. I can’t take back the time I spent in that bad dynamic. But I can ensure that I apply what I’ve learned to make something better.

Building Better

Last month, I wrote that science is a collaborative effort. That is still true. However, you have the right to choose your team and to leave if it no longer treats you as a teammate.

And here’s what I’ve discovered: there are good teams out there. Collaborations in which your skills are valued. Mentors who promote your work. Colleagues who involve you in decisions regarding your own contributions. Environments in which setting boundaries does not make you the villain.

I’ve found a few of those people. The colleague who enquires, “Does this timing work for you?” rather than assuming my availability. The collaborator who ensures that my analysis is properly credited in all presentations. The mentor who cares about how I’m doing rather than what I’m producing. These relationships exist. This experience taught me to recognise them when I see them and to protect them once I find them.

If you are reading this from within a toxic dynamic, know that you are not imagining things. You are not too sensitive. You are not the problem for wanting to be treated with dignity. And there is life after this collaboration concludes.

You’re not a quitter. You’re saving your energy, motivation, knowledge, and care for people who will value what you bring.

And thank you to my fellow scientists who are working to improve collaborations. Every time you include someone from the start, correctly credit someone’s work, and ask instead of assuming, you are creating the culture we all deserve. Keep building. It matters more than you realise.

Cheers,

Seb

Weekly reads 16/3/26

Sebastiaan Vanuytven — Sun, 22 Mar 2026 18:58:18 GMT

This week’s reads push the boundaries of what we can measure, model, and ultimately intervene in—from multi-layered views of gene regulation to early cancer interception and unexpected forms of cellular cooperation. A new trimodal single-cell method, scHiCAR, simultaneously measures transcriptome, chromatin accessibility, and 3D genome interactions at unprecedented scale, providing a “ground truth” of gene regulation without the need for computational reconstruction. On the therapeutic side, KRAS inhibitors are demonstrating unprecedented promise not just as a treatment for pancreatic cancer, but as a means of intercepting it in the first place, dramatically delaying tumour onset. Other papers are redefining our understanding of the tumour microenvironment: cancer stem cells exploit damaged mitochondria from neutrophils as a fuel source for metastasis via a “pseudohypoxic” state, while bacteria have immune effects that are context-dependent on whether they are inside or outside of tumour cells. New technologies are also expanding our analytical capabilities: scAPEX-seq adds subcellular context to single-cell transcriptomics, scCChain rethinks cell-cell interactions as “chains,” while GPS employs deep learning to find drugs by “reversing” disease gene signatures.

Preprints/articles that I managed to read this week

Trimodal single-cell profiling of transcriptome, epigenome and 3D genome in complex tissues with scHiCAR

Wei, X. et al. Nature Biotechnology (2026). https://doi.org/10.1038/s41587-026-03013-7

The paper in one sentence

scHiCAR is a plate-based combinatorial barcoding method that simultaneously profiles mRNA, open chromatin, and 3D chromatin conformation from the same single cells, enabling integrated analysis of gene-regulatory landscapes in complex tissues at unprecedented scale and resolution.

Summary

Understanding how genes are regulated requires integrating three layers of information: what genes are expressed (transcriptome), which regulatory elements are accessible (epigenome), and how those elements physically interact in 3D space (3D genome). Existing methods capture at most two of these modalities from the same cell, forcing computational integration across different cells and assays—a process prone to artifacts and missing rare cell populations. Wei and colleagues develop scHiCAR (single-cell Hi-C with assay for transposase-accessible chromatin and RNA sequencing), a plate-based combinatorial barcoding protocol that captures all three modalities from the same nucleus. The workflow uses four rounds of barcoding (yielding up to 84.9 million unique cell barcodes) and a clever molecular design: after tagmentation and reverse transcription, nuclei are physically separated into RNA-containing supernatant and DNA-containing pellet. The DNA undergoes in situ digestion and ligation to specifically amplify fragments where accessible chromatin (marked by Tn5) is ligated to its interacting genomic regionsenriching for long-range cis interactions anchored at candidate regulatory elements (cCREs). In cell line mixtures (human H1 hESCs, GM12878, mouse ESCs), scHiCAR achieves 100% transcriptome-based cell type classification and 97–99% accuracy using open chromatin or 3D contacts alone. Critically, 48% of scHiCAR chromatin contacts are long-range (>20 kb) cis interactions and closely match bulk HiCAR and in situ Hi-C profiles. Applied to 1.62 million mouse frontal cortex cells (four batches, two replicates), scHiCAR resolves 22 major brain cell types. Open chromatin peaks (554,946 total, 260,781 unique cCREs) show 70% overlap with BICCN snATAC-seq data and strong concordance with matched bulk HiCAR from sorted astrocytes and Pvalb neurons. Chromatin loops identified by a new deep-learning caller, scDeepLUCIA, are robust even when downsampled to 5,000 cells, enabling analysis of rare populations (0.32% of cells). Integrating all three modalities yields 20,270 high-confidence enhancer-gene pairs with stronger correlations than predictions from paired RNA-ATAC data alone. In skeletal muscle regeneration (41,578 nuclei from uninjured, day 5, day 7 post-injury), scHiCAR tracks myogenic differentiation with single-cell resolution, revealing gene-specific dynamics: at some loci (Myh3), 3D genome remodeling follows transcription; at others (Ncam1, Myog), it precedes or coincides.

Personal highlights

Trimodal profiling from the same cell, at scale: scHiCAR’s combinatorial barcoding design yields up to 84.9 million unique cell barcodes, enabling cost-effective (~$0.04 per cell) analysis of millions of cells in a single experiment. This is the first method to experimentally capture mRNA, open chromatin, and 3D genome from the same nucleus at this scale, providing ground-truth integration without computational alignment across cells or assays.
Enrichment of cCRE-anchored long-range interactions: unlike unbiased sc3C methods (Dip-C, LiMCA) where only 6–10% of contacts are long-range cis interactions, scHiCAR’s molecular design, ligating accessible chromatin fragments to their interacting partners, enriches these functionally relevant contacts to 48%. This enables high-resolution (5 kb) loop detection in rare cell types without prohibitive sequencing depth.
scDeepLUCIA: deep learning for robust loop calling in sparse data: existing loop callers (HiCCUPS, Peakachu) degrade sharply with low cell numbers. scDeepLUCIA, trained on bulk HiCAR loops and incorporating chromatin accessibility, maintains >75% loop recovery even when downsampled from 120,000 to 5,000 cells. This makes it possible to study 3D genome organization in rare populations comprising <0.5% of cells.
1.62 million-cell brain atlas with integrated modalities: the largest single-cell 3D genome dataset to date yields 22 major cell types, 260,781 unique cCREs, and 294,385 high-confidence loops. Integrating all three modalities identifies 20,270 enhancer-gene pairs with significantly stronger correlations than predictions from RNA+ATAC alone, demonstrating that chromatin contacts improve enhancer assignment.

Cancer interception with KRAS inhibitors in preclinical models of pancreatic ductal adenocarcinoma

Than, M. T. et al. Science (2026). https://doi.org/10.1126/science.aec7929

The paper in one sentence

Treating premalignant pancreatic lesions in mice with KRAS inhibitors dramatically delays tumour onset and triples survival, demonstrating that pharmacological interception of pancreatic cancer, targeting the disease before it becomes invasive, is feasible and more effective than treating established tumours.

Summary

Pancreatic ductal adenocarcinoma (PDAC) is almost invariably driven by KRAS mutations and arises from microscopic precursor lesions called pancreatic intraepithelial neoplasia (PanINs), which are present in healthy adults but rarely progress. Than and colleagues ask whether intercepting these lesions with KRAS inhibitors, before they become invasive, could prevent or delay tumour formation. Using the autochthonous KPC mouse model (Krasᴳ¹²ᴰ; Tp53ᴿ¹⁷⁷ᴴ), which recapitulates human PDAC progression from PanIN to invasive cancer, they treated tumour-free young mice (7–9 weeks old) with either a RAS(ON) multiselective inhibitor (RMC-7977, active against multiple RAS isoforms) or a KRASᴳ¹²ᴰ-selective inhibitor (RMC-9945). After just 10 days of treatment, both inhibitors significantly reduced PanIN burden. Mechanistically, this reflected apoptosis of neoplastic cells (increased cleaved caspase 3) rather than redifferentiation to acinar cells, and was accompanied by modest reductions in F4/80⁺ macrophages in the perilesional stroma but no changes in fibroblasts or T cells. A 28-day interception course delayed tumour onset by 30–80 days and improved overall survival. But the most striking result came from a metronomic regimen: 7-week-old mice received RMC-7977 on a 1-week-on, 1-week-off schedule continuously. This extended median tumour-free survival from 105 days to 329 days (p<0.0001) and tripled median overall survival (138 days to 376 days, p<0.0001). The pancreatic architecture remained largely normal, with sustained suppression of preneoplasia. Importantly, tumours that eventually emerged despite interception were no more aggressive than those in control mice, grew at similar rates, and remained sensitive to KRAS inhibitor treatment when resumed. Whole-genome sequencing of cell lines derived from these “escaped” tumours revealed no consistent genetic resistance mechanisms (e.g., no enrichment for MAPK pathway amplifications), suggesting that escape may be non-genetic rather than driven by selection for resistant clones. Comparing interception with treatment initiated at diagnosis (using the same metronomic schedule) showed that early intervention provided a significantly greater survival benefit, consistent with the idea that less evolved, less genomically unstable lesions are harder to adapt.

Personal highlights

Pharmacological interception of pancreatic cancer is feasible: in a highly aggressive, autochthonous mouse model where every pancreatic epithelial cell harbours driver mutations, short-term KRAS inhibitor treatment significantly reduced PanIN burden. A metronomic 1-week-on/1-week-off regimen tripled median survival, from 138 days to 376 days, demonstrating that even intermittent suppression of KRAS signalling in premalignant lesions can profoundly delay tumourigenesis.
PanIN regression occurs via apoptosis, not redifferentiation: after 3 days of treatment, residual lesions showed increased cleaved caspase 3 staining, indicating that KRAS inhibitor-induced cell death, rather than reversion to acinar fate (amylase expression remained unchanged), drives the reduction in preneoplastic burden. This clarifies the immediate mechanism of interception.
Escaped tumours are not more aggressive and remain drug-sensitive: tumours that eventually arose despite interception were similar in size, growth rate, and histology to control tumours. When treatment was resumed at diagnosis, they responded similarly to tumours from non-intercepted mice, and whole-genome sequencing revealed no consistent genetic resistance mechanisms. This suggests that interception does not select for more lethal or resistant clones
3D CODA reconstruction reveals whole-organ effects: using CODA (a technique for cubic-centimetre-scale 3D reconstruction from serial sections), the authors quantified PanIN burden across entire pancreata, showing near-complete suppression of preneoplasia in intercepted mice and confirming that pancreatic microstructure (acini, islets, vasculature) remained intact after long-term metronomic dosing.
Interception outperforms treatment of established disease: direct comparison of metronomic RMC-7977 initiated at the premalignant stage versus at diagnosis showed that early intervention conferred a greater survival benefit, supporting the principle that targeting less complex, pre-malignant lesions, before genomic instability and molecular plasticity accrue, is more effective than treating fully formed cancers.

Why should we care?

Pancreatic cancer is one of the deadliest cancers, with a 5-year survival below 13%. By the time most patients are diagnosed, the disease is advanced and largely untreatable. But we know that nearly all PDACs arise from microscopic precancerous lesions (PanINs) that harbour KRAS mutations, and these lesions are common in healthy adults. The challenge has been: can we do anything about them before they turn into cancer? This study provides a proof of principle that yes, we can. Using KRAS inhibitors already in clinical development, intermittent treatment of mice with PanINs delayed tumour onset by months and tripled survival. The intercepted tumours that did eventually form were no more aggressive and remained sensitive to the drugs, suggesting that this approach doesn’t just select for resistant cells but genuinely resets the clock on cancer development.

THY1+ cancer stem cells drive metastasis through a pseudohypoxic state shaped by neutrophil-derived mitochondria

Wan, W.-H. et al. Nature Cell Biology (2026). https://doi.org/10.1038/s41556-026-01876-1

The paper in one sentence

A rare subset of cancer stem cells marked by THY1 drives metastasis across multiple cancer types by acquiring neutrophil-derived damaged mitochondria, which induce a hypoxia-independent "pseudohypoxic" state that fuels their metastatic capacity.

Summary

Cancer stem cells (CSCs) are thought to drive metastasis, but whether a specific subpopulation is responsible, and how it acquires metastatic traits, has remained unclear. Wan, Li, and colleagues identify THY1⁺ CSCs as a conserved metastasis-initiating population across hepatocellular carcinoma, melanoma, breast, and colon cancer. Integrating single-cell and spatial transcriptomics from 94 HCC patients, they resolved CSCs into nine subpopulations. Cluster 3, defined by THY1 expression, accounted for ~50% of CSCs in metastatic tumours but <2% in non-metastatic tumours, and was enriched in portal vein tumour thrombi and circulating tumour cells. THY1⁺ CSCs were functionally validated: despite forming fewer spheres than EpCAM⁺ CSCs, they produced markedly more lung metastases in vivo. THY1 itself is a functional driver, knockout or antibody blockade suppressed metastasis, while EpCAM overexpression had no effect. THY1⁺ CSCs are maintained by an IL-6–MYC axis. MYC binds the THY1 promoter, and STAT3 inhibition blocks THY1⁺ CSC regeneration. In vivo, Il6ra knockdown in EpCAM⁺ cells reduced THY1⁺ CSC generation and lung metastasis without affecting primary tumour growth. Critically, THY1⁺ CSCs adopt a “pseudohypoxic” state: HIF1α is stabilized even in well-vascularized, oxygen-rich tumour regions, independent of true hypoxia. THY1 overexpression induced HIF1α in vivo but not in vitro, implicating microenvironmental cues. Spatial transcriptomics linked this pseudohypoxia to neutrophil-rich invasive margins. Mechanistically, THY1 on CSCs engages Mac1 on neutrophils, activating Src–Akt/Erk signalling and Rac1-dependent migrasome formation. Neutrophils then extrude damaged, ROS-enriched mitochondria via migrasomes. THY1⁺ CSCs internalize these mitochondria through macropinocytosis, and the transferred ROS stabilizes HIF1α by inhibiting its hydroxylation. Inhibition of macropinocytosis, mitochondrial ROS, or THY1–Mac1 interactions abrogated pseudohypoxia and metastasis. In vivo mitochondrial transfer was confirmed using strain-specific mtDNA polymorphisms and single-cell MERCI analysis, which identified 11/49 HCC patients whose tumour cells contained neutrophil-derived mitochondrial signatures—associated with higher THY1⁺ CSC proportions and hypoxia/EMT gene programmes.

Personal highlights

THY1 marks a rare, conserved metastasis-initiating CSC subset: across pan-cancer single-cell analysis, THY1⁺ CSCs consistently showed higher metastatic scores than THY1⁻ CSCs. In paired primary and metastatic lesions, only THY1⁺ CSCs were enriched in metastases, not EpCAM⁺ CSCs. Functionally, THY1⁺ cells formed fewer spheres but generated more lung metastases, and gradually lost metastatic potential as they differentiated in culture, confirming that metastatic competence is transient and context-dependent.
Pseudohypoxia—HIF1α stabilization without oxygen deprivation: THY1⁺ CSCs stabilized HIF1α even in well-vascularized tumour regions, independent of true hypoxia. This pseudohypoxic state was essential for metastasis: Hif1α or Arnt knockout abrogated THY1-driven metastasis. The term captures a key conceptual shift: hypoxia-like transcriptional programmes can be activated by non-canonical cues, in this case mitochondrial ROS.
Neutrophils extrude damaged mitochondria via migrasomes upon THY1 engagement: THY1–Mac1 interaction triggers Src–Akt/Erk signalling in neutrophils, activating Rac1-dependent migrasome formation. Crucially, neutrophils selectively expel ROS-enriched, dysfunctional mitochondria, not healthy ones.
Mitochondrial transfer requires macropinocytosis, driven by THY1 signalling in CSCs: THY1 overexpression in CSCs activated Src, Akt, and Rac1, enhancing macropinocytosis. Uptake of neutrophil-derived migrasomes was blocked by the macropinocytosis inhibitor EIPA, but not by dynasore. In vivo, macropinocytosis-deficient tumours (Carmil1-AA mutants) failed to acquire neutrophil mitochondria and showed no pseudohypoxia or metastasis

Why should we care?

The discovery that THY1⁺ CSCs are a conserved metastasis-initiating population across multiple cancer types provides a long-sought cellular target. But the real insight is how these cells become metastatic: not by accumulating mutations or responding to hypoxia, but by stealing damaged mitochondria from neutrophils. The neutrophils, in turn, are co-opted through THY1–Mac1 signalling to package these mitochondria into migrasomes and deliver them to CSCs, which internalize them via macropinocytosis. The transferred ROS then stabilizes HIF1α, creating a “pseudohypoxic” state that drives metastasis even in oxygen-rich environments. This reframes our understanding of tumour–immune crosstalk. Neutrophils are usually thought of as either tumour-killing or immunosuppressive. Here, they serve as organelle donors, supplying the very fuel, damaged mitochondria, that CSCs need to metastasize. It also reveals a new form of metabolic co-option: CSCs don’t just compete for nutrients; they literally incorporate bits of immune cells.

Subcellular transcriptome sequencing with single cell APEX-seq identifies regulators of cell-cell interactions

Xue, A. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.17.712496

The paper in one sentence

Single-cell APEX-seq (scAPEX-seq) profiles RNAs at the endoplasmic reticulum membrane in thousands of individual cells, revealing that cathepsin W (CTSW) promotes long-term CAR T cell persistence and function—a regulator missed by conventional scRNA-seq.

Summary

Single-cell RNA sequencing captures total transcript abundance but loses information about where RNAs are located within the cell, a critical dimension because RNA localization determines when and how transcripts are spliced, translated, and degraded. mRNAs encoding cell surface and secreted proteins, which mediate cell-cell interactions (CCIs), are specifically enriched at the endoplasmic reticulum (ER) membrane, where they are translated. Xue and colleagues develop scAPEX-seq, a proximity labeling method that maps subcellular transcriptomes at single-cell resolution. They first improve the original APEX-seq protocol with a more cell-permeable probe (phenol-azide, PA) and copper-free click chemistry, increasing RNA recovery 10-fold and enabling analysis from as few as 200,000 cells. By integrating this with the 10x Genomics droplet platform, performing reverse transcription before biotin enrichment, then separately sequencing the labeled (ER-associated) and unlabeled (supernatant) fractions, they obtain matched subcellular and whole-transcriptome profiles from the same single cells. In tumor–macrophage cocultures, scAPEX-seq resolved cell states invisible to conventional scRNA-seq, distinguished cocultured from non-cocultured cells, and detected many more ligand–receptor interactions (including Ccl2–Ccr2, TGFβ signaling, and Adam10/17 axes). Critically, it captured changes in RNA localization at the ER membrane, not just abundance, upon coculture. In HER2+ breast cancer cells cocultured with anti-HER2 CAR T cells, scAPEX-seq revealed four distinct CAR T cell states after just 2 hours of interaction, whereas conventional sequencing showed only two. These included a secondary memory-like population with unique interferon signaling and survival signatures. Long-term cocultures (21 days, 7 stimulation cycles) identified a rare CAR T cell subpopulation with reduced exhaustion markers and sustained effector function—again invisible to whole-transcriptome profiling. From this persistent population, the authors identified CTSW (Cathepsin W) as a top candidate. CTSW overexpression in CAR T cells promoted a stem-like memory phenotype (CD62L⁺CD45RA⁺), enhanced proliferation over multiple stimulation rounds, and improved tumor cell killing. Knockout of CTSW had the opposite effect. Mechanistically, CTSW appeared to modulate CD25 expression, consistent with its known protease activity. In patient cohorts, CTSW expression correlated with improved immunotherapy outcomes and survival in melanoma and non-small cell lung cancer.

Personal highlights

scAPEX-seq enables subcellular transcriptomics at single-cell resolution: by replacing biotin-phenol with a more cell-permeable phenol-azide probe and performing reverse transcription before biotin enrichment, the authors increase RNA recovery 10-fold, enabling compartment-specific profiling from thousands of single cells. The workflow yields matched ER-associated (”secretory”) and whole-transcriptome (”supernatant”) datasets from the same cells, providing two complementary views.
ER-localized transcriptomes reveal hidden cell-cell interaction states: in tumor–macrophage cocultures, scAPEX-seq resolved cell states that conventional scRNA-seq could not distinguish, and detected ~10× more differentially expressed genes between conditions. Crucially, it captured changes in RNA localization at the ER membrane, not just abundance, for dozens of immune-relevant transcripts, adding a new dimension to CCI analysis.
Early divergence in CAR T cell states predicts long-term fate: within 2 hours of tumor contact, scAPEX-seq resolved four CAR T cell subpopulations, including a secondary memory-like cluster with unique interferon and survival signatures. RNA velocity analysis suggested that cells in this state could differentiate toward a persistent effector population seen after 21 days of repeated stimulation—a trajectory invisible in whole-transcriptome data.
CTSW identified as a driver of CAR T cell persistence and function: from a rare scAPEX-seq-specific cluster of long-lived CAR T cells with reduced exhaustion markers, the authors identified CTSW (Cathepsin W) as a top candidate. Overexpression of CTSW promoted a stem-like memory phenotype, enhanced proliferation over multiple stimulation cycles, and improved tumor cell killing. Knockout had opposite effects, validating CTSW as a functional regulator.
Subcellular RNA localization dynamics as a predictive signal: by using scAPEX-seq counts as a proxy for “secretory commitment” and supernatant counts as baseline, the authors defined “secretory velocity”, a new metric tracking changes in ER localization. This revealed distinct trajectories from those of RNA velocity, suggesting that relocalization of transcripts to the ER represents an independent axis of cellular differentiation relevant for engineering durable T cell responses.

Why should we care?

Single-cell RNA sequencing has given us unprecedented views of cellular heterogeneity, but it treats the cell as a bag of RNA, ignoring that where a transcript sits determines what it does. mRNAs at the ER membrane are being translated into proteins that will be secreted or displayed on the cell surface, precisely the molecules that mediate cell–cell interactions. By capturing this subcellular information at scale, scAPEX-seq adds a new dimension to single-cell analysis.

Divergent tumor immunity determined by bacteria-cancer cell engagement

Yao, B. et al. Cell (2026). https://doi.org/10.1016/j.cell.2025.12.044

The paper in one sentence

The same strain of bacteria, depending on whether it resides inside or outside cancer cells, triggers opposite immune responses, intracellular bacteria drive immunosuppressive neutrophils and promote metastasis via the cGAS-STING-IL-17B axis, while extracellular bacteria stimulate anti-tumor immunity.

Summary

Intratumoral bacteria are increasingly recognized as players in cancer, but how they interact with host cells to shape immunity has remained unclear. Yao and colleagues dissect this question in a PyMT breast cancer model, focusing on Staphylococcus xylosus, a dominant member of the tumor microbiome. They first show that intracellular bacteria are critical for metastatic relapse. In mice harbouring bacteria-invaded tumour organoids, doxycycline elimination of bacteria reduced lung metastasis recurrence from 65% to 6.7%, an effect lost in immunocompromised NPSG mice, implicating the immune system. Intracellular bacteria recruit neutrophils to the lung within days of tumour cell arrival. Using germ-free mice and antibiotic treatment, the authors demonstrate that this recruitment depends on the tumour microbiota, not the gut microbiome. Single-cell RNA-seq revealed that intracellular bacteria induce a distinct neutrophil state, enriched for immunosuppressive genes (Wfdc17, S100a9, Arg2) and myeloid-derived suppressor cell (MDSC) signatures. These neutrophils suppressed CD8+ T cell function in co-culture assays. Strikingly, extracellular bacteria (injected intravenously or intratracheally) induced a completely opposite neutrophil phenotype: antigen-presenting, immunostimulatory, and associated with tumour control. The same bacterial strain, depending on its location, thus programmes neutrophils toward either pro- or anti-tumour fates. The key molecular mediator is IL-17B. Intracellular bacteria, but not extracellular ones, upregulate Il17b in cancer cells via the cGAS-STING pathway. Bacterial invasion triggers cytosolic dsDNA accumulation, activating cGAS-STING, which in turn drives Il17b expression through NF-κB and STAT3. IL-17B then acts on neutrophils to induce their suppressive state. Overexpression of Il17b in tumour cells phenocopied intracellular bacteria, promoting metastasis; knockout of Il17b or STING abrogated it. Neutrophil depletion similarly abolished IL-17B-driven metastasis. In a post-surgical recurrence model, eliminating intracellular bacteria with doxycycline, or depleting neutrophils with anti-Ly6G, dramatically reduced lung metastases. Conversely, intranasal administration of bacterial components (LPS, PGN), mimicking extracellular bacteria, reduced recurrence and improved survival. Finally, the mechanism translates to humans. Staphylococcus epidermidis isolated from human breast tumours invaded MDA-MB-231 cells, upregulated IL-17 pathway genes and STING, and induced MDSC-like gene expression in healthy donor neutrophils. In patient samples, bacterial load correlated with neutrophil infiltration. In the METABRIC cohort, a bacteria invasion signature, STING pathway, IL-17 pathway, and neutrophil response signature all correlated with worse prognosis.

Personal highlights

Opposing immune outcomes from the same bacterium: the most striking finding is that the same strain of S. xylosus triggers fundamentally different immune responses depending on its location. Intracellular bacteria induce immunosuppressive neutrophils and promote metastasis; extracellular bacteria induce antigen-presenting neutrophils and anti-tumour immunity. This resolves a long-standing paradox about why bacteria can be both pro- and anti-tumorigenic, it’s not the species, but the mode of engagement.
Intracellular bacteria activate cGAS-STING-IL-17B to reprogram neutrophils: bacterial invasion into the cytosol causes accumulation of dsDNA, activating cGAS-STING. This leads to upregulation of IL-17B specifically (not other IL-17 family members), which acts on neutrophils to induce an MDSC-like, T cell-suppressive state. The pathway is causal: Il17b overexpression phenocopies intracellular bacteria; STING or Il17b knockout abrogates it.
Extracellular bacteria drive a distinct, protective neutrophil state: intravenous or intratracheal administration of live bacteria (without tumour cells) induced neutrophils with high antigen presentation scores and anti-tumour signatures. This suggests that the immune system’s default response to bacteria is protective, and that invasion into cancer cells subverts this response for tumour benefit.
Eliminating intracellular bacteria prevents post-surgical metastatic recurrence: in a clinically relevant model where primary tumours are resected and mice monitored for relapse, doxycycline treatment (eliminating intracellular bacteria) reduced lung metastases from 80% to 20%. Neutrophil depletion had a similar effect. This points to a practical therapeutic strategy: antibiotics that penetrate cells could reduce recurrence in patients whose tumours harbour invasive bacteria.
Human relevance across multiple cohorts: S. epidermidis from human breast tumours invaded MDA-MB-231 cells and induced the same pathways. In patient samples, bacterial load correlated with neutrophil infiltration. In the METABRIC database, a bacteria invasion signature (derived from RNA-seq of invaded cells) correlated with G-MDSC scores and worse survival—outperforming signatures from dead bacteria. This provides clinical evidence that the mechanism operates in human disease.

Why should we care?

The tumour microbiome is one of the hottest, and most confusing, areas in cancer biology. Bacteria have been found inside tumours, but whether they help or hurt the host has been maddeningly inconsistent. This study provides the clarity that’s been missing: it’s not about which bacteria are there, but where they are. Intracellular bacteria activate a specific innate immune pathway (cGAS-STING) that turns on IL-17B, a cytokine that reprograms neutrophils into immunosuppressive cells. These neutrophils then shield metastatic cells from T cell attack. Extracellular bacteria, by contrast, trigger a completely different, protective neutrophil state.

GPS: A Deep-Learning Platform for Discovering Drugs by Reversing Disease Gene Signatures

Xing, J. et al. Cell (2026). https://doi.org/10.1016/j.cell.2026.02.016

The paper in one sentence

The authors developed a deep-learning model called GPS that predicts how a drug will change gene expression based solely on its chemical structure, enabling them to screen millions of compounds to find and optimize those that reverse the harmful gene signatures of diseases like liver cancer and pulmonary fibrosis.

Summary

A major goal in drug discovery is to find compounds that can “reset” the diseased state of a cell, a concept often measured by changes in gene expression. However, experimentally testing millions of compounds for their effect on the entire transcriptome is impossible. This study introduces a deep-learning platform called GPS (Gene expression Profile predictor on chemical Structures) to overcome this barrier. The team trained GPS on a large database of drug-induced gene expression profiles. The model learned to accurately predict a compound’s transcriptional impact from its 2D chemical structure alone. They used GPS to create a virtual library of predicted gene signatures for nearly 7 million compounds. To find potential drugs, they developed a new scoring method (Z-RGES) that measures how well a compound’s predicted profile reverses a disease-specific gene signature, for instance, downregulating genes that are overactive in cancer. They validated this approach across multiple diseases, showing that lower Z-RGES scores correlated with known drug efficacy. Using GPS, they identified and experimentally validated novel compounds for hepatocellular carcinoma (HCC), optimizing one lead candidate (MSU45302) that showed in vivo efficacy. In idiopathic pulmonary fibrosis (IPF), they used single-cell transcriptomics to identify compounds that reverse disease signatures in multiple pathogenic cell types, finding a repurposing candidate and a new anti-fibrotic compound. GPS thus provides a powerful, scalable platform for discovering and optimizing new therapeutics by directly targeting disease-associated transcriptional programs.

Personal highlights

Predicting transcriptomes from structures: the core advance is GPS, a deep-learning model that accurately predicts a compound’s effect on ~2,200 genes using only its chemical structure. This allows for truly de novo discovery, not just repurposing of already-profiled drugs.
A better scoring system for reversal: the authors developed Z-RGES, a refined scoring metric that normalizes for the number of genes a compound affects. This proved crucial for accurately ranking compounds and correlating their predicted profile with real-world anti-cancer activity.
From a failed analog to a new lead: the platform was used to optimize a problematic anti-cancer hit (niclosamide). GPS-guided selection of analogs improved water solubility and reduced toxicity, demonstrating its utility for hit-to-lead optimization.
From bulk to single-cell resolution: for IPF, the team used single-cell RNA-seq data to define disease signatures for specific cell types (like myofibroblasts and MUC5B+ epithelial cells). GPS identified a repurposed drug that simultaneously reversed the pathogenic state of both cell populations, highlighting a potential advantage over single-target therapies.
Experimental validation of in silico hits: the paper doesn’t just stop at predictions. It provides extensive wet-lab validation, including in vitro assays in multiple cell lines and in vivo xenograft mouse models for HCC, confirming that GPS-nominated compounds have genuine therapeutic potential.

Why should we care?

This work offers a new way to bridge the gap between the vast chemical universe and the complex biology of disease. Instead of relying on finding a drug that hits a single protein target, this approach allows us to search for molecules that can correct a broad disease state, which may be more effective for complex diseases like cancer and fibrosis. By moving drug discovery from a "target-based" to a "phenotype-reversal" model, GPS provides a powerful and scalable tool for both finding new drugs and understanding their mechanisms of action. While the approach has limitations, like the need for better disease signatures and the challenge of predicting the full spectrum of drug effects, it represents a significant step toward using the rich information in transcriptomics to accelerate and de-risk the early stages of drug development.

scCChain: Modeling Cell-Cell Communication as Chains to Map Spatial Signaling Programs in Tissues

Brunn, N., et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.18.712664

The paper in one sentence

The authors introduce scCChain, a transformer-based framework that reframes spatial cell-cell communication as a sequence-modeling problem, using weighted random walks to assemble chains of cells and prioritizing communication programs based on how well they predict receiver cell gene expression.

Summary

Spatial transcriptomics allows researchers to map where genes are expressed in tissues, but inferring which cells are actually “talking” to each other remains challenging. Existing methods often focus on pairwise ligand-receptor interactions or use fixed-radius neighborhoods, which can miss the complexity of coordinated signaling programs and struggle with the noise and sparsity of spatial data. To address this, the authors developed scCChain, a framework that models cell-cell communication as chains of cells. First, it builds a distance-aware graph where cells are connected by transcriptional similarity and by potential ligand-receptor signaling. Structured dimensionality reduction then groups related ligand-receptor pairs into interpretable communication programs (CPs). For each CP, scCChain samples chains of cells using weighted random walks that can traverse the tissue, borrowing information from transcriptionally similar neighbors. Crucially, these chains are then fed into a transformer model that attempts to predict the gene expression of the last cell (receiver) from the expression of the preceding cells (senders). Chains that yield lower prediction errors are prioritized as more plausible communication events. Applied to human breast cancer data, scCChain identified a pro-angiogenic communication program (enriched for VEGF, midkine, and WNT signaling) localized to invasive tumor regions in Visium spot-level data. In higher-resolution Xenium data, a targeted analysis of CXCL12-CXCR4 signaling revealed cell-type-specific sender-receiver relationships and showed that the model’s attention mechanism highlighted intermediate-range senders as most informative. scCChain provides a flexible, interpretable way to discover and map spatial communication programs across different spatial transcriptomics platforms

Personal highlights

Re-framing communication as chains: instead of modeling pairwise interactions or fixed neighborhoods, scCChain uses weighted random walks to build variable-length “chains” of cells. This allows the model to integrate information from the local microenvironment while keeping each unit compact and computationally tractable for transformer architectures.
Transformers for prioritizing interactions: the core innovation is using a transformer model to predict a receiver cell’s transcriptome from its senders. The prediction error provides a data-driven criterion for communication plausibility, while the attention mechanism identifies which sender cells are most influential for a given receiver.
Discovering interpretable communication programs: by applying structured dimensionality reduction to ligand-receptor co-expression, scCChain groups related interactions into sparse, interpretable communication programs (CPs). In breast cancer, it identified a CP enriched for VEGF and midkine signaling that localized specifically to invasive tumor regions, consistent with known pro-angiogenic biology.
From bulk spots to single-cell resolution: the framework is compatible with both spot-based (Visium) and imaging-based (Xenium) spatial transcriptomics. In Xenium data, a targeted analysis of CXCL12-CXCR4 signaling revealed that attention-weighted senders were at intermediate distances (~48 µm), suggesting that the model captures biologically relevant spatial ranges beyond nearest neighbors.
Attention as a tool for biological insight: the transformer’s attention weights offer a built-in interpretability layer. The authors used them to identify which sender cell types (e.g., stromal cells as consistent CXCL12 producers) and which spatial distances (intermediate-range) were most informative for predicting receiver states, providing hypotheses about tissue architecture.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 9/3/26

Sebastiaan Vanuytven — Mon, 16 Mar 2026 09:55:28 GMT

This week’s reads span a wide range from metabolic tricks that boost immunotherapy to new computational tools reshaping single-cell analysis and foundation models that simulate cellular futures. A clinically viable 16-hour fasting regimen surprisingly increases CD8⁺ T-cell-mediated tumor cell killing by rewiring tumor metabolism through the amino acid isoleucine. On the computational front, CellSweep provides a fast and interpretable method for cleaning single-cell data, while PerturbGen, trained on over 100 million cells, predicts the effects of genetic perturbations on developmental trajectories. Other papers provide new insights into the tumor cell plasticity and microenvironmental support, such as how lung cancer cells offload damaged mitochondria to fibroblasts to survive targeted therapy, and how head and neck cancer cells decouple differentiation from loss of self-renewal, which has implications for differentiation therapy. Finally, a systematic benchmark reveals the potential and pitfalls of measuring the activities of transposable elements in single-cell RNA-seq.

Preprints/articles that I managed to read this week

16-h fasting optimizes cancer immunotherapy in mice and humans

Chen, S. et al. Cell Metabolism (2026). https://doi.org/10.1016/j.cmet.2026.01.015

The paper in one sentence

A clinically feasible 16-hour fasting regimen reshapes the tumor microenvironment by causing cancer cells to release isoleucine, which fuels CD8+ T cell cytotoxicity via acetyl-CoA-driven epigenetic and metabolic remodeling, enhancing immunotherapy efficacy in both mice and patients with colorectal cancer.

Summary

Dietary interventions can influence cancer therapy, but prolonged fasting is poorly tolerated in patients who may already be malnourished. Chen and colleagues test a brief, overnight 16-hour fasting regimen, already standard preoperative practice, in mouse tumor models and a prospective cohort of colorectal cancer (CRC) patients. In B16 and MC38 tumour-bearing mice, 16-hour fasting remodelled the tumour immune microenvironment. Single-cell RNA-seq revealed enhanced CD8+ T cell cytotoxicity (increased IFNγ, GZMB) and reduced exhaustion markers (PD1, TIM3, TIGIT), without altering circadian rhythms. In a pilot human study (12 CRC patients), those undergoing preoperative fasting showed expansion of cytotoxic Temra (terminally differentiated effector memory) cells and reduced exhaustion trajectories compared to fed controls. The mechanism centres on the branched-chain amino acid isoleucine. Untargeted metabolomics of tumour interstitial fluid (TIF) identified isoleucine as the most significantly upregulated metabolite after 16-hour fasting—specifically at the 16-hour time point, not earlier. This accumulation correlated with fasting duration in a second patient cohort. Isoleucine proved essential for CD8+ T cell function. Depletion impaired proliferation and effector function; supplementation in nutrient-deprived tumour-conditioned medium (TCM) restored both. In vivo, isoleucine administration slowed tumour growth in a CD8+ T cell-dependent manner and synergized with anti-PD1 therapy. Mechanistically, isoleucine enters CD8+ T cells via the LAT1 transporter, is catabolized by BCAT2, and fuels the acetyl-CoA pool. Isotope tracing confirmed conversion of ¹³C-isoleucine into acetyl-CoA and TCA intermediates. This acetyl-CoA drives two critical processes: (1) histone acetylation at effector gene loci (Ifng, Gzmb, Tbx21), increasing chromatin accessibility, and (2) phospholipid synthesis, supporting membrane integrity and cytotoxic morphology. BCAT2-deficient T cells lacked these responses and failed to mediate anti-tumour effects. Crucially, the isoleucine originates from tumour cells, not serum. Under fasting-induced glutamine deprivation, tumour cells upregulate antiporter activity (SLC3A2/SLC7A5) that exchanges intracellular isoleucine for extracellular glutamine—a metabolic trade-off that releases isoleucine into the TME. Knockout of Slc3a2 in tumour cells abolished isoleucine accumulation and abrogated fasting-induced T cell enhancement. In a prospective neoadjuvant immunotherapy trial (NCT05731726), CRC patients who fasted for 16 hours before anti-PD1 infusion showed expanded Temra populations, enhanced cytotoxic signatures, and reduced tumour size compared to non-fasted controls.

Personal highlights

16-hour fasting opens a metabolic window for immunotherapy: unlike prolonged fasting regimens that are poorly tolerated, a single overnight 16-hour fast, already clinically routine before surgery, is sufficient to remodel the tumour immune microenvironment. In mice, this brief fast enhanced CD8+ T cell cytotoxicity and reduced exhaustion; in a pilot human study, it expanded cytotoxic Temra cells and improved anti-PD1 responses.
Isoleucine emerges as the critical fasting-induced metabolite: untargeted metabolomics of tumour interstitial fluid identified isoleucine as the most significantly upregulated metabolite after 16-hour fasting, specifically at the 16-hour time point. This accumulation correlated with fasting duration in CRC patients and was not observed for other branched-chain amino acids (leucine, valine) or immune-relevant amino acids (arginine, serine).
Isoleucine fuels CD8+ T cell effector function via acetyl-CoA: depletion of isoleucine impaired CD8+ T cell proliferation and IFNγ/GZMB production; supplementation in nutrient-deprived medium restored them. Isotope tracing showed conversion of ¹³C-isoleucine into acetyl-CoA and TCA intermediates. This acetyl-CoA drives histone acetylation at effector gene loci (Ifng, Gzmb, Tbx21) and supports phospholipid synthesis for membrane integrity, dual mechanisms linking a single amino acid to both epigenetic and metabolic control of cytotoxicity.
Tumour cells release isoleucine via a glutamine-exchange trade-off: under fasting-induced glutamine deprivation, tumour cells upregulate the antiporter SLC3A2/SLC7A5, exchanging intracellular isoleucine for extracellular glutamine. CRISPR screening with an isoleucine FRET sensor (OLIVE) identified SLC3A2 as the key efflux transporter. Slc3a2 knockout abolished isoleucine accumulation in TIF and abrogated fasting-induced T cell enhancement, confirming that tumour cells are the source.
Clinical proof-of-concept in neoadjuvant immunotherapy: in a prospective trial, pMMR/MSS rectal cancer patients who fasted for 16 hours before anti-PD1 infusion showed expanded Temra populations, enhanced cytotoxic signatures, and reduced tumour size compared to non-fasted controls. This demonstrates that a simple, well-tolerated dietary intervention can meaningfully improve immunotherapy outcomes.

Why should we care?

Immunotherapy has transformed cancer treatment, but most patients still don’t respond, and resistance remains a major challenge. Meanwhile, dietary interventions have shown promise but require prolonged, poorly tolerated regimens, a non-starter for patients already at risk of malnutrition and cachexia. This study flips that narrative. A single overnight 16-hour fast, already standard practice before surgery, is sufficient to reshape the tumour microenvironment in a way that enhances immunotherapy. The mechanism is elegant: fasting creates a metabolic tug-of-war where tumour cells, starved of glutamine, release isoleucine as a trade-off. CD8+ T cells capture this isoleucine and use it to fuel the very programs, epigenetic remodeling, lipid synthesis, mitochondrial respiration, that underlie effective killing. The clinical proof-of-concept, though small, is striking. Patients who fasted showed expanded cytotoxic T cell populations and smaller tumours after neoadjuvant anti-PD1 therapy.

Single-cell genomics decontamination with CellSweep

Caskey, M. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.04.709349

The paper in one sentence

CellSweep is a fast, interpretable probabilistic model that removes ambient and bulk contamination from single-cell genomics data using an expectation-maximization algorithm, outperforming existing methods across multiple benchmarks while running in under a minute.

Summary

Caskey and colleagues introduce CellSweep, a generative model that decomposes observed counts for each barcode into three interpretable components: cell-type expression, ambient contamination, and global bulk contamination. The model assumes a multinomial distribution conditional on total UMI counts, with the expected expression profile for each cell as a convex combination of these sources.A key innovation is the use of non-cellular barcodes (empty droplets) to obtain an empirical estimate of the ambient RNA profile, stable and unbiased due to the large number of empty droplets typical in droplet-based assays. Bulk contamination is initialized from the global mean expression across all droplets. Cell-type labels are provided upfront (e.g., from CellTypist), and parameters are inferred via a closed-form expectation-maximization (EM) algorithm that parallelizes perfectly across cells. When non-cellular barcodes are unavailable (e.g., in well-based protocols like Smart-seq2), CellSweep offers an alternative model where ambient RNA is modeled as a mixture of cell-type profiles, with mixture weights updated via a nested EM procedure.

The authors benchmark CellSweep against SoupX, CellBender, DecontX, and scAR across multiple datasets and modalities. In a human-mouse mixture 10x dataset, CellSweep removes >98% of cross-species contamination while retaining >97% of true-species counts—substantially better than competitors. It performs similarly well on Smart-seq2 and ATAC-seq data. In a Visium HD spatial dataset with human cancer cells grafted in mouse, CellSweep not only removes cross-species contamination but also reveals spatial patterns of ambient noise: cells with high predicted ambient fractions (αᵢ) localize to tissue edges, consistent with edge artifacts. On a PBMC 8k dataset, CellSweep cleans up marker gene expression (e.g., removing neutrophil markers from non-neutrophil clusters) while preserving pan-leukocyte markers like PTPRC. It achieves this with mean removal of 668 counts per cell, less aggressive than scAR (2,647) but more effective than CellBender (121), which left substantial contamination. CellSweep is idempotent: reapplying it to already-cleaned data produces minimal additional changes, unlike CellBender and scAR, which continue to remove counts. It is also fast: on a PBMC 8k dataset, CellSweep runs in 25 seconds on 16 CPU threads, 10× faster than DecontX and SoupX, and orders of magnitude faster than neural-network-based methods requiring GPUs. In simulations with ground truth, CellSweep achieves near-perfect positive predictive value (0.981), matching DecontX and SoupX, while scAR performs poorly (0.686).

Personal highlights

Interpretable three-component mixture model: CellSweep explicitly models observed counts as a convex combination of cell-type expression, ambient contamination (from lysed cells), and global bulk contamination (from library prep). This decomposition, unlike black-box neural approaches, provides biologically meaningful parameters: αᵢ (per-cell ambient fraction) and β (global bulk fraction), enabling interpretability and quality control.
Empirical ambient estimation from empty droplets: ny leveraging the large number of non-cellular barcodes typical in droplet-based assays, CellSweep obtains a stable, unbiased estimate of the ambient RNA profile via simple averaging. This avoids the need to infer ambient noise from cellular data alone, a key advantage over methods that must estimate everything simultaneously.
Closed-form EM with perfect parallelization: unlike variational inference or deep generative models, CellSweep’s EM algorithm has closed-form E- and M-steps that decompose independently across cells. This enables near-perfect parallelization and yields runtimes of seconds to minutes on a CPU, orders of magnitude faster than CellBender or scAR, which require GPUs and hours of compute.
Spatial mapping of ambient noise reveals edge artifacts: applying CellSweep to a Visium HD xenograft dataset, the authors show that cells with high predicted ambient fractions (αᵢ) localize to tissue edges, a striking spatial pattern that validates the model and provides a diagnostic for edge artifacts in spatial transcriptomics.
Idempotency ensures stable output: CellSweep, SoupX, and DecontX are nearly idempotent repeated application produces minimal changes. In contrast, CellBender and scAR continue to remove counts across iterations, indicating instability and risking over-cleaning.
Versatility across technologies and modalities: CellSweep works on droplet-based (10x), combinatorial barcoding (Parse), well-based (Smart-seq2), and spatial (Visium HD) data, as well as ATAC-seq. The alternative model handles cases without empty droplets, broadening applicability.

title

Chi Hao, L. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.04.709254

The paper in one sentence

PerturbGen is a generative foundation model trained on 107 million single-cell transcriptomes that predicts how genetic perturbations introduced at one point along a cellular trajectory, differentiation, development, or immune activation—reshape downstream cell states and fate decisions.

Summary

Existing approaches for predicting perturbation responses operate within fixed cellular states, they cannot model how an intervention applied early (e.g., in a stem cell) propagates to alter later differentiated states. This limits their utility for understanding development, disease progression, and therapeutic timing. The authors of this manuscript develop PerturbGen, an encoder-decoder transformer that explicitly models state-to-state transitions. Cells are represented as ranked tokenized gene expression sequences (following Geneformer). During training, the model learns to predict gene expression at a target state (e.g., day 10 of differentiation) conditioned on source and intermediate states (e.g., days 0, 3, 7). This trajectory-aware architecture enables in silico perturbation: modify the source state representation (e.g., knock out a gene) and predict how that change propagates downstream. PerturbGen is pre-trained on ~107 million single-cell transcriptomes spanning embryonic, fetal, and postnatal stages, capturing diverse developmental transitions. It is then fine-tuned on task-specific time-resolved datasets.

Personal highlights

Trajectory-aware perturbation prediction: unlike prior models that predict effects within a single state, PerturbGen explicitly models state-to-state transitions. By conditioning target-state generation on source and intermediate states, it enables prediction of how early perturbations propagate to reshape downstream transcriptional programs, a critical capability for development, differentiation, and disease progression.
Massive pre-training on developmental transitions: pre-training on 107 million cells including underrepresented embryonic and fetal datasets—exposes the model to diverse, densely sampled state changes. This improves generalization across tissues and contexts, as demonstrated by accurate prediction of unseen time points across three independent time-resolved datasets.
In silico perturbation atlases reveal perturbation-induced programs (PIPs): scaling in silico perturbations to 3,108 genes in hematopoiesis and 5,050 in skin organoids yields perturbation maps where genes with similar downstream effects cluster. These PIPs capture age- and lineage-specific regulatory programs (e.g., “postnatal lymphoid differentiation,” “fetal hematopoietic progenitor proliferation”) and are enriched for blood-trait-associated genes and monogenic disorder genes, demonstrating biological coherence and translational relevance.
Recapitulation of monogenic disease phenotypes: in silico ETV6 knockout in megakaryocyte progenitors predicted transcriptional changes that closely matched those observed in ETV6-related thrombocytopenia patients (81% pathway concordance). This included upregulation of MHC class II genes and downregulation of platelet programs, effects validated across patients and not driven by compositional shifts. This establishes a framework for modeling rare diseases where patient samples are limited.
Functional validation in skin organoids: PerturbGen prioritized GSK3B/Wnt activation as a candidate to promote fibroblast maturation. Experimental Wnt activation with CHIR99021 at day 6 phenocopied the predicted stromal shift, increasing transcriptional similarity to fetal skin fibroblasts. This demonstrates that trajectory-aware predictions can guide experimental optimization of complex multicellular systems.

Why should we care?

Biology is not static, it unfolds along trajectories. A stem cell today becomes a differentiated cell tomorrow; an immune cell at 90 minutes post-stimulus is not the same as at 6 hours. Yet most perturbation models treat cells as snapshots, asking: what happens if I perturb this cell in this state? They cannot ask: what happens if I perturb this cell now and look at its descendants later? PerturbGen bridges this gap. By learning how states transition across time—from 107 million cells spanning development, homeostasis, and disease, it can simulate the downstream consequences of early interventions. This is not just a technical advance; it reframes the question we can ask.

Transfer of Damaged Mitochondria from Cancer Cells to Cancer-Associated Fibroblasts Promotes Tyrosine Kinase Inhibitor Tolerance in EGFR-Mutant Lung Cancer

Liu, T. et al. Cancer Research (2025). https://doi.org/10.1158/0008-5472.CAN-25-0433

The paper in one sentence

EGFR-mutant lung cancer cells under tyrosine kinase inhibitor stress transfer damaged mitochondria via tunneling nanotubes to a specific fibroblast subset (RGS5+MYL9+ CAFs), which act as "metabolic sinks" to reduce oxidative stress and promote drug-tolerant persister cell survival, a process that can be blocked by the FDA-approved Rho kinase inhibitor fasudil.

Summary

Liu and colleagues use single-cell RNA sequencing of treatment-naive EGFR-mutant lung adenocarcinomas to map the fibroblast landscape, identifying five distinct CAF subsets. Among these, a previously unrecognized myofibroblast population marked by RGS5 and MYL9 stood out. In patient-derived organoid co-cultures, RGS5+MYL9+ CAFs, but not other CAF subsets, significantly attenuated osimertinib-induced cell death and promoted tumor regrowth after drug withdrawal. Higher infiltration of these CAFs correlated with advanced stage and poor prognosis in TCGA data. Mechanistically, osimertinib treatment generates mitochondrial reactive oxygen species (mtROS) in cancer cells. This triggers two parallel responses: (1) upregulation of CCL11, which recruits RGS5+MYL9+ CAFs to the DTP niche, and (2) activation of Miro1 (mitochondrial Rho GTPase 1) and RhoA, which drive F-actin polymerization and the formation of tunneling nanotubes (TNTs)—long membrane protrusions that connect cancer cells to adjacent CAFs. Through these nanotubes, damaged, ROS-producing mitochondria are transferred from stressed cancer cells to RGS5+MYL9+ CAFs. The CAFs accept this “toxic cargo,” thereby reducing mitochondrial burden and oxidative stress in the cancer cells and promoting DTP survival. The transferred mitochondria in CAFs show elevated mtROS and dysfunction, confirming they are indeed damaged. In vivo, xenografts containing RGS5+MYL9+ CAFs showed reduced tumor regression on osimertinib and accelerated regrowth after withdrawal, with evidence of mitochondrial transfer detectable by flow cytometry and confocal imaging. Blocking CCL11 with a neutralizing antibody reduced CAF recruitment, delayed relapse, and improved survival. Critically, the Rho kinase inhibitor fasudil—already FDA-approved for cerebral vasospasm, blocked TNT formation by inhibiting RhoA activity. In xenograft models, combining osimertinib with fasudil significantly delayed tumor relapse and extended survival, even when treatment was initiated after MRD establishment. Human specimens from neoadjuvant osimertinib-treated patients showed increased RGS5+MYL9+ CAF infiltration and closer proximity to residual tumor cells, confirming clinical relevance.

Personal highlights

scRNA-seq identifies RGS5+MYL9+ CAFs as a clinically relevant subset: unbiased profiling of treatment-naive EGFR-mutant lung adenocarcinomas revealed five CAF subsets, including a novel myofibroblast population co-expressing RGS5 and MYL9. In patient-derived organoid co-cultures, only this subset conferred osimertinib resistance and promoted tumor regrowth after drug withdrawal. Higher RGS5+MYL9+ CAF infiltration correlated with advanced stage and poor prognosis in TCGA, and was enriched in post-treatment residual tumors from patients.
Damaged mitochondria are transferred from cancer cells to CAFs via tunneling nanotubes: under osimertinib stress, cancer cells form F-actin-rich membrane protrusions (tunneling nanotubes) that connect to adjacent RGS5+MYL9+ CAFs. Through these nanotubes, damaged, ROS-producing mitochondria are transferred from cancer cells to CAFs, visualized by mitoDsRed labeling, confocal imaging, and flow cytometry. This is not a one-way transfer of healthy mitochondria to cancer cells (as previously described), but rather a disposal mechanism where cancer cells offload damaged organelles to stromal “sinks.”
Miro1 and RhoA mediate nanotube formation and mitochondrial trafficking: Osimertinib-induced mtROS upregulates Miro1 (mitochondrial Rho GTPase 1), which moves damaged mitochondria toward the cell periphery, and activates RhoA, which drives F-actin polymerization to form nanotubes. Miro1 knockdown or RhoA inhibition (with fasudil) abrogates mitochondrial transfer and restores drug sensitivity. This establishes a molecular pathway linking oxidative stress to intercellular organelle transfer.
CCL11 recruits RGS5+MYL9+ CAFs to the DTP niche: DTP cells secrete CCL11, which acts as a chemoattractant specifically for RGS5+MYL9+ CAFs (not other subsets). Neutralizing CCL11 reduces CAF accumulation around stressed cancer cells, decreases mitochondrial transfer, and delays tumor relapse in vivo. This reveals a two-step mechanism: recruitment followed by nanotube-mediated transfer.
Fasudil, an FDA-approved drug, blocks TNT formation and prevents relapse: the Rho kinase inhibitor fasudil, already used clinically for cerebral vasospasm, effectively blocks TNT formation by inhibiting RhoA. In xenograft models, combining osimertinib with fasudil—even when started after MRD establishment—significantly delayed tumor relapse and extended survival. This offers an immediately translatable strategy to overcome TKI tolerance.

Why should we care?

Drug-tolerant persister cells are the hidden seeds of relapse in EGFR-mutant lung cancer, they survive initial therapy through non-genetic adaptations, then eventually regrow as fully resistant tumors. For years, we’ve known they exist, but we haven’t known how the microenvironment supports them. This work reveals a remarkable mechanism: stressed cancer cells don’t just suffer in silence. They actively recruit specific fibroblasts, hand off their damaged mitochondria like toxic waste, and thereby reduce their own oxidative burden to survive. The fibroblast acts as a “metabolic sink,” accepting damage to protect the cancer cell. This flips the conventional narrative of mitochondria transfer (healthy mitochondria moving into cancer cells) on its head. The molecular pathway is unusually complete: from the initial ROS signal, to Miro1-mediated mitochondrial positioning, to RhoA-driven nanotube formation, to CCL11-mediated recruitment. And crucially, each node is targetable.

Plasticity of squamous differentiation drives drug resistance in HNSCC

Sipilä, K. et al. bioRxiv (2026). https://doi.org/10.64898/2026.03.09.710514

The paper in one sentence

A subset of head and neck squamous cell carcinoma cells resists differentiation-inducing signals, including the clinically used ErbB inhibitor afatinib, retaining clonogenic and tumorigenic potential despite expressing differentiation markers, revealing that differentiation and loss of self-renewal are uncoupled in these cells.

Summary

Differentiation therapy, forcing cancer cells to terminally differentiate and lose self-renewal, has transformed outcomes in acute promyelocytic leukaemia, but has shown limited success in solid tumours. Why? Sipilä and colleagues address this question using patient-derived head and neck squamous cell carcinoma (HNSCC) lines (SJGs) cultured on feeder layers, a system that preserves the mutational heterogeneity of primary tumours. When transplanted orthotopically into immunocompromised mice, these lines recapitulate the histological diversity of human HNSCC, including variable differentiation status, stromal desmoplasia, and perineural invasion. In normal keratinocytes, detachment from the basement membrane (methylcellulose suspension) triggers terminal differentiation and irreversible loss of clonogenic potential. HNSCC cells also upregulate differentiation markers (IVL, TGM1) in suspension, but they do not lose clonogenic capacity to the same extent. Immunostaining revealed a heterogeneous response: some cells became Ki67⁻ and expressed differentiation markers, but a subset remained Ki67⁺ or failed to upregulate IVL/TGM1 entirely. To track the fate of clonogenic cells in vivo, the authors used lentiviral fluorescent barcoding (mRuby2, mTagBFP2, acGFP). Cells pre-treated with methylcellulose suspension for 20h showed only a minor delay in tumour growth and no significant change in clonal density or clone size, indicating that the cells responsible for tumour formation are largely resistant to transient differentiation signals. A small-molecule screen targeting pathways known to regulate keratinocyte differentiation identified ErbB-MEK1/2-ERK1/2 inhibition (afatinib, PD0325901, VX-11e) as the most effective at inducing IVL expression. Afatinib, already clinically used in HNSCC, increased differentiation marker expression but, like methylcellulose, left a substantial fraction of cells undifferentiated. Fluorescent barcoding after afatinib pre-treatment showed no reduction in tumour growth or clonal architecture, the tumorigenic cells were unaffected by drug-induced differentiation. Using an IVL promoter-driven mCherry reporter, the authors sorted cells by differentiation status after afatinib treatment. IVL-high cells formed markedly smaller tumours than IVL-low cells, but some IVL-high cells still generated progeny, and tumours derived from IVL-low cells remained capable of producing differentiated cells upon re-challenge. Even at supra-clinical concentrations, an afatinib-resistant subpopulation persisted. The key finding: differentiation and loss of self-renewal are partially uncoupled in HNSCC. Cells can express differentiation markers while retaining clonogenic potential, and the most tumorigenic cells are those that resist differentiation cues—not because they cannot differentiate, but because they can escape the irreversible cell-cycle exit that normally accompanies it.

Personal highlights

Patient-derived models preserve heterogeneity: SJG lines cultured on feeder layers retain the mutational landscape of primary tumours (TP53, PIK3CA, FAT1, NOTCH1, CDKN2A). Orthotopic xenografts recapitulate key histopathological features: differentiation status, desmoplasia, perineural invasion, providing a clinically relevant platform to study differentiation dynamics.
Differentiation and self-renewal are uncoupled in HNSCC: in normal keratinocytes, detachment induces terminal differentiation and irreversible loss of clonogenicity. HNSCC cells upregulate differentiation markers (IVL, TGM1) in suspension but retain colony-forming ability. A subset of cells remains Ki67⁺ or fails to express differentiation markers entirely, revealing intrinsic heterogeneity in the response.
Clonogenic tumour-initiating cells resist differentiation signals: fluorescent barcoding enabled lineage tracing of individual clones in vivo. Pre-treatment with methylcellulose or afatinib did not reduce tumour growth, clonal density, or clone size. The cells that drive tumour formation are largely unaffected by differentiation-inducing stimuli.
ErbB-MAPK inhibition promotes differentiation but spares tumorigenic cells: a focused screen identified afatinib, MEKi (PD0325901), and ERKi (VX-11e) as the most potent inducers of IVL expression. Yet, even at supra-clinical concentrations, a fraction of cells remained undifferentiated, and these corresponded to the most clonogenic population in vivo.
IVL reporter reveals graded differentiation states: cells sorted by IVL-mCherry intensity after afatinib treatment showed an inverse relationship between IVL expression and tumorigenic potential. However, some IVL-high cells still formed tumours, and IVL-low cells remained capable of producing differentiated progeny upon re-challenge. Differentiation status is not a binary switch but a spectrum, and cells can move along it without losing self-renewal.

Why should we care?

Differentiation therapy transformed acute promyelocytic leukaemia from a deadly disease to one with >90% cure rates. The idea is elegant: instead of killing cancer cells, force them to mature into harmless, post-mitotic cells. But for solid tumours, this strategy has repeatedly failed. This work explains why. In HNSCC, differentiation and loss of self-renewal are not tightly coupled. Cancer cells can express differentiation markers, they look like they’re maturing, while retaining the ability to divide and form tumours. The cells that actually sustain tumour growth are precisely those that resist differentiation cues, not because they can’t differentiate, but because they can escape the irreversible cell-cycle exit that normally accompanies it. The clinical implications are sobering. Afatinib, already used in HNSCC, does induce differentiation, but it doesn’t eliminate the tumorigenic cells. Even at concentrations exceeding those achieved in patients, a resistant subpopulation persists. This suggests that simply measuring differentiation markers in response to therapy may overestimate efficacy; what matters is whether the clonogenic cells are eliminated.

Benchmarking computational tools for locus-specific analysis of transposable elements in single-cell RNA-seq datasets

Finazzi, V. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.26.708244

The paper in one sentence

This systematic benchmark evaluates computational tools for locus-specific transposable element quantification in short-read scRNA-seq, revealing that while older elements are reliably quantified, young repetitive TEs remain intrinsically difficult to resolve, and gene-TE misassignment is a pervasive, underappreciated challenge.

Summary

Transposable elements (TEs) are increasingly recognized as regulators of gene expression and cellular identity, but their repetitive nature makes them difficult to quantify, especially at single-locus resolution in sparse, 3’-biased single-cell RNA-seq data. Several tools have been developed, but their relative performance has not been systematically evaluated against ground truth. The authors present a comprehensive benchmarking framework combining real datasets (mouse ESCs, olfactory mucosa, human PBMCs) with controlled simulations that provide read-level ground truth. They evaluate three tools capable of locus-specific quantification: SoloTE, Stellarscope, and STARsolo (with and without EM-based multimapper handling). First, they show that TE-derived reads constitute a substantial fraction of scRNA-seq data (>24% across datasets) and that TE expression profiles alone can resolve cell types, often revealing substructure not apparent in gene-based clustering. However, the proportion of multimapping TE reads varies dramatically by cell state (e.g., highest in 2-cell-like cells expressing young TEs). The simulations, stratified by TE age (old vs. young), mixing, and inclusion of genes, reveal sharp performance contrasts. For old TEs, all tools achieve near-perfect detection and quantification. For young TEs, detection is plagued by false positives across all methods, with limited tool agreement. Including multimappers (via EM or threshold lowering) increases false positives without consistently improving accuracy. Stellarscope’s EM algorithm partially mitigates noise but at the cost of sensitivity; its posterior probability thresholds can be tuned, but the optimal trade-off depends on the analysis goal. Family-level analysis shows striking heterogeneity: L1 and ERVL elements are hardest to resolve accurately, while SINEs (Alu, B2) perform better. Aggregating to the subfamily level dramatically improves precision, confirming that the core challenge is locus-specific assignment, not family-level detection. Critically, gene-TE misassignment is a major, bidirectional problem. Reads from expressed genes are frequently misassigned to overlapping TE loci, and vice versa. Stellarscope, which does not filter gene-overlapping reads, is most affected, but all tools show some degree of cross-assignment. This confound can strongly bias biological interpretation. The authors distill their findings into practical recommendations: (i) use locus-level quantification confidently for older elements, but interpret young-locus calls with caution; (ii) prefer unique-mapper strategies (SoloTE default) when precision is paramount; (iii) for discovery-scale surveys, aggregate to subfamily level for robustness; (iv) explicitly check and report gene-TE overlaps.

Personal highlights

TE-derived reads are abundant and biologically informative in scRNA-seq: across three diverse datasets, >24% of reads mapped to TE loci, reads typically discarded in standard pipelines. TE expression profiles alone resolved major cell types and, in some cases, revealed substructure not visible in gene-based clustering, demonstrating that TEs encode meaningful biological signal.
Age matters: old TEs are reliable, young TEs are problematic: evolutionary age is the dominant predictor of quantification accuracy. Old elements ( >2 million years) were detected and quantified with near-perfect precision across tools. Young elements, by contrast, generated pervasive false positives regardless of method, with limited tool agreement. This reflects fundamental sequence-level constraints: young TEs are too similar to resolve with short reads.
Multimapper handling offers limited gains, at a cost: including multimapped reads, via EM algorithms (Stellarscope, STARsolo) or threshold lowering (SoloTE), increased false positives without consistently improving accuracy. EM improved precision modestly but reduced sensitivity. For most applications, unique-mapper strategies (SoloTE default) performed comparably while producing fewer false positives, suggesting that aggressive multimapper inclusion may do more harm than good.
Gene-TE misassignment is pervasive and bidirectional: reads from expressed genes were frequently misassigned to overlapping TE loci, and TE-derived reads were misassigned to genes. Stellarscope was most affected (it does not filter gene-overlapping reads), but all tools showed cross-assignment. This confound can severely bias interpretation—for example, inflating apparent TE activity in gene-rich regions or masking genuine TE signals.
Family-specific performance guides tool choice: performance varied dramatically by TE family. L1 and ERVL elements (long, homogeneous) were hardest to resolve accurately; SINEs (Alu, B2) performed better. This suggests that optimal tool selection may depend on which families are expected to be active in a given biological system, and that family-aware quality control is essential.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 23/2/26

Sebastiaan Vanuytven — Sun, 08 Mar 2026 13:34:39 GMT

This week’s reads show how cancer biology is deeply shaped by context across multiple scales—from metabolism and tissue architecture to chromatin state and experimental models. One study overturns a long-standing paradigm by showing that systemic hypoxia suppresses tumor growth, not by activating canonical HIF programs but by crippling de novo purine synthesis and starving tumors of nucleotides. At the tissue level, the Wayfarer framework demonstrates that spatial gene expression patterns shift across length scales during lung cancer progression, revealing immune exclusion and tumor architecture changes invisible at any single resolution. New experimental platforms also expand how we study tumor heterogeneity and therapy response: GENEVA enables multiplexed in vivo pharmacogenomics by pooling diverse cancer models into mosaic tumors, uncovering mitochondrial hyperactivation as a mechanism of KRAS inhibitor killing and EMT-driven resistance. Computationally, REMAP reconstructs spatial tissue architecture from dissociated single-cell RNA-seq data, enabling spatial analysis of existing atlases without direct spatial measurements. Meanwhile, studies of early tumorigenesis and therapeutic resistance emphasize the importance of cellular plasticity and microenvironmental interactions, from fibroblast niche remodeling that determines whether nascent tumors persist, to chromatin-mediated EMT programs that drive resistance to KRAS-targeted therapies.

Preprints/articles that I managed to read this week

Systemic hypoxia suppresses solid tumor growth

Midha, A. D. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.09.704975

The paper in one sentence

Systemic hypoxia, reducing atmospheric oxygen to 8–11%, suppresses tumor growth across multiple cancer models by inhibiting de novo purine synthesis, an effect distinct from local tumor hypoxia, independent of HIF activation, and synergistic with chemotherapy and immunotherapy.

Summary

Local tumor hypoxia is a well-established negative prognostic factor, driving angiogenesis, therapy resistance, and aggressive progression. But what happens when the entire host is hypoxic? Midha and colleagues address this question across a remarkable range of models: syngeneic subcutaneous and orthotopic tumors (Panc02 pancreatic, E0771 breast), a genetically engineered KPC pancreatic cancer model, and a 20-cell-line mosaic xenograft pool (GENEVA) enabling lineage-resolved fitness measurements. Systemic hypoxia (8–11% O₂) consistently suppressed tumor growth, with orthotopic breast tumors in 8% O₂ growing ~60% slower than normoxic controls. Direct oxygen measurements confirmed tumors were indeed more hypoxic. Yet the mechanism defied obvious explanations: hypoglycemia (a known effect of hypoxia) was not responsible, glucose supplementation failed to rescue growth. Constitutive insulin signaling via PTEN knockout did not overcome suppression. And HIF activation, while present, was not required: ARNT-knockout tumors remained sensitive. The GENEVA platform revealed heterogeneous responses: most lines (18/20) showed reduced fitness, but two renal cell carcinoma lines (786O, Caki1) were resistant. By correlating transcriptional changes with fitness, the authors identified de novo purine synthesis genes (PPAT, MTHFD1, PAICS, ATIC) as strongly associated with sensitivity, their downregulation in hypoxia predicted growth suppression. In contrast, purine salvage genes showed no relationship. Metabolomics confirmed depletion of nucleotides (especially AMP, ADP, ATP) in hypoxic tumors and accumulation of the salvage intermediate hypoxanthine in tumor interstitial fluid. Stable isotope tracing with ¹⁵N-glutamine and ¹³C-adenine in vitro and in vivo demonstrated that hypoxia suppresses de novo purine synthesis while increasing reliance on salvage. This aligns with prior work showing de novo purine synthesis is essential for in vivo tumor growth but dispensable in culture. Systemic hypoxia proved durable: tumors serially reimplanted over four passages showed no resistance. It synergized with gemcitabine in pancreatic cancer and with anti-CTLA4 immunotherapy in breast cancer, with the triple combination nearly abolishing growth. Finally, the authors show that HypoxyStat, a small molecule that increases hemoglobin’s oxygen affinity, mimicking systemic hypoxia, recapitulates the tumor-suppressive effect.

Personal highlights

Paradoxical tumor suppression by systemic hypoxia: while local tumor hypoxia promotes aggression, systemic hypoxia (8–11% O₂) consistently suppresses growth across subcutaneous, orthotopic, genetic, and multiplexed xenograft models. This challenges the long-held paradigm that hypoxia uniformly supports cancer progression and reveals that the scale of hypoxia, local vs. systemic, dramatically alters its consequences.
GENEVA platform enables lineage-resolved fitness mapping: by pooling 20 human cancer cell lines into mosaic tumors and using SNP-based deconvolution, the authors measure relative fitness of each line under hypoxia versus normoxia. This reveals heterogeneous responses, most lines sensitive, two renal cell lines resistant—and enables correlation of transcriptional changes with fitness, pinpointing de novo purine synthesis as the key pathway.
De novo purine synthesis, not HIF or glucose, drives sensitivity: hypoxia-induced growth suppression persists despite glucose supplementation, constitutive insulin signaling (PTEN knockout), and HIF inactivation (ARNT knockout). Instead, metabolomics and stable isotope tracing show that systemic hypoxia suppresses de novo purine synthesis, shifting tumors toward salvage pathways. Genes in this pathway (PPAT, MTHFD1, PAICS, ATIC) are among those whose downregulation best predicts sensitivity.
Synergy with chemotherapy and immunotherapy, no acquired resistance: systemic hypoxia enhances the efficacy of gemcitabine in pancreatic cancer and anti-CTLA4 in breast cancer, with combination therapy nearly abolishing tumor growth. Serial reimplantation over four passages shows no evidence of resistance, tumors remain sensitive to hypoxia therapy.

Why should we care?

For decades, hypoxia has been cast as a villain in cancer—a driver of angiogenesis, therapy resistance, and poor outcomes. This work turns that narrative on its head by showing that systemic hypoxia does something fundamentally different from local tumor hypoxia. When the entire host experiences low oxygen, the tumor cannot rely on well-oxygenated pockets to supply nucleotides or salvageable substrates. The division of labor that sustains growth in heterogeneous tumors collapses. The mechanistic finding, suppression of de novo purine synthesis, is particularly compelling. Purine synthesis is energetically costly (4 ATP, 1 GTP per AMP), and cancer cells in vivo depend on it more than cells in culture. Systemic hypoxia seems to exploit this vulnerability, shifting tumors toward salvage pathways that cannot keep up with proliferative demand. The translational implications are substantial. Systemic hypoxia synergizes with both chemotherapy and immunotherapy, and the small molecule HypoxyStat offers a practical route to achieve it without altitude or hypoxic chambers. The epidemiological correlation with altitude, while correlational, adds a layer of real-world plausibility.

Wayfarer: A multiscale framework for spatial analysis of tumor progression

Moses, L. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.16.706245

The paper in one sentence

Wayfarer is a multiscale spatial analysis framework that tracks how spatial statistics, Moran's I, Lee's L, and MULTISPATI PCA, evolve across nested spatial aggregations, revealing that tumor progression in lung adenocarcinoma is accompanied by reproducible shifts in scale-dependent spatial patterns that are invisible at any single resolution.

Summary

Spatial biology operates across length scales, from subcellular organization to tissue architecture, yet most spatial transcriptomics analyses implicitly assume that biological patterns are scale-consistent, choosing a single resolution (e.g., Visium spots, 8 μm bins, or one neighborhood radius) and treating scale as a technical nuisance rather than a biological variable. The authors demonstrate that this assumption is false. Using Xenium data from a lung adenocarcinoma (LUAD) progression series (AIS A, AIS B, MIA C, IA), they systematically aggregate transcript counts into square bins ranging from 8 μm to 384 μm and compute how three spatial metrics—Moran’s I (spatial autocorrelation), Lee’s L (spatially informed correlation), and MULTISPATI PCA (spatially informed dimension reduction), change with scale. The results reveal that spatial patterns at fine and coarse scales can co-exist. For example, genes can exhibit bimodal Moran’s I curves, with peaks at both small and large bin sizes, indicating spatial structure at two distinct scales simultaneously, a phenomenon reproduced in synthetic data with sparse large spots and dense small spots. Lee’s L curves similarly show that gene co-expression relationships can be scale-dependent, with some gene pairs exhibiting flat curves that nonetheless conceal qualitatively different spatial organizations at different scales. Crucially, these scale-response profiles differ reproducibly between LUAD stages. Using linear mixed models with spline terms, the authors test whether Moran’s I or Lee’s L curves vary significantly across stages. Over 80% of genes show significant stage-dependent effects, not explained by differential expression alone. ERBB2, for instance, is not significantly differentially expressed between stages in pseudobulk, but its Moran’s I at fine scales is markedly higher in invasive adenocarcinoma (IA) than in earlier stages, reflecting coherent, homogeneous ERBB2-high tumor blocs that emerge only in late disease. This difference disappears at coarser (96 μm) resolutions comparable to Visium, suggesting that analyses at a single scale would miss this progression-associated phenotype. Immune markers reveal a fundamental restructuring of the tumor immune geography. ITGAE (CD103, tissue-resident memory T cells) shows minimal single-cell spatial autocorrelation across all stages, but at coarse scales, Moran’s I increases in IA, consistent with T cells becoming spatially restricted to boundaries rather than intermingled with tumor cells. GZMB (cytotoxic T cells) shifts from fine-scale infiltration to coarse-scale pockets at invasive margins. CXCL9, a T-cell-attracting chemokine, transitions from diffuse patterns in early stages to boundary-localized clustering in IA. These shifts from fine-scale mixing to coarse-scale exclusion would be invisible without multiscale analysis.

Personal highlights

Multiscale spatial analysis reveals co-existing patterns at different scales: by systematically aggregating Xenium data across bin sizes from 8 μm to 384 μm, the authors show that spatial autocorrelation (Moran’s I) and gene co-expression (Lee’s L) can exhibit multiple peaks or plateaus, indicating spatial structure at multiple scales simultaneously. Bimodal curves in real data, reproduced in synthetic data with sparse large spots and dense small spots, demonstrate that fine-scale and coarse-scale patterns are not mutually exclusive but can co-exist within the same tissue section.
Stage-specific spatial signatures invisible at single resolution: linear mixed models with spline terms reveal that over 80% of genes have Moran’s I curves that differ significantly between LUAD stages, independent of differential expression. ERBB2 exemplifies this: not differentially expressed in pseudobulk, but with markedly higher fine-scale Moran’s I in invasive adenocarcinoma (IA) than in earlier stages, reflecting coherent ERBB2-high tumor blocs that emerge only in late disease. This difference disappears at Visium-like resolutions, showing how single-scale analyses can miss progression-associated phenotypes.
Immune geography shifts from fine-scale infiltration to coarse-scale exclusion: T-cell markers (ITGAE, GZMB) and the chemokine CXCL9 exhibit scale-dependent changes with progression. In early stages, these markers show fine-scale spatial mixing with tumor cells; in IA, they become restricted to coarse-scale pockets at invasive margins. This shift from infiltration to exclusion, a hallmark of immune evasion, is only detectable by comparing how spatial statistics change across scales, not by any single-resolution measurement.
Gene co-expression relationships are scale-dependent and stage-specific: Lee’s L curves for gene pairs with known biological relevance (e.g., ERBB2-PRG4, SPP1-APOE, SPP1-CXCL9) show that correlation can change sign with scale and differ between stages. For ERBB2-PRG4, weak fine-scale correlation across all stages gives way to negative correlation in IA at coarse scales, reflecting PRG4 exclusion from ERBB2-high regions. For SPP1-APOE, positive correlation at fine scales transitions to negative correlation in IA, with local analysis revealing co-existing zones of positive and negative correlation that cancel out in global metrics—a phenomenon invisible without multiscale decomposition.

Why should we care?

Spatial transcriptomics has given us the ability to see where genes are expressed in tissue, but most analyses implicitly assume that biological patterns are scale-invariant, that the right resolution can be chosen once and applied universally. Wayfarer demonstrates that this assumption is not just oversimplified but actively misleading. The same tissue can contain spatial structures at multiple scales simultaneously, and the relationship between scales can change with disease progression in ways that single-resolution analyses completely miss. The implications are profound. When we choose a single bin size or neighborhood radius, whether by convention (Visium spots), convenience (8 μm Xenium bins), or algorithmic default, we are not just simplifying; we are selecting which biological phenomena we can see. The ERBB2 example shows that a key progression-associated phenotype (coherent ERBB2-high tumor blocs in invasive cancer) is detectable at fine scales but disappears at Visium-like resolutions. The immune geography examples show that the shift from T-cell infiltration to exclusion, a central mechanism of immune evasion, manifests as a change in scale-dependent behavior, not a simple change in cell counts or correlation magnitude.

The GENEVA platform models tumor mosaicism to reveal variations of responses to KRAS inhibitors and identify improved drug combinations

Yu, J. X. et al. Nature Cancer (2026). https://doi.org/10.1038/s43018-026-01130-5

The paper in one sentence

GENEVA is a scalable platform that pools multiple cancer cell lines or patient-derived models into mosaic tumors, enabling single-cell-resolution profiling of drug responses across diverse genetic backgrounds within a single in vivo experiment, revealing that KRAS-G12C inhibitors kill cancer cells via mitochondrial hyperactivation and that EMT is a prominent in vivo resistance mechanism.

Summary

Preclinical cancer drug development relies on xenograft models that are costly, labor-intensive, and difficult to scale, limiting the number of genetic backgrounds that can be tested before clinical trials, where efficacy across diverse patients determines success. Yu, Suh, and colleagues introduce GENEVA (GENetically diverse and Endogenously controlled phenotypic Variation Assay), a platform that addresses this by pooling tens of cell lines or patient-derived models into mosaic 3D cultures or xenograft tumors. After drug treatment, single-cell RNA-seq combined with SNP-based deconvolution and MULTI-seq hashing assigns every cell to its line of origin and treatment condition, enabling simultaneous measurement of sensitivity, cell cycle state, and transcriptomic response across models. Applying GENEVA to KRAS-G12C inhibitors (ARS-1620, sotorasib, adagrasib) across panels of lung cancer lines, patient-derived organoids, and xenografts, the authors uncover several unexpected findings. First, cells surviving inhibitor treatment have significantly lower mitochondrial transcript content. CRISPRi screens confirm that knockdown of mitoribosomal and mitochondrial genes confers resistance. Acute treatment rapidly increases mitochondrial membrane potential, spare respiratory capacity, and oxygen consumption—effects that precede caspase cleavage and are specific to KRAS-G12C inhibition (not seen with other chemotherapies). Inhibiting complex III with antimycin A rescues cell death, demonstrating that mitochondrial hyperactivation is a direct mechanism of on-target killing. Second, GENEVA identifies mTOR and EMT as key tolerance pathways. mTOR signature genes are upregulated in persister cells, and combining KRAS-G12C inhibitors with the mTOR inhibitor INK128 shows strong Bliss synergy in vitro and in vivo. EMT genes emerge strongly only in vivo—not in vitro, highlighting the importance of the tumor microenvironment. Combining ARS-1620 with the TGFβ receptor inhibitor galunisertib (targeting EMT) synergistically reduces tumor growth. Third, in vivo CRISPR screens validate GENEVA-prioritized targets: knocking down mTOR or EMT pathway genes sensitizes cells to KRAS-G12C inhibitors, while knocking down mitochondrial ribosomal genes protects them. Finally, GENEVA enables systematic mapping of drug combination synergies at single-cell resolution, revealing that mitochondrial genes are downregulated in synergistic combinations (ARS-1620 + galunisertib, ARS-1620 + INK128), consistent with the mechanism of action.

Personal highlights

Multiplexed in vivo pharmacogenomics at single-cell resolution: GENEVA pools tens of cell lines or patient-derived models into mosaic xenografts, then uses SNP-based deconvolution and MULTI-seq hashing to assign every sequenced cell to its line of origin and treatment condition. This enables simultaneous measurement of drug sensitivity, cell cycle effects, and transcriptomic responses across diverse genetic backgrounds within a single mouse, dramatically scaling preclinical pharmacogenomics while controlling for technical variation.
Mitochondrial hyperactivation as a mechanism of KRAS-G12C inhibitor killing: cells surviving treatment show reduced mitochondrial transcripts, and CRISPRi screens reveal that knocking down mitoribosomal or mitochondrial genes confers resistance. Acute inhibitor treatment rapidly increases mitochondrial membrane potential, spare respiratory capacity, and oxygen consumption, preceding caspase cleavage. Inhibiting complex III with antimycin A rescues cell death, demonstrating that KRAS-G12C inhibitors kill via on-target mitochondrial hyperactivation, a mechanism distinct from the canonical view of MAPK pathway suppression.
EMT emerges as an in vivo-specific resistance mechanism: while mTOR pathway upregulation is observed both in vitro and in vivo, epithelial-mesenchymal transition (EMT) signatures appear only in xenograft tumors, not in culture. This highlights the importance of the tumor microenvironment in shaping resistance and demonstrates GENEVA’s ability to capture in vivo-specific biology that would be missed by conventional in vitro screening.
Systematic mapping of drug combination synergies: GENEVA enables quantitative Bliss synergy analysis at single-cell resolution, revealing that ARS-1620 combinations with mTOR (INK128) or EMT (galunisertib) inhibitors show strong synergy across KRAS-G12C lines. Gene-level synergy modeling identifies mitochondrial genes as consistently downregulated in synergistic combinations, mechanistically linking the combination effect to the drug’s primary mechanism.

Why should we care?

The gap between preclinical cancer models and clinical outcomes is stark: drugs that work in mice often fail in humans, in part because we test them in too few models before moving to trials. Xenografts are expensive and labor-intensive, so we typically evaluate candidates in a handful of cell lines or patient-derived models, hoping they represent the diversity of human tumors. They don’t. GENEVA offers a way out. By pooling dozens of models into a single mouse, it scales in vivo pharmacogenomics by an order of magnitude, turning what was a 50-mouse experiment into a 1-mouse experiment. More importantly, it provides rich molecular data, not just cell counts but transcriptomes, cell cycle states, and gene expression responses, across all models simultaneously. This turns drug testing from a binary “sensitive/resistant” readout into a high-dimensional portrait of how different genetic backgrounds respond. The biological discoveries enabled by this approach are striking. The finding that KRAS-G12C inhibitors kill cells by hyperactivating mitochondria, not just suppressing MAPK signaling, changes our understanding of how these drugs work and suggests new combination strategies (like adding complex III inhibitors to enhance killing). The emergence of EMT as an in vivo-specific resistance mechanism underscores that we cannot rely on in vitro models alone; the tumor microenvironment fundamentally shapes drug response.

Reconstructing multi-scale tissue spatial architecture from single-cell RNA-seq with REMAP

Jiang, S. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.21.707167

The paper in one sentence

REMAP is a deep learning framework that reconstructs spatial locations of cells from dissociated single-cell RNA-seq data by integrating gene expression with neighborhood-level gene-gene covariance learned from one or multiple spatial transcriptomics references, enabling multi-scale spatial analysis of existing scRNA-seq atlases.

Summary

Single-cell RNA sequencing (scRNA-seq) provides transcriptomes at scale but loses spatial context; spatial transcriptomics (ST) preserves location but is costly and limited in gene coverage. REMAP bridges this gap by learning to predict where cells in scRNA-seq data likely originated, using ST data as a reference. The key innovation is the use of both first-order (individual gene expression) and second-order (gene-gene covariance within cellular neighborhoods) features. During training on ST data, REMAP identifies spatial neighbors for each cell, computes covariance among neighboring cells’ gene expression to capture local tissue context, and trains a neural network to predict coordinates from these combined features. For scRNA-seq inference, where neighbors are unknown, REMAP iteratively refines covariance estimates: an initial guess from ENVI, location prediction, then a second network refining covariance based on predicted locations, repeating to improve accuracy. When multiple ST references are available (e.g., covering different tissue regions), REMAP shifts from predicting absolute coordinates to predicting pairwise cell-cell distance matrices, which remain comparable across differently oriented slices. This enables reconstruction of global tissue relationships even from fragmented captures. Across extensive benchmarking—mouse brain (10x Visium HD vs. Xenium), human fetal cortex (MERFISH), colorectal cancer (Visium HD vs. Xenium), and seven cancer types—REMAP consistently outperforms existing methods (CeLery, iSORT, LUNA, CellContrast) in preserving pairwise distances, reconstructing fine structures (hippocampal subregions, V1/V2 cortical border, curved tumor architectures), and recovering cellular neighborhood (CN) networks. In a human multiple sclerosis atlas (15 samples, paired Visium and snRNA-seq), REMAP enabled spatial analysis of microglial neighborhoods. It revealed that inactive MS samples stratify into two subgroups: one resembling controls (minimal microglial self-colocalization), the other mirroring active MS (increased microglia-microglia interactions). Within a rare microglial subpopulation colocalized with astrocytes, REMAP identified a transitioning, pro-inflammatory state enriched for interferon signaling and MS-relevant markers (CHIT1, SIGLEC1)—insights invisible to snRNA-seq alone. Across five cancer types (cervical, ovarian, melanoma, lung, prostate), REMAP uncovered conserved spatial subtypes of cancer-associated fibroblasts (CAFs) based on neighborhood composition. These matched the s1–s4 CAF taxonomy: s1-CAFs tumor-adjacent (prognostic), s2-CAFs self-colocalized, s3-CAFs adjacent to macrophages, s4-CAFs near tertiary lymphoid structures. Transcriptional profiling confirmed distinct functional programs (ECM remodeling, stress response, antigen presentation), demonstrating REMAP’s ability to decode microenvironmental organization from dissociated cells

Personal highlights

Neighborhood covariance as a second-order spatial signal: REMAP goes beyond matching individual cell expression by learning gene-gene covariance within cellular neighborhoods, a proxy for local tissue context. This second-order information captures microenvironmental patterns (e.g., cell-type composition, signaling niches) that are more spatially informative than individual transcriptomes alone, enabling accurate reconstruction even when expression profiles alone are ambiguous.
Iterative refinement of latent spatial context: for scRNA-seq data where true neighbors are unknown, REMAP initializes covariance estimates using ENVI, then iteratively refines them: predict locations, then predict better covariance from those locations, then re-predict locations. This closed loop progressively aligns the latent spatial representation with the expression data, overcoming the initial lack of spatial information.
Multi-reference integration via pairwise distance prediction: When tissue samples exceed a single ST capture, common in practice, REMAP switches from absolute coordinates to pairwise distance matrices, which remain comparable across slices with different orientations. A grid-based downsampling strategy makes training tractable, and optional neighbor filtering scales inference to large datasets. This enables global tissue reconstruction from fragmented references.
Conserved CAF spatial subtypes across cancers: across five cancer types, REMAP recovered the s1–s4 CAF taxonomy from dissociated scRNA-seq data, based solely on predicted neighborhood compositions. These subtypes showed distinct transcriptional programs (ECM remodeling, stress response, antigen presentation) and spatial niches (tumor-adjacent, self-colocalized, macrophage-adjacent, TLS-adjacent), validating that microenvironmental organization can be inferred from single-cell data and revealing conserved principles of CAF architecture.

Precancerous niche remodelling dictates nascent tumour persistence

Skrupskelyte, G. et al. Nature (2026). https://doi.org/10.1038/s41586-026-10157-8

The paper in one sentence

A subset of nascent tumours in the mouse oesophagus survive by instructing local fibroblasts to form a fibronectin-rich stromal niche via an EGF-SOX9-FN1 signalling axis, and disrupting this interaction prevents tumour persistence.

Summary

Most studies of early tumorigenesis focus on mutations in cancer cells, but healthy tissues accumulate cancer-associated mutations with age, suggesting that additional factors determine whether mutant clones progress to tumours. Using a diethylnitrosamine (DEN) mouse model of upper gastrointestinal tract carcinogenesis, Skrupskelyte, Rojo Arias, and colleagues investigate why some nascent tumours persist while others are eliminated. At 10 days post-DEN, microscopic tumours (as few as 10 cells) fall into two categories: Niche⁻ lesions with no stromal reorganization, and Niche⁺ lesions where underlying PDGFRαⁱᵒʷ lamina propria fibroblasts form a supportive scaffold protruding into the epithelium. Over time, Niche⁻ tumours are progressively eliminated, while Niche⁺ tumours persist, enlarge, and become enriched in the tissue. By 8 months, 82% of surviving tumours are Niche⁺. Lineage tracing shows that niche fibroblasts derive from local PDGFRαⁱᵒʷ cells that clonally expand beneath persistent tumours. Single-cell RNA sequencing of microdissected tumours identifies a tumour-specific epithelial population (”Tumour 12”) characterized by high SOX9 expression and enrichment for EGF ligands (AREG, HBEGF) and ECM-interacting genes (LAMC2, ITGB6). This population signals to fibroblasts via EGF, promoting their migration and inducing a pro-fibrotic transcriptional program with marked upregulation of fibronectin (FN1) and other ECM components. Functional assays confirm that tumour-derived signals are sufficient: normal epithelium exposed to denuded tumour stroma acquires tumour-like features in 3D culture and shows enhanced engraftment in vivo. The EGF-SOX9-FN1 axis is functionally required: inhibiting EGFR signalling with gefitinib or blocking fibronectin fibrillogenesis with the FUD peptide reduces Niche⁺ tumour formation and overall tumour burden.

Personal highlights

Nascent tumours stratify by niche-forming ability within days: at 10 days post-carcinogen, microscopic tumours (≥10 cells) already segregate into Niche⁻ (no stromal change) and Niche⁺ (fibroblast scaffold) phenotypes. Longitudinal tracking shows Niche⁻ tumours are progressively eliminated, while Niche⁺ tumours persist and enlarge, demonstrating that fate is determined at the earliest stages by stromal interaction, not just mutation burden.
Local PDGFRαⁱᵒʷ fibroblasts form the niche via clonal expansion: lineage tracing using Colla2-Cre and Pdgfra-Cre mice reveals that niche fibroblasts derive from the lamina propria PDGFRαⁱᵒʷ population, not deeper submucosal fibroblasts. These cells clonally expand beneath persistent tumours, indicating that the niche is built by proliferation of local fibroblasts, not recruitment from distant sources.
A rare tumour-specific epithelial state (Tumour 12) drives niche formation: scRNA-seq identifies a distinct keratinocyte population (Tumour 12) enriched in persistent tumours, marked by high SOX9, EGF ligands (AREG, HBEGF), and ECM-interacting genes (LAMC2, ITGB6). This stress-associated state, not present in all tumour cells, is the signalling hub that instructs fibroblast recruitment and ECM remodelling.
EGF-SOX9-FN1 axis is necessary and sufficient for niche formation: chemoattractant assays show tumour-derived AREG stimulates fibroblast migration. 3D epithelioid-fibroblast cocultures demonstrate that expanding keratinocytes (high SOX9) induce fibroblast segregation, FN1 deposition, and vimentin upregulation, all blocked by EGFR inhibition. In vivo, gefitinib or the fibronectin assembly inhibitor FUD reduces Niche⁺ tumour formation and overall burden, validating the axis as a therapeutic target.

Why should we care?

For decades, we’ve thought of cancer as a disease of mutations, accumulate enough drivers, and a tumour forms. But recent findings that normal ageing tissues are riddled with cancer-associated mutations have forced a rethink: mutations are common, but tumours are rare. Something else determines which mutant clones cross the line. This study provides a compelling answer: the ability to remodel the microenvironment. Nascent tumours that survive do so because a subset of their cells activate a stress program (SOX9 high, EGF high) that recruits local fibroblasts and builds a fibronectin-rich supportive niche. Tumours that fail to do this are eliminated, despite presumably carrying similar mutations. Persistence is not about what mutations you have, but about how you talk to your neighbours.

mSWI/SNF complex inhibition sensitizes KRAS-mutant lung cancers to targeted therapies via epithelial-mesenchymal subversion

Gentile, C. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.27.708377

The paper in one sentence

Inhibiting mSWI/SNF chromatin remodeling complexes with the clinical-grade SMARCA4/2 inhibitor FHD-286 reverses EMT-driven resistance to KRAS inhibitors in lung cancer by suppressing AXL and mesenchymal programs, synergizing with KRAS-targeted therapies across multiple mutation subtypes and in patient-derived models.

Summary

KRAS-mutant lung cancers respond to targeted inhibitors like sotorasib and adagrasib, but responses are typically short-lived (<8 months) and resistance inevitably emerges. In nearly half of patients, resistance occurs without new mutations, implicating non-genetic mechanisms such as chromatin-mediated transcriptional plasticity.

Gentile, Feng, and colleagues identify mSWI/SNF (BAF) chromatin remodeling complexes as critical determinants of this adaptive resistance. Analyzing KRAS-mutant tumors and cell lines, they find mSWI/SNF genes among the top transcriptional regulators. Combining the clinical-grade SMARCA4/2 ATPase inhibitor FHD-286 with KRAS-G12C inhibitors (sotorasib, adagrasib) produces strong synergy in 5/8 cell lines tested—but only in those with mesenchymal signatures. Cell lines with epithelial phenotypes lack acute synergy but still show enhanced response durability over time. The mechanistic dissection is comprehensive. CUT&RUN and ATAC-seq reveal that mSWI/SNF complexes occupy distinct genomic sites in synergy versus non-synergy lines, with synergy-specific sites enriched for EMT, cytoskeletal organization, and cell migration genes. Combination treatment (sotorasib + FHD-286) uniquely downregulates these EMT programs and reduces chromatin accessibility at loci including the receptor tyrosine kinase AXL, a known driver of EMT-mediated resistance. AXL overexpression confers sotorasib resistance, which FHD-286 reverses. The synergy extends beyond G12C: FHD-286 sensitizes KRAS-G12S, -G12A, -G13D, and -G12D lines to the pan-RAS inhibitor RMC-6236 or the G12D-specific inhibitor MRTX-1133, with durable suppression of regrowth after drug washout. AXL and vimentin induction by RAS inhibitors is blunted by FHD-286 co-treatment, and the AXL inhibitor bemcentinib partially phenocopies the effect. In sotorasib-resistant H358 cells (H358SR), established by 3-month drug exposure, mSWI/SNF complexes retarget to new genomic loci enriched for EMT, integrin signaling, and TNFα pathways. FHD-286 alone reverts AXL and vimentin expression and resensitizes cells, but only when combined with MEK inhibition to block pERK rebound, revealing a vertical inhibition strategy. Patient-derived ex vivo tumor spheroids (DFCI486, DFCI491) and PDX models (PHLC239, PHLC194) confirm the synergy. In the G12Ci-resistant PDX_PHLC239, only combination treatment significantly reduced tumor volume over 56 days. In the initially sensitive PDX_PHLC194, tumors relapsed on sotorasib monotherapy but remained suppressed with FHD-286 co-treatment. The authors propose a model where mSWI/SNF complexes maintain mesenchymal chromatin states that enable adaptive resistance; their inhibition collapses this state, enhancing both depth and durability of KRAS inhibitor response.

Personal highlights

mSWI/SNF complexes as master regulators of EMT-mediated resistance: upstream regulator analysis of KRAS-mutant tumors ranked mSWI/SNF genes among top transcriptional regulators. CUT&RUN profiling revealed that synergy-specific mSWI/SNF occupancy sites are enriched for EMT, cytoskeletal organization, and cell migration genes, distinct from non-synergy lines where occupied sites enrich for WNT signaling and cell cycle. This establishes chromatin accessibility at EMT loci as a determinant of response.
FHD-286 synergy is predicted by EMT status, not STK11 mutation: While 4/5 synergy lines were STK11-mutant (”KL” subtype), CRISPR knockout or rescue experiments ruled out STK11 as the mechanism. Instead, synergy tracked with mesenchymal signature scores. Epithelial lines lacked acute synergy but still benefited from enhanced durability, suggesting mSWI/SNF inhibition blocks eventual EMT-mediated escape even when not immediately synergistic.
AXL as a critical downstream effector of mSWI/SNF-driven resistance: Combination treatment reduced chromatin accessibility at the AXL locus and downregulated its expression. AXL overexpression in H358 cells conferred sotorasib resistance, which FHD-286 reversed. The AXL inhibitor bemcentinib phenocopied FHD-286 effects in non-G12C lines, validating AXL as a key node. This positions mSWI/SNF inhibition as a strategy to target AXL in the absence of effective clinical AXL inhibitors.
Broad efficacy across KRAS mutation subtypes and inhibitor classes: FHD-286 sensitized KRAS-G12S, -G12A, -G13D, and -G12D lines to pan-RAS (RMC-6236) and G12D-specific (MRTX-1133) inhibitors. In H441 (KRAS-G12V), combination treatment prevented regrowth after drug washout, a durable response not seen with single agents. This suggests the mechanism is mutation-agnostic and targets a common adaptive program.

Why should we care?

KRAS inhibitors have transformed treatment for the one-third of lung cancer patients with KRAS mutations, but the excitement is tempered by reality: responses are rarely durable, and resistance almost always emerges. The field has focused on genetic bypass mechanisms, but nearly half of progressing patients lack new resistance mutations, pointing to non-genetic, adaptive plasticity as the culprit. This work identifies that plasticity as chromatin-mediated, driven by mSWI/SNF complexes maintaining a mesenchymal state permissive for resistance. The clinical-grade SMARCA4/2 inhibitor FHD-286 collapses that state, suppressing AXL and other EMT programs, and synergizes with KRAS inhibitors across multiple mutation subtypes—including those not covered by G12C-specific drugs.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

The Most Valuable Thing You Can Do as a Scientist

Sebastiaan Vanuytven — Sat, 28 Feb 2026 16:10:37 GMT

“What’s the most valuable thing you can do as a scientist?” That question has been on my mind this past week, so I’m switching out the weekly reads for my thoughts on why the answer is, without a doubt, mentoring.

Already during my PhD, I realised that the aspect of science I like most is being a small part in the training and success of young scientists. During my PhD and internships, I had the opportunity to be in environments where collaboration and teamwork were not dirty words, and there was no distinction between wet-lab and dry-lab scientists. Nothing brought more fun to me than being able to work on a wide variety of topics and seeing other people succeed. If you asked me what my biggest accomplishment was during my PhD, I would answer without hesitation: guiding one exceptionally bright master’s student who exceeded everyone’s expectations and turned into a better bioinformatician than me, albeit with worse humour. I think his thesis defence really cemented in me that being a mentor is, for me, the most important part of being a scientist.

During my master’s degrees and PhD training, I was lucky enough to be trained and mentored by some incredible scientists. I was supervised by Thierry Voet, one of the pioneers in single-cell multi-omics, who made me into a skilled grant writer and showed me the importance of bringing together the right people at the right time with the same goal in mind. He gave me complete trust and freedom from the moment I entered his group, something I also see as a characteristic in other successful group leaders. My other supervisor, Peter Van Loo, was and still is a bit of a scientific superhero to me, and I feel very privileged that he gave me the opportunity to do bioinformatics at one of the leading institutes, with a Nobel Prize winner sitting just a few feet away. He showed me the importance of kindness, staying true to yourself, and having an open mind.

On top of that, I got the chance to be guided by three bioinformaticians who have all become PIs in the last five years: Maxime Tarabichi, Jonas Demeulemeester, and the great Alejandro Sifrim. Maxime was kind enough to teach me some of his tips and tricks for successfully juggling multiple projects at the same time, and to humble me from time to time with his presentations. Jonas, on the other hand, has been something of a guide in choosing my next career steps. Without him, my parents would probably not have allowed me to pursue an additional master’s degree. He showed me the importance of working together in science, and his cartoon recommendations were always gold. And then, lastly, the person who probably caused me the most sleepless nights during my PhD, but also the one I could count on time and time again: Alejandro. I will always be grateful that you saw potential in me and challenged me to evolve as a scientist, but also as a person. You gave me the opportunity to work on the project that allowed me to graduate and pursue a postdoc, without expecting anything in return, and for this, you will always be a hero in my book.

But one of the most fruitful and fun mentorships was with someone who was six days younger than me and not even a bioinformatician. Finding someone with whom you can bounce ideas off and come up with the craziest experiments was the highlight for me in the post-COVID world. Having someone who always has the perfect experiment to test the weird findings of a bioinformatician, or the best personal advice, is a true blessing. My favourite moments with him were our journal clubs, which, no matter the subject or presenter, always turned into a debate between the two of us. His enthusiasm for science and dedication to research, driven not by publications and ego but by pure passion, has been truly inspiring. So if you ever have a crazy idea and want the help of a competent scientist in the region of Ghent, hit up Sam Kint.

I’ll be honest: having someone younger guide you felt strange at first. But the skill of all great mentors is that they make boundaries of hierarchy disappear and make you feel that you are doing science together, that they’re not teaching you so much as thinking alongside you. Sam is the kind of gold you search for in a collaborator, because he will make you a better person and scientist without even trying. Of course, you have to take the Ghent accent with it, but hey... the advice is worth the language barrier.

I’ve tried to distil what I learned from these mentors, combined with hard-won lessons of my own, into advice I wish I’d had earlier. In the absence of anyone to mentor in my current lab, I thought I’d share it with the internet.

Wellbeing & Sustainability

Health = 1, family & friends = 2, science = 3. Always. No excuses.
Take a walk every day. Your best ideas will come then—and so will the oxygen.
Get enough sleep. It’s a productivity tool, not a luxury.
Stay curious and passionate. When you feel drained, take time off.

Collaboration & Relationships

Do science with people who make you a better scientist and/or a better person—ideally both.
Treat everyone as your peer. Never assume you are (or will be) the smartest person in the room.
Science is a team effort. You cannot truly succeed alone, only fail alone.
Make jokes, also at your own expense. People relate to humility and humour.
Stay in touch with former mentors and colleagues. A friendly message or a quick scientific exchange goes a long way.

Mindset & Growth

There are no stupid ideas, only closed minds.
Enjoy the journey and have fun. The destination is never set in stone.
Read a paper every day—and read broadly to spark novel ideas.
Stay true to who you are. Academia is a political game, and we need more exceptions.

Impact & Legacy

Exponential impact in science comes from mentoring and teaching the next generation. Your own work will likely be incremental—and that’s okay.
A CNS paper is nice to have, but scientific outreach and teaching kids science? That’s unforgettable.

Practical Wisdom

Keep meetings short and in person whenever possible—better than long email threads.
Always reread emails before sending—from the recipient’s perspective.

I hope that for someone out there, one of these tips can have the same impact it had on me. I have them written in my copy of Letters to a Young Scientist, which I got from my supervisor during my first master’s. When times are rough, these are the words that make me fall in love with science again, that shift my mindset back to feeling blessed and privileged for having the opportunity to do science every day in the best conditions.
And when I do find someone to mentor, I hope I can pass even a fraction of this forward

Cheers,

Seb

P.S. This post was inspired by my colleagues D.E., for recently highlighting how privileged I’ve been with the people who trained me, and A.P., for reminding me what I like most about science.

Weekly reads 16/2/26

Sebastiaan Vanuytven — Sun, 22 Feb 2026 13:59:15 GMT

This week's papers share a common insight: some of the most important biological signals are encoded in variance, context, and hidden structure across scales, rather than mean values. scAmp uses the stochastic inheritance of extrachromosomal DNA to detect oncogene amplifications from single-cell copy-number variation, resolving subclonal ecDNA heterogeneity and its phenotypic consequences directly in patient tumours. Svensson revisits a long-standing failure mode in SCVI, revealing that low-UMI cells collapse toward a learned bias point rather than posterior collapse, and demonstrates how self-supervised augmentation can rescue biological signals that would otherwise be lost. SPATIA views spatial biology as inherently hierarchical, combining morphology, gene expression, and tissue context to enable the controlled generation of microenvironment-dependent phenotypes. The scTumor Atlas prioritises representative malignant states over maximal aggregation, resulting in a useful, interpretable pan-cancer reference for comparing cell lines and predicting gene dependencies. scVital reframes cross-species integration by explicitly removing species signals while preserving conserved cancer cell states, revealing common treatment-resistant programs. Furthermore, OneCell CUT&Tag demonstrates that epigenomic priming can occur before transcriptional commitment, capturing multi-omic state transitions within a single cell.

Across these studies, a recurring theme emerges: when we model biology with the appropriate structure—variance-aware, depth-invariant, hierarchical, species-agnostic, or multi-layered—we discover programs that bulk averages and single-modality analyses consistently overlook.

Preprints/articles that I managed to read this week

scAmp: Analyzing focal gene amplifications at single-cell resolution

Jones, M. G et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.14.705928

The paper in one sentence

scAmp is a probabilistic framework that detects extrachromosomal DNA (ecDNA) amplifications from single-cell copy-number data by leveraging the increased variance caused by random ecDNA inheritance, enabling analysis of subclonal heterogeneity and phenotypic consequences across patient tumors.

Summary

scAmp is an algorithm that detects ecDNA-amplified genes from single-cell copy-number data by exploiting a fundamental biological difference: ecDNAs lack centromeres and are inherited randomly during mitosis, generating greater copy-number variance in cell populations compared to stable chromosomal amplifications. The authors train a multi-layer perceptron on simulated copy-number distributions from a forward-time evolutionary model, featurizing each gene’s distribution across cells by its mean, variance, coefficient of variation, deciles, and interquartile range. scAmp achieves an average precision of 0.96 on simulated data and perfect agreement with previously characterized cell lines, outperforming a null model based on mean copy-number alone (AP 0.89). Crucially, scAmp corrects misclassifications from WGS: the breast cancer cell line BT474 was predicted by WGS to have ecDNA-amplified ERBB2, but scAmp correctly predicted chromosomal amplification, confirmed by metaphase FISH showing ERBB2 co-localized with DAPI-stained chromosomes. Applying scAmp to 73 patient tumors profiled with single-cell ATAC-seq through TCGA reveals ecDNA prevalence across cancer types (gliomas, lung, breast), identifies frequently amplified oncogenes (EGFR, MYC, KRAS), and enables phenotypic analysis. Tumors with ecDNA show shifts in immune composition, T cells enriched in BRCA and LUAD, macrophages in GBMx. Within tumors, ecDNA+ cancer cells exhibit upregulation of glycolysis and hypoxia pathways compared to ecDNA− cells from the same tumor. scAmp’s single-cell resolution reveals striking subclonal heterogeneity. In one GBM tumor, while ~80% of cells contain an MDM2-amplifying ecDNA, distinct subclones harbor additional ecDNAs amplifying MYC or CDK4, with corresponding changes in chromatin accessibility. Finally, scAmp generalizes to clinical FFPE samples analyzed by DNA FISH, correctly classifying ecDNA status in xenograft tumors and a tissue microarray of patient samples.

Personal highlights

Variance-based ecDNA detection from single-cell copy-number data: scAmp leverages the non-Mendelian inheritance of ecDNA, random segregation during mitosis due to lack of centromeres, which generates greater copy-number variance across cell populations compared to stable chromosomal amplifications. This biological insight enables discrimination that bulk WGS cannot achieve, as demonstrated by the BT474 case where WGS misclassified ERBB2 while scAmp correctly predicted chromosomal integration.
Simulation-trained neural network outperforms mean-based models: the authors train a multi-layer perceptron on copy-number distributions generated from a forward-time evolutionary model of ecDNA and chromosomal amplification dynamics. By featurizing distributions with statistics beyond the mean (variance, deciles, IQR), scAmp maintains accuracy even for highly amplified chromosomal loci (copy-number >10) where mean-based models fail.
Subclonal ecDNA heterogeneity and phenotypic consequences: In a GBM tumor, scAmp resolves a dominant MDM2-ecDNA clone with two subclones acquiring additional MYC or CDK4 ecDNAs, revealing ongoing ecDNA diversification. Within tumors, ecDNA+ cancer cells show distinct chromatin accessibility and pathway activation (glycolysis, hypoxia) compared to ecDNA− cells from the same tumor, enabling functional dissection of ecDNA effects.
Clinical applicability to FFPE and DNA FISH: scAmp generalizes beyond single-cell genomics to clinically relevant modalities, correctly classifying ecDNA status from interphase DNA FISH data in xenograft tumors and a 14-sample tissue microarray of patient tumors with MYC amplifications, demonstrating potential for retrospective analysis of archival pathology specimens.

Why should we care?

Extrachromosomal DNA is not a rare curiosity, it appears in approximately 17% of primary tumors overall and is associated with significantly worse patient outcomes. Yet our understanding of ecDNA has been constrained by the limitations of bulk sequencing, which cannot resolve which cells within a tumor carry ecDNA, how ecDNA evolves over time, or what transcriptional consequences it confers. scAmp opens these questions by transforming single-cell copy-number data, already generated by assays like scATAC-seq, into a quantitative readout of ecDNA status.

Improving SCVI for low-count cells through self-supervised augmentation

Svensson, V. bioRxiv (2026). https://doi.org/10.64898/2026.02.11.705441

The paper in one sentence

By adding binomial thinning augmentation and a cross-correlation loss during training, SCVI can learn representations that preserve biological signal for low-UMI cells, which typically collapse to a learned bias point, enabling analysis of cells that would otherwise be discarded.

Summary

Single-cell RNA sequencing data suffers from variation in total molecule counts (library size) between cells, a major source of nuisance variation. SCVI, a count-based variational autoencoder, is designed to integrate out this variation, but cells with extremely low UMI counts still separate from high-UMI cells in learned representations and are typically filtered out before analysis. Svensson investigates the mechanism behind this failure by artificially reducing the UMI counts of high-UMI cells through binomial thinning and passing them through trained SCVI encoders across six representative datasets. The key finding: as UMI depth decreases, cells converge toward a learned bias point in the encoder’s latent space, a fixed point representing a cell with zero observed molecules. This convergence is distinct from classical posterior collapse driven by KL regularization; massively increasing the KL term produces a different failure mode (collapse to the origin), while the bias point can be far from the origin. To address this, Svensson modifies the training procedure in two ways: (1) binomial thinning augmentation, artificially subsampling counts during training to expose the model to low-depth cells, and (2) a cross-correlation loss between embeddings of original and thinned cells, encouraging the encoder to produce similar representations regardless of depth. This approach is inspired by self-supervised learning methods like Barlow Twins, which reduce redundancy in representations. Ablation experiments show that augmentation alone is insufficient and degrades performance even at high depths. The cross-correlation loss is necessary, but reconstruction loss is also essential, pure self-supervision without reconstruction loses biological signal. The optimal configuration (JointEmbed with w=100) preserves cluster membership and condition differences down to ~100 UMI depth, where standard SCVI fails completely (cluster accuracy 0.083 vs. 0.280 at 100 UMI; condition accuracy 0.383 vs. 0.440). These gains come without sacrificing reconstruction quality.

Personal highlights

Identification of bias point convergence distinct from posterior collapse: by systematically thinning high-UMI cells and tracing their trajectories through trained encoders, Svensson reveals that low-UMI cells collapse to a learned bias point, not the origin, demonstrating that this failure mode is distinct from classical KL-driven posterior collapse. Massive KL weighting produces collapse to the origin, while the bias point can be arbitrarily far, clarifying a long-standing empirical observation in the field.
Binomial thinning as self-supervised augmentation: the training modification exposes the model to artificially subsampled versions of high-UMI cells during training, forcing the encoder to learn representations invariant to total count depth. This simple data augmentation strategy, binomial thinning of observed counts, is biologically grounded in the sampling process of scRNA-seq and requires no external annotations.
Cross-correlation loss preserves biological signal across depths: borrowing from self-supervised learning (Barlow Twins), the added loss term encourages the embeddings of original and thinned cells to be similar while reducing redundancy across dimensions. This prevents the encoder from learning depth-dependent features and maintains cluster structure and condition differences at low UMI depths where standard SCVI fails.
Ablation reveals necessity of both augmentation and reconstruction: augmentation alone degrades performance even at high depths, and pure self-supervision without reconstruction loses biological signal entirely. The optimal configuration requires the full combination: augmentation, cross-correlation loss, and reconstruction loss, demonstrating that representation learning and generative modeling are complementary rather than substitutable.
Practical extension of usable cell range without sacrificing quality: the modified model preserves cluster accuracy and condition differences down to ~100 UMI depth (vs. standard SCVI failing below ~1000 UMI) with minimal impact on reconstruction metrics. This extends the range of analyzable cells, enabling inclusion of low-quality cells from precious samples or cost-effective shallow sequencing.

Why should we care?

Single-cell genomics faces a persistent trade-off: to get good data, you need high-quality cells with many transcripts; to work with precious or difficult samples, you often get low-quality cells with few transcripts. Standard practice is to filter out the latter, discarding potentially valuable biological material because current computational tools cannot handle it. Svensson’s work shows that this trade-off is not inevitable. By understanding the precise mechanism by which SCVI fails on low-UMI cells (convergence to a learned bias point, not posterior collapse), he designs a targeted fix: train the model to be invariant to total count depth by showing it augmented versions of its own data and enforcing representation consistency. The result is a model that retains biological signal down to ~100 UMI—cells that would normally be thrown away.

SPATIA: Multimodal Generation and Prediction of Spatial Cell Phenotypes

Kong, Z et al. bioRxiv (2025). https://doi.org/10.64898/2026.02.18.706593

The paper in one sentence

SPATIA is a hierarchical multimodal model that integrates cell morphology, gene expression, and spatial context across scales, from individual cells to niches to whole tissues, to enable both predictive analysis and controllable generation of microenvironment-dependent cellular phenotypes.

Summary

Image-based spatial transcriptomics technologies provide matched measurements of cellular morphology and gene expression in intact tissue, but existing methods typically analyze these modalities in isolation, lack cell-level resolution, or cannot model how local spatial context shapes cellular phenotypes. Kong and colleagues introduce SPATIA, a unified framework that learns spatially aware representations by explicitly modeling biological structure across three nested scales: individual cells, local niches (256×256 px regions containing 10–30 cells), and whole-slide tissue context. At the cell level, SPATIA fuses image-derived morphological tokens and gene expression embeddings via cross-attention. At the niche level, a transformer aggregates neighboring cell embeddings with regional image patches to model local cell–cell interactions. At the tissue level, a global transformer captures long-range dependencies across the full slide. This hierarchical design enables SPATIA to learn representations that integrate intrinsic cell state with extrinsic spatial context. More importantly, SPATIA introduces a spatially conditioned generative framework for predicting morphological outcomes of perturbations without requiring paired pre–post data. The authors construct weak supervision pairs between control and perturbed cells using entropy-regularized optimal transport (OT) in gene expression space, constrained by lineage consistency and spatial proximity. To address noise in these weak matches, they propose a confidence-aware flow matching objective that reweights training trajectories based on OT coupling uncertainty. A morphology-profile alignment loss further ensures generated cells match the distribution of real target morphologies in CellProfiler feature space. Across 12 tasks spanning phenotype generation, cell annotation, clustering, gene imputation, and cross-modal prediction, SPATIA outperforms 18 existing models, achieving an 8% improvement in generative fidelity (FID/KID) and up to 3% gains in predictive benchmarks. Ablation studies confirm that each hierarchical level contributes meaningfully, and robustness analysis shows the model remains stable under moderate OT pairing errors (10–20% corruption).

Personal highlights

Hierarchical multi-scale architecture from cells to tissue: SPATIA explicitly models biological organization across three nested levels, individual cells, local niches (256×256 px regions), and whole-slide tissue context, using cross-attention transformers at each scale. This design captures both fine-grained cellular features and the spatial dependencies that govern tissue function, enabling representations that integrate intrinsic cell state with extrinsic microenvironmental context.
Confidence-aware flow matching for perturbation modeling without paired data: to predict morphological outcomes of biological transitions, where paired pre–post observations are unavailable, SPATIA constructs weak supervision pairs via optimal transport in gene expression space, constrained by lineage and spatial proximity. A confidence-weighting scheme downweights uncertain OT matches during flow matching training, while a condition-contrastive regularization encourages the model to distinguish different transition types, enabling controllable generation without brittle one-to-one correspondences.
Morphology-profile alignment ensures biological fidelity: generated cell images are evaluated not only by perceptual metrics (FID/KID) but also by their alignment with real target distributions in CellProfiler feature space. A sliced Wasserstein distance loss explicitly enforces that generated morphologies match the statistical properties of true target cells, ensuring that improvements in visual realism translate to biological correctness.
MIST: A large-scale multi-platform spatial transcriptomics atlas: the authors assemble and curate MIST, a dataset of 25.9 million cell–gene pairs from 74 sources spanning 17 tissues, 60 donors, and four major platforms. This resource enables cross-platform benchmarking and provides a foundation for training models that generalize across technical and biological variation.
Unified performance across generative and predictive tasks: SPATIA achieves state-of-the-art results on both fronts: 8% improvement in generative fidelity over specialized models like GeneFlow and MorphDiff, while matching or exceeding task-specific models on cell annotation, clustering, biomarker prediction, and gene expression imputation. This demonstrates that a single model can support both exploratory simulation and quantitative downstream analysis without sacrificing either capability.

Why should we care?

Spatial transcriptomics has transformed our ability to see where genes are expressed in tissue, but connecting that molecular information to what cells actually look like, and predicting how they might change under disease or treatment, has remained out of reach. Existing models either ignore morphology, lose cell-level resolution, or cannot simulate perturbations. SPATIA bridges these gaps by treating spatial biology as it actually is: hierarchical, multimodal, and context-dependent. The confidence-aware flow matching framework provides a general recipe for learning perturbation models when paired data doesn’t exist, a common scenario in biology where destructive measurements prevent tracking the same cell over time. The morphology-profile alignment loss offers a way to ground generative models in biologically meaningful features rather than pixel-level statistics alone. And the MIST dataset itself will likely become a valuable community resource for training and benchmarking spatial models.

A Pan-Cancer Single-Cell Atlas to Evaluate Tumor Identity, Cell Line Concordance, and Dependency Mapping

Reveron-Thornton, R. F et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.14.705396

The paper in one sentence

The scTumor Atlas is a curated, quality-controlled pan-cancer single-cell reference of 135,424 malignant cells from 499 samples across 36 cancer types that enables systematic evaluation of tumor identity, benchmarking of cancer cell line models, and inference of gene dependencies directly from single-cell transcriptional states.

Summary

Bulk RNA sequencing has enabled large-scale pan-cancer analyses but obscures cancer cell-specific programs due to admixture with nonmalignant cells. Single-cell RNA sequencing resolves this, yet existing atlases often prioritize maximal data aggregation over biological coherence, resulting in unwieldy resources with variable data quality and limited interpretability. The authors here fundamentally different approach: rather than maximizing cell count, they prioritize representative malignant transcriptional states. Starting from public scRNA-seq datasets, they apply uniform stringent quality control (cells with <5,000 UMIs or >10% mitochondrial transcripts excluded), doublet removal with Scrublet, and careful malignant cell annotation. To prevent any single dataset from dominating, they implement a two-step downsampling framework using Mahalanobis distance from the centroid in principal component space—first per sample (capping at 5,000 cells), then per cancer type (capping at 5,000 representative cells). After integration with Harmony and scANVI, the final scTumor Atlas contains 135,424 high-quality malignant cells from 499 samples spanning 36 adult and pediatric malignancies. The atlas preserves lineage-specific transcriptional programs, with epithelial, mesenchymal, hematologic, and neuroendocrine cancers forming coherent clusters. Pathway analysis recapitulates expected biology: oxidative phosphorylation enriched in lung squamous carcinoma but not ALL, androgen signaling in prostate cancer, estrogen signaling in breast cancer, KRAS signaling in pancreatic cancer, and EMT signatures in sarcomas. These patterns align with independent TCGA bulk RNA-seq data, validating that the selected malignant states reflect broader tumor biology. The authors then use the atlas to evaluate cancer cell line (CCL) fidelity. By projecting single-cell CCL profiles into the same latent space, they quantify transcriptional similarity between cell lines and primary tumors. This reveals substantial heterogeneity: some pancreatic lines (PK59, DANG) closely match primary PAAD centroids, while others (PANC1, SW1990) diverge significantly, providing a quantitative framework for model selection. Most importantly, the atlas enables single-cell resolution gene dependency prediction. The authors train ElasticNet regression models on DepMap CRISPR screen data using pseudobulked scRNA-seq from matched cell lines, then apply these models to scTumor Atlas cells to generate predicted gene effect scores (PGES). This recapitulates known lineage-specific dependencies (CDK4 in medulloblastoma, BRAF in melanoma) and identifies putative novel vulnerabilities (QRICH1 in breast cancer, TCF7L2 in gastrointestinal cancers). In a proof-of-concept application to a primary retroperitoneal leiomyosarcoma profiled in-house, the framework predicts dependencies including IGF1R, a target with prior clinical investigation in sarcoma.

Personal highlights

Representative-state sampling over maximal aggregation: unlike atlas efforts that prioritize cell count above all else, the authors use Mahalanobis distance-based downsampling to select up to 5,000 representative malignant cells per cancer type. This yields a compact (135k cells) yet biologically coherent reference that preserves lineage structure while remaining computationally lightweight and interpretable, trading exhaustive inclusion for practical utility.
Stringent quality control and standardized annotation: public datasets vary widely in depth and annotation quality. The authors apply uniform filters (≥5,000 UMIs, ≤10% mitochondrial reads), Scrublet doublet removal, and consistent malignant cell identification, either retaining original annotations or applying cancer-specific rules (e.g., CHGA expression for PNET, keratin scores for CESC). This rigor ensures the atlas reflects genuine malignant states, not technical artifacts or mislabeled cells.
Quantitative benchmarking of cancer cell line fidelity: by projecting single-cell CCL profiles into the same scANVI latent space, the authors compute normalized Euclidean distances between cell line centroids and primary tumor centroids. This provides a continuous, interpretable metric of model concordance,revealing that not all lines for a given cancer type are equally representative, and enabling rational selection of models for translational studies.
Single-cell resolution gene dependency prediction: adapting a framework originally developed for bulk RNA-seq, the authors train ElasticNet models on DepMap CRISPR screens using pseudobulked scCCL expression, then apply them to scTumor Atlas cells. This yields predicted gene effect scores at single-cell resolution, recapitulating known dependencies and nominating novel candidates, bridging high-throughput functional genomics with in vivo tumor heterogeneity.
Personalized dependency inference in a rare tumor: as a translational proof-of-concept, the authors profile a primary retroperitoneal leiomyosarcoma, integrate it into the atlas, and apply the dependency models. The predicted vulnerabilities include IGF1R, a target previously investigated in sarcoma clinical trials, demonstrating that this workflow can generate actionable hypotheses from a single patient sample, particularly valuable for rare cancers where large cohorts are unavailable.

Why should we care?

Cancer research faces a persistent translation gap: we have massive functional genomics datasets from cell lines (DepMap) and massive transcriptional datasets from tumors (TCGA), but connecting them is fraught with difficulty. Bulk tumor profiles are contaminated by stromal and immune signals; cell lines drift in culture; and single-cell atlases have become so large and heterogeneous that they are difficult to use as practical references. The scTumor Atlas takes a different tack. By prioritizing representative malignant states over maximal cell count, it creates a resource that is actually usable, small enough to distribute and query, clean enough to trust, and rich enough to support meaningful comparisons. The Mahalanobis downsampling strategy is a methodological contribution in itself, offering a principled way to balance representation without sacrificing biological signal.

Deep-Learning Tool ScVital Enables Species-Agnostic Integration of Cancer Cell States

Rub, J. et al. Cancer Research (2026). https://doi.org/10.1158/0008-5472.CAN-24-4889

The paper in one sentence

ScVital is a variational autoencoder with adversarial training that embeds single-cell RNA-seq data from different species into a shared latent space, enabling identification of conserved cancer cell states across mouse models and human tumors.

Summary

Genetically engineered mouse models (GEMMs) are essential for cancer research, but cross-species differences limit their predictive value for human disease. Single-cell RNA sequencing captures tumor heterogeneity, yet current integration methods treat cross-species comparison as a batch correction problem, failing to handle species-specific genes and often losing biological signal. Rub and colleagues develop scVital, a deep-learning framework specifically designed for species-agnostic integration. The model combines a conditional variational autoencoder with an adversarially trained discriminator. The encoder maps gene expression into a latent space while the decoder reconstructs the original data. The discriminator attempts to predict species from the latent representation, and the autoencoder is trained to fool it, removing species-specific signal while preserving cellular identity. Crucially, the reconstruction loss is designed to handle species-specific genes: mouse genes do not affect human cell reconstruction and vice versa, allowing integration without forcing all genes into a common feature space. To evaluate integration quality without relying on heuristic post-integration clustering, the authors introduce Latent Space Similarity (LSS) , a metric that computes pairwise cosine distances between pre-annotated cell types in the latent space and calculates the AUC-F1 of correct cell-type pairings. LSS is robust to class imbalance and avoids the variability of clustering-dependent metrics like adjusted Rand index. Benchmarked on normal tissues (muscle, lung, pancreas, liver, bladder), scVital performs comparably to Harmony and scVI but with faster runtime than the deep-learning alternative scDREAMER, and better preserves species-specific cell types that other methods incorrectly merge. Applying scVital to pancreatic ductal adenocarcinoma (PDAC), it aligns classic and basal cell states across mouse models and 24 human patients, while the mouse-specific mesenchymal state remains separate, correctly reflecting biology. In lung adenocarcinoma (LUAD), scVital identifies shared AT2-like and high-plasticity cell states across species. Integration of healthy, injured, and malignant lung tissue reveals similarity between the LUAD high-plasticity state and a damage-associated transient progenitor in mice. Most strikingly, in undifferentiated pleomorphic sarcoma (UPS), a rare cancer with no prior knowledge of cross-species concordance, scVital integrates a KP GEMM with two patient-derived xenografts treated with doxorubicin. It uncovers a treatment-resistant cell state enriched for hypoxia signature (SLC2A1/Glut1) that is conserved across species and expands with prolonged chemotherapy, validated by immunohistochemistry. This state would have been missed by separate analysis of each dataset followed by marker intersection.

Personal highlights

Species-agnostic latent space with adversarial species removal: scVital’s architecture, a VAE with an adversarially trained discriminator, explicitly removes species-specific signal from the latent representation while preserving cellular identity. The reconstruction loss is designed to handle species-specific genes independently, so mouse genes don’t interfere with human cell reconstruction and vice versa, enabling true cross-species integration without forcing all genes into a common feature space.
Latent Space Similarity (LSS): a clustering-free integration metric: Current evaluation metrics (ARI, FM) require clustering post-integration cells, a highly variable and heuristic step. LSS instead computes pairwise cosine distances between pre-annotated cell types in the latent space and calculates the AUC-F1 of correct pairings. It is robust to class imbalance, avoids clustering artifacts, and correctly scores integration quality even for rare cell types that other metrics mis-evaluate.
Preservation of species-specific cell types: In mouse-human muscle integration, other methods erroneously merge mouse neural/glial cells with human mature skeletal muscle. scVital and scDREAMER keep this mouse-specific cluster distinct, a difference reflected in LSS but not in ARI, demonstrating that LSS captures biologically meaningful distinctions that clustering-based metrics miss.
Identification of conserved treatment-resistant hypoxia state in UPS: In a rare sarcoma with no prior knowledge of cross-species concordance, scVital integrates a GEMM and two PDXs treated with doxorubicin, revealing a shared cell state enriched for hypoxia signature (SLC2A1/Glut1) that expands with prolonged treatment. Validated by IHC, this state would have been missed by separate analysis followed by marker intersection, demonstrating scVital’s power to uncover conserved biology masked by strong batch and species effects.
Linking mouse lung injury response to human LUAD plasticity: Integrating healthy lung, alveolar injury, and LUAD data reveals similarity between the mouse high-plasticity cancer cell state (HPCS) and a damage-associated transient progenitor state absent in healthy tissue, suggesting that cancer may co-opt regenerative programs and providing a functional hypothesis for the origin of this aggressive cell state.

Why should we care?

Mouse models are the workhorses of cancer research, but their track record for predicting human outcomes is sobering: less than 10% of animal studies advance to clinical trials, and fewer than 1 in 10 of those gain FDA approval. A major reason is that cross-species differences, both technical and biological, obscure which features of mouse tumors actually reflect human disease. ScVital addresses this by learning what is shared across species and what is specific. Rather than treating mouse and human as two batches to be forcibly merged, it explicitly removes species signal while preserving cellular identity. This lets us ask a fundamentally different question: not “do mouse models resemble human tumors?” but “which cell states are conserved, and which are species-specific?”d

Matched single-cell chromatin, transcriptome, and surface marker profiling captures in vivo epigenomic reprogramming during basal-to-luminal transition in the mammary gland

Schwager, A. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.16.706078

The paper in one sentence

OneCell CUT&Tag is a low-input method that profiles histone modifications, full-length transcriptomes, and surface markers from the same single cell, revealing that basal mammary epithelial cells harbor epigenomic priming for luminal fate, undetectable at RNA or protein level, and that basal-to-luminal transdifferentiation proceeds via continuous epigenomic remodeling but a binary transcriptomic switch.

Summary

The authors develop OneCell CUT&Tag, a plate-based method that starts from individual cells (as few as one) and generates high-coverage histone modification profiles (H3K27me3, H3K4me1), full-length transcriptomes via FLASH-seq, and surface marker quantification from the same cell. Key innovations include: (i) optimized lysis buffer preserving both chromatin integrity and cytoplasmic mRNA; (ii) carboxylic beads for nuclei isolation enabling serial solution changes without loss; (iii) adaptation of FLASH-seq to limited cytoplasmic extracts. The method achieves median 26k unique DNA fragments/cell (0.77 FrIP) and 8k genes/cell in cell lines, outperforming droplet-based alternatives, and works on fresh or frozen tissues, including a triple-negative breast cancer tumor.

In the mammary gland, they profile 773 epithelial cells across basal and luminal lineages with matched H3K4me1, H3K27me3, RNA, and 14 surface markers. While cytometry and RNA annotations show near-perfect concordance (98%), a subset of basal cells (9%) exhibit luminal-like epigenomes, enriched for H3K4me1 at luminal genes and depleted of H3K27me3 repression, undetectable at RNA or protein level. This epigenomic priming, specific to basal cells, aligns with their known context-dependent multipotency upon lineage ablation or transplantation. To capture the transition in vivo, they transplant 10,000 basal cells into cleared fat pads and profile engrafted cells at 4.5 days. Descendants show continuous epigenomic progression from basal to luminal in H3K4me1 space, with intermediate cells absent in reference populations, while transcriptomes exhibit a binary switch. Transitioning cells upregulate proliferation and downregulate TNFα and p53 signaling, TNFα being a known restrictor of basal multipotency, and upregulate Axl, a stemness driver.

Personal highlights

OneCell CUT&Tag: low-input matched multi-omics from the same cell: unlike existing methods requiring 10⁴–10⁵ starting cells, OneCell works from one cell upward, generating high-coverage histone modification profiles (median 26k fragments), full-length transcriptomes (8k genes), and surface marker data per cell. The method is adaptable to fresh or frozen tissues, including patient tumors, and automation increases throughput to 1,536 cells per run with improved coverage.
Epigenomic priming of basal cells for luminal fate: a subset of basal mammary epithelial cells (9%) displays luminal-like H3K4me1 and H3K27me3 landscapes at luminal genes, despite expressing basal markers at RNA and protein levels. This priming, undetectable without matched multi-omics, aligns with basal cells’ known capacity to regenerate luminal lineages upon ablation or transplantation, suggesting epigenomic “readiness” enables rapid fate activation.
Continuous epigenomic remodeling vs. binary transcriptomic switch during transdifferentiation: following basal cell transplantation, descendants at 4.5 days show progressive H3K4me1 remodeling from basal to luminal states, with intermediate cells absent in steady-state epithelium. In contrast, transcriptomes exhibit a sharp binary switch between basal and luminal identities. This reveals that epigenomic reprogramming precedes and potentially enables the transcriptional commitment.
MOFA disentangles omic-layer-specific contributions to cell identity: Joint factor analysis of RNA, H3K4me1, and H3K27me3 identifies factors capturing basal identity through combined modalities (e.g., factor 7: Acta2, Krt14 expression + Trp63/Trp73 motif accessibility) and others revealing epigenomic-only distinctions (factor 2: H3K4me1 at stemness-associated Zfx motifs, undetectable in RNA). This demonstrates how matched multi-omics resolves regulatory layers that single modalities miss.

Why should we care?

Cell identity is not encoded in a single molecular layer, it emerges from the interplay of surface phenotype, transcriptional programs, and the epigenetic landscapes that prime or restrict them. Yet most single-cell technologies capture only one layer, forcing us to infer regulatory relationships across cells rather than measure them within the same cell. OneCell CUT&Tag changes this. By delivering matched epigenome, transcriptome, and surface data from the same cell, starting from as few as on, it opens the door to studying rare populations where every cell counts: early embryos, stem cell niches, patient biopsies. The mammary gland findings illustrate the power: a subset of basal cells are epigenomically primed for luminal fate, invisible to standard RNA or protein profiling. This priming likely explains their context-dependent multipotency and may represent a general mechanism by which tissues balance stability with regenerative capacity.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.

Weekly reads 9/2/25

Sebastiaan Vanuytven — Sun, 15 Feb 2026 12:12:58 GMT

This week's reads look at how systemic biological programs, not isolated events, influence cancer progression, ageing, gene regulation, and lineage dynamics. In polymetastatic breast cancer, Insua-Rodríguez et al. found a conserved immunosuppressive macrophage niche driven by the MIF-CD74 axis that operates in the brain, lung, liver, and bone. Yang et al. present a first-principles measure of transcriptional entropy that quantifies ageing as a breakdown in gene coordination on a learned manifold. Vagiaki et al. reframe trans-eQTL mapping using LIVI, an interpretable generative model that identifies genetically influenced gene programs at the single-cell level. Patel and Kundaje's ARSENAL, a compact DNA language model trained on regulatory regions with motif-scale inductive bias, challenges the "scale is everything" paradigm. Yang et al. create tumor-homing probiotics that deliver chemo-immunotherapy locally with remarkable precision. Gao et al. explicitly model mitochondrial drift with MitoDrift, which turns mtDNA lineage tracing into a probabilistic, confidence-aware framework.

Preprints/articles that I managed to read this week

The MIF-CD74 Axis Drives a Systemic Immunosuppressive Niche in Polymetastatic Breast Cancer

Insua-Rodríguez et al. bioRxiv (2026). 10.64898/2026.01.31.701004

The paper in one sentence

A conserved program driven by cancer-secreted MIF recruits CD74+ lipid-associated macrophages across metastatic sites, creating an immunosuppressive niche that fuels systemic breast cancer colonization.

Summary

This study uses a synchronous multi-organ metastasis model combined with in vivo niche labeling and single-cell RNA sequencing to map the metastatic ecosystem in brain, lung, liver, and bone. The authors identify a universal proximal niche dominated by CD74+ lipid-associated macrophages (LA-MAMs), which are instructed by tumor-derived MIF via the CD74 receptor. These macrophages exhibit a potent immunosuppressive signature, drive T-cell exhaustion, and enable metastatic outgrowth across all organs. Disrupting the MIF-CD74 axis, genetically or pharmacologically, reduces LA-MAM accumulation, restores T-cell function, and impairs multi-organ metastasis. Clinical data from over 100 patients confirm that the MIF-CD74 axis is a hallmark of human polymetastatic disease and predicts poor survival.

Personal highlights

Conserved macrophage niche across metastatic organs: using in vivo proximity labeling and scRNA-seq, the authors reveal that bone marrow-derived CD74+ macrophages are consistently enriched in the immediate vicinity of metastases in the brain, lung, liver, and bone, highlighting a universal cellular convergence in systemic disease.
MIF-CD74 as a master paracrine regulator of immunosuppression: cancer cell-secreted MIF signals through CD74 on macrophages to orchestrate a lipid-associated, immunosuppressive phenotype (LA-MAM), establishing a coordinated signaling axis that is active in all metastatic sites.
Lipid-associated macrophages drive T-cell suppression and exhaustion: LA-MAMs exhibit strong lipid metabolism and oxidative phosphorylation signatures, directly suppress T-cell proliferation in co-culture assays, and correlate with increased exhausted T-cell subsets in vivo.
Disruption of the MIF-CD74 axis reduces systemic metastasis: genetic knockdown of MIF or pharmacological inhibition with 4-IPP significantly impairs metastatic burden across all four organs, demonstrating that targeting this axis can systemically compromise colonization.
Clinical translation and prognostic relevance: in a 100-patient cohort of breast cancer metastases, MIF and CD74 protein expression are nearly universal and stratify post-metastasis survival, confirming the human relevance of this mechanism and its potential as a therapeutic target

Why should we care?

This work shifts from an organ-specific “seed and soil” model to a systemic, conserved program that cancer cells deploy to colonize multiple tissues simultaneously. By identifying the MIF-CD74 axis as a central coordinator of immunosuppressive macrophages and T-cell dysfunction, it offers a unifying therapeutic vulnerability for polymetastatic disease, a condition that currently lacks effective treatments and is managed only palliatively.

A manifold-based measure of transcriptional entropy for quantifying aging in single cells

Yang et al. bioRxiv (2026). doi:10.64898/2026.01.24.701460

The paper in one sentence

Researchers developed an unsupervised, manifold-based metric called transcriptional entropy to quantify aging in single cells by measuring the breakdown of gene expression coordination, revealing cell-type-specific vulnerabilities and molecular mechanisms across tissues.

Summary

This study introduces a first-principles framework to quantify transcriptional entropy, a measure of intrinsic gene expression noise, from single-cell RNA-seq data. Unlike supervised methods that rely on predefined markers, this approach uses the deviation of a cell’s transcriptome from a low-dimensional manifold to capture loss of transcriptional coordination with age. Applied to multiple aging datasets (Tabula Muris Senis, SenNet, kidney, liver), the method identifies stem and progenitor cells as particularly vulnerable, correlates with chromatin-based mitotic age, and disentangles two aging mechanisms: loss of expression precision and activation of stress-response programs.

Personal highlights

First-principles quantification of transcriptional noise: transcriptional entropy is derived from deviations of single-cell expression from a learned manifold, capturing intrinsic biological variation without relying on clustering or predefined gene sets, making it broadly applicable across cell types and tissues.
Distinguishes aging mechanisms at the gene level: yhe framework separately identifies genes that become more disordered with age (loss of precision) and genes whose expression correlates with cellular entropy (stress-response activation), offering a dual-axis view of transcriptional dysregulation.
Cross-modal validation with epigenetic aging: transcriptional entropy strongly correlates with chromatin-based mitotic age estimates from EpiTrace, linking transcriptomic noise to epigenetic drift and reinforcing its relevance as a fundamental aging metric.
Reveals tissue- and zone-specific aging patterns: in kidney proximal tubules, entropy increases specifically in injury-prone segments (S2/S3), while in liver hepatocytes, it peaks in the regenerative midlobular zone (zone 2), highlighting spatially structured vulnerability.
Unsupervised and marker-free, outperforming existing tools: unlike SenMayo or variance-based methods, transcriptional entropy consistently detects age-related dysregulation across diverse tissues and cell types, including in compartments where traditional senescence scores fail.

LIVI: Mapping trans-eQTLs at single-cell resolution with interpretable deep learning

Vagiaki, D. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.04.703363

The paper in one sentence

LIVI is an interpretable deep learning framework that uses a structured variational autoencoder with linear decoders to decompose single-cell gene expression into cell-state and donor-specific components, enabling scalable and statistically powerful discovery of trans-eQTLs that act across gene networks and continuous cell states.

Summary

Mapping trans-eQTLs, or genetic variants that affect gene expression on different chromosomes, has traditionally been difficult due to the massive multiple testing burden (billions of potential variant-gene pairs), small effect sizes, and the complex, context-dependent nature of these regulatory effects. Existing single-cell eQTL methods are optimised for cis effects, use predefined discrete cell types, or suffer from statistical circularity when genotypes are included during training. LIVI (Latent Interaction Variational Inference) overcomes these limitations with a purpose-built variational autoencoder architecture. The model takes raw single-cell counts as input and decomposes expression into three interpretable components: (1) canonical cell-state factors (C) capturing shared transcriptional programs across donors, (2) cell-state-specific donor factors (D×C) representing inter-individual variation that interacts with cellular context, and (3) global donor factors (V) capturing population structure. LIVI uses a linear decoder for each latent space, which maintains interpretability by directly linking each latent factor to a sparse set of genes using learned weights. This enables LIVI to operate as a factor analysis model embedded within a deep generative framework, scaling to millions of cells while maintaining biological transparency. The donor factors (D) are learned without access to genetic data, preventing circularity in subsequent association testing. After training, these factors serve as compact quantitative phenotypes (typically 500-700 dimensions, compared to ~200,000 gene-cell-type combinations) for efficient eQTL mapping. The discovered associations can then be projected back to single cells using the interaction model and decoded to identify the specific genes and pathways affected, effectively converting statistical associations into mechanistic hypotheses with cell-state resolution. When applied to the OneK1K dataset (981 donors, over 1 million PBMCs), LIVI identified more trans-eQTLs than alternative latent variable methods, recovered signals missed by conventional single-gene testing, and revealed how polygenic risk for autoimmune diseases manifests in specific cell types and gene programs.

Personal highlights

Structured decomposition of genetic and cellular variation: LIVI explicitly disentangles gene expression into canonical cell-state factors, cell-state-specific donor effects, and global donor variation through a multi-decoder architecture. This separation, enforced by keeping cell-state factors fixed during donor factor training, eliminates a key source of non-identifiability and enables independent interpretation of cellular context versus genetic influence.
Interpretable deep learning via linear decoders with sparsity constraints: unlike black-box VAEs, LIVI employs sparse linear decoders that map each latent factor directly to a weighted set of genes. This design choice preserves the scalability of deep generative models while returning biologically transparent outputs: each D×C factor is a sparse, interpretable gene program whose activity can be tracked across cells, donors, and genetic variants.
Cell-state-continuous eQTL mapping without predefined annotations: LIVI does not require discrete cell type labels. By modeling donor effects as interactions with a learned, continuous cell-state space, the framework naturally captures genetic effects that span continuous trajectories or that manifest in subsets of cells that do not align with canonical annotations, a class of associations systematically missed by conventional pseudobulk approaches.
Statistically rigorous, non-circular association testing: donor factors are inferred without access to genetic data, and association testing is performed post hoc using linear mixed models that account for population structure. This avoids the permutation-based calibration required by methods that incorporate genotypes during training, yielding calibrated test statistics without computational overhead.
From factor-level associations to single-cell resolution effect maps: LIVI bridges donor-level statistics and single-cell biology. The interaction model (Softmax(C)A ⊙ d_y) allows estimated SNP effect sizes to be propagated back to individual cells, producing continuous, genome-inferred perturbation maps that reveal exactly where in the cell-state space a genetic variant exerts its influence—and on which genes.

Why should we care?

Trans-eQTLs represent the missing link between GWAS hits and the cellular phenotypes they influence. Yet they remain drastically underpowered in conventional analyses, buried under a mountain of statistical tests and obscured by their tendency to affect multiple genes with individually small effect sizes. LIVI offers a way out. By reframing trans-eQTL mapping as a problem of discovering genetically influenced gene programs rather than testing isolated variant–gene pairs, it collapses the search space from billions of hypotheses to thousands, while simultaneously aggregating weak signals into detectable, biologically coherent units.

ARSENAL: Short-context regulatory DNA language models with motif-discovery regularization

Patel, A., & Kundaje, A. bioRxiv (2026). https://doi.org/10.64898/2026.02.05.703637

The paper in one sentence

ARSENAL is a compact, short-context DNA language model pretrained exclusively on regulatory genomic regions with a frequency-domain Fourier regularizer that biases masked reconstructions toward motif-scale features, enabling superior zero-shot motif discovery, variant effect prediction, and transferable representations for downstream regulatory genomics tasks.

Summary

DNA language models (DNALMs) have emerged as powerful tools for learning regulatory sequence syntax from genomic data, yet most current approaches scale to millions or billions of parameters and train on whole-genome sequences, diluting their capacity for learning precise, motif-resolution regulatory features. ARSENAL takes a radically different, and surprisingly effective, contrarian stance: smaller is better, if you train on the right data with the right inductive biases. The model consists of a compact 8-layer transformer (768 embedding dimensions) trained exclusively on 350 bp windows centered on ENCODE candidate cis-Regulatory Elements (cCREs), a curated set of ~1.3 million experimentally supported regulatory regions. This targeted pretraining strategy concentrates the model’s capacity on functional non-coding sequence, avoiding dilution by the vast, information-sparse genomic background. But the core methodological innovation lies in ARSENAL’s Fourier motif-discovery regularizer. Drawing on prior work in supervised attribution priors, Patel and Kundaje adapt frequency-domain constraints to the self-supervised setting: the model is penalized when its per-base likelihood reconstructions contain excessive high-frequency (noise-like) or low-frequency (repeat-like) variation. This simple auxiliary loss biases the learned likelihood landscape to emphasize motif-scale features (6–20 bp), implicitly guiding the model toward the characteristic length scale of transcription factor binding sites without any supervised motif annotations.

Personal highlights

Regulatory-region-only pretraining concentrates model capacity: ARSENAL is pretrained exclusively on ENCODE cCREs rather than whole-genome sequence. This simple domain restriction ensures that the model’s limited capacity is allocated to learning functional regulatory syntax rather than memorizing repetitive, low-information genomic background, a form of data efficiency through biological curation.
Fourier-domain regularization induces motif-scale likelihood structure: the frequency-domain auxiliary loss penalizes reconstructions whose per-base likelihoods contain inappropriate spectral components, softly biasing the model toward learning 6–20 bp motif-scale features. This imports an inductive bias from supervised attribution priors into self-supervised learning, yielding interpretable likelihood landscapes without any supervised motif annotations.
Zero-shot variant scoring from 350 bp windows outperforms long-context models: ARSENAL achieves state-of-the-art correlation with experimentally measured dsQTL and caQTL effect sizes using only 350 bp of sequence context, substantially shorter than the 2 kb+ windows used in prior evaluations. This demonstrates that effective regulatory variant effect prediction does not require long-range context when motif-scale syntax is captured with sufficient fidelity.
Transferable embeddings improve supervised regulatory models: ARSENAL’s per-base embeddings, when substituted for one-hot encoding in ChromBPNet, yield consistent gains in chromatin accessibility prediction across five cell lines and improved counterfactual variant scoring. The self-supervised representations generalize across assay modalities and cellular contexts, providing useful inductive bias for downstream supervised tasks.
Controllable generation of cell-type-specific regulatory sequences: coupled with pretrained oracle models, ARSENAL supports objective-guided beam search to generate synthetic regulatory sequences with targeted properties, high predicted activity in HepG2, low activity in H1-hESC, or differential specificity. TF-MoDISco on the resulting sequences reveals emergent enrichment for appropriate cell-type transcription factor motifs, validating the approach for regulatory sequence design.

Why should we care?

For the past several years, the field of genomic language modeling has implicitly equated progress with scale: longer contexts, more parameters, more tokens, more FLOPs. ARSENAL suggests that this equation is, at best, incomplete. By training a compact model exclusively on regulatory regions and regularizing toward motif-scale features, the authors achieve state-of-the-art performance on zero-shot variant effect prediction and motif discovery, tasks that are supposed to require massive scale, with an 8-layer transformer and 350 bp windows.

Engineered probiotics for tumor-targeted combination chemoimmunotherapy

Yang, Z. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.04.703875

The paper in one sentence

A single strain of engineered E. coli Nissle 1917 delivers enzyme/prodrug chemotherapy, an IL-15 superagonist, and a PD-L1-blocking nanobody directly within tumors, achieving localized chemoimmunotherapy with synergistic antitumor immunity and minimal systemic toxicity.

Summary

Combining chemotherapy and immunotherapy is conceptually appealing but clinically challenging: chemotherapeutics have no tumour specificity and cause systemic toxicity, whereas immunotherapies rely on pre-existing immunity and can cause immune-related side effects. Yang and colleagues create a living bacterial platform that circumvents these limitations by confining both modalities to the tumour microenvironment. The authors begin with E. coli Nissle 1917 (EcN), a probiotic strain that selectively colonises tumours after intravenous administration. They modify it to express cytosine deaminase (CD), which converts the nontoxic prodrug 5-fluorocytosine (5-FC) into the cytotoxic chemotherapy drug 5-fluorouracil (5-FU). The enzyme is tagged with a heparin-binding peptide (pCD), which anchors it to extracellular matrix components and prevents it from leaving the tumour. However, wild-type EcN expresses a dihydropyrimidine dehydrogenase (encoded by the preTA operon) that converts 5-FU to inactive DHFU, which is the same metabolic pathway that causes 5-FU resistance in humans. Deleting the preTA operon eliminates bacterial drug catabolism, transforming marginal efficacy into effective tumour control. The immune phenotyping of the optimised enzyme/prodrug therapy reveals a double-edged effect: it activates CD8 T cells, NK cells, and NKT cells while also upregulating PD-L1 on suppressive myeloid populations and expanding activated regulatory T cells. This finding motivates further engineering: the authors co-express an IL-15 superagonist (s15), which promotes CD8 and NK proliferation while inhibiting Tregs, as well as a PD-L1-blocking nanobody (PDL1nb) from the same plasmid. The triple-engineered strain (EcNx^ΔpreTA-pCD/PDL1nb/s15) generates three payloads simultaneously. In the MC38 colorectal tumour model, a single intravenous dose followed by 5-FC administration results in complete tumour regression in a subset of animals, with no detectable body weight loss. The therapy activates dendritic cells, polarises M1 macrophages, promotes CD4 T cell proliferation, reverses exhaustion, and expands IFNγ-producing CD8 and NK cells.

Personal highlights

Tumor-selective enzyme/prodrug delivery with ECM anchoring: EcN bacteria naturally colonize tumors following intravenous injection, achieving >10⁹ CFU/g in tumor tissue with near-undetectable levels in healthy organs. The cytosine deaminase enzyme is tagged with a PlGF2-derived heparin-binding peptide that anchors it to extracellular matrix components, ensuring activated 5-FU remains localized rather than diffusing systemically.
Prevention of bacterial drug catabolism by preTA knockout: wild-type EcN expresses dihydropyrimidine dehydrogenase (encoded by preTA), the same enzyme that causes 5-FU resistance in humans, converting 5-FU to inactive DHFU. Deletion of the preTA operon eliminates this metabolic sink, increasing intratumoral 5-FU bioavailability and converting a marginal therapeutic effect into robust tumor control.
Mechanism-guided rational combination design: Their data-driven approach identifies IL-15 superagonist and PD-L1 blockade as logical partners to counteract the therapy’s immunosuppressive side effects while amplifying its immunostimulatory potential.
Single-strain co-delivery of three orthogonal payloads: the engineered bacteria simultaneously produce a prodrug-converting enzyme, a cytokine superagonist, and a checkpoint-blocking nanobody from a single stabilized plasmid. This demonstrates that living therapeutics can coordinate multi-agent combinations with precise temporal and spatial control, delivering chemotherapy, immunotherapy, and immunomodulation from a single intravenous injection.
Complete tumor regressions with no observable toxicity: in the MC38 model, the triple-engineered strain achieves complete regression in a subset of tumors following intratumoral injection and durable growth suppression following intravenous administration. No body weight loss or other signs of systemic toxicity are observed, a striking contrast to conventional 5-FU chemotherapy, which causes significant weight loss at efficacious doses.

Why should we care?

Cancer therapy is constrained by a persistent trade-off: effective treatments often lack specificity, and specific treatments are often ineffective. Chemotherapy kills tumors but damages healthy tissue; immunotherapy can produce durable responses but only in a minority of patients; combining them risks additive toxicities without guaranteed synergy. This work offers a way out by outsourcing drug delivery to a living system that does what no synthetic formulation can: actively home to tumors, sense its environment, and produce multiple therapeutic agents on-site, on-demand.

MitoDrift: Modeling mitochondrial inheritance enables high-precision single-cell lineage tracing in humans

Gao, T. et al. bioRxiv (2026). https://doi.org/10.64898/2026.02.12.705660

The paper in one sentence

MitoDrift is a probabilistic framework that models mitochondrial DNA heteroplasmy drift as a Wright-Fisher process, enabling confidence-refined lineage trees that accurately recover clonal relationships in primary human tissues without experimental barcoding.

Summary

The authors develop MitoDrift, a probabilistic framework that views mtDNA lineage tracing as an intracellular population-genetic process observed using noisy single-cell measurements. The model combines a discrete Wright-Fisher drift process along lineage edges and a binomial observation model at the leaves to compute tree likelihood using message passing on a hidden Markov tree. The parameters are learned using expectation maximisation, and the posterior clade support is estimated using Metropolis-Hastings MCMC sampling over tree topologies. Branches with low confidence are collapsed, resulting in a refined tree rich in accurate clonal relationships. MitoDrift is validated against orthogonal ground truth in two complementary settings. First, they use lentiviral barcoding (LARRY) in primary human HSCs, where exogenous barcodes provide precise clone assignments. MitoDrift achieves 75% clone recovery (Jaccard ≥ 0.5) and 77% clade precision, outperforming standard phylogenetic methods (NJ: 55%, UPGMA: 28%). Second, they compare their results to whole-genome sequencing of single colonies from eight healthy donors, with nuclear SNV-based phylogenies serving as ground truth. MitoDrift achieves superior precision-recall, recovering approximately 13% of clades with 50% precision and ~10% with 70% precision. This is a lower bound given the challenges of detecting low-VAF heteroplasmy in WGS.When MitoDrift is applied to native human haematopoiesis, it reveals age-related declines in clonal diversity with cell-type-specific patterns: myeloid, B, and erythroid compartments show significant reductions, while T cells maintain diversity, consistent with long-lived memory cells and lineage-biased output from dominant HSC clones. MitoDrift detects heritable regulatory programs in purified HSCs, including AP-1/stress response, stemness/lymphoid priming, and chromatin organization, with significant phylogenetic autocorrelation across longitudinal sampling. In aged donors, AP-1 transcriptional activity correlates with clone size, indicating a link between inflammatory programs and clonal expansion. MitoDrift resolves therapy-associated clonal remodelling in multiple myeloma that would otherwise be undetectable by copy number analysis. Post-treatment tumours in a deep responder (MM1) exhibit increased clonal diversity, indicating the eradication of dominant clones and outgrowth from a polyclonal reservoir. Within a therapy-resistant 1q-gain subclone, phylogeny-state analysis identifies CD44+ adhesion/migratory cells as the most closely related to post-treatment persisters, indicating a potential resistance program.

Personal highlights

Wright-Fisher hidden Markov tree models mtDNA drift explicitly: MitoDrift treats heteroplasmy evolution as a discrete-state Markov chain along lineage edges, with transition probabilities derived from a Wright-Fisher process parameterized by effective population size and generations. This population-genetic foundation replaces ad hoc distance metrics with a generative model that accounts for mutation loss, fixation, and stochastic drift, the core biological processes that have confounded mtDNA lineage tracing.
Confidence-based topology refinement prioritizes accurate clades: rather than accepting a single tree topology, MitoDrift samples tree space via MCMC and computes posterior support for each clade. Collapsing low-confidence branches produces a refined tree that explicitly acknowledges uncertainty while retaining well-supported groupings. This enables downstream analyses to focus on reliable lineage structures, trading fine-scale resolution for precision in a dataset-specific, tunable manner.
Orthogonal validation against lentiviral barcoding and WGS: the authors establish rigorous ground-truth benchmarks using two independent modalities: exogenous LARRY barcodes in primary HSCs (definitive clone assignments) and nuclear SNV-based phylogenies from single-colony WGS (time-resolved lineages). MitoDrift consistently outperforms existing methods across both benchmarks, demonstrating that its performance gains are not dataset-specific and that mtDNA-based lineage tracing can achieve quantitative accuracy.
Cell-type-specific clonal diversity in aging hematopoiesis: Applying MitoDrift to healthy donors reveals that age-associated declines in clonal diversity are not uniform across lineages. Myeloid, B, and erythroid compartments show marked reductions, while T cells preserve diversity, consistent with long-lived memory T cells and lineage-biased output from dominant HSC clones. This suggests that reduced clonal complexity in myeloid/erythroid output may compromise hematopoietic redundancy and increase stress susceptibility.
Phylogeny-state analysis links heritable programs to clonal expansion and therapy resistance: Integrating MitoDrift trees with multioomic cell states enables quantitative dissection of heritable versus plastic programs. In HSCs, AP-1/stress-associated regulons show significant phylogenetic signal and associate with clone size in aged donors. In multiple myeloma, phylogeny-state analysis within a therapy-resistant subclone identifies CD44+ adhesion/migratory cells as most closely related to post-treatment persisters, nominating a candidate mechanism of cell adhesion-mediated drug resistance.

Why should we care?

Lineage tracing is at the heart of developmental biology, cancer evolution, and stem cell biology, but barcodes cannot be used experimentally in humans. Mitochondrial DNA mutations provide a natural alternative, but their utility has been limited by a fundamental mismatch: the tools we use to analyse them (standard phylogenetic methods) assume that mutations are stable, binary, and inherited cleanly, whereas in reality mtDNA variants drift, disappear, and are measured noisily. MitoDrift addresses this mismatch by incorporating the appropriate biology into the model. Instead of treating heteroplasmy as a static trait, it explicitly simulates the Wright-Fisher drift process that governs mitochondrial inheritance. The end result is not only better trees, but trees with confidence estimates, allowing us to identify which branches are reliable and which are not. This transforms mtDNA lineage tracing from a qualitative, descriptive tool into a quantitative, hypothesis-testing framework.

Other papers that peeked my interest and were added to the purgatory of my “to read” pile

Thanks for reading.

Cheers,

Seb.