Weekly reads 29/12/25
Context, evolution, and scale in modern single-cell biology
This week’s selection spans spatial integration, cancer evolution, cell–cell communication, and perturbation biology, unified by a shared goal: moving from static descriptions to context-aware, mechanistic insight at scale. On the spatial front, ARCADIA introduces a generative framework that bridges scRNA-seq and spatial proteomics without requiring paired cells or feature correspondence, uncovering niche-specific transcriptional programs. In cancer biology, Padrão et al. reveal that the therapeutic benefits of fasting in breast cancer are mediated by glucocorticoid receptor activation, and can be pharmacologically mimicked, while Pich et al. map how chemotherapy and immunotherapy leave lasting mutational and selective imprints on normal tissues. Wirth et al. further refine our view of tumour evolution by identifying distinct, ancestry-linked evolutionary trajectories in lung adenocarcinoma that transcend smoking status. Methodologically, FastCCC removes the permutation bottleneck from cell–cell communication analysis, enabling robust inference across millions of cells, and Pertpy delivers a long-awaited, end-to-end framework for scalable perturbation analysis in the scverse ecosystem
Preprints/articles that I managed to read this week
ARCADIA Reveals Spatially Dependent Transcriptional Programs through Integration of scRNA-seq and Spatial Proteomics
Rozenman et al. bioRxiv (2025). https://doi.org/10.1101/2025.11.20.689521
The paper in one sentence
ARCADIA is a generative deep learning framework that integrates single-cell RNA-seq and spatial proteomics data without requiring feature-level linkage or paired cell barcodes, enabling the discovery of spatially-dependent transcriptional programs and cell states.
Summary
ARCADIA (ARchetype-based Clustering and Alignment with Dual Integrative Autoencoders) is a novel computational method designed to integrate single-cell RNA sequencing (scRNA-seq) with spatial proteomics data (e.g., CODEX) without relying on paired cell barcodes or direct gene-protein correspondence. The framework first identifies modality-specific “archetypes” representing extreme cell states, aligns them across modalities based on cell-type composition, and then trains dual variational autoencoders (VAEs) with cross-modal regularization to learn a shared, entangled latent space. This allows ARCADIA to infer spatial context for scRNA-seq cells and predict transcriptional programs tied to specific tissue niches. Applied to human tonsil data, ARCADIA successfully recovers spatially-resolved immune cell states—such as germinal center B cell hypermutation and T cell exhaustion programs—demonstrating superior performance over existing integration methods like MaxFuse and scMODAL in capturing spatially-dependent biological variation.
Personal highlights
Archetype-driven cross-modal alignment without feature linkage: ARCADIA uses Principal Convex Hull Analysis (PCHA) to identify extreme cell-state archetypes in each modality and aligns them based on cell-type composition profiles, enabling integration without requiring gene-protein correspondence or paired barcodes.
Dual variational autoencoders with multi-objective geometric regularization: the framework employs two VAEs, one for RNA, one for spatial protein, trained with a combined loss that includes reconstruction ELBO, anchor-guided cross-modal matching, cell-type structure preservation, and Maximum Mean Discrepancy (MMD) alignment, ensuring a biologically coherent shared latent space.
Spatially-aware protein feature engineering: ARCADIA augments protein expression data with neighborhood-aggregated features (mean protein expression from local cell neighbors), providing a spatially-informed input that captures microenvironmental influence beyond cell-intrinsic signals.
Interpretable mapping of spatial niches to transcriptional states: by predicting cell neighborhood (CN) labels for scRNA-seq cells, ARCADIA enables spatially-resolved differential expression analysis, revealing niche-specific programs such as B cell hypermutation in germinal centers and T cell exhaustion in peripheral regions.
Fasting Mimetics: How Glucocorticoid Activation Unlocks the Power of Fasting to Boost Breast Cancer Therapy
Padrão et al. Nature (2025). https://doi.org/10.1038/s41586-025-09869-0
The paper in one sentence
This study reveals that the anti-tumour benefits of fasting when combined with endocrine therapy for breast cancer are driven by activation of the glucocorticoid receptor, and these effects can be replicated, without fasting, using clinically approved corticosteroid drugs.
Summary
The research explores how periodic fasting enhances the efficacy of endocrine therapy (like tamoxifen) in hormone receptor-positive (HR+) breast cancer. Using mouse models and clinical data from patients on fasting-mimicking diets, the team found that fasting causes profound epigenetic reprogramming in tumours. This shifts activity away from pro-growth signals (AP-1) and towards tumour-suppressive steroid hormone receptors, specifically the glucocorticoid receptor (GR) and progesterone receptor (PR). Fasting increases levels of cortisol and progesterone, activating these receptors. Crucially, the study shows that knocking out GR abolishes the benefit of fasting, while administering a GR agonist (dexamethasone) mimics the fasting effect, boosting therapy and delaying resistance. This positions GR activation as a druggable strategy to replace dietary restriction.
Personal highlights
Epigenetic reprogramming as the fasting switch: fasting combined with tamoxifen triggers large-scale changes in the active enhancer landscape (H3K27ac), silencing pro-tumour AP-1 sites while opening up GR/PR-driven tumour-suppressive gene programs.
GR activation is non-negotiable for the fasting benefit: genetic knockout of the glucocorticoid receptor completely abolishes the synergistic anti-tumour effect of fasting plus tamoxifen, proving GR is the central mediator, not just a bystander.
A fasting mimic in a pill: the glucocorticoid drug dexamethasone phenocopies the combined effect of fasting and endocrine therapy across multiple models (cell lines, xenografts, PDXs), achieving tumour regression without requiring dietary change.
Resolving a clinical paradox: the work offers a mechanistic lens for past contradictory trials on progesterone receptor modulators, suggesting that the anti-tumour effects of certain drugs like mifepristone may be partly due to their off-target GR agonist activity.
Somatic evolution following cancer treatment in normal tissue
Pich et al., Nature (2025). DOI: 10.1038/s41586-025-09792-4
The paper in one sentence
This study reveals how cancer treatments, including chemotherapy and immunotherapy, leave lasting mutational footprints and drive clonal selection in normal tissues, reshaping our understanding of treatment-related cancer risk and somatic evolution.
Summary
Using high-depth duplex sequencing of 168 cancer-free tissue samples from 22 autopsied cancer patients, the authors mapped how chemotherapy, immunotherapy, smoking, and alcohol exposure shape somatic mutations and selection across 16 organs. They identified treatment-specific mutational signatures, quantified mutation burdens relative to aging, and showed that even non-mutagenic therapies like immunotherapy can promote the expansion of clones carrying driver mutations in genes such as TP53 and PPM1D. The work highlights the profound and tissue-specific impact of lifetime exposures on the genomic landscape of normal cells.
Personal highlights
High-sensitivity mutation mapping in normal tissues: uses duplex sequencing (>30,000× coverage) to detect ultra-rare somatic mutations (VAF ~0.00003) across 16 organs, revealing a hidden landscape of treatment-induced and lifestyle-associated genomic scars.
Treatment-specific mutational signatures identified: links distinct mutational patterns to platinum chemotherapy, temozolomide, chlorambucil, and radiotherapy, and quantifies their contribution to driver mutations in genes like TP53, NFE2L2, and PPM1D.
Non-mutagenic therapies as selection pressures: shows that immunotherapy, while not increasing mutation rates, promotes clonal expansion of cells with TP53 and PPM1D mutations, demonstrating how therapy can sculpt evolution without direct DNA damage.
Tissue-specific vulnerability to mutagenesis: reveals that tissues vary dramatically in susceptibility; for example, blood accumulates platinum-induced mutations equivalent to 27 years of aging after just 6 cycles, whereas lung tissue is more resistant.
Lifetime exposure integration: models the cumulative impact of smoking, alcohol, and treatments on mutation burden, showing that exogenous sources account for >40% of mutations in liver but <10% in brain.
Why should we care?
This study provides the first systematic evidence that chemotherapy and immunotherapy can drive the evolution of pre-cancerous clones in healthy tissues, which may influence long-term cancer risk and aging. For patients, it underscores the importance of weighing treatment benefits against potential genomic consequences. For researchers and clinicians, it offers a new lens through which to understand secondary malignancies, therapy-related toxicity, and personalized risk assessment.
Revealing the Drivers Underlying Distinct Evolutionary Trajectories in Lung Adenocarcinoma
Wirth et al., bioRxiv (2025). doi:10.64898/2025.12.19.695410
The paper in one sentence
This study identifies three distinct evolutionary paths in lung adenocarcinoma—two driven by EGFR in never-smokers and one by KRAS in smokers, and reveals that one in six never-smokers follows a smoking-like trajectory, characterized by KRAS mutations, lower genomic instability, and faster progression.
Summary
Using whole-genome sequencing of 550 lung adenocarcinomas, the authors applied a Plackett-Luce ordering model to uncover three evolutionary trajectories: two dominant in never-smokers (NSD-Loss and NSD-Gain) and one dominant in smokers (SD). Surprisingly, 18% of never-smoker tumours follow the smoking-dominant path, driven largely by KRAS mutations. These tumours evolve more rapidly, with fewer genomic alterations and shorter latency, and are three times more common in people of European vs. East Asian ancestry. The findings suggest that KRAS enables aggressive, smoking-like progression without the need for high genomic instability, with implications for personalized treatment strategies.
Personal highlights
De novo discovery of evolutionary trajectories using event ordering: the study employs a Plackett-Luce model to reconstruct the sequence of genomic events in each tumour, revealing three distinct evolutionary paths based on the timing and type of alterations, rather than static genomic snapshots.
KRAS as a smoking-mimic driver in never-smokers: in never-smokers, KRAS mutations are strongly associated with the smoking-dominant trajectory, enabling rapid progression with fewer copy number changes and lower genomic instability compared to EGFR-driven tumours.
Ancestry-linked differences in evolutionary route: never-smokers of European ancestry are nearly three times more likely to develop tumours following the smoking-like trajectory than those of East Asian ancestry, highlighting the role of genetic background in tumour evolution.
Divergent pathways to similar copy number states: despite different routes, whole-genome duplication with losses vs. incremental copy number gainsm tumours converge toward similar overall copy number profiles, suggesting broad selective pressures for oncogene-tumour suppressor balance.
Clinical implications for targeted and immunotherapy: KRAS-mutant never-smoker tumours on the smoking-like path may respond better to KRAS inhibitors but less to immunotherapy due to lower tumour mutational burden, underscoring the need for trajectory-informed treatment selection.
Why should we care?
This work reframes how we understand lung cancer evolution in never-smokers. By moving beyond smoking status as the sole classifier, it reveals that a significant subset of never-smokers follow a smoking-like genomic path driven by KRAS. This has direct clinical relevance: these tumours progress faster but with quieter genomes, meaning they may be missed by immunotherapy biomarkers but could be targeted with emerging KRAS inhibitors
FastCCC: a permutation-free framework for scalable, robust, and reference-based cell-cell communication analysis in single cell transcriptomics studies
Hou et al., Nature Communications (2025). https://doi.org/10.1038/s41467-025-66272-z
The paper in one sentence
FastCCC is a fast, permutation-free, and reference-aware computational toolkit for detecting ligand–receptor interactions in single-cell RNA-seq data at scale, enabling robust and interpretable cell-cell communication inference across millions of cells.
Summary
FastCCC addresses bottlenecks in cell-cell communication (CCC) analysis by introducing an analytic permutation-free framework for p-value calculation via convolution and Fast Fourier Transform, a modular scoring system with 16 interaction patterns, and a novel reference-based inference paradigm leveraging a newly constructed human CCC reference panel of ~16 million cells across 19 tissues. It improves speed, scalability, and robustness over existing methods, as demonstrated in large-scale COVID-19 and thymus developmental datasets, while enabling biologically meaningful discovery even in small or biased query datasets through reference-guided analysis.
Personal highlights
Permutation-free analytic p-value calculation via convolution and FFT: FastCCC replaces computationally intensive permutation tests with an analytical solution derived from convolution of ligand and receptor expression distributions, accelerated by Fast Fourier Transform—making it orders of magnitude faster while maintaining or improving accuracy.
Modular and flexible communication scoring with 16 interaction patterns: instead of relying on a single communication score, FastCCC introduces a multi-layered algebraic framework that captures diverse ligand–receptor interaction modes (mean, quantile, geometric/arithmetic means, subunit-aware aggregation), ensuring robustness across biological contexts and complex interaction types.
First human CCC reference panel for context-aware inference: the authors constructed a comprehensive reference from ~16 million cells across 19 tissues and >450 cell types, enabling reference-based CCC analysis that corrects for dataset-specific biases and enhances discovery in smaller or skewed query data.
Scalable to million-cell datasets with minimal memory footprint: FastCCC efficiently processes datasets with over 2 million cells in under 20 minutes, using significantly less memory than existing tools—making it the only method currently capable of analyzing atlas-scale single-cell data without subsampling.
Why should we care?
FastCCC removes the permutation bottleneck, enabling rapid, reproducible CCC analysis even on massive datasets.
Pertpy: an end-to-end framework for perturbation analysis
ref
The paper in one sentence
Pertpy is a Python-based modular framework designed for scalable, end-to-end analysis of single-cell perturbation data, integrating metadata annotation, analysis, and visualization in a unified and extensible workflow.
Summary
Pertpy addresses the growing complexity and scale of single-cell perturbation experiments by providing a comprehensive, open-source framework for analyzing genetic, chemical, and disease-related perturbations. Built within the scverse ecosystem, it offers harmonized access to datasets, metadata from public databases, and a wide range of fast, user-friendly methods, from guide RNA assignment and differential expression to perturbation-space embeddings and multicellular program discovery. Through three diverse use cases (CRISPRa screens, drug response deconvolution, and triple-negative breast cancer treatment analysis), the authors demonstrate Pertpy’s flexibility, speed, and ability to derive biologically meaningful insights.
Personal highlights
Unified perturbation space embeddings: Pertpy introduces the concept of perturbation spaces, learned representations that summarize cellular responses per perturbation rather than per cell, enabling direct comparison of perturbation effects and revealing shared and divergent biological programs.
Metadata-aware analysis pipelines: the framework automatically annotates perturbations and cell lines with curated ontologies and public resources (DepMap, GDSC, CMap, PubChem), integrating biological context directly into the analysis workflow for more interpretable and robust results.
Scalable, GPU-accelerated implementations: by leveraging JAX and parallelized algorithms, Pertpy offers optimized versions of key methods (Mixscape, scCODA, Augur, CINEMA-OT) that are significantly faster and more memory-efficient than original implementations, enabling analysis of datasets with millions of cells.
Multimodal and multicellular insight extraction: Pertpy extends beyond single-cell differential expression to identify multicellular programs (MCPs) via DIALOGUE, quantify compositional shifts with Bayesian models, and infer causal perturbation effects using optimal transport, capturing tissue-level and cell-type-specific responses.
Why should we care?
Pertpy provides one of the first end-to-end, scalable frameworks that unifies data access, annotation, analysis, and interpretation. For researchers, it dramatically reduces the overhead of integrating scattered databases and incompatible tools, enabling faster, more reproducible, and biologically contextualized insights. By making state-of-the-art perturbation methods accessible, efficient, and interoperable in Python, Pertpy empowers both computational and wet-lab scientists to explore complex perturbation screens, from CRISPR and drug screens to disease cohorts, with greater depth, speed, and confidence. In addition it is a paper from the Theis lab… so yeah… that is a badge of quality.
Thanks for reading.
Cheers,
Seb.


