Weekly reads 18/05/26
Twin tumors, spatial maps, and stromal rewiring
This week’s reads focus on a common challenge in modern biology: how to disentangle signal from context, whether that means reconstructing the embryonic origin of a childhood cancer, separating true cellular expression from spatial contamination, or identifying the stromal programs that suppress anti-tumor immunity. One study uses naturally occurring developmental mutations to trace a rare sarcoma through monozygotic twins, showing that the tumor arose in one fetus and metastasized in utero to the other . There are several computational advances that are pushing spatial biology towards tissue-scale resolution and reliability: HESTIA and Cellist address the growing challenge of analyzing and segmenting million-cell spatial datasets, SpatialArtifacts detects damaged regions that resemble biological signal, and DeSpotX mathematically addresses contamination in single-cell spatial transcriptomics via identifiable generative modeling. A bispecific antibody approach that targets pathogenic fibroblasts alone and spares systemic TGFβ toxicity is used to address the central conductors of immune suppression in lung cancer, LRRC15-positive fibroblasts on the tumor microenvironment side.
Preprints/articles that I managed to read this week
Embryonic origin of cancer in newborn twins
Walkowiak et al. bioRxiv (2026). 10.64898/2026.05.10.722519
The paper in one sentence
Whole‑genome sequencing of 23 normal tissues, 10 tumour samples, and 11 placental samples from newborn monozygotic twins with a rare sarcoma reveals that the tumour arose in one twin and spread in utero to the other, while early embryonic lineages contributed asymmetrically to the placenta and each twin.
Summary
The authors investigated the origin of an MN1::ZNF341‑rearranged undifferentiated sarcoma in newborn monozygotic twin girls (twin A with disseminated lesions, twin B with brain and skin lesions). They performed whole‑genome sequencing (WGS) on six normal samples per twin (various organs), 11 bulk placental samples, 10 tumour samples (eight from twin A, two from twin B), and 12 laser‑capture microdissected (LCM) trophoblast samples. After filtering out germline variants and sequencing artefacts, they identified 254 early embryonic somatic mutations (mosaic variants arising during development) by requiring: ≥300‑600 reads spanning the position across normal samples, VAF ≥0.1 in at least one sample, no significant strand bias, and high mapping quality. These mutations were used as lineage tracers to reconstruct twinning phylogeny. They also called clonal tumour mutations and copy number alterations (CNAs; loss of 1q and 18p) using ASCAT. Three mutation groups emerged: (A) mutations present in both twins (shared lineage), (B) twin‑A‑specific mutations (VAF ~0.5 in twin A, absent in twin B), and (C) twin‑B‑specific mutations. Unexpectedly, placental samples were dominated by twin B lineages (not equal contribution). The twinning phylogeny showed asymmetric fates: one lineage gave rise almost exclusively to twin A, another to both twins and placenta, and a third to twin B and placenta. For the tumour, all tumour samples (including those from twin B) carried twin‑A‑specific mutations but not twin‑B‑specific mutations, proving a single origin in twin A followed by in utero metastasis to twin B. Subclonal tumour analysis (13 mutation clusters based on VAFs across samples) revealed parallel evolution after the MN1::ZNF341 fusion, with different clones seeding different metastatic sites. Finally, using VAFs of twin‑specific mutations, they estimated substantial cell transfer (twin‑to‑twin transfusion) in both directions.
Personal highlights
Developmental somatic mutations as natural lineage tracers: instead of using experimental barcodes, the authors leveraged spontaneously occurring somatic mutations acquired during the first cell divisions of the embryo. By sequencing multiple normal tissues from both twins and the placenta, they identified 254 early mutations whose VAF patterns distinguish three embryonic lineages with asymmetric contributions to each twin and the placenta.
Stringent filtering to distinguish early embryonic from germline and artefactual variants: somatic variants were called with CaVEMan and filtered by depth (300‑600 reads), VAF ≥0.1, absence of strand bias, and manual JBrowse inspection. Germline variants were removed using a binomial test (VAF >0.5 expected for heterozygous germline). Variants in copy‑number‑altered regions were excluded. This rigorous pipeline is essential for reliable lineage tracing in a single family without technical replicates.
Estimating tumour cell infiltration in normal tissues using phased CNAs: tumours had loss of 1q and 18p. By phasing heterozygous SNPs in these regions (assigning alleles to the lost vs retained chromosome based on VAF >0.75 or <0.25 in a pure tumour sample), they calculated the fraction of tumour cells in each normal sample from the median VAF deviation from 0.5. This provided purity estimates consistent with direct counting of clonal tumour mutations, enabling correction for cross‑sample contamination.
Subclonal tumour phylogeny from mutation cluster VAFs: clonal tumour mutations (present in all samples) were distinguished from subclonal ones by clustering VAFs across eight tumour samples. Thirteen clusters emerged, revealing that the MN1::ZNF341 fusion was an early (truncal) event, followed by parallel evolution: e.g., cluster 11 mutations absent from the right parietal tumour, cluster 12 unique to that tumour, cluster 13 unique to liver metastasis. This demonstrates within‑patient metastatic heterogeneity.
Targeting LRRC15 in cancer-associated fibroblasts modifies the extracellular matrix and enhances tumor immune responses to suppress lung cancer progression
Qi et al. Cancer Research (2026). 86(10):2377–2392
The paper in one sentence
A specific type of fibroblast in lung tumors, marked by the protein LRRC15, promotes cancer growth by remodeling the surrounding matrix and polarizing immunosuppressive macrophages, and targeting LRRC15 with a novel bispecific antibody slows tumor progression in mice.
Summary
This study investigates LRRC15-positive cancer-associated fibroblasts (CAFs), a tumor-specific cell population enriched in lung cancer. Using single-cell transcriptomics of human and mouse samples, the authors show that LRRC15+ CAFs are associated with poor patient survival. Mechanistically, LRRC15 in CAFs drives the production of extracellular matrix components, particularly collagen I, which in turn promotes the polarization of CD206+ "M2-like" macrophages. These macrophages suppress CD8+ T-cell activity, creating an immunosuppressive environment that favors tumor growth. Genetic deletion of LRRC15 in CAFs reduces collagen deposition, decreases M2 macrophage polarization, restores CD8+ T-cell cytotoxicity, and slows tumor progression in multiple mouse models—effects that are dependent on an intact immune system. Finally, the authors develop a bispecific antibody that simultaneously targets LRRC15 and neutralizes TGFβ (the cytokine that induces LRRC15 expression). This antibody reduces LRRC15 expression in CAFs, limits tumor growth, and avoids the systemic toxicity associated with broad TGFβ inhibition.
Personal highlights
LRRC15+ CAFs are tumor-enriched and prognostic: in lung cancer patients, LRRC15+ CAFs constitute ~40% of all fibroblasts within tumors but are nearly absent in adjacent normal tissue, and their signature correlates with worse survival.
LRRC15 drives macrophage polarization via ECM remodeling: LRRC15 deficiency in CAFs reduces collagen I production, and this diminished extracellular matrix directly limits the polarization of macrophages toward an immunosuppressive CD206+ phenotype.
Immune-dependent tumor suppression: genetic deletion of LRRC15 in CAFs slows lung tumor growth in immunocompetent mice but has no effect in immunodeficient NSG mice or in direct co-culture with cancer cells, confirming that the effect is mediated by the immune system.
Macrophages as critical mediators: depleting macrophages abolishes the tumor-suppressive effect of LRRC15 deletion, placing macrophages downstream of LRRC15+ CAFs in the immunosuppressive cascade.
A bispecific antibody with improved safety: An LRRC15-TGFβ trap antibody preferentially accumulates in LRRC15+ CAFs within tumors, reduces LRRC15 expression and ECM density, suppresses tumor growth in mice, and avoids the splenomegaly seen with systemic TGFβ blockade.
Why should we care?
Tumors are not just masses of cancer cells, they are complex ecosystems. Fibroblasts, a type of connective tissue cell, can be “corrupted” by tumors to become allies that shield the cancer from the immune system. The researchers identified a specific protein called LRRC15 on these corrupted fibroblasts that acts like a master switch. When they blocked LRRC15 in mice, the fibroblasts stopped building dense barriers and no longer instructed immune cells called macrophages to suppress the body’s cancer-killing T-cells. The broader takeaway is that instead of trying to kill cancer cells directly, we might be able to reprogram the tumor’s supportive environment. The bispecific antibody developed here is particularly clever: it targets LRRC15 to deliver a TGFβ-blocking “payload” only to the problematic fibroblasts, avoiding the serious side effects that occur when TGFβ is blocked throughout the body. While this work is still in preclinical mouse models, it highlights a promising strategy for making lung tumors more vulnerable to the immune system, an approach that could eventually be combined with existing immunotherapies.
HESTIA: Scalable Multimodal Integration of Histology and High-Resolution Spatial Transcriptomics for Robust Spatial Domain Identification
Zhong et al. bioRxiv (2026). 10.64898/2026.05.14.723098v1
The paper in one sentence
HESTIA is a computationally efficient algorithm that integrates tissue histology images with high-resolution spatial gene expression data to map tissue structures at single-cell scale, overcoming the memory failures and data sparsity that plague existing methods.
Summary
Modern spatial transcriptomics technologies (e.g., Stereo-seq, Visium HD) can now map gene expression across entire tissue sections at subcellular resolution, generating datasets with millions of data points. However, existing analysis tools were not designed for this scale, they run out of memory or produce noisy results due to extreme sparsity (most genes are undetected in any given spot). The authors present HESTIA, a multimodal algorithm that addresses both challenges. HESTIA uses a hierarchical vision transformer to extract features from H&E-stained histology images, and a novel dual-autoencoder system that simultaneously processes high-resolution and spatially aggregated low-resolution transcriptomic data. A cross-resolution consistency constraint stabilizes sparse signals. The fused representation is then used for spatial domain identification (clustering). Benchmarking on a mouse brain dataset (Stereo-seq, >818,000 bins at highest resolution) shows that HESTIA is the only method among nine that can process the full dataset without memory failure. It achieves superior clustering accuracy and spatial continuity compared to eight competing algorithms. Applied to a human lung adenosquamous carcinoma sample (>2 million bins) and two colorectal cancer Visium HD datasets, HESTIA identifies clinically relevant intratumoral heterogeneity, including an immune-active B-cell niche in lung cancer and tumor subdomains associated with REG family genes or SPP1+ macrophages in colorectal cancer. Ablation studies confirm that the dual-resolution design provides greater benefits for sparser, lower-quality data.
Personal highlights
Unmatched scalability: HESTIA is the only multimodal method capable of processing a full-slice Stereo-seq dataset with >818,000 bins (or a 2-million-bin human lung cancer sample) on a single GPU with <120 GB RAM, while eight competing algorithms fail due to out-of-memory errors at much lower resolutions.
Dual-autoencoder with cross-resolution consistency: by learning from both high-resolution and spatially aggregated low-resolution transcriptomic data simultaneously, HESTIA stabilizes sparse molecular signals and improves clustering accuracy—a benefit that is most pronounced for lower-quality or sparser datasets.
Superior spatial domain identification in mouse brain: at both bin20 (grid) and single-cell (cellbin) resolution, HESTIA accurately delineates fine anatomical structures (e.g., CA1/CA3 stratum pyramidale, dentate gyrus, corpus callosum) with higher adjusted Rand index and spatial continuity than SpaGCN, MUSE, StereoMM, ConGR, and other competitors.
Clinically relevant discoveries in human cancer: in a big lung adenosquamous carcinoma sample, HESTIA identified an immune-active niche within the squamous region enriched for B-cell immunity genes (CCL19, IGLC2, IGLC3, MZB1). In colorectal cancer, it resolved intratumoral heterogeneity—one sample showed REG family gene-expressing tumor subdomains (linked to invasion and poor differentiation), another revealed tumor subdomains co-localized with pro-tumorigenic SPP1+ macrophages.
Robust to transcriptomic sparsity: ablation studies show that the dual-resolution design yields greater performance gains when sequencing depth is low, demonstrating that HESTIA effectively mitigates the gene dropout problem inherent to high-resolution spatial platforms.
Why should we care?
Spatial transcriptomics allows scientists to create detailed molecular maps of tissues showing exactly which genes are active in each cell and where. HESTIA solves this problem by being smart about memory usage and by compensating for the inevitable gaps in the data (because no technology can measure every gene in every cell). With HESTIA, researchers can now analyze entire cancer biopsies at single-cell resolution to identify subtle but clinically important features, such as an immune hotspot that might predict immunotherapy response, or a tumor boundary that is actively invading healthy tissue.
Cellist: Accurate, Scalable and Cross-Platform Cell Identification for High-Resolution Spatial Transcriptomics
Sun et al. Nature Genetics (2026). 10.1038/s41588-026-02610-1
The paper in one sentence
Cellist is a computational method that integrates tissue images with gene expression data to accurately assign transcripts to individual cells across diverse spatial transcriptomics platforms, overcoming the memory and accuracy limitations of existing tools.
Summary
High-resolution spatial transcriptomics technologies can now map gene expression at subcellular resolution, but identifying which transcripts belong to which cell, remains a major bottleneck. Existing methods are either platform-specific, computationally too slow for large datasets, or fail to preserve the biological integrity of gene expression within cells. The authors introduce Cellist, a multimodal approach that combines nuclear staining images with spatial gene expression to segment cells. Cellist first identifies nuclei from images (using Watershed or Cellpose), then uses a probabilistic model that balances expression similarity and physical distance to assign surrounding transcripts to the correct cell. Benchmarking on nine datasets across five platforms (Stereo-seq, Seq-Scope, seqFISH+, STARmap, and 10x Xenium) shows that Cellist consistently achieves higher within-cell expression consistency, better cell-type annotation accuracy, and superior computational efficiency compared to existing methods (SCS, StereoCell, Baysor, UCS). In an application to post-neoadjuvant immunotherapy NSCLC samples, Cellist-enabled segmentation revealed spatially distinct tumor clones with different stemness signatures and identified macrophage subtypes (CXCL9, SPP1, TREM2) with unique spatial distributions at the tumor-stroma boundary, offering insights into therapy response.
Personal highlights
Cross-platform versatility: Cellist works on both barcoding-based platforms (Stereo-seq, Seq-Scope) and imaging-based platforms (seqFISH+, STARmap, Xenium), unlike most existing methods that are designed for only one technology family.
Superior within-cell expression purity: using novel metrics (random correlation, directional split correlation, and variance-based purity score), Cellist consistently outperforms competing methods in preserving transcriptomic coherence within segmented cells, meaning cleaner, less contaminated single-cell profiles.
Scalable to massive datasets: Cellist processes a full Stereo-seq mouse brain dataset (~140,000 cells) and a human NSCLC dataset (>2 million bins) on a single GPU with moderate memory, while methods like SCS fail on large samples due to memory constraints.
Improved cell-type annotation: in mouse brain, Cellist-segmented cells showed higher correlation with matched scRNA-seq reference data and yielded more specific marker gene expression (higher log fold-changes) compared to other segmentation methods.
Biological discovery in NSCLC: applied to post-immunotherapy lung cancer samples, Cellist identified two tumor clones with distinct copy number alterations, one with higher cancer stem cell signatures. It also revealed spatial co-localization of CXCL9 and SPP1 macrophages at the tumor boundary, with opposing roles in T-cell recruitment versus exclusion.
Why should we care?
Spatial transcriptomics allows scientists to see which genes are active in exactly which cells within a tissue like a molecular Google Maps of a tumor or brain. However, the raw data are messy: transcripts (gene readouts) are scattered around, and no one has perfect boundaries around each cell. If you assign a transcript to the wrong cell, your conclusions about what that cell is doing will be wrong. Cellist is like a smart sorting algorithm that uses both the tissue image (where the nucleus is) and the gene expression patterns to decide which transcripts belong to which cell. It works across different technologies and scales to millions of cells without crashing
SpatialArtifacts: a computational framework for tissue artifact detection in spatial transcriptomics data
He et al. bioRxiv (2026). 10.64898/2026.05.15.725260
The paper in one sentence
SpatialArtifacts is a computational method that uses mathematical morphology operations to detect and classify spatially contiguous tissue artifacts (such as dry spots and edge damage) in spatial transcriptomics data, enabling precise removal of technical noise while preserving biologically meaningful low-expression regions.
Summary
Spatial transcriptomics allows researchers to map gene expression across tissue sections, but technical artifacts from sample preparation: tissue lifting, folding, uneven reagent coverage create regions of artificially low RNA capture. Existing quality control methods either remove spots based on fixed global thresholds (which mistakenly discard biologically valid low-expression areas like brain white matter) or use local neighborhood statistics that miss large, irregularly shaped edge artifacts. The authors present SpatialArtifacts, a framework that first identifies outlier spots using median absolute deviation (MAD) thresholds, then applies morphological image processing operations (3×3 fill, 5×5 outline, star-pattern connectivity) to connect these outliers into coherent patches that match the irregular geometry of real tissue damage. The method classifies artifacts into four categories (large/small edge, large/small interior) and provides spot-level coordinates for targeted removal. Validation across human hippocampus, dorsolateral prefrontal cortex (DLPFC), and colorectal cancer datasets on both Visium and VisiumHD platforms shows that SpatialArtifacts removes 2–3% of spots compared to 13–22% removed by BLADE or 23% by global thresholds, while preserving known anatomical structures. Benchmarking against SpotSweeper and BLADE reveals complementary strengths: SpotSweeper excels at isolated low-quality spots, BLADE detects slide-level edge effects but lacks precision, and SpatialArtifacts fills the gap for spatially coherent regional artifacts.
Personal highlights
Morphological operations adapted from computer vision: SpatialArtifacts applies focal kernels (3×3 fill, 5×5 outline, star-pattern) to connect outlier spots into irregularly shaped artifact regions, mimicking how pathologists identify tissue damage from images.
Preserves biologically meaningful low-expression regions: unlike global UMI thresholds that remove 23% of spots (including healthy white matter and mucosal crypts), SpatialArtifacts removes only 1.9–3.4% of spots, correctly retaining areas with naturally low transcription.
Hierarchical classification of artifact types:aArtifacts are categorized as large/small and edge/interior, enabling flexible filtering strategies (e.g., automatically remove edge artifacts but flag large interior artifacts for manual review).
Cross-platform compatibility: works on both standard Visium (hexagonal grid, ~5,000 spots) and VisiumHD (square grid, >130,000 bins) with resolution-aware parameter scaling, maintaining physical coverage equivalent across platforms.
Independent validation with expert annotations: in human DLPFC data, the 87 spots previously labeled as “Unannotated” by domain experts were entirely identified as artifacts by SpatialArtifacts, demonstrating automated recovery of manual quality control decisions.
DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics
Wang and Gentles, bioRxiv (2026). 10.64898/2026.05.12.724704
The paper in one sentence
A deep generative model that uses anchor genes (genes absent in a given cell cluster) to uniquely separate native expression from spatially structured ambient contamination in single-cell-resolution spatial transcriptomics data.
Summary
In single-cell-resolution spatial transcriptomics (platforms such as Xenium, MERFISH, CosMx, and Stereo-seq), 20–40% of transcripts are assigned to neighboring cells due to ambient diffusion, segmentation errors, or tissue overlap. This contamination compromises cell-type annotation, spatial expression patterns, and cell-cell communication networks. Existing decontamination methods face three fundamental challenges: (i) the native and contamination components cannot be uniquely separated from observed counts (non-identifiability), (ii) contamination is spatially local but most methods use a single global ambient profile, and (iii) low-expression genes are vulnerable to over-correction. The authors introduce DeSpotX, a deep generative model that addresses each challenge. For identifiability, it defines anchor genes, genes not natively expressed in a given cell cluster, inferred automatically from per-cluster expression rates, and proves mathematically that these constraints restore a unique decomposition. For spatial structure, it estimates contamination locally using a cluster-masked, distance-weighted average over cross-cluster spatial neighbors, excluding same-cluster cells to avoid signal dilution. For signal preservation, a learned diffusion prior regularizes latent expression states, preventing over-correction of low-but-real biological signal. Benchmarking on spike-in simulations across five datasets spanning four platforms shows DeSpotX achieves AUROC >0.94 on every dataset, outperforming SoupX, DecontX, ResolVI, and SpaceBender by 0.02–0.12, with the lowest per-cell and global calibration errors. On real tissues, decontaminated counts produce cleaner cluster separation in UMAP embeddings, tighter marker-gene localization to canonical cell types, and increased spatial autocorrelation (Moran’s I) for biologically relevant genes. The method is robust to inaccuracies in anchor masks and cell-cluster labels, and runs in 16–21 minutes on a million-cell dataset—substantially faster than competing deep-learning methods.
Personal highlights
Identifiability via anchor genes: the first method to formally prove non-identifiability of the native–contamination decomposition and restore identifiability using anchor genes, automatically identified from per-cluster expression rates, providing a provable guarantee rather than heuristic regularization.
Spatially local, cluster-isolated contamination estimation: estimates contamination from cross-cluster spatial neighbors only, using distance-weighted averaging and a graph attention network encoder. The cluster mask excludes same-cluster neighbors, preventing dilution of native signal and enabling the method to recover the cross-cluster contamination fraction that drives downstream artifacts.
Diffusion prior preserves low-expression signal: a denoising diffusion prior learned jointly with the model regularizes latent expression states, preventing over-correction of genuinely low but biologically important signal.
Superior benchmark performance across platforms: on spike-in simulations spanning Xenium, MERFISH, CosMx, and Stereo-seq, DeSpotX achieves AUROC >0.94 on every dataset, with gains of 0.02–0.12 over the best baseline (ResolVI), and the lowest per-cell calibration error, indicating that accurate per-cell contamination estimates drive global calibration rather than error cancellation.
Other papers that peeked my interest and were added to the purgatory of my “to read” pile
Differential assembly of mouse and human tumor microenvironments
Refining sequence-to-activity models by increasing model resolution
Genome instability triggers intercellular DNA transfer between human cells
Humanized patient-derived xenografts preserve tumour-specific immune microenvironments
A deep-learning framework reveals whole-body perturbations at cell level
An AI system to help scientists write expert-level empirical software
Thanks for reading.
Cheers,
Seb.


