Weekly reads 31/3/25
Over the past week, I’ve been diving into some of the latest papers and preprints in single-cell and spatial omics, with a focus on integration strategies, computational frameworks, tumor microenvironments, and early development. From benchmarking multimodal data integration for human reference atlases to novel spatial models disentangling cellular contexts, these studies reflect the rapidly evolving landscape of high-resolution tissue biology.
Preprints/articles that I managed to read this week
Systematic evaluation of single-cell multimodal data integration for comprehensive human reference atlas
Acera-Mateos, M., Adiconis, X., Li, J-K., Marchese, D., Caratù, G., Hon, C.-C., Tiwari, P., Kojima, M., Vieth, B., Murphy, M. A., Simmons, S. K., Lefevre, T., Claes, I., O'Connor, C. L., Menon, R., Otto, E. A., Ando, Y., Vandereyken, K., Kretzler, M., Bitzer, M., Fraenkel, E., Voet, T., Enard, W., Carninci, P., Heyn, H., Levin, J. Z., Mereu, E. (2025). bioRxiv Preprint.
The paper in one sentence
This paper systematically evaluates the integration of single-cell multimodal data (RNA-seq and ATAC-seq) for a comprehensive human reference atlas, providing insights into cell type resolution and marker detection.
Summary
This research focuses on integrating diverse single-cell sequencing modalities—namely, single-nucleus RNA-seq (snRNA-seq) and single-nucleus ATAC-seq (snATAC-seq)—to generate a more precise human reference atlas. The authors develop and apply computational models to assess how well different multimodal integration strategies resolve human cell types and identify cell-specific markers across various organs. Through a thorough comparison of integration methods, they identify optimal strategies to enhance the accuracy of cell type detection and the identification of biologically significant markers.
Personal highlights
Multimodal integration and computational tools: The paper introduces scOMM, a supervised machine learning framework designed to integrate multimodal data (RNA and chromatin accessibility). This tool provides interpretable feature importance, which is crucial for identifying the contribution of each modality to cell-type classification.
Benchmarking integration strategies: The study provides a rigorous benchmarking of different integration strategies, including horizontal (same data type across samples), vertical (different data types within the same samples), and diagonal/mosaic integration (paired and unpaired data). The authors identify that while vertical integration yields some improvements in identifying cell subtypes, horizontal integration offers higher performance when it comes to refining cell type resolution
Challenges with data quality and consistency: While the multimodal data integration strategy significantly improves cell type identification, the paper acknowledges technical noise, such as batch effects and cell dissociation biases, which still impact the results. For instance, scRNA-seq and snRNA-seq data can yield complementary but often inconsistent findings, requiring careful harmonization and filtering. There is also a lack of exploration into how these noise factors influence rare cell type detection across different tissue types
Rare cell type detection and functional annotation: One of the most promising outcomes is the discovery of rare cell types that were previously undetectable using single modalities. For example, the identification of WFDC2-expressing TAL cells, linked to lupus nephritis, demonstrates the potential of integrating these technologies for discovering clinically relevant cell populations.
Computational Model Limitations: The scOMM framework, while innovative, has its limitations in terms of scalability and applicability across different types of tissues and organs. The framework depends heavily on pre-existing reference datasets, which may not always be available for all organs or diseases
Why should we care?
This study highlights an important step toward creating more detailed maps of the human body by integrating different types of cell data. The ability to combine information about gene expression and chromatin accessibility allows for a deeper understanding of how cells work and interact in complex organs, like the kidney. This work could eventually lead to more accurate ways of identifying disease markers, improving diagnoses, and personalizing treatments in the future.
SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data
Dong, M., Kluger, Y., & Fraenkel, E. (2025). SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nature Communications, 16, 2990. https://doi.org/10.1038/s41467-025-58089-7
The paper in one sentence
SIMVI is a novel deep learning framework designed to disentangle intrinsic cellular variations (e.g., cell types) from spatially induced variations (e.g., gradients, interactions) in spatial omics data, offering new insights into cellular organization and spatial effects.
Summary
The paper presents SIMVI (Spatial Intrinsic Variational Inference), a deep variational inference model that integrates spatial omics data (e.g., gene expression, chromatin accessibility) to separate intrinsic and spatial-induced variations. The framework is based on a generative model that uses graph-based approaches to model cellular spatial dependencies, enabling the identification of spatial effects on cell states. This model is particularly useful in understanding spatial heterogeneity in tissues and its applications span from gene expression analysis to identifying spatial patterns of cell interactions.
Personal highlights
SIMVI introduces a deep variational inference approach to spatial omics data, specifically designed to separate intrinsic cellular variability (e.g., cell type, state) from spatial-induced variations (e.g., environmental gradients and cell interactions).
Disentanglement of intrinsic and spatial effects: One of the primary innovations of SIMVI is its ability to disentangle intrinsic factors (such as gene expression related to cell type) from spatial effects (such as spatial gradients and local cell interactions), which is often a challenge in existing methods.
Graph attention networks (GAT) for spatial modeling: SIMVI utilizes graph attention networks to capture spatial dependencies between neighboring cells, allowing for more precise modeling of spatial variations and better representation of the spatial organization of tissues at the single-cell level.
Rigorous theoretical support: The paper provides solid theoretical guarantees for the model’s identifiability, which ensures that SIMVI can reliably separate spatial and intrinsic variations.
Quantification of spatial effects: SIMVI introduces a novel method to quantify the spatial effect (SE), the influence of spatial context on gene expression, at a single-cell level. This enables a clearer understanding of how the local tissue environment impacts cellular behavior.
Excellent performance across datasets: The authors demonstrate that SIMVI outperforms existing methods in multiple spatial omics datasets, including human cortex, mouse hippocampus, and melanoma tissues, showcasing its versatility and robustness in handling diverse spatial omics technologies.
Why should we care?
SIMVI enables researchers to better understand how cell behavior is influenced by its environment, including spatial factors like gradients and interactions with neighboring cells. For non-experts, the key takeaway is that SIMVI can improve how we interpret the complex behavior of cells within tissues, potentially unlocking new insights into diseases like cancer and neurological disorders, where spatial organization is critical
Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer
Li, R., et al. (2022). Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer. Cancer Cell, 40, 1583–1599. https://doi.org/10.1016/j.ccell.2022.11.001
The paper in one sentence
This study uses single-cell RNA sequencing and spatial transcriptomics to reveal the complex cellular interactions and expression programs in kidney cancer, focusing on tumor cells, immune cells, and their microenvironment.
Summary
The authors employed multi-region-based genomic and single-cell transcriptomic profiling on tissues from 12 patients with kidney cancer to understand the intra-tumoral heterogeneity and its impact on tumor progression. They discovered distinct cellular states within the tumor microenvironment (TME), identified epithelial-mesenchymal transition (EMT) and other cancer-related expression programs, and explored the spatial correlation between these states and immune cells, particularly macrophages. The research also highlights the association between IL1B-expressing tumor-associated macrophages (TAMs) and high EMT-expressing cancer cells at the tumor-normal interface, providing insights into tumor progression.
Personal highlights
IL1B-expressing TAMs correlate with high EMT-expressing RCC cells at the tumor-normal interface: The study found that IL1B-expressing tumor-associated macrophages (TAMs) were spatially correlated with epithelial-mesenchymal transition (EMT)-high RCC cells at the tumor-normal interface. This suggests that IL1B signaling from TAMs may promote tumor cell invasiveness and contribute to disease progression. They also express AREG
Distinct polarization patterns of macrophages in the tumor microenvironment: The study identified nine distinct macrophage clusters, with six of them being classified as tumor-associated macrophages (TAMs) enriched in the tumor core and tumor-normal interface. These TAM subsets exhibited different polarization profiles, which may influence the aggressiveness of the tumor and its response to immune-based therapies.
IL1B produced by TAMs regulates EMT genes in RCC cells: The study demonstrated that IL1B-expressing TAMs (specifically from the TR Mac.2 subset) regulated the expression of genes associated with EMT in RCC cells. This suggests that IL1B signaling plays a pivotal role in driving the invasive behavior of cancer cells through the modulation of EMT pathways.
Expansion of CD8+ T cell clonotypes and their spatial localization: The study identified a strong link between CD8+ T cell clonotype expansion and their spatial localization within the tumor. Highly expanded clonotypes were found to be restricted to specific regions of the tumor, with exhausted T cells predominantly located in the tumor core and exhibiting high levels of exhaustion markers. The study also revealed that clonotypes with high expansion levels did not recirculate into peripheral blood, indicating a tissue-residency phenotype of these exhausted cells.
TCR clonotype exhaustion is influenced by spatial location rather than somatic mutations: The researchers found that the degree of exhaustion in CD8+ T cell clonotypes was more closely associated with their spatial location within the tumor rather than the somatic mutation profile. This suggests that the local microenvironment plays a crucial role in T cell exhaustion, independent of the genetic heterogeneity of the tumor.
Why should we care?
This research enhances our understanding of the immune-tumor interactions in kidney cancer, focusing on macrophages and their role in promoting invasive cancer behaviors. By identifying the spatial correlation between immune cells like IL1B-expressing macrophages and aggressive cancer cells, the study sheds light on potential therapeutic avenues for treating kidney cancer
Chromatin accessibility landscape of mouse early embryos revealed by single-cell NanoATAC-seq2
Li, M., Jiang, Z., Xu, X., et al. (2025). Chromatin accessibility landscape of mouse early embryos revealed by single-cell NanoATAC-seq2. Science, 387, eadp4319. https://doi.org/10.1126/science.adp4319
The paper in one sentence
This study presents a novel method, scNanoATAC-seq2, to map chromatin accessibility at single-cell resolution during mouse preimplantation development, revealing crucial insights into early lineage specification and epigenetic reprogramming.
Summary
Li et al. developed the scNanoATAC-seq2 technique to analyze chromatin accessibility in single cells, using long-read sequencing to overcome challenges in mapping repetitive regions and low-mappability genomic areas. The study focused on chromatin dynamics during the preimplantation stages of mouse embryos, including maternal-to-zygotic transition (ZGA) and cell lineage segregation, providing insights into early embryonic development, such as the regulation of X chromosome inactivation (XCI) and noncanonical imprinting processes. Their findings also highlight key transcription factors involved in these processes and offer a detailed chromatin accessibility map at different developmental stages.
Personal highlights
Introduction of scNanoATAC-seq2: The study presents scNanoATAC-seq2, a single-cell chromatin accessibility assay using long-read sequencing, enabling accurate mapping of chromatin states in scarce samples, particularly in repetitive genomic regions often missed by short-read methods.
Chromatin sccessibility during preimplantation Development: The method generates comprehensive chromatin accessibility profiles across various stages of mouse preimplantation development, revealing critical chromatin dynamics during ZGA and early lineage segregation, such as epiblast, primitive endoderm, and trophectoderm.
Analysis of repetitive elements: scNanoATAC-seq2 excels at analyzing repetitive elements like LINE1 and MERVL, uncovering their role in regulating gene expression during ZGA, which is difficult with short-read technologies.
Imprinted X Chromosome Inactivation (XCI): The method reveals detailed chromatin dynamics of XCI in female embryos, highlighting the shift in chromatin accessibility between Xist and Tsix domains, and showing how XCI is maintained or erased in specific cell lineages during development.
Lineage-Specific Chromatin Signatures: The study identifies lineage-specific chromatin accessibility patterns that regulate key transcription factors such as Nanog and Gata3, which are involved in pluripotency and cell fate decisions during early development.
Why should we care?
The development of scNanoATAC-seq2 offers a major advancement in understanding how our genes are regulated during the earliest stages of life. By examining chromatin accessibility at a single-cell level, this method allows scientists to pinpoint when and how genes are activated or silenced as cells decide their roles in the embryo. This research not only contributes to fundamental developmental biology but could also have implications for understanding early-life diseases, regenerative medicine, and fertility treatments.
Evidence of off-target probe binding in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel compromises accuracy of spatial transcriptomic profiling
Hallinan, C., Ji, H. J., Salzberg, S. L., & Fan, J. (2025). Evidence of off-target probe binding in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel compromises accuracy of spatial transcriptomic profiling. bioRxiv. https://doi.org/10.1101/2025.03.31.646342
The paper in one sentence
This paper develops and applies a tool, the Off-target Probe Tracker (OPT), to identify off-target probe binding in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel, highlighting the significant impact of such binding on the accuracy of spatial transcriptomic data.
Summary
The study addresses the challenge of off-target probe binding in spatial transcriptomics, specifically using the 10x Genomics Xenium platform. The authors developed a software tool, OPT, which predicts off-target binding by aligning probe sequences with annotated transcriptomes. Using OPT, they found that at least 21 out of 280 genes in the Xenium v1 Human Breast Gene Expression Panel were affected by off-target binding. This issue was further substantiated by comparisons with other spatial and single-cell transcriptomic data, showing that the expression patterns observed by Xenium may represent aggregate signals from both target and off-target genes. The findings suggest that such off-target binding can distort gene expression profiles and impact the biological interpretability of spatial transcriptomics data.
Personal highlights
Development of the Off-target Probe Tracker (OPT): The authors created OPT, a new open-source tool that aligns probe sequences to transcriptomes to systematically predict off-target binding—filling a critical gap in spatial transcriptomics quality control.
Discovery of widespread off-target binding in Xenium probes: Using OPT, the study identified off-target binding affecting at least 21 of the 280 genes in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel, undermining the accuracy of gene expression quantification.
Cross-validation with orthogonal datasets: The researchers compared Xenium data with both Visium and single-cell RNA-seq from the same tissue samples, confirming that the observed expression of several genes reflects aggregate signals from both target and off-target genes.
Flexible alignment modes reveal additional off-targets: by allowing mismatches at the ends of probes, OPT revealed even more potential off-target bindings—highlighting that minor sequence mismatches can still distort spatial expression patterns.
Call for greater transparency and improved design: The study advocates for publishing full probe sequences and using tools like OPT in both commercial and academic settings to improve probe design, ensure reproducibility, and prevent misleading interpretations in spatial transcriptomics.
Why should we care?
This work emphasizes the importance of probe specificity in spatial transcriptomics, a technology increasingly used in cancer research and tissue analysis. By demonstrating that off-target binding can compromise the accuracy of gene expression profiles, this study helps researchers better interpret their spatial transcriptomic data and encourages more reliable experimental design.
Scalable High-Performance Single-Cell Data Analysis with BPCells
Parks, B., & Greenleaf, W. (2025). Scalable high-performance single-cell data analysis with BPCells. bioRxiv. https://doi.org/10.1101/2024.11.06.565990
The paper in one sentence
BPCells is a powerful open-source toolkit that dramatically accelerates and scales single-cell data analysis, enabling efficient handling of massive datasets on standard hardware.
Summary
Single-cell sequencing is transforming biology, but the volume of data it generates can be overwhelming for existing analysis tools. Parks and Greenleaf introduce BPCells, a new open-source software library designed to tackle this challenge. By optimizing memory usage and computation with advanced data encoding and storage techniques, BPCells allows researchers to perform complex analyses—like normalization, feature selection, and dimensionality reduction—on datasets with hundreds of millions of cells, all while using common desktop machines or modest servers. The system integrates seamlessly with popular bioinformatics tools and is highly extensible, making it suitable for both developers and biologists.
Personal highlights
BPCells enables analysis of datasets with 100 million+ cells on a single workstation.
Outperforms existing tools like Seurat and Scanpy in both memory efficiency and speed.
Uses smart encoding (e.g. block-wise compression) to reduce storage size without sacrificing performance.
Supports both RNA and protein data (e.g. CITE-seq), expanding applicability across modalities.
Open-source and compatible with existing pipelines—easy to adopt and extend.
Why should we care?
As the scale of single-cell experiments grows, traditional data analysis tools struggle to keep up, creating a bottleneck in discovery. BPCells breaks this barrier by democratizing access to high-performance analysis—no need for supercomputers or massive cloud budgets.
Other papers that peeked my interest and were added to the purgatory of my “to read” pile
Benchmarking spatial transcriptomics technologies with the multi-sample SpatialBenchVisium dataset: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03543-4
Quantitative measurement of phenotype dynamics during cancer drug resistance evolution using genetic barcoding: https://www.biorxiv.org/content/10.1101/2025.03.26.645251v1
SOAPy: a Python package to dissect spatial architecture, dynamics, and communication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03550-5
The tumor microenvironment is an ecosystem sustained by metabolic interactions: https://www.cell.com/cell-reports/fulltext/S2211-1247(25)00203-7?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2211124725002037%3Fshowall%3Dtrue
Causal machine learning for single-cell genomics: https://www.nature.com/articles/s41588-025-02124-2
Quantitative characterization of tissue states using multiomics and ecological spatial analysis: https://www.nature.com/articles/s41588-025-02119-z
CREsted: modeling genomic and synthetic cell type-specific enhancers across tissues and species: https://www.biorxiv.org/content/10.1101/2025.04.02.646812v1
Defining effective strategies to integrate multi-sample single-nucleus ATAC-seq datasets via a multimodal-guided approach: https://www.biorxiv.org/content/10.1101/2025.04.02.646871v1
HyDrop v2: Scalable atlas construction for training sequence-to-function models: https://www.biorxiv.org/content/10.1101/2025.04.02.646792v1
In-silico biological discovery with large perturbation models: https://arxiv.org/abs/2503.23535
Thanks for reading.
Cheers,
Seb.