Weekly reads 11/05/26
How cells communicate, adapt, and age across space and time
This week’s reads explore how cellular states, spatial organization, and hidden layers of biology shape disease progression across cancer, aging, and tissue biology. In colorectal cancer, the potential to metastasise emerges not from additional driver mutations, but from eversible MAPK-high/WNT-low chromatin state that can altered with KRAS inhibition. Multiple new methods push spatial transcriptomics past static neighbourhood maps: InterScale separates local from tissue-wide interaction programs, while CellNeighborEX v2 detects context-specific communication signals directly from Visium data without relying on ligand–receptor databases. In other works, we see the TransCODE Consortium’s systematic effort to catalog non-canonical microproteins and “peptideins,” evidence that hematopoietic stem-cell dormancy and not apoptosis is the key safeguard limiting mutational accumulation during aging, a UK Biobank analysis linking both short and long sleep duration to accelerated biological aging across multiple organs, while BARseq3 demonstrates how transcriptomics, translatomics, and cellular lineage barcodes can now be integrated in the same tissue section.
Preprints/articles that I managed to read this week
A high-MAPK, low-WNT cell state drives metastatic dissemination in colorectal cancer
Heinlein et al. Nature Cancer (2026). 10.1038/s43018-026-01155-w
The paper in one sentence
Serial orthotopic passaging of CRISPR-engineered mouse colon organoids selects for a highly metastatic MAPK-high, WNT-low transcriptional state driven by copy number gains in MAPK pathway genes and chromatin remodeling at AP-1 motifs, which is reversible by KRASG12D inhibition.
Summary
The authors generated an immunocompetent mouse model of metastatic colorectal cancer by sequentially introducing mutations in Apc, KrasG12D, Trp53, and Smad4 (AKPS) into small intestinal organoids. Parental organoids formed primary tumors but rarely metastasized in C57BL/6N mice. To enhance metastatic competence, they performed five rounds of serial orthotopic passaging: isolating liver metastases, expanding cells in vitro, and re-injecting into new mice. After five passages (P5), the resulting line (m484) showed high metastatic frequency to liver and lung. Whole-exome sequencing revealed no new driver mutations but stepwise increases in copy number alterations, particularly amplifications on chromosomes 6, 15, and 17 encompassing MAPK pathway genes (Kras, Braf, Raf1, Mapk11–14). Bulk RNA-seq on sorted EpCAM+ tumor epithelial cells showed elevated MAPK target genes (e.g., Spry4, Dusp4) and suppression of WNT targets (e.g., Lgr5, Smoc2). ATAC-seq on sorted tumor cells revealed increased chromatin accessibility at AP-1 motifs (FRA1, FOS, JUNB) and decreased accessibility at TCF/LEF motifs in metastatic cells. Integrating ATAC-seq and RNA-seq with BETA (Binding and Expression Target Analysis) identified direct target genes, including Emp1 (a metastasis marker) controlled by AP-1. Treatment with the KRASG12D inhibitor MRTX1133 in vivo reversed the MAPK-high/WNT-low state, reduced Emp1 expression, and suppressed liver and lung metastases. Human CRC patient data (AVANT and CALGB cohorts) showed that a high-MAPK/low-WNT gene signature is associated with shorter overall survival.
Personal highlights
Serial orthotopic passaging in immunocompetent mice: rather than using immunodeficient hosts, the authors performed five rounds of injecting organoids into the colon of C57BL/6N mice, isolating liver metastases, and re-expanding them. This selected for metastatic competence without introducing new driver mutations, yielding a reproducible model (P5 line m484) that retains immune system interactions.
Copy number gains as drivers of MAPK pathway activation: WES revealed that metastatic P5 organoids acquired amplifications of chromosomes 6, 15, and 17, leading to increased copy numbers of Kras, Braf, Raf1, and multiple Mapk genes. These amplifications correlated with increased mRNA and protein-level MAPK activity, despite no additional point mutations in the pathway.
ATAC-seq reveals AP-1 motif opening and TCF/LEF closure: compared to non-metastatic P1 tumors, P5 tumor epithelial cells showed increased chromatin accessibility at AP-1 transcription factor binding sites (FRA1, FOS, JUNB) and reduced accessibility at WNT-associated TCF/LEF motifs, establishing a chromatin landscape permissive for MAPK-driven gene expression.
BETA integration of ATAC-seq and RNA-seq identifies direct regulatory targets: using Binding and Expression Target Analysis (BETA), the authors linked AP-1 motifs to upregulated genes in P5 cells (e.g., Emp1) and TCF/LEF motifs to downregulated genes in P1 cells (e.g., Smoc2, Nkd1). The Emp1 locus showed increased accessibility at predicted AP-1 binding sites, validated by ENCODE ChIP-seq data showing BATF, JUNB, and FOS binding at the human EMP1 promoter.
KRASG12D inhibition reverses the metastatic transcriptional state: treatment with MRTX1133 (30 mg/kg twice daily) in mice with established P5 tumors reduced MAPK target gene expression (Dusp4, Emp1), increased WNT target expression (Smoc2), and shifted the transcriptome toward the non-metastatic P1 state (PCA, PC1 dimension). Lung metastases showed greater sensitivity than liver metastases, suggesting tissue-specific modulation of pathway activity.
Why should we care?
This study provides a methodologically careful demonstration that metastatic competence in CRC can arise without new driver mutations, instead through copy number gains and chromatin remodeling that tip the balance between MAPK and WNT signaling. For cancer biologists, the serial orthotopic passaging approach in immunocompetent mice offers a tractable system to study metastasis while preserving immune interactions, unlike tail-vein or intrasplenic injection models that bypass dissemination. Importantly, the authors show that KRASG12D inhibition reverses the pro-metastatic chromatin state, but also note that WNT reactivation occurs as a form of adaptive resistance, explaining why KRAS inhibitors have limited efficacy in CRC and suggesting that combination with WNT pathway inhibitors may be needed. The main limitations: the model uses small intestinal rather than colonic organoids, and the human survival analysis is retrospective and based on gene signatures rather than direct measurement of the described cell state.
InterScale reveals multi-scale cellular interaction programs in spatial transcriptomics
Drummer et al. bioRxiv (2026). 10.64898/2026.05.07.723456
The paper in one sentence
InterScale integrates a graph convolutional network (local) and a transformer encoder (global) to jointly model short‑range and tissue‑scale cellular interactions from spatial transcriptomics data, with separate linear decoders and attention‑based interpretation to identify scale‑specific gene programs and directional communication.
Summary
InterScale is a modular framework for spatial transcriptomics that explicitly separates local neighborhood information from global tissue context. The input is a gene expression matrix and a spatial adjacency graph (e.g., radius‑based or hexagonal grid). A local component (default: two‑layer GCN) aggregates information from k‑hop neighbors to produce a local embedding Hlocal. This embedding, together with a CLS token, is passed to a transformer encoder whose attention mask is set to the inverse of the adjacency matrix, allowing attention between non‑neighboring cells to capture long‑range interactions, yielding a global embedding Hglobal. Two separate linear decoders reconstruct the masked gene expression from Hlocal and HglobalHglobal, respectively. Training uses a self‑supervised masked node prediction objective (scaled cosine error or Gaussian negative log‑likelihood). After training, InterScale provides three levels of interpretation: (1) tissue level: CLS token attention reveals which cell types contribute to condition prediction; (2) cell level: net attention flow ( A−AT ) and gradient‑based relevance aggregation produce directional sender‑receiver maps; (3) gene level: standardized decoder loadings (weights scaled by embedding and gene standard deviations) identify genes that are preferentially reconstructed by the local vs. global decoder, which are then linked to biological pathways via enrichment analysis. The authors validate on a SHH‑induced neural organoid dataset (local genes: neuronal differentiation; global genes: progenitor regulators) and a type‑1 diabetes pancreas CosMx dataset (local: oxidative stress, global: PI3K‑AKT signaling). Benchmarking against GCN‑only, transformer‑only, and competing methods (AMICI, Steamboat) shows that InterScale improves condition classification and reduces sensitivity to graph radius selection.
Personal highlights
Explicit separation of local and global embeddings via dual decoders: instead of merging multi‑scale signals into a single latent space, InterScale trains two linear decoder, one on Hlocal(GCN output) and one on Hglobal(transformer output). The reconstruction loss enforces that different spatial scales explain different parts of the gene expression variance, enabling downstream attribution of genes to local vs. global programs.
Inverse adjacency mask for global attention: unlike standard graph transformers that use full attention, InterScale masks out edges that exist in the spatial graph (i.e., it forces attention only between cells that are not direct neighbors). This design choice ensures that the transformer cannot simply recapitulate local information and must learn genuinely long‑range dependencies.
Standardized gene loadings from linear decoders: by rescaling decoder weights Wfe by σ(He)/σ(Xf), the method produces interpretable coefficients (change in gene expression in standard deviations per one‑SD change in latent dimension). This allows ranking of genes by their contribution to local vs. global embeddings without requiring cell‑type annotations or ligand‑receptor priors.
Net attention flow and gradient‑based relevance aggregation: Raw attention scores are known to be unreliable as explanations. InterScale uses self‑attention relevance propagation (integrating attention maps with gradients) to collapse multi‑head attention into a single matrix, then computes net flow A−AT and normalizes by window‑wise maximum absolute flow. Directional sender‑receiver summaries (dot plots) are derived by averaging net flow across cell types, with dot size representing consistency (reciprocal standard deviation).
Modular architecture with replaceable components: the local component can be swapped for other GNNs (GIN), expression embeddings (scVI), or precomputed spatial domains (CellCharter, BANKSY). The global transformer can be replaced by sparse or linear attention variants. This design allows the framework to adapt to different data regimes (e.g., very large datasets) without retraining the entire pipeline.
Why should we care?
InterScale addresses a fundamental limitation of existing methods: most tools either look only at immediate neighbors (GNNs, niche models) or treat all cells as equally connected (standard transformers), but rarely separate these scales in a way that is both trainable and interpretable. The dual‑decoder architecture with inverse adjacency masking is a clean, practical solution to force the model to learn multi‑scale representations rather than collapsing to a single dominant scale. The standardized gene loading approach provides a hyperparameter‑free, cell‑type‑agnostic way to identify which genes are driven by local vs. global signaling, something that typically requires manual annotation or prior knowledge. The attention flow analysis, while still correlative, offers a more stable alternative to raw attention scores by focusing on directionality. The main limitations are: (1) sliding windows break interactions across window boundaries; (2) the method does not infer causal directionality (e.g., A → B via C); (3) scale interpretation is platform‑dependent (what is “global” in Visium may differ from CosMx)
Identifying context-specific cell-cell interaction genes without ligand-receptor databases from spatial transcriptomics
Kim et al. bioRxiv (2026). 10.64898/2026.05.08.723913
The paper in one sentence
CellNeighborEX v2 detects genes upregulated by cell-cell interactions from low‑resolution Visium data by comparing observed expression to scRNA‑seq‑derived expectations, then uses a hybrid statistical test and regression to infer context‑specific interaction genes and their source‑neighbor cell‑type pairs without relying on predefined ligand‑receptor databases.
Summary
CellNeighborEX v2 is a computational framework designed for low‑resolution spatial transcriptomics (e.g., 10x Visium) where each spot captures multiple cells. The method uses matched scRNA‑seq reference data to estimate “expected” expression per spot via cell2location deconvolution, then computes residuals (observed – expected). Positive residuals indicate potential cell‑cell interaction (CCI)‑driven upregulation. To identify context‑specific CCI genes (e.g., by spatial region or disease condition), the framework applies a hybrid statistical test: a permutation test across contexts (to control false positives) and a chi‑squared test within each context (to capture localized signals), combined via a Cauchy combination with empirically optimized weights (0.9 for chi‑squared, 0.1 for permutation). Detected genes are further processed with a two‑step regression: first a ridge‑regularized negative binomial model to estimate contributions of candidate source‑neighbor cell‑type pairs, then a linear model to isolate individual pair effects. The method infers directional interactions (source cell type expresses the CCI gene, neighbor modulates it) and can capture paracrine, contact‑dependent, and ECM‑mediated communication. Validation includes synthetic Visium data, pseudo‑Visium data aggregated from high‑resolution Slide‑seq, and real paired Visium/CosMx/Visium HD datasets from ovarian cancer, colorectal cancer, and mouse lymph node. Benchmarking against Niche‑DE and ligand‑receptor‑based methods (CellChat) shows improved precision and recall, particularly for non‑database genes
Personal highlights
Database‑free detection via residual modeling: instead of querying known ligand‑receptor pairs, CellNeighborEX v2 compares observed Visium expression to a null expectation derived from scRNA‑seq references and deconvolution. Genes with consistently higher expression in tissue than predicted from cell‑type composition alone are candidate CCI genes, covering canonical signaling, contact, and ECM‑mediated interactions without prior pathway knowledge.
Hybrid statistical test for context specificity: a permutation test (shuffling residuals across spots, 1,000 iterations) evaluates whether a gene’s upregulation is specific to a given spatial region or condition, while a chi‑squared goodness‑of‑fit test identifies localized deviations within a context. Cauchy combination (9:1 weight favoring chi‑squared) balances sensitivity and false‑positive control, as benchmarked on synthetic data.
Two‑step regression to infer directional cell‑type pairs: for each CCI gene, the method first uses a ridge‑regularized non‑negative negative binomial model to assess all candidate source‑neighbor pairs (from correlation‑filtered cell types). A second linear model isolates each pair’s contribution while adjusting for confounding interactions, producing Wald test p‑values for ranking.
Recovery of fine‑grained interactions from aggregated low‑resolution data: on pseudo‑Visium data (60 μm bins from 10 μm Slide‑seq), CellNeighborEX v2 recovered 92 of 102 (90%) previously validated contact‑dependent genes from mouse hippocampus and 33 of 34 from mouse liver cancer, demonstrating that interaction signals survive spatial downsampling and can be extracted from standard Visium.
Expanding the human proteome with microproteins and peptides from non‑canonical ORFs
Deutsch et al. Nature (2026). 10.1038/s41586-026-10459-x
The paper in one sentence
The TransCODE Consortium integrated 3.5 billion non‑HLA and 240 million HLA mass spectra, ribosome profiling, CRISPR screens, and a new evolutionary constraint metric (ORBL) to detect and classify 7,264 non‑canonical open reading frames (ncORFs) into a tiered system, introducing “peptidein” as a formal annotation category for translated products with indeterminate functional status.
Summary
This large‑scale collaborative effort set out to determine which of 7,264 GENCODE‑annotated ncORFs (including upstream, downstream, internal, and lncRNA‑derived ORFs) produce detectable microproteins. The authors built two PeptideAtlas resources: a non‑HLA build of 295 ProteomeXchange datasets (3.5 billion MS/MS spectra, mostly tryptic) and an HLA build (240 million spectra, no‑enzyme search). Stringent HUPO‑HPP criteria (≥2 unique peptides ≥9 aa, ≥18 aa coverage) and a decoy‑estimated protein‑level FDR <0.1% were applied. Only ~2.5% of ncORFs (183 out of 7,264) were detected in non‑HLA data, while 24.6% (1,785) were detected in HLA data – almost exclusively HLA‑I, with strong binding prediction concordance (NetMHCpan). To address the lack of amino‑acid conservation typical of ncORFs, the authors developed ORBL (ORF Relative Branch Length): a phylogenetic metric measuring conservation of start codon, stop codon, and reading frame across 116 placental mammals (or primates). ORBLv is the branch length fraction of species with a conserved ORF; ORBLq is the quantile of that score among size‑ and biotype‑matched untranslated ORFs, providing a measure of ORF‑level constraint independent of amino acid sequence. Using this, 30.4% of ncORFs showed high constraint (ORBLq >0.9). A tier classification system (1A: ≥2 non‑HLA peptides; 1B: ≥2 HLA peptides; 2A/2B: 1 peptide; 3: only HLA; 4: only Ribo‑seq; 5: in silico) was applied, with manual spectral and Ribo‑seq inspection. Only 15 ncORFs met tier 1A criteria for potential protein‑coding status; most were reclassified as “peptidein” – a new term for confidently detected translation products lacking sufficient evidence for a conventional protein‑coding gene. Functional CRISPR‑Cas9 screening across 8 cell lines targeting >2,000 ncORFs, combined with meta‑analysis of 25 screens, identified 51 pan‑essential ncORFs, including c10riboseqorf92 in the OLMA/LINC lncRNA, whose coding sequence rescued the knockout phenotype. The paper also includes targeted PRM validation, multi‑protease digestion experiments, and structural predictions (AlphaFold3, ESMFold), concluding with seven community consensus points on annotation guidelines.
Personal highlights
Two massive PeptideAtlas builds with stringent FDR control: the non‑HLA build (3.5 billion spectra, 1,172 experiments) and HLA build (240 million spectra, 592 experiments) used protein‑level FDR <0.1% and HUPO‑HPP criteria (≥2 unique peptides ≥9 aa, ≥18 aa coverage). Manual inspection of all ncORF PSMs (859 HLA spectra, 183 non‑HLA spectra) validated 88.7% of multi‑study HLA hits, but only 30/42 (71%) non‑HLA ncORFs with two peptides passed manual check.
ORBL: evolutionary constraint on ORFness, not amino acid sequence: unlike PhyloCSF (which scores amino‑acid conservation), ORBL quantifies conservation of start codon, stop codon, and reading frame across whole‑genome alignments. ORBLq normalizes against matched untranslated ORFs (same biotype, similar length), revealing that 30.4% of ncORFs (including 45.8% of uORFs) exhibit significant constraint, whereas only 2% have positive PhyloCSF scores. Detected HLA peptides were significantly enriched in high‑ORBLq ncORFs (P = 1.4×10⁻¹²).
Tier classification system for ncORF evidence: provisional tiers combine Ribo‑seq (all ncORFs have it by design), non‑HLA MS, and HLA MS. Final tiers after manual inspection: 15 tier 1A (meet HUPO‑HPP criteria in non‑HLA MS), 601 tier 1B (≥2 HLA peptides), 39 tier 2A (1 non‑HLA peptide), 1,059 tier 2B (1 HLA peptide). Only three tier 1A ncORFs were ultimately annotated as protein‑coding by GENCODE; the rest became “peptidein”.
Peptidein, a new annotation category for uncertain functional products: to resolve the paradox of confidently detected translation products that lack evidence for a conventional protein‑coding gene (e.g., only detected in cancer/immortalized cells, too short for HUPO‑HPP criteria, no known function), the consortium introduced “peptidein”. This formal category sits between “not detected” and “protein‑coding gene”. 121 initial peptidein annotations are provided, with full guidelines forthcoming.
Why should we care?
This paper is a landmark collaborative effort that finally brings methodological rigor to the “dark proteome” of non‑canonical ORFs. The main limitations are that the analysis is manual and labor‑intensive (not scalable for most labs), the peptidein concept may be interpreted as a “lesser” category rather than a legitimate status, and the field still lacks consensus on whether HLA‑presented peptides alone constitute proof of a stable protein. Nevertheless, the paper sets a new baseline for how to integrate MS, ribosome profiling, evolution, and functional genomics to systematically evaluate thousands of candidate ORFs
Dormancy, not apoptosis, restricts hematopoietic stem cell mutagenesis during aging
Fotopoulou et al. bioRxiv (2026). 10.64898/2026.05.09.724021
The paper in one sentence
Using clonal in vitro expansion of single LT‑HSCs followed by whole‑genome sequencing (≥30X coverage, VAF cutoff 0.3), genetic ablation of the intrinsic apoptosis pathway (Bak⁻/⁻ BaxΔ/Δ), a doxycycline‑chase H2B‑GFP label‑retention model, and NanoSeq on pooled HSCs, the authors show that apoptosis does not limit mutation accumulation during aging, whereas dormancy reduces the mutation rate ~3‑fold, and sterile inflammation (poly(I:C)) accelerates aging‑associated mutagenesis.
Summary
The study investigates how hematopoietic stem cells (HSCs) restrict mutation accumulation during normal aging in mice. The authors first established a sensitive pipeline: single LT‑HSCs were sorted, expanded in vitro to generate clonal colonies, and subjected to whole‑genome sequencing (WGS) with paired tail germline controls. To ensure accurate mutation calling, they performed a benchmarking experiment by downsampling a deeply sequenced clone (89X) to 10–80X coverage in triplicate, identifying that 30X coverage captures ~87% of confident SNVs and ~73% of confident indels, which was set as the minimum threshold. Variants acquired during in vitro expansion were filtered out using a VAF cutoff <0.3. Using this method, they confirmed an age‑associated increase in SNVs (44 mutations per genome per year) with mutational signatures (HSPC, SBS1, SBS5, SBS18) similar to human aged HSCs. To test the role of intrinsic apoptosis, they used Scl‑Cre‑ERT2 Bak⁻/⁻ Baxᶠˡ/ᶠˡ mice (tamoxifen‑induced deletion of Bax in HSCs already lacking Bak). At 8 months of age, LT‑HSCs from Bak‑BaxΔ/Δ mice showed no significant difference in SNV, indel, or SV burden compared to wild‑type controls, and mutational signatures were largely unchanged, indicating that apoptosis is not a major restriction mechanism during physiologic aging. To test the role of dormancy, they used Scl‑tTA H2B‑GFP mice with an 18‑month doxycycline chase. Dormant (label‑retaining) LT‑HSCs had significantly fewer SNVs than active (non‑retaining) cells from the same aged mice, with a ~3‑fold slower accumulation rate. Dormant cells also had lower SBS1 (cell‑division‑associated) and SBS18 (ROS‑associated) signature burdens. Finally, to test whether inflammation accelerates mutagenesis, they treated young mice with three rounds of poly(I:C) (TLR3 agonist). Because inflammatory exposure reduces colony formation efficiency, they used NanoSeq directly on pooled ~1500 LT‑HSCs per mouse, avoiding in vitro expansion bias. Poly(I:C)‑treated mice showed ~25% higher SNV burden than PBS controls, with an aging‑like mutational spectrum (enriched for HSPC signature).
Personal highlights
Benchmarking of coverage for single‑HSC mutation detection: by downsampling a 89X‑sequenced clone to lower depths (10–80X) in triplicate, the authors established that 30X coverage captures ~87% of confident SNVs and ~73% of confident indels. This empirical threshold is more rigorous than typical coverage used for clonal barcoding studies.
Clonal expansion plus VAF filtering to exclude in vitro artifacts: single LT‑HSCs were expanded in vitro to generate sufficient DNA for WGS. Variants with VAF <0.3 were filtered out, as they arise from divisions during culture (diluted from the original 0.5 VAF). This ensures that called mutations reflect in vivo aging, not culture‑induced errors.
Genetic ablation of intrinsic apoptosis (Bak‑Bax double knockout): using Scl‑Cre‑ERT2‑driven deletion of floxed Bax in a Bak⁻/⁻ background, the authors disabled the mitochondrial apoptosis pathway specifically in HSCs. This is a clean genetic intervention, and the absence of increased mutation burden challenges the long‑held dogma that apoptosis is a key gatekeeper against mutagenesis in stem cells.
Label‑retention model (H2B‑GFP) to isolate dormant vs. active HSCs from the same aged mice: an 18‑month doxycycline chase allowed prospective sorting of label‑retaining (dormant) and non‑retaining (active) LT‑HSCs from identical 22‑month‑old donors. This design controls for chronological age and environment, directly attributing lower mutation burden to the dormant state rather than inter‑individual variation.
NanoSeq for mutation burden in inflamed HSCs without clonal expansion: Poly(I:C) treatment impairs colony formation, making the standard clonal expansion method biased (only proliferating clones would be sequenced). NanoSeq, a duplex‑sequencing‑based method that generates consensus from both DNA strands, allowed direct enumeration of mutations in pooled ~1500 LT‑HSCs per mouse, circumventing expansion bias and confirming inflammation‑accelerated mutagenesis.
Why should we care?
The work convincingly demonstrates that dormancy and not apoptosis is the dominant protective mechanism against age‑related mutagenesis in HSCs. However mouse HSCs may differ from human HSCs in apoptosis dependency, and the inflammatory model uses a strong artificial agonist (poly(I:C)) rather than chronic infection.
Sleep chart of biological ageing clocks in middle and late life
The MULTI Consortium et al. Nature (2026). 10.1038/s41586-026-10524-5
The paper in one sentence
Using generalized additive models (GAMs) on ~500,000 UK Biobank participants, the study quantifies U‑shaped associations between self‑reported sleep duration and 23 multi‑organ, multi‑omics biological age gaps (BAGs), then integrates GWAS, genetic correlation, survival analysis, structural equation mediation, and Mendelian randomization to dissect genetic and environmental contributions, disease risks, and causal directionality.
Summary
This large‑scale epidemiological study investigates whether sleep duration has a nonlinear (U‑shaped) relationship with biological ageing across multiple organ systems and molecular layers. The authors curated 23 previously developed BAGs: 7 from MRI (brain, heart, liver, pancreas, spleen, adipose, kidney), 11 from plasma proteomics (ProtBAGs), and 5 from plasma metabolomics (MetBAGs). Sleep duration (field 1160, self‑reported hours per 24h) was restricted to 4–10h to avoid sparse extremes. For each BAG, they fitted a generalized additive model (GAM) with cubic regression splines (mgcv package), adjusting for age, sex, BMI, blood pressure, assessment centre, and disease status. The effective degrees of freedom (e.d.f.) of the smooth term quantified nonlinearity; the sample‑specific BAG minimum (optimal sleep duration) was derived from the spline curve. Nine BAGs showed significant U‑shaped associations (Bonferroni‑corrected P < 0.05/23), with optimal sleep ranging 6.4–7.8h depending on organ and sex. To test whether the observed associations are genetically driven, they performed GWAS (REGENIE) comparing short (<6h) vs normal (6–8h) and long (>8h) vs normal sleep in >300k individuals, identifying distinct genomic loci. Genetic correlations (LDSC) between sleep traits and 527 disease endpoints from FinnGen/PGC revealed broad systemic correlations for short sleep, but brain‑focused correlations for long sleep. Survival analyses (Cox proportional hazards) linked both short and long sleep to increased all‑cause mortality and incident disease endpoints. Mediation analysis using structural equation modelling (SEM; lavaan) tested whether MRI‑derived BAGs mediate the effect of sleep on two late‑life depression (LLD) subtypes. Short sleep showed direct effects on LLD, while long sleep acted predominantly through brain and adipose BAGs (62% mediated). Finally, two‑sample Mendelian randomization (five estimators: IVW, Egger, weighted median, simple mode, weighted mode) assessed reverse causality from 525 disease endpoints to sleep traits, finding no widespread causal effect (though pleiotropy sensitivity remained). Replication attempts in two smaller cohorts (BLSA, MESA) showed similar U‑shaped patterns but did not reach statistical significance.
Personal highlights
Generalized additive models with cubic splines to detect U‑shaped associations without prior assumption: the authors used GAM with cubic regression splines to flexibly model sleep duration vs. BAG, allowing the data to determine nonlinearity. The effective degrees of freedom (e.d.f.) of the smooth term quantifies curve complexity; e.d.f. >1 indicates nonlinearity. Optimal sleep duration was derived from the spline curve’s minimum, not from arbitrary cutoffs.
23 multi‑organ, multi‑omics biological age gaps (BAGs) developed via nested cross‑validation: BAGs were trained on pathology‑free controls using repeated holdout cross‑validation (50 repetitions, 80/20 split) with multiple algorithms (LASSO, support vector regressor, elastic net, neural network). Age bias correction was applied. This framework ensures minimal overfitting and provides interpretable “age gap” (biological age minus chronological age) for each organ/omics layer.
Binary GWAS for short and long sleep duration vs. normal sleep: rather than treating sleep as a continuous trait (which would obscure nonlinearity), the authors performed case‑control GWAS (REGENIE) comparing short (<6h) vs normal (6–8h) and long (>8h) vs normal. This design respects the U‑shaped relationship and identified distinct genetic architectures: short sleep associated with brain‑tissue enrichment (MAGMA), long sleep with multiple loci but less tissue specificity.
Structural equation mediation with organ‑specific BAGs as mediators: using the temporal ordering (sleep measured at baseline, MRI at follow‑up), the authors tested whether MRI‑derived BAGs mediate the sleep → late‑life depression pathway. The model included direct path (sleep → LLD) and indirect path (sleep → BAG → LLD), adjusted for covariates. This revealed that long sleep’s association with depression is largely indirect (e.g., brain BAG mediated 62% of the effect), whereas short sleep shows stronger direct effects.
Two‑sample Mendelian randomization with pleiotropy sensitivity analyses: to test whether disease causes sleep disturbances (reverse causality), they performed MR using 525 disease GWAS (FinnGen, PGC) as exposures and binary sleep traits as outcomes. Five estimators (IVW, Egger, weighted median, simple mode, weighted mode) were used, with heterogeneity tests (Cochran’s Q), MR‑Egger intercept for directional pleiotropy, MR‑PRESSO global test, and leave‑one‑SNP analysis. Most analyses did not support widespread causal effects of disease on sleep, though some pleiotropy biases were noted (e.g., depression to long sleep showed inconsistent estimates across estimators).
BARseq3: a modular system for integrating spatial multi-omics and cellular barcoding in single cells
Qi et al. bioRxiv (2026). 10.64898/2026.05.13.724900
The paper in one sentence
BARseq3 decouples barcode sequencing from spatial gene detection using independent rolling-circle amplification (RCA) libraries and sequential Illumina sequencing‑by‑synthesis, enabling modular combination of transcriptomics, translatomics, and cellular barcoding (e.g., viral lineage or connectivity barcodes) in fixed tissue sections at subcellular resolution.
Summary
BARseq3 is a modular in situ sequencing platform that separates the detection of cellular barcodes (e.g., random 30‑mer viral barcodes for neuronal tracing) from the measurement of other molecular modalities. The workflow consists of three independent modules: (1) Gene Module: hybridization of padlock probes (SNAIL for total mRNA or TRI for ribosome‑bound mRNA) followed by ligation, RCA, and crosslinking to the tissue; (2) Barcode Module: reverse transcription of barcode RNA, gap‑filling padlock hybridization, ligation, RCA, and crosslinking; (3) Sequencing Module: sequential Illumina sequencing‑by‑synthesis using orthogonal primers, first for gene IDs (encoded in padlock probes) and then for unknown barcodes. The two modules are experimentally independent, allowing any combination (gene only, barcode only, or both). The authors benchmark BARseq3 against BARseq2 (previous coupled method) using barcoded Sindbis virus in mouse motor cortex, showing significantly more gene amplicons per barcoded cell (p < 0.0001) with fewer probes per gene (4 vs 12) and lower probe concentrations. Specificity was validated using Pcp2 (Purkinje cell‑specific) and Malat1 (nuclear) probes, with BARseq3 achieving higher on‑target/off‑target ratios that increase with probe concentration. Simultaneous detection of transcriptome (SNAIL), translatome (TRI), and barcodes in the same cells is demonstrated, with no translatome signal when the ribosome‑binding splint probe is omitted. As a standalone spatial transcriptomics assay (1,745 genes), BARseq3 yields ~92 UMIs and ~71 genes per cell in mouse cerebellum, reproduces known cell types and marker expression, and shows high reproducibility across serial sections (Pearson r = 0.98). The method works on fresh‑frozen and PFA‑fixed tissue (with pepsin pretreatment) and across multiple species (zebra finch, frog, octopus).
Personal highlights
Decoupled barcode and gene libraries via independent RCA and orthogonal sequencing primers: unlike BARseq2 where barcode and gene readout are coupled, BARseq3 physically separates the two libraries. Gene Module amplicons are sequenced first using a gene‑specific sequencing primer; then the primer is stripped and a barcode‑specific primer is hybridized for subsequent cycles. This allows each module to be optimized independently and enables plug‑and‑play exchange of gene detection chemistries without redesigning barcode capture.
Modular architecture supporting multiple spatial omics in the same cell: the Gene Module can accommodate any hybridization‑based assay. The authors demonstrate parallel detection of total mRNA (SNAIL probes, adapted from STARmap) and translating mRNA (TRI probes, adapted from RIBOmap) alongside barcodes. SNAIL probes use a split‑padlock design; TRI probes add a ribosome‑binding splint probe that only circularizes when the mRNA is bound to a ribosome. Control experiments confirm that translatome signal requires the splint probe.
Improved sensitivity and specificity over BARseq2: with only 4 SNAIL probes per gene (vs 12 padlocks in BARseq2) and lower concentrations, BARseq3 produces ~2‑fold more gene amplicons per barcoded cell (mean ~38 vs ~18). On‑target signal for Pcp2 and Malat1 is concentration‑dependent and significantly higher than BARseq2, while off‑target signal remains low and not significantly different. This improvement likely comes from more efficient RCA and crosslinking chemistry.
High‑throughput barcode sequencing with Illumina chemistry: the Sequencing Module uses standard Illumina incorporation and cleavage reagents (from MiSeq kits) and four‑color imaging. Barcode signal‑to‑noise remains high across 10 sequencing cycles (Fig. 2D), with clear base calling. This enables de novo sequencing of unknown barcodes (e.g., random 30‑mers from MAPseq viruses) without prior sequence knowledge, unlike hybridization‑based barcode detection methods that require known barcodes.
Why should we care?
By separating barcode readout from gene readout into independent RCA libraries and orthogonal sequencing primers, the method avoids the complexity of designing padlock probes that simultaneously capture both features. This modularity means that any existing RCA‑based spatial assay (STARmap, RIBOmap, TEMPOmap) can be combined with any barcoding approach (viral barcodes for connectomics, genomic lineage barcodes, CRISPR sgRNA barcodes for perturbation screens) by simply running the two modules sequentially on the same section. For users, the key practical takeaway is that BARseq3 achieves higher sensitivity with fewer probes per gene than its predecessor, making probe design and synthesis more affordable. The demonstration of simultaneous transcriptomics + translatomics + barcodes in the same cells is technically impressive, though the biological utility of adding translatomics to spatial mapping is still emerging. The main limitations are the need for probe concentration optimization (not a one‑size‑fits‑all protocol) and the fact that barcode sequencing efficiency depends on barcode abundance (Sindbis virus gives high expression; low‑abundance barcodes may be harder to detect)
Other papers that peeked my interest and were added to the purgatory of my “to read” pile
Comprehensive Lineage Tracing Maps the Landscape of Cell Fate Decisions in Mouse Embryogenesis
scPlOver: inferring DNA content from amplification-free single-cell WGS using fragment overlaps
Optics-free spatial genomics for mapping mammalian brain aging by IRISeq
pyTrance finds co-localizing RNAs in subcellular spatial transcriptomics data
Epigenetic dysregulation and microenvironment remodeling in pancreatic cancer
Spurious correlation inflates performance in single-cell perturbation prediction
Ecotypes of triple-negative breast cancer in response to chemotherapy
scShapeBench: Discovering geometry from high dimensional scRNAseq data
Programmable synthetic cytokine receptors polarize macrophages to user-defined functional states
Whole-genome doubling drives immune evasion by silencing antigen presentation
DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics
Thanks for reading.
Cheers,
Seb.


