

Relationships between amino acid composition, physicochemical properties, and proteome elemental composition across cellular and viral life. (A) Heatmap showing Spearman rank correlations (ρ) between amino acid frequencies and normalized elemental composition for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), sulfur (S), and selenium (Se) across cellular domains and viral realms. Amino acids are shown on the yaxis using standard three and one-letter abbreviations. (B) Heatmap showing Spearman correlations between aggregated amino acid physicochemical property classes and elemental composition across the same groups. Properties include aliphatic, aromatic, non-polar, polar, charged, basic, and acidic residues. Colors indicate correlation strength and direction from negative (yellow) to positive (blue), with gray cells indicating absent values. — q-bio.PE
Proteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved.
One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition.
This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life.
To investigate the evolutionary origins of this pattern, we compare modern proteomes with multiple independent reconstructions of the Last Universal Common Ancestor (LUCA) and with synthetic reduced-alphabet proteomes generated from primordial amino acid alphabets. LUCA proteomes occupy the same constrained elemental composition space observed in modern Bacteria and Archaea, whereas reduced primordial-like alphabets systematically generated alternative elemental regimes outside the modern range despite retaining high sequence similarity to extant proteins.
Reduced alphabets disrupt fold space and reorganize relationships between elemental composition and predicted protein structural organization. Our results suggest that constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution and may have contributed to the selection and stabilization of the modern amino acid alphabet.
L. Felipe Benites, Louie Slocombe, Sara I. Walker
Subjects: Biomolecules (q-bio.BM); Populations and Evolution (q-bio.PE)
Cite as: arXiv:2605.19333 [q-bio.BM] (or arXiv:2605.19333v1 [q-bio.BM] for this version)
https://doi.org/10.48550/arXiv.2605.19333
Focus to learn more
Submission history
From: Sara Walker
[v1] Tue, 19 May 2026 04:14:11 UTC (2,924 KB)
https://arxiv.org/abs/2605.19333
Astrobiology,






