Artificial Intelligence Aided Design Of Peptides With Custom Secondary Structure Motifs And Reduced Amino Acid Alphabets

editorAstrobiology4 hours ago6 Views

Artificial Intelligence Aided Design Of Peptides With Custom Secondary Structure Motifs And Reduced Amino Acid Alphabets

Model accuracy in terms of predicted DSSP secondary structure. A. DSSP sequence percent identity of predicted sequences to target, input, secondary structure. B. Example target structures of low (green) medium (orange) and high (blue) structure complexity proteins (top) displayed in ribbon format. Histograms (bottom) of results for DSSP sequence percent identity of predicted sequences to target, input, secondary structures for low404 (green) medium- (orange) and high- (blue) complexity protein targets. C. DSSP sequence percent identity for predicted sequences across protein sequence length and alphabet size. –biorxiv.org

Proteins are highly diverse functional polymers where the specific sequence of amino acids, selected from a standard genetically-encoded alphabet of twenty (C20), determines the structure and ultimately the function of the resulting folded protein.

This standard alphabet has been identified to be non-randomly distributed in physicochemical properties crucial to both structure-formation and function, often referred to as coverage theory.

While machine learning models have drastically improved protein structure prediction, protein design has yet to have similar development. Here we therefore bridge contemporary biological theory with recent advancements in artificial intelligence (AI) to develop and evaluate a generative AI protein design model, trained on hundreds of thousands of proteins within the RSCB PDB, for custom secondary structure motifs using reduced amino acid alphabets.

Results indicate an overall success in designing novel proteins with desired secondary structure motifs for a broad range of amino acid alphabets. Interestingly this tool often captures the full three-dimensional tertiary structure of a target protein despite training only on physicochemical sequence space and DSSP secondary structure.

The development of this model advances research across multiple disciplines, from general scientific AI/ML architecture development to protein design for biotechnology, astrobiology, and early-Earth evolutionary biology.

Major components of the bLSTMa encoder-decoder model architecture. Detailed architectures of the Encoder block, primarily made up of LSTM encoder layers and multi-head self-attention (bottom right) and model head, where the Decoder output is separately fed through a classifier and continuous value sequence to predict sequences and their associated properties (top right). — biorxiv.org

Astrobiology, Genomics,

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Recent Comments

No comments to show.
Join Us
  • Facebook38.5K
  • X Network32.1K

Stay Informed With the Latest & Most Important News

[mc4wp_form id=314]
Categories

Advertisement

Loading Next Post...
Follow
Search Trending
Popular Now
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...