Machine learning (ML) and AI approaches have revolutionized the way drug discovery is performed, shifting the process from resource-intensive and time-consuming screening campaigns to a predictive, hypothesis-driven design approach that is faster and less resource-intensive. AI methods are particularly relevant to the discovery and design of cyclic peptides, as macrocyclic peptides have a combinatorial explosion in the number of possible structures, including ring size, stereochemistry, backbone modifications, non-canonical amino acids, etc., which presents a design space that is both too large to search exhaustively in a wet lab setting and well-structured for algorithmic exploration. Additionally, since cyclic peptides are often determined by their precise three-dimensional structures, the information content about the relationship between structure and activity can be represented as a set of molecular descriptors, and a model can be trained to understand hidden patterns between these properties and a desired chemical or biological property, including target binding, protease resistance, membrane permeability, and more.
AI-assisted discovery has the potential to enable an entirely new approach to the development of therapeutic peptides. It can replace a process of laborious trial and error with one of predictive design. AI has proven particularly effective for the design of macrocyclic peptides, as the constrained structure and diverse chemistry of cyclic peptides create a highly multi-dimensional and combinatorial search space that is beyond human intuition. ML approaches have a natural advantage for target discovery in cyclic peptides, as their rigid ring structure limits the conformational space and confers a high tolerance for diverse modifications, allowing the derivation of structure-activity relationships from small datasets. The clear geometric structure of cyclic peptides makes encoding into molecular descriptors possible which ML models utilize to predict the outcomes of different ring sizes, stereochemistry and backbone changes on binding affinity metabolic stability and cell permeability. Shifting from traditional screening to predictive discovery also addresses a major bottleneck in peptide development, as it was previously highly time-consuming and expensive to synthesize and test individual library members in a stepwise optimization process. AI platforms instead allow for millions of cyclic peptide sequences to be screened computationally before experimental validation, selecting candidates with the most ideal profiles across multiple parameters. Furthermore, machine learning algorithms trained on diverse datasets that incorporate structural biology, pharmacokinetics, and clinical outcomes can identify non-linear patterns between peptide sequence and therapeutic characteristics, uncovering new optimization strategies that may not be apparent through traditional rational design. In short, AI has the potential to change the role of medicinal chemists from synthesists to designers, in which computational predictions inform experimental design, and bench efforts are concentrated on confirming high-confidence models rather than on generating random analogs.
AI has been a major agent of change in drug discovery. One of its most important effects, arguably, is a general move away from the older, labor-intensive and slower trial-and-error screening-based approaches, towards more targeted, hypothesis-based, and, thus, efficient "drug candidate discovery with limited resources". Automation is a natural fit for cyclic peptide drug discovery in particular, as the macrocyclic sequence space is very large: The design space for cyclic peptide drugs expands exponentially when combining different ring sizes with variations in chirality and backbone modifications or non-canonical amino acids. This property makes exhaustive experimental sampling unfeasible, but it can be navigated algorithmically with AI. In addition, cyclic peptides are a prime target for AI because the mapping from their sequences to activity is often dependent on more rigid 3D structures, which allows for more structural descriptors to be generated for machine learning and thus for models to "capture" the relationship between those and functional properties such as binding, proteolytic stability, or membrane permeability. In contrast to linear peptides, which can adopt almost any conformation and have a highly complex conformational space, macrocycles are topologically more constrained and have fewer possible conformations, which should make the computational structure prediction problem more tractable, with adequate remaining diversity.
ML algorithms can also be employed to predict folding and conformation of cyclic peptides. Supervised learning models trained on large datasets of known peptide structures can predict three-dimensional conformations of cyclic peptides. The predictive capability of these models is derived from the ML algorithms' ability to recognize complex patterns and relationships within the dataset. This allows researchers to predict the folding and interaction patterns of cyclic peptides with their targets. Tools like AlphaFold can also be employed to predict structures of cyclic peptides. AlphaFold was first developed to predict protein structure. However, with some adjustments to the tool, it can also be applied for cyclic peptides to provide high-confidence predictions. AlphaFold has been used to generate cyclic peptides against the HIV gp120 trimer, where its use led to enhanced control and precision in the structure of the cyclic peptides.
Fig.1 Peptide modeling requires synergy between multiple computational techniques and experiments.1,5
AI is also being applied to the prediction of binding affinity and docking of cyclic peptides to their targets. Machine learning algorithms can be used to predict the binding affinity of cyclic peptides to specific targets. Neural networks, a type of ML algorithm, have been used to learn the features that are important for binding affinity from large datasets, and use this information to design cyclic peptides with optimized binding properties. There have been several case studies in the use of AI for designing cyclic inhibitors for kinases and GPCRs. In one such case study, a research group used an AI-driven generative model to design cyclic peptides that could mimic the natural helix-mediated interaction between MDM2 and p53, a key player in cancer biology. The designed peptides were synthesized and tested, and several of them exhibited promising binding affinities and inhibitory effects.
Generative AI has particularly influenced the discovery of cyclic peptides. Cyclic peptides have attracted interest in recent years as the cyclic scaffold enables reduced conformational heterogeneity compared to linear peptides, while increased protease stability can be leveraged and conformational preorganization with constrained geometries may allow for more specific target engagement and selectivity profiles. Unlike small molecules, de novo design of cyclic peptides has historically been less productive in terms of lead optimization than template-based design. However, as the sequence space of cyclic peptides can be encoded as input to a generative model, the latent space has been leveraged to propose novel macrocycles that have been successfully optimized for unvalidated targets. In particular, this has led to generative AI being applied to discovery against so-called "undruggable" targets with unstructured protein surfaces or no known ligands with a focus on the design of protein–protein interactions. The macrocyclic constraints of a cyclic peptide more tightly define the geometry of a compound than an unconstrained peptide, and since a cyclic peptide is shorter and more preorganized than an unconstrained one, the prediction of its 3D structure is more tractable than that of unconstrained peptides. Thus, constrained peptides were considered a tractable first case to demonstrate capabilities for generative modeling.
Generative AI models like generative adversarial networks (GANs) and transformer architectures have been used to design novel cyclic peptide sequences. These models are trained on large datasets of known peptide sequences and their associated properties. The model learns the underlying patterns and relationships between sequences and properties and can then generate new, novel sequences with desired properties. GANs are composed of two neural networks: a generator and a discriminator. The generator is trained to create high-quality, chemically valid peptide sequences, while the discriminator is trained to distinguish between real and generated sequences. The two networks are trained in competition with each other, with the generator continuously improving its output to better fool the discriminator. Transformer architectures, on the other hand, use self-attention mechanisms to learn long-range dependencies within the sequence data. This allows the model to generate structurally and functionally diverse cyclic peptides. These generative models have been used to design cyclic peptides targeting "undruggable" targets like protein–protein interactions (PPIs) and membrane proteins. For example, a transformer-based model was used to design cyclic peptides that could mimic natural helix-mediated MDM2: p53 interactions, and the most potent peptides were synthesized and experimentally validated for binding and inhibition.
A variety of reinforcement learning (RL) frameworks have been applied for cyclic peptide sequence optimization. RL aims to train an RL agent to perform sequential decision-making. The agent is trained by a reward system that provides signals based on whether the chosen action achieves the desired outcome. RL can be employed to finetune the generative model of a peptide design, providing a trained agent to bias the design generation towards desired pharmacological properties. The trained agent is then optimized using a carefully designed reward function to take into account a chemical validity and property improvement trade-off. The resulting predictive feedback loop between activity prediction and design modification allows the agent to autonomously sample diverse sequence variants and iteratively refine the predicted property distributions.
PepThink-R1 is an interpretable cyclic peptide optimization framework that combines LLMs, chain-of-thought (CoT) supervised fine-tuning, and RL. It explicitly reasons about monomer-level modifications during sequence generation. PepThink-R1 is then demonstrated to allow users to efficiently design interpretable peptides with improved lipophilicity, stability, and exposure. These initial experiments show notable increases in property satisfaction rates over a state-of-the-art baseline, which represents a promising step towards accelerating peptide design cycles. RL can also be used for peptide sequence optimization for multiple properties (such as cell permeability, antimicrobial activity, and binding affinity). The use of techniques such as policy gradient and reward shaping, allow the model to balance trade-offs between multiple, potentially conflicting therapeutic goals for the design of multi-criteria meeting peptides.
The data infrastructure and training pipeline lay the foundation of AI discovery efforts in cyclic peptides, as they largely influence the quality of predictions from trained models and the generalization of models to novel chemical spaces. A well-designed machine learning model structure requires an adequate amount of high-quality training data that accurately reflects the complexity of relationships between the peptides' sequence, structure, and function. However, obtaining such experimental data is costly and time-consuming. In contrast to the millions of curated compounds and their corresponding bioactivity profiles found in small molecule databases, cyclic peptide databases are relatively limited in size due to the complexity of macrocycle synthesis and the level of expertise needed for macrocycle preparation. The paucity of data also makes the design and training of deep learning models challenging since these require substantial data to learn complex patterns and not overfit to scarce information. Data infrastructure is therefore oriented to aggregate all these data sources including phage display screening data, mRNA display enrichment data, structural biology databases, and drug-like properties to form a unified training set that encompasses the maximum amount of chemical and biological diversity. Data curation in this process also requires careful quality control to maintain data consistency, as experimental protocols may vary across different labs, assay conditions, and measurement techniques, introducing noise that can negatively impact model training. Additionally, the representation of cyclic peptides in machine learning algorithms requires specialized featurization methods that can account for not only sequence information but also the topological constraints, stereochemistry, and ring closure chemistry, which are not considerations for linear peptides or small molecules. Data infrastructure development also faces the challenge of annotating negative data points, such as sequences that do not bind or have poor stability, which are not well-documented in literature but are crucial for training models to discriminate between successful and unsuccessful peptides. To address these data limitations, efforts have been made to develop data augmentation pipelines that can generate synthetic but realistic peptide sequences to expand training datasets without violating chemical principles. The integration of diverse data sources into a cohesive data infrastructure ultimately sets the stage for the predictive capabilities of AI-driven discovery platforms, as models are only as insightful as the data used to train them.
Fig. 2 A flowchart depicting the application of deep learning in peptide drug design.2,5
Training data availability and quality are critical factors in the design of generative models for cyclic peptides. High-throughput experimental techniques such as phage display and mRNA display have been used to generate large-scale datasets that capture the structural and functional diversity of peptides. These datasets, which include information on peptide sequences, their binding affinities, and functional activities, provide a rich source of training data for AI models. Integrating data from phage/mRNA display with computational models can improve training by providing diverse and high-quality data, which is essential for model learning and generalization. However, data scarcity can be a challenge in some cases. Transfer learning has been successfully used to address this issue. Transfer learning involves using a model pre-trained on related tasks or larger datasets and fine-tuning it for the specific domain of cyclic peptide design. This approach can significantly enhance the model's generalization and performance, even with limited data availability. For example, models pre-trained on large databases of protein structures can be fine-tuned with smaller datasets specific to cyclic peptides, improving their predictive accuracy and utility in cyclic peptide design.
Multi-parameter optimization in the design of cyclic peptides involves simultaneously optimizing multiple properties or objectives. This could include maximizing potency, solubility, and stability, among other desirable drug-like properties. AI models, such as machine learning and deep learning algorithms, can be used to perform multi-parameter optimization in the design of cyclic peptides. The model can be trained on datasets that include information on multiple parameters, such as potency, solubility, and stability. The model can then identify patterns and relationships in the data that are predictive of these properties and use this information to guide the design process. For example, a deep learning model could be trained to predict changes in binding affinity, solubility, and stability based on changes in peptide sequence, allowing researchers to rapidly explore the design space and identify peptide sequences that optimize multiple properties simultaneously. Reinforcement learning can also be used to perform multi-parameter optimization, with the model receiving rewards based on improvements in multiple parameters.
The convergence of AI with automated synthesis platforms has led to the emergence of closed-loop systems that integrate AI-driven design, synthesis, and testing. These systems employ AI models to predict the most promising peptide sequences and synthesis conditions. Subsequently, the proposed peptides are synthesized automatically, and the resulting compounds are tested using high-throughput screening methods. The data obtained from these experiments are then fed back into the AI model, allowing it to refine its predictions and improve the overall cycle. This feedback loop significantly speeds up the process of discovering and optimizing cyclic peptides. For instance, AstraZeneca has unveiled an automated synthesis platform that amalgamates AI-driven molecular generation and synthesis prediction. This innovation expedites the production and evaluation of focused libraries of molecules pertinent to drug discovery. The integration of AI and automation culminates in a closed-loop system that not only truncates the time to discovery but also augments the precision and efficiency of peptide design. This collaborative approach between AI and laboratory automation empowers scientists to explore an expanded chemical space within a condensed timeframe, potentially uncovering more promising therapeutic candidates. Currently, several startups, specializing in AI–lab automation, are emerging with the goal of advancing research on cyclic peptides. These companies are at the forefront of integrating AI-driven design with robotic synthesis and testing capabilities. For example, Allot, a startup founded in 2020, is developing agents powered by artificial intelligence to pinpoint health inequalities and optimize the allocation of healthcare resources.
Despite these advancements, certain challenges in AI-driven cyclic peptide design persist. One such issue is data bias, where AI models can only be as good as the data they are trained on. Biased or incomplete datasets can result in suboptimal or inaccurate predictions. Ensuring data diversity and representativeness is crucial for developing reliable AI models. Another challenge is explainability; the complex nature of AI models often makes it difficult to understand how predictions are generated. This lack of transparency can impede trust in AI-driven discoveries and complicate regulatory approvals. Model validation is also essential, as AI predictions must be rigorously tested against experimental data to ensure their reliability. Overcoming these challenges will require more comprehensive and representative datasets, interpretable AI models, and robust validation frameworks. Collaboration between AI researchers, domain experts, and regulatory bodies will be necessary to address these issues and ensure the responsible deployment of AI in cyclic peptide research. The next evolution in cyclic peptide research will likely be driven by the emergence of autonomous discovery platforms. These platforms, which seamlessly integrate AI, robotics, and high-throughput screening, will enable the creation of self-optimizing workflows. Capable of autonomously identifying knowledge gaps, designing experiments, executing synthesis and testing, and validating results, they will significantly accelerate the pace of scientific discovery. This automation will reduce the time and resources required to develop new therapeutics. For instance, autonomous chemistry laboratories are currently under development. These labs will conduct targeted robotic experimentation to refine models and generate high-quality data with little to no human involvement. As robotic throughput continues to improve and multimodal data integration matures, these platforms will likely become instrumental in advancing cyclic peptide research. By facilitating continuous learning and adaptation, they will lead to more efficient and effective discovery processes. The development of autonomous discovery platforms is a transformative step towards a streamlined and innovative future in cyclic peptide therapeutics.
AI will enable a shift in cyclic peptide R&D from serendipity-based trial-and-error processes to precision design. The goal will be to identify the most promising therapeutic candidates in silico prior to laboratory testing. This should also remove the need for costly and time-consuming screening campaigns and guide de novo design to enable discovery of macrocycles with desired properties in a targeted way. In addition, with generative AI, reinforcement learning, and multi-parameter optimization, the simultaneous optimization of potency, metabolic stability, and permeability should become a reality in order to consider the compromises needed when developing peptide drugs. If this works for cyclic peptides, it will be possible to develop fully autonomous platforms for cyclic peptide discovery from AI to automated synthesis and HTS as well as providing feedback to the AI platform to self-optimize. In this way, with AI-based drug design and precision in mind, the generation and development of personalized medicines with cyclic peptides should also become more and more targeted.
AI is redefining how cyclic peptides are conceived, modeled, and validated. Our AI-driven discovery engine integrates computational design, generative modeling, and automated prioritization to accelerate your innovation pipeline.
Our capabilities include:
Partner with us to design smarter, more effective cyclic peptide candidates powered by next-generation AI technologies.
References