Designing proteins from scratch with computer science

Marc Baiget-Francesch highlights interesting developments in the field of protein drug design and explains how continual software improvements are speeding up the process.

PROTEINS ARE one of the most versatile biomolecules that exist. From structural functions – the most abundant class of proteins – to pathogen destruction (antibodies) and metabolic activities (enzymes), proteins are responsible for a wide array of functions. Consequently, protein malfunctions can create severe disorders in a host organism. Alzheimer’s and Parkinson’s diseases, for example, result from the presence of misfolded proteins;1 Becker Muscular Dystrophy and Crohn’s disease are caused by the production of an abnormally short-sized protein;2,3 and Phenylketonuria is the consequence of a missing protein.4

As proteins are encoded by genes, one of the most common approaches to tackle these kinds of disease is to focus on the defective genes. In the long-term view, this idea offers one of the most promising solutions; however, manipulating genes is rather complicated and has presented significant challenges so far, such as unwanted immune responses, complicated gene release, unstable expression, upstream processing and lack of sufficient facilities for viral vector production.5,6 Given these complications, dealing with proteins – while by no means easy – presents a more straight-forward solution: using proteins to deal with protein problems seems a logical approach.

Protein design has not always entailed designing new proteins from scratch. As with so many scientific fields, nature has always been a major source of inspiration. Antibodies are an example of this; in order to make antibodies that resemble the ones our bodies produce, antibodies from other animal species have been slightly modified, as is the case with chimeric and humanised antibodies (the latter being the most similar ones to human). Alemtuzumab and Mepozulimab are examples of two humanised antibodies that have reached the market. Alemtuzumab has been commercialised by Sanofi under the name of Lemtrada for the treatment of multiple sclerosis and Mepolizumab, from GlaxoSmithKlein, has been launched under the name of Nucala to treat eosinophilic asthma.7,8 While Nucala, first authorised in 2015, is still being monitored by the European Medicines Agency (EMA) to further assess its safety, Lemtrada, which received its first authorisation in 2013, has been restricted temporarily in April 2019 by the EMA while it investigates some unexpected side effects.9

The temporary restriction of Lemtrada shows that designing new molecules, even if just slightly modified, can be more complicated than it seems. However, the increasing advances of computer science have revolutionised this field. If mimicking natural proteins was the main focus for new protein design, the in silico approach is increasingly becoming the go-to mechanism for designing synthetic biomolecules.

Computer simulations, while perhaps not as accurate as in vitro or in vivo experimentation, facilitate the exploration of thousands of molecular interactions in a short amount of time,  saving significant sums of money and resources. With respect to antibodies, in 2017 a research group from the University of Texas in Austin, led by Dr Jennifer A Maynard, used computer science to design new antibody complementarity determining regions – the part responsible for the antibody-antigen interaction.10 Similar to humanised antibodies, only a small fragment of the antibody was changed. To design their sequences, Maynard’s group used PyMOL – an open-source molecular visualisation tool, which did not exist when Campath-1, the precursor of Alemtuzumab, was designed.11

…the efficacy of new drugs and reducing the time they take to reach the clinical phase is vital to bring new pharmaceuticals to the market as quickly as possible”

PyMOL allows the user to visualise the structure of small to large biomolecules and simulate interactions between different molecules.12 Aside from PyMOL, which is popular among molecular biologists, other software packages have been developed to aid the design of new biomolecules. In many cases, researchers use a combination of different software packages. EvoDesign, for instance, is a computational algorithm that is also used to design new proteins. From an initial protein scaffold, EvoDesign helps researchers identify protein families with similar three-dimensional (3D) structures and folds.13 This is a powerful tool with which to preview protein-protein interactions and protein folds of newly designed structures. One of its principal advantages is the use of evolutionary designs in contrast to physics-based approaches, which are less accurate at picturing atomic interactions and folds.14 In addition, the algorithm is continually being improved and new servers are created to enhance its functionality.15

Rosetta is another popular software among those in the field of synthetic biology and one of the most extensively developed softwares for de novo design. Rosetta facilitates 3D structure prediction of proteins, redesigns existing structures and models new proteins from scratch.16 However, there are few well-established protocols in the de novo design of proteins, so its success (which applies to all protein design software) relies mostly on the user’s knowledge of protein design principles. It is predominantly the combination of extensive protein science knowledge and use of bioinformatics tools that have sped up the process of de novo protein design.

Some research groups specialising in protein design have written papers about its principles, including David Baker and Daniel-Adriano Silva’s groups, both from the Institute for Protein Design at the University of Washington.17,18 In fact,  David Baker’s group recently delivered an interesting feature in this field: the creation of a bioactive protein switch.19,20 This complex system, named LOCKR (Latching Orthogonal Cage/Key pRotein), responds to environmental stimuli and has been used for many purposes, from inducing cell death to moving material in both yeast and human cells. One of the main characteristics of this newly-designed protein is that it only activates its mechanism if a key molecule interacts with the protein.

The implications of this new design are extensive. In addition to the function of the LOCKR protein itself, the significance of this discovery is that it highlights what can be achieved by combining computer science with protein design. This discovery sets a precedent, marking the transition from just imitating what already exists to designing proteins with unique functions. Furthermore, advances in protein design software are enabling scientists to overcome the limitations of traditional methods for structural determination, such as X-ray crystallography and nuclear magnetic resonance spectroscopy, which are usually very time-consuming. This is especially relevant in the pharmaceutical field, where many proteins are used as pharmaceuticals. The drug discovery process is very long, especially given that drugs must undergo strict, lengthy clinical trials before hitting the market – and even then, there is no guarantee that the product will not cause side effects, as we have seen with Lemtrada. For this reason, enhancing the efficacy of new drugs and reducing the time they take to reach the clinical phase is vital to bring new pharmaceuticals to the market as quickly as possible. Generating several new molecular models at the same time and simulating protein interactions in silico will certainly aid this endeavour: it appears that we are approaching the dawn of a revolution in synthetic biology.

About the author

Marc Baiget-Francesch graduated as an MSc in pharmaceutical engineering and design in 2017 from the Technical University of Denmark (DTU). He participated at the SensUs competition twice as a student team co-ordinator designing biosensors for creatinine and NT-proBNP. 


  1. Irvine GB, et al. Protein Aggregation in the Brain: The Molecular Basis for Alzheimer’s and Parkinson’s Diseases. Molecular Medicine, vol. 14, no. 7-8, 2008, pp. 451–464., doi:10.2119/2007-00100.irvine.
  2. Becker Muscular Dystrophy (BMD). Muscular Dystrophy Association, Muscular Dystrophy Association, 31 Jan. 2018,
  3. Ogura Y, et al. A Frameshift Mutation in NOD2 Associated with Susceptibility to Crohns Disease. Nature, vol. 411, no. 6837, 2001, pp. 603–606., doi:10.1038/35079114.
  4. Phenylketonuria: MedlinePlus Medical Encyclopedia. MedlinePlus, U.S. National Library of Medicine,
  5. Carbonell R, et al. A Technology Roadmap For Today’s Gene Therapy Manufacturing Challenges., 18 Apr. 2019, roadmap-for-today-s-gene-therapy-manufacturing- challenges-0001.
  6. Gonçalves GAR, Paiva RdMA. Gene Therapy: Advances, Challenges and Perspectives. Einstein (São Paulo), vol. 15, no. 3, 2017, pp. 369–375., doi:10.1590/s1679- 45082017rb4024.
  7. Lemtrada Product Information. European Medicines
    Agency, 2018, information/lemtrada-epar-product-information_en.pdf.
  8. Nucala Product Information. European Medicines Agency, 2015, information/nucala-epar-product-information_en.pdf.
  9. Francisco EM. Use of Multiple Sclerosis Medicine Lemtrada Restricted While EMA Review Is Ongoing. European Medicines Agency, 12 Apr. 2019, news/use-multiple-sclerosis-medicine-lemtrada-restricted- while-ema-review-ongoing.
  10. Entzminger KC, et al. De Novo Design of Antibody Complementarity Determining Regions Binding a FLAG Tetra-Peptide. Scientific Reports, vol. 7, no. 1, 2017, doi:10.1038/s41598-017-10737-9.
  11. Riechmann L, et al. Reshaping Human Antibodies for Therapy. Nature, vol. 332, no. 6162, 1988, pp. 323–327., doi:10.1038/332323a0.
  12. PyMOL,
  13. EvoDesign: De Novo Protein Design Based on Structural and Evolutionary Profiles. Zhang Lab,
  14. Mitra P, et al. EvoDesign: De Novo Protein Design Based on Structural and Evolutionary Profiles. Nucleic Acids Research, vol. 41, no. W1, 2013, doi:10.1093/nar/gkt384.
  15. Pearce R, et al. “EvoDesign: Designing Protein–Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function.” Journal of Molecular Biology, vol. 431, no. 13, 2019, pp. 2467–2476., doi:10.1016/j.jmb.2019.02.028.
  16. The Rosetta Software. RosettaCommons, www.
  17. MarcosE,SilvaDA.EssentialsofDeNovoProteinDesign: Methods and Applications. Wiley Interdisciplinary Reviews: Computational Molecular Science, vol. 8, no. 6, 2018, doi:10.1002/wcms.1374.
  18. Koga N, et al. Principles for Designing Ideal Protein Structures. Nature, vol. 491, no. 7423, 2012, pp. 222–227., doi:10.1038/nature11600.
  19. Langan RA, et al. De Novo Design of Bioactive Protein Switches. Nature, vol. 572, no. 7768, 2019, pp. 205–210., doi:10.1038/s41586-019-1432-8.
  20. Ng AH, et al. Modular and Tunable Biological Feedback Control Using a De Novo Protein Switch. Nature, vol. 572, no. 7768, 2019, pp. 265–269., doi:10.1038/s41586- 019-1425-7.