top of page

Can AI unlock the biological mystery of proteins?

By: Jessica Wang


What are Proteins?

We all know what proteins are, right? There’s a lot of protein in chicken, beef, and other meats, as well as, surprisingly, broccoli. It also helps bodybuilders build large, strong muscles, right? Yes, but they do so much more. Proteins are one of the four macromolecules (1). If you are a house, then these macromolecules are the nails, wood and brick involved in every part of your body. From allowing your brain to communicate with your limbs to smelling scents, there are almost no biological processes that do not involve a protein (1). So, what exactly are they?


Proteins are made up of molecules called amino acids linked in a chain and folded into different shapes, which are called conformations (1). There are 4 levels of protein structure and to visualize the structures, let us compare a protein to a necklace with beads with a tassel on each of the beads.


Level 1: Polypeptide sequence

This is a series of amino acids strung along by bonds called peptide bonds between amino acids, like differently coloured beads strung on a necklace. Amino acids are molecules with a constant region that is same for all amino acids and a R-group that differs. There are 20 different amino acids that make up biological proteins, and depending on the order and how many of these amino acids are in the sequence, we get very different proteins (1).


Level 2: Alpha-helices and beta sheets

Alpha-helices and beta sheets are caused by the interactions between the protein backbone, which is made up of the common parts of the amino acid, and are bonded to each other to form a repeating sequence. This is like if you took the necklace and twisted it around in different shapes. There are two main shapes, the alpha helices that form a corkscrew shape and beta sheets that form a flat shape (1).


Level 3: R group interactions

There are many different types of interactions between the R groups, including disulfide bridges and hydrophobic interactions (1). To visualize this, you can imagine certain tassels getting tangled with each other and twisting the necklace.


Level 4: More than 1 polypeptide

Not all proteins have this structure, and it occurs when multiple polypeptides from level 3 get attached to each other (1). In our analogy, this is like the necklaces get tangled with each other.


Why does structure matter?

Well, the structure dictates the function of proteins. A type of protein called enzymes solely functions by having substrates binding into their active site (1), like a lock (enzyme) and key (substrate). If the shape of the lock is wrong, then a key can never open it. As such, misfolding into the wrong 3D structure can lead to serious diseases. For example, cystic fibrosis can be caused by a single gene mutation. These mutations can cause disastrous misfolding that can lead to a key protein being degraded or destroyed by the cell or otherwise nonfunctional (2). Similarly, prion diseases that affect the brain use a misfolded protein as a template to cause misfolding of other normal proteins to create aggregations and collections of proteins in the brain (3). As such, to determine their proper functioning, researchers have focused on the proteins’ structure.


AI and Proteins?

To determine the structure of proteins, scientists have used various physical methods, including the ones listed below:

  1. X-ray crystallography involves making the protein into a crystal then seeing how light interacts with it (4).

  2. Nuclear Magnetic Resonance spectroscopy uses strong magnetic field and measure the radiation from it (4)

  3. Cryo-electron microscopy involving freezing the protein and using an electron microscope (5).


However, a new contender has entered the field: Artificial intelligence. At the Critical Assessment of Protein Structure Prediction 14 competition, the software AlphaFold2 was used to solve the structure of unpublished but known structures of proteins from the amino acid sequence (6). It showed a high accuracy at 0.096nm for the protein backbone, that is more than 1 million times smaller than the thickness of a sheet of paper and around the size of an atom (7,8). This means that alphafold can predict the coordinates of all the heavy atoms to the atom (7). This is incredibly accurate and rivals the physical experimental methods above, and the benefit is that this software is extremely efficient, allowing months or even years worth of work being conducted in days or weeks (6).


How did they achieve this? The scientists used neural networks that mimicked how the human brain learns (7). The researchers have a set of “training data” using known structures from other methods then essentially the computer guesses until it finds a way to solve the structures from the amino acid (7). In particular, the researchers focused on two approaches: physical interaction of atoms and evolution, matching the new sequence with similar protein sequences to find a similar structure (7). The result is a program that can revolutionize the field of molecular biology.


 

References

  1. Reece JB, Urry LA, Cain ML, Wasserman SA, Minorsky PV, Jackson RB, et al. Campbell biology. Tenth edition. Boston: Pearson; 2014. 1 p.

  2. Fraser-Pitt D, O’Neil D. Cystic fibrosis – a multiorgan protein misfolding disease. Future Sci OA. 2015 Sep 1;1(2):FSO57.

  3. Moore RA, Taubner LM, Priola SA. Prion protein misfolding and disease. Curr Opin Struct Biol. 2009 Feb;19(1):14–22.

  4. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Analyzing Protein Structure and Function. Mol Biol Cell 4th Ed [Internet]. 2002 [cited 2021 Dec 3]; Available from: https://www.ncbi.nlm.nih.gov/books/NBK26820/

  5. Bhella D. Cryo-electron microscopy: an introduction to the technique, and considerations when working to establish a national facility. Biophys Rev. 2019 Aug 1;11(4):515–9.

  6. Callaway E. DeepMind’s AI predicts structures for a vast trove of proteins. Nature. 2021 Jul 22;595(7869):635–635.

  7. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583–9.

  8. Size of the Nanoscale | National Nanotechnology Initiative [Internet]. [cited 2021 Dec 3]. Available from: https://www.nano.gov/nanotech-101/what/nano-size

Recent Posts

See All

Comments


bottom of page