Conformational Transitions During Protein Folding

A beautiful representation (amino acid distance matrix) of the conformational transitions which a protein undergoes during folding. The bias-exchange metadynamics simulation is described in Pietrucci & Laio, J. Chem

GCSF Protein Folding Illustration Movie

Granulocyte Colony-Stimulating Factor (G-CSF or GCSF) is a colony-stimulating factor hormone. It is a glycoprotein, growth factor or cytokine produced by a number of different tissues to stimulate the bone marrow to produce granulocytes and stem cells. G-CSF then stimulates the bone marrow to pulse them out of the marrow into the blood. It also stimulates the survival, proliferation, differentiation, and function of neutrophil precursors and mature neutrophils.

G-CSF is also known as Colony-Stimulating Factor 3 (CSF 3).

Bookmark and Share  Subscribe in a reader

Biological function

G-CSF is produced by endothelium, macrophages, and a number of other immune cells. The natural human glycoprotein exists in two forms, a 174- and 180-amino-acid-long protein of molecular weight 19,600 grams per mole. The more-abundant and more-active 174-amino acid form has been used in the development of pharmaceutical products by recombinant DNA (rDNA) technology.
Mouse granulocyte colony-stimulating factor (G-CSF) was first recognised and purified in Australia in 1983, and the human form was cloned by groups from Japan and the United States in 1986.

The G-CSF-receptor is present on precursor cells in the bone marrow, and, in response to stimulation by G-CSF, initiates proliferation and differentiation into mature granulocytes.

The gene for G-CSF is located on chromosome 17, locus q11.2-q12. Nagata et al. found that the GCSF gene has 4 introns, and that 2 different polypeptides are synthesized from the same gene by differential splicing of mRNA.
The 2 polypeptides differ by the presence or absence of 3 amino acids. Expression studies indicate that both have authentic GCSF activity.

It is thought that stability of the G-CSF mRNA is regulated by an RNA element called the G-CSF factor stem-loop destabilising element.

Therapeutic use
G-CSF stimulates the production of white blood cells (WBC). In oncology and hematology, a recombinant form of G-CSF is used with certain cancer patients to accelerate recovery from neutropenia after chemotherapy, allowing higher-intensity treatment regimens. Chemotherapy can cause myelosuppression and unacceptably low levels of white blood cells, making patients prone to infections and sepsis. However, in a Washington University School of Medicine study using mice, G-CSF is shown to lessen the density of bone tissue even while it increases the WBC count; if this is found to occur in human cases it would necessitate increased consumption of calcium and vitamins A and D, and maybe drug therapy.

G-CSF is also used to increase the number of hematopoietic stem cells in the blood of the donor before collection by leukapheresis for use in hematopoietic stem cell transplantation. It may also be given to the receiver, to compensate for conditioning regimens.

Itescu planned in 2004 to use G-CSF to treat heart degeneration by injecting it into the blood-stream, plus SDF (stromal cell-derived factor) directly to the heart.

The recombinant human G-CSF synthesised in an E. coli expression system is called filgrastim. The structure of filgrastim differs slightly from the structure of the natural glycoprotein. Most published studies have used filgrastim. Filgrastim (Neupogen®) and PEG-filgrastim (Neulasta®) are two commercially-available forms of rhG-CSF (recombinant human G-CSF). The PEG (polyethylene glycol) form has a much longer half-life, reducing the necessity of daily injections.

Another form of recombinant human G-CSF called lenograstim is synthesised in Chinese Hamster Ovary cells (CHO cells). As this is a mammalian cell expression system, lenograstim is indistinguishable from the 174-amino acid natural human G-CSF. No clinical or therapeutic consequences of the differences between filgrastim and lenograstim have yet been identified, but there are no formal comparative studies.

RNA interference

RNA interference (also called "RNA-mediated interference", abbreviated RNAi) is a mechanism for RNA-guided regulation of gene expression in which double-stranded ribonucleic acid inhibits the expression of genes with complementary nucleotide sequences. Conserved in most eukaryotic organisms, the RNAi pathway is thought to have evolved as a form of innate immunity against viruses and also plays a major role in regulating development and genome maintenance.


The RNAi pathway is initiated by the enzyme dicer, which cleaves double-stranded RNA (dsRNA) to short double-stranded fragments of 20–25 base pairs. One of the two strands of each fragment, known as the guide strand, is then incorporated into the RNA-induced silencing complex (RISC) and base-pairs with complementary sequences. The most well-studied outcome of this recognition event is a form of post-transcriptional gene silencing. This occurs when the guide strand base pairs with a messenger RNA (mRNA) molecule and induces degradation of the mRNA by argonaute, the catalytic component of the RISC complex. The short RNA fragments are known as small interfering RNA (siRNA) when they derive from exogenous sources and microRNA (miRNA) when they are produced from RNA-coding genes in the cell's own genome. The RNAi pathway has been particularly well-studied in certain model organisms such as the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the flowering plant Arabidopsis thaliana.

The selective and robust effect of RNAi on gene expression makes it a valuable research tool, both in cell culture and in living organisms; synthetic dsRNA introduced into cells can induce suppression of specific genes of interest. RNAi may also be used for large-scale screens that systematically shut down each gene in the cell, which can help identify the components necessary for a particular cellular process or an event such as cell division. Exploitation of the pathway is also a promising tool in biotechnology and medicine.

Historically, RNA interference was known by other names, including post transcriptional gene silencing, transgene silencing, and quelling. Only after these apparently-unrelated processes were fully understood did it become clear that they all described the RNAi phenomenon. RNAi has also been confused with antisense suppression of gene expression, which does not act catalytically to degrade mRNA but instead involves single-stranded RNA fragments physically binding to mRNA and blocking translation. In 2006, Andrew Fire and Craig C. Mello shared the Nobel Prize in Physiology or Medicine for their work on RNA interference in the nematode worm C. elegans, which they published in 1998.

Pharmacogenomics and Personalized Medicine

HIV Resistance

Nanotechnology for Targeted Cancer Therapy

From RNA to Protein Synthesis

DNA stores the information for protein synthesis and RNA carries out the instructions encoded in DNA, most biological activities are carried out by proteins. The accurate synthesis of proteins thus is critical to the proper functioning of cells and organisms.

the linear order of amino acids in each protein determines its three-dimensional structure and activity. For this reason, assembly of amino acids in their correct order, as encoded in DNA, is the key to production of functional proteins.

Three kinds of RNA molecules perform different but cooperative functions in protein synthesis

1. Messenger RNA (mRNA) carries the genetic information copied from DNA in the form of a series of three-base code “words,” each of which specifies a particular amino acid.

2. Transfer RNA (tRNA) is the key to deciphering the code words in mRNA. Each type of amino acid has its own type of tRNA, which binds it and carries it to the growing end of a polypeptide chain if the next code word on mRNA calls for it. The correct tRNA with its attached amino acid is selected at each step because each specific tRNA molecule contains a three-base sequence that can base-pair with its complementary code word in the mRNA.

3. Ribosomal RNA (rRNA) associates with a set of proteins to form ribosomes. These complex structures, which physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit, each of which contains its own rRNA molecule or molecules.

Translation is the whole process by which the base sequence of an mRNA is used to order and to join the amino acids in a protein. The three types of RNA participate in this essential protein-synthesizing pathway in all cells; in fact, the development of the three distinct functions of RNA was probably the molecular key to the origin of life.

RNA contains ribonucleotides of adenine, cytidine, guanine, and uracil; DNA contains deoxyribonucleotides of adenine, cytidine, guanine, and thymine. Because 4 nucleotides, taken individually, could represent only 4 of the 20 possible amino acids in coding the linear arrangement in proteins, a group of nucleotides is required to represent each amino acid. The code employed must be capable of specifying at least 20 words (i.e., amino acids).

If two nucleotides were used to code for one amino acid, then only 16 (or 42) different code words could be formed, which would be an insufficient number. However, if a group of three nucleotides is used for each code word, then 64 (or 43) code words can be formed. Any code using groups of three or more nucleotides will have more than enough units to encode 20 amino acids. Many such coding systems are mathematically possible. However, the actual genetic code used by cells is a triplet code, with every three nucleotides being “read” from a specified starting point in the mRNA. Each triplet is called a codon. Of the 64 possible codons in the genetic code, 61 specify individual amino acids and three are stop codons. most amino acids are encoded by more than one codon. Only two — methionine and tryptophan — have a single codon; at the other extreme, leucine, serine, and arginine are each specified by six different codons. The different codons for a given amino acid are said to be synonymous. The code itself is termed degenerate, which means that it contains redundancies.

Synthesis of all protein chains in prokaryotic and eukaryotic cells begins with the amino acid methionine. In most mRNAs, the start (initiator) codon specifying this aminoterminal methionine is AUG. In a few bacterial mRNAs, GUG is used as the initiator codon, and CUG occasionally is used as an initiator codon for methionine in eukaryotes. The three codons UAA, UGA, and UAG do not specify amino acids but constitute stop (terminator) signals that mark the carboxyl terminus of protein chains in almost all cells. The sequence of codons that runs from a specific start site to a terminating codon is called a reading frame. This precise linear array of ribonucleotides in groups of three in mRNA specifies the precise linear sequence of amino acids in a protein and also signals where synthesis of the protein chain starts and stops.

Because the genetic code is a commaless, overlapping triplet code, a particular mRNA theoretically could be translated in three different reading frames. Indeed some mRNAs have been shown to contain overlapping information that can be translated in different reading frames, yielding different polypeptides.The vast majority of mRNAs, however, can be read in only one frame because stop codons encountered in the other two possible reading frames terminate translation before a functional protein is produced. Another unusual coding arrangement occurs be- cause of frameshifting. In this case the protein-synthesizing machinery may read four nucleotides as one amino acid and then continue reading triplets, or it may back up one base and read all succeeding triplets in the new frame until termination of the chain occurs. These frameshifts are not common events, but a few dozen such instances are known

The vast majority of mRNAs, however, can be read in only one frame because stop codons encountered in the other two possible reading frames terminate translation before a functional protein is produced. Another unusual coding arrangement occurs be- cause of frameshifting. In this case the protein-synthesizing machinery may read four nucleotides as one amino acid and then continue reading triplets, or it may back up one base and read all succeeding triplets in the new frame until termination of the chain occurs. These frameshifts are not common events, but a few dozen such instances are known

The Folded Structure of tRNA Promotes Its Decoding Functions

understanding the flow of genetic information from DNA to protein was to determine how the nucleotide sequence of mRNA is converted into the amino acid sequence of protein. This decoding process requires two types of adapter molecules: tRNAs and enzymes called aminoacyl-tRNA synthetases.

All tRNAs have two functions: to be chemically linked to a particular amino acid and to base-pair with a codon in mRNA so that the amino acid can be added to a growing peptide chain. Each tRNA molecule is recognized by one and only one of the 20 aminoacyl-tRNA synthetases. Likewise, each of these enzymes links one and only one of the 20 amino acids to a particular tRNA, forming an aminoacyl-tRNA. Once its correct amino acid is attached, a tRNA then recognizes a codon in mRNA, thereby delivering its amino acid to the growing polypeptide.

Ribosomes Are Protein-Synthesizing Machines

If the many components that participate in translating mRNA had to interact in free solution, the likelihood of simultaneous collisions occurring would be so low that the rate of amino acid polymerization would be very slow. The efficiency of translation is greatly increased by the binding of the mRNA and the individual aminoacyl-tRNAs to the most abundant RNA-protein complex in the cell — the ribosome. This two-part machine directs the elongation of a polypeptide at a rate of three to five amino acids added per second. Small proteins of 100 – 200 amino acids are therefore made in a minute or less. On the other hand, it takes 2 to 3 hours to make the largest known protein, titin, which is found in muscle and contains 30,000 amino acid residues. The machine that accomplishes this task must be precise and persistent.

A ribosome is composed of several different ribosomal RNA (rRNA) molecules and more than 50 proteins, organized into a large subunit and a small subunit. The proteins in the two subunits differ, as do the molecules of rRNA. The small ribosomal subunit contains a single rRNA molecule, referred to as small rRNA; the large subunit contains a molecule of large rRNA and one molecule each of two much smaller rRNAs in eukaryotes.The lengths of the rRNA molecules, the quantity of proteins in each subunit, and consequently the sizes of the subunits differ in prokaryotic and eukaryotic cells. (The small and large rRNAs are about 1500 and 3000 nucleotides long in bacteria and about 1800 and 5000 nucleotides long in humans.) Perhaps of more interest than these differences are the great structural and functional similarities among ribosomes from all species. This consistency is another reflection of the common evolutionary origin of the most basic constituents of living cells.


Transcription is the process through which a DNA sequence is enzymatically copied by an RNA polymerase to produce a complementary RNA. In other words, it is the transfer of genetic information from DNA into RNA. In the case of protein-encoding DNA, transcription is the beginning of the process that ultimately leads to the translation of the genetic code (via the mRNA intermediate) into a functional peptide or protein. The stretch of DNA that is transcribed into an RNA molecule is called a transcription unit. Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.

As in DNA replication, transcription proceeds in the 5' → 3' direction (i.e. the old polymer is read in the 3' → 5' direction and the new, complementary fragments are generated in the 5' → 3' direction). Transcription is divided into 3 stages: initiation, elongation and termination.

Initiation in prokaryotes
Transcription of RNA differs from DNA synthesis in that only one strand of DNA, the template strand, is used to make mRNA. Because transcription only proceeds in the 5' → 3' direction, it follows that the DNA template strand that is used must be oriented in 3' → 5' (complementary) direction. The strand that is not used as a template strand is called the non-template strand. Thus, DNA exists as a double strand, whereas RNA only exists as a single strand. The difference is due to the fact that DNA replication is semi-conservative, while transcription results in de novo production of a single strand of RNA.

Transcription begins with the binding of RNA polymerase to the promoter in DNA. The RNA polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. At the start of initiation, the core enzyme is associated with a sigma factor (number 70) that aids in finding the appropriate -35 and -10 basepairs downstream of promoter sequences. Unlike DNA replication, transcription does not need a primer to start. The DNA unwinds and produces a small open complex and synthesis begins on only the template strand.

Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases, so many mRNA molecules can be produced from a single copy of the gene. This step also involves a proofreading mechanism that can replace an incorrectly added RNA molecule.

Upon seeing a termination codon within the DNA template, RNA transcription can stop by forming a secondary hairpin loop that lets it come off the DNA template. Alternatively, another protein designated "Rho" can pull the mRNA away from polymerase

Prokaryotic vs. eukaryotic transcription

Prokaryotic transcription occurs in the cytoplasm alongside translation.
Eukaryotic transcription is primarily localized to the nucleus, where it is separated from the cytoplasm (where translation occurs) by the nuclear membrane.

Transcription factories
Active transcription units are clustered in the nucleus, in discrete sites called ‘transcription factories’. Such sites could be visualized after allowing engaged polymerases to extend their transcripts in tagged precursors (Br-UTP or Br-U), and immuno-labeling the tagged nascent RNA. Transcription factories can also be localized using fluorescence in situ hybridization, or marked by antibodies directed against polymerases. There are ~10,000 factories in the nucleoplasm of a HeLa cell, among which are ~8,000 polymerase II factories and ~2,000 polymerase III factories. Each polymerase II factory contains ~8 polymerases. As most active transcription units are associated with only one polymerase, each factory will be associated with ~8 different transcription units. These units might be associated through promoters and/or enhancers, with loops forming a ‘cloud’ around the factory.

Transcription initiation complex

Transcription factors mediate the binding of RNA polymerase and the initiation of transcription. The RNA polymerase only binds to the promoter after certain transcription factors are assembled. The completed assembly of transcription factors and RNA polymerase bound to the promoter is called the transcription initiation complex.

Reverse transcription

Some viruses (such as HIV, the cause of AIDS), have the ability to transcribe RNA into DNA in order to see a cell's genome. The main enzyme responsible for this type of transcription is called reverse transcriptase. In the case of HIV, reverse transcriptase is responsible for synthesising a complementary DNA strand (cDNA) to the viral RNA genome. An associated enzyme, ribonuclease H, digests the RNA strand and reverse transcriptase synthesises a complementary strand of DNA to form a double helix DNA structure. This cDNA is integrated into the host cell's genome via another enzyme (integrase) causing the host cell to generate viral proteins which reassemble into new viral particles. Subsequently, the host cell undergoes programmed cell death (apoptosis).

Diabetes Animation

Diabetes mellitus (IPA pronunciation: [ˌdaɪəˈbitəs]). is a metabolic disorder characterized by hyperglycemia (high blood sugar) and other signs, as distinct from a single disease or condition. The World Health Organization recognizes three main forms of diabetes: type 1, type 2, and gestational diabetes (occurring during pregnancy), which have similar signs, symptoms, and consequences, but different causes and population distributions. Type 1 is usually due to autoimmune destruction of the pancreatic beta cells which produce insulin. Type 2 is characterized by tissue-wide insulin resistance and varies widely; it sometimes progresses to loss of beta cell function. Gestational diabetes is similar to type 2 diabetes, in that it involves insulin resistance; the hormones of pregnancy cause insulin resistance in those women genetically predisposed to developing this condition.


Type 1 diabetes mellitus—formerly known as insulin-dependent diabetes (IDDM), childhood diabetes or also known as juvenile diabetes, is characterized by loss of the insulin-producing beta cells of the islets of Langerhans of the pancreas leading to a deficiency of insulin. It should be noted that there is no known preventative measure that can be taken against type 1 diabetes. Most people affected by type 1 diabetes are otherwise healthy and of a healthy weight when onset occurs. Diet and exercise cannot reverse or prevent type 1 diabetes. Sensitivity and responsiveness to insulin are usually normal, especially in the early stages. This type comprises up to 10% of total cases in North America and Europe, though this varies by geographical location. This type of diabetes can affect children or adults but was traditionally termed "juvenile diabetes" because it represents a majority of cases of diabetes affecting children.

The most common cause of beta cell loss leading to type 1 diabetes is autoimmune destruction, accompanied by antibodies directed against insulin and islet cell proteins. The principal treatment of type 1 diabetes, even from the earliest stages, is replacement of insulin. Without insulin, ketosis and diabetic ketoacidosis can develop and coma or death will result.

Currently, type 1 diabetes can be treated only with insulin, with careful monitoring of blood glucose levels using blood testing monitors. Emphasis is also placed on lifestyle adjustments (diet and exercise). Apart from the common subcutaneous injections, it is also possible to deliver insulin by a pump, which allows continuous infusion of insulin 24 hours a day at preset levels and the ability to program doses (a bolus) of insulin as needed at meal times. It is also possible to deliver insulin with an inhaled powder.

Type 1 treatment must be continued indefinitely. Treatment does not impair normal activities, if sufficient awareness, appropriate care, and discipline in testing and medication is taken. The average glucose level for the type 1 patient should be as close to normal (80–120 mg/dl, 4–6 mmol/l) as possible. Some physicians suggest up to 140–150 mg/dl (7-7.5 mmol/l) for those having trouble with lower values, such as frequent hypoglycemic events. Values above 200 mg/dl (10 mmol/l) are often accompanied by discomfort and frequent urination leading to dehydration. Values above 300 mg/dl (15 mmol/l) usually require immediate treatment and may lead to ketoacidosis. Low levels of blood glucose, called hypoglycemia, may lead to seizures or episodes of unconsciousness.

Type 2 diabetes mellitus—previously known as adult-onset diabetes, maturity-onset diabetes, or non-insulin-dependent diabetes mellitus (NIDDM)—is due to a combination of defective insulin secretion and insulin resistance or reduced insulin sensitivity (defective responsiveness of tissues to insulin), which almost certainly involves the insulin receptor in cell membranes. In the early stage the predominant abnormality is reduced insulin sensitivity, characterized by elevated levels of insulin in the blood. At this stage hyperglycemia can be reversed by a variety of measures and medications that improve insulin sensitivity or reduce glucose production by the liver, but as the disease progresses the impairment of insulin secretion worsens, and therapeutic replacement of insulin often becomes necessary. There are numerous theories as to the exact cause and mechanism for this resistance, but central obesity (fat concentrated around the waist in relation to abdominal organs, and not subcutaneous fat, it seems) is known to predispose individuals for insulin resistance, possibly due to its secretion of adipokines (a group of hormones) that impair glucose tolerance. Abdominal fat is especially active hormonally. Obesity is found in approximately 55% of patients diagnosed with type 2 diabetes. Other factors include aging (about 20% of elderly patients are diabetic in North America) and family history (Type 2 is much more common in those with close relatives who have had it), although in the last decade it has increasingly begun to affect children and adolescents, likely in connection with the greatly increased childhood obesity seen in recent decades in some places.

Type 2 diabetes may go unnoticed for years in a patient before diagnosis, as visible symptoms are typically mild or non-existent, without ketoacidotic episodes, and can be sporadic as well. However, severe long-term complications can result from unnoticed type 2 diabetes, including renal failure, vascular disease (including coronary artery disease), vision damage, etc.

Type 2 diabetes is usually first treated by attempts to change physical activity (generally an increase is desired), the diet (generally to decrease carbohydrate intake), and weight loss. These can restore insulin sensitivity, even when the weight loss is modest, for example, around 5 kg (10 to 15 lb), most especially when it is in abdominal fat deposits. Some Type 2 diabetics can achieve satisfactory glucose control, sometimes for years, as a result. However, the underlying tendency to insulin resistance is not lost, and so attention to diet, exercise, and weight loss must continue. The usual next step, if necessary, is treatment with oral antidiabetic drugs. As insulin production is initially unimpaired in Type 2s, oral medication (often used in various combinations) can still be used to improve insulin production (e.g., sulfonylureas), to regulate inappropriate release of glucose by the liver (and attenuate insulin resistance to some extent (e.g., metformin), and to substantially attenuate insulin resistance (e.g., thiazolidinediones). According to one study, overweight patients treated with metformin compared with diet alone, had relative risk reductions of 32% for any diabetes endpoint, 42% for diabetes related death and 36% for all cause mortality and stroke. When oral medications fail (cessation of beta cell insulin secretion is not uncommon amongst Type 2s), insulin therapy will be necessary to maintain normal or near normal glucose levels. A disciplined regimen of blood glucose checks is recommended in most cases, most particularly and necessarily when taking medications.

Sickle Cell Anemia

Sickle-cell disease (SCD), or sickle-cell anaemia (or anemia; SCA) or drepanocytosis, is an autosomal recessive genetic blood disorder characterized by red blood cells that assume an abnormal, rigid, sickle shape. Sickling decreases the cells' flexibility and results in a risk of various complications. The sickling occurs because of a mutation in the Hemoglobin gene. Life expectancy is shortened, with studies reporting an average life expectancy of 42 in males and 48 in females.

Sickle-cell gene mutation probably arose spontaneously in different geographic areas, as suggested by restriction endonuclease analysis. These variants are known as Cameroon, Senegal, Benin, Bantu and Saudi-Asian. Their clinical importance springs from the fact that some of them are associated with higher HbF levels, e.g., Senegal and Saudi-Asian variants, and tend to have milder disease.

In people heterozygous for HgbS (carriers of sickling haemoglobin), the polymerisation problems are minor, because the normal allele is able to produce over 50% of the haemoglobin. In people homozygous for HgbS, the presence of long-chain polymers of HbS distort the shape of the red blood cell from a smooth donut-like shape to ragged and full of spikes, making it fragile and susceptible to breaking within capillaries. Carriers have symptoms only if they are deprived of oxygen (for example, while climbing a mountain) or while severely dehydrated. Under normal circumstances, these painful crises occur about 0.8 times per year per patientThe sickle-cell disease occurs when the seventh amino acid (if the initial methionine is counted), glutamic acid, is replaced by valine to change its structure and function.

The gene defect is a known mutation of a single nucleotide (see single-nucleotide polymorphism - SNP) (A to T) of the β-globin gene, which results in glutamate being substituted by valine at position 6. Haemoglobin S with this mutation are referred to as HbS, as opposed to the normal adult HbA. The genetic disorder is due to the mutation of a single nucleotide, from a GAG to GTG codon mutation. This is normally a benign mutation, causing no apparent effects on the secondary, tertiary, or quaternary structure of haemoglobin in conditions of normal oxygen concentration. What it does allow for, under conditions of low oxygen concentration, is the polymerization of the HbS itself. The deoxy form of haemoglobin exposes a hydrophobic patch on the protein between the E and F helices. The hydrophobic residues of the valine at position 6 of the beta chain in haemoglobin are able to associate with the hydrophobic patch, causing haemoglobin S molecules to aggregate and form fibrous precipitates.

The allele responsible for sickle-cell anaemia is autosomal recessive and can be found on the short arm of chromosome 11. A person that receives the defective gene from both father and mother develops the disease; a person that receives one defective and one healthy allele remains healthy, but can pass on the disease and is known as a carrier. If two parents who are carriers have a child, there is a 1-in-4 chance of their child developing the disease and a 1-in-2 chance of their child's being just a carrier. Since the gene is incompletely recessive, carriers can produce a few sickled red blood cells, not enough to cause symptoms, but enough to give resistance to malaria. Because of this, heterozygotes have a higher fitness than either of the homozygotes. This is known as heterozygote advantage.

Due to the adaptive advantage of the heterozygote, the disease is still prevalent, especially among people with recent ancestry in malaria-stricken areas, such as Africa, the Mediterranean, India and the Middle East.[15] Malaria was historically endemic to southern Europe, but it was declared eradicated in the mid-20th century, with the exception of rare sporadic cases.

The malaria parasite has a complex life cycle and spends part of it in red blood cells. In a carrier, the presence of the malaria parasite causes the red blood cells with defective haemoglobin to rupture prematurely, making the plasmodium unable to reproduce. Further, the polymerization of Hb affects the ability of the parasite to digest Hb in the first place. Therefore, in areas where malaria is a problem, people's chances of survival actually increase if they carry sickle-cell trait (selection for the heterozygote).

In the USA, where there is no endemic malaria, the prevalence of sickle-cell anaemia among blacks is lower (about 0.25%) than in West Africa (about 4.0%) and is falling. Without endemic malaria from Africa, the sickle cell mutation is purely disadvantageous and will tend to be selected out of the affected population. Another factor limiting the spread of sickle-cell genes in North America is the absence of cultural proclivities to polygamy.

Computational Analysis of Biological Sequence