New Genetic Technologies
With the completion of the Human Genome Project in 2003, scientists can now access—and develop—ever more sophisticated techniques to identify and record the genetic contributions to common, complex illnesses. Below are some of the more frequently used genetic techniques and areas of ongoing studies.
Ongoing genome projects are generating vast amounts of DNA sequence
information, and because of this ever-growing influx of data,
researchers are increasingly relying on mathematical and computational
techniques to embed meaning into the data outputs. Sophisticated
mathematical concepts are used to characterize the principles
underlying biology at genetic, molecular and cellular levels.
Computational biology includes the areas of molecular cell biology that are embracing the application of mathematical theory to advance discovery.
The field includes foundations in computer science, applied mathematics, statistics (i.e., biostatistics), biochemistry, chemistry, biophysics, molecular biology, and genetics, among others.
Epigenetics (including epigenetic techniques)
A relatively new area of research based on the science of epigenetics is uncovering how our environment interacts with certain genes to switch them on or off permanently, or alter their expression—much like a light-switch dimmer.
Epigenetics is the study of changes in gene activity that do not involve alterations to the genetic code, yet are heritable. These patterns of gene expression are governed by the cellular material— the epigenome —that overlies the genome.
For example, epigenetic changes include DNA methylation and histone modification, both of which serve to regulate gene expression without altering the underlying DNA sequence.
DNA methylation, for example, is a naturally occurring process that can cause a potentially reversible change in the activity of a gene. It helps to determine which genes are turned ‘on’ or ‘off’ in each cell—and at what level or intensity of expression—thus influencing the cells’ functions. DNA methylation is the process in which enzymes (methylases) add methyl groups onto specific cytosine nucleotides in genes and in so doing block their activity. Many cancer biologists agree that methylation and other so-called epigenetic changes may be as important as genetic mutations in causing and promoting cancer.
Histone modification does not change the chemistry of DNA but does affect the protein platforms around which DNA is “spooled” (i.e., the core histones). These proteins condense DNA and provide an initial level of gene organization. Histones can be modified by methylation, acetylation and phosphorylation. These changes are dynamic, reversible and can act to recruit specific protein factors that are important for regulating gene expression. The combined interactions of the different types of modification yield a “histone code,” which influences the likelihood that associated DNA will be transcribed (or not).
Various techniques are used to identify epigenetic modifications to DNA. These may include:
- mapping locus-specific differences in DNA modification by using enzymes that recognize the same target sequence in the DNA but were either sensitive or insensitive to its modification;
- exposing DNA to bisulphites and then sequencing, in effect allowing methylated sites to be identified;
- using chemical approaches or antibodies targeted to specific modifications (e.g., cytosine modification). Antibody-based approaches are also widely applied to identify the various types of histone protein modification in chromatin.
These techniques can be amplified by using DNA microarrays or next-generation (high-throughput) sequencing techniques to analyze the entire epigenome.
Genome-wide association studies
Genome-wide association studies (GWAS) enable researchers to identify genes involved in human illnesses and thereby test for the association between genetic polymorphisms (the recurrence within a population of two or more discontinuous genetic variants of a specific trait, such as blood type) spread evenly over the entire genome. GWAS search the genome for small variations, called single nucleotide polymorphisms (SNPs) that occur more frequently in people with a particular illness versus those without.
Once new genetic associations are identified for a particular
illness, researchers can use the information to develop better
strategies to detect, treat and prevent the disease. Such studies are
particularly useful in finding genetic variations that contribute to
common, complex illnesses such as arthritis, cancer, diabetes, heart
disease and psychiatric illnesses.
If certain genetic variations are found to be significantly more frequent in people with the disease compared to those without, the variations are said to be “associated” with the disease. The associated genetic variations can thereby indicate the region of the human genome where the disease-causing mutations reside.
An analysis of GWAS data requires the performance of thousands of statistical tests. Some adjustment for the multiple testing issues is required to declare a genetic variant significantly associated with the outcome of interest. Researchers often consider that a p-value of the magnitude 5.10-8 (before adjustment for multiple testing) is necessary to achieve genome-wide significance.
However, the associated variants themselves may not directly cause the disease. Therefore, researchers often take additional steps, such as fine-mapping by imputation (i.e., substitution) of dense SNPs or by sequencing DNA base pairs in that particular region of the genome, to identify the exact genetic change involved in the disease.
Some researchers are now moving away from GWAS, which were originally motivated by the “common variants, common diseases” hypothesis. For many diseases and complex traits, GWAS have not been able to identify genetic variants that could explain a large proportion of heritability. Therefore, many investigators are now searching for novel rare variants through next generation sequencing techniques (described below) that could explain this “missing heritability.”
When using family data, linkage analysis can be a powerful approach to find rare variants that segregate through families. A linkage signal can be detected at markers up to 20 Mb away from the disease or trait locus of interest.
Microarray technology (i.e., DNA microarray)
Studying which genes are active and which are inactive in different cell types helps researchers understand both how these cells function normally and how they are affected if and when the activities of various genes are compromised.
Researchers use DNA microarrays to measure the expression levels of large numbers of genes simultaneously, detect SNPs or genotype multiple regions of a genome. For these tests, a collection of microscopic DNA spots are attached to a solid surface, and each DNA spot contains specific DNA sequences (known as probes). Since an array can contain tens of thousands of probes, a microarray experiment can accomplish many genetic tests simultaneously. Therefore, arrays have dramatically accelerated many types of investigation.
Microarray technology brings new knowledge about complex illnesses such as cancer. Specifically, with the help of microarray technology researchers can not only classify tumours based on which organs they reside, but also based on the patterns of gene activity in the tumour cells. This knowledge assists in the development of new treatments targeted directly to each specific type of cancer.
Next-generation sequencing (also referred to as second-generation
sequencing, or high-throughput sequencing)
Knowledge of DNA sequences has become indispensable for basic biological research and new insights into health and disease.
DNA sequencing includes several methods and technologies used to determine the order nucleotide bases in a molecule of DNA.
While first generation, or Sanger sequencing techniques segregated individual samples of DNA into lanes or capillaries and separated bases in space, the new generation of sequencing techniques rely on arraying several hundred thousand sequencing templates in either picotiter plates or agarose thin layers, so that these sequences can be analyzed in parallel—a significant increase in capability compared to the maximum of 96 sequencing templates on a contemporary Sanger capillary sequencer.
New techniques have significantly increased the speed and decreased the cost of DNA sequencing, enabling the sequencing of the entire genomes of many individuals. The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that simplify the sequencing process, producing thousands or millions of sequences concurrently. The technology is still developing rapidly and miniaturized sequencers have been assembled that are the size of large USB keys.
Exons are short, functionally important sequences of DNA that represent the regions in genes that are translated into proteins. In the human genome there are about 180,000 exons and this represents about 1% of the entire genome (the untranslated regions flanking exons are not included in exome studies).
Exome sequencing selectively sequences the coding regions of the genome and is sometimes used as an alternative to whole genome sequencing.
Whole-exome sequencing is a cheaper, faster, yet still efficient strategy for reading the parts of the genome that researchers believe are the most important for diagnosing disease.
It is estimated that most disease-causing mutations are found within
the regions of the genome that encode proteins. Whole-exome sequencing
reads only the parts of the human genome that encode proteins
(including untranslated regions of RNA), leaving the other 99% of the
genome unread (i.e., in effect the technique is akin to a high-level
scan of the genome).