Three billion puzzle pieces: insights into the Human Genome Project

The largest jigsaw puzzle created by students at the University of Economics of Ho Chi Minh City in Vietnam in 2011. Photo source.

The largest jigsaw puzzle created by students at the University of Economics of Ho Chi Minh City in Vietnam in 2011. Photo source.

Jigsaw puzzles, as fun as they are, can be a nuisance. We’ve all struggled with finding the right puzzle piece for that last corner of the board. According to the Guinness World Records, the largest jigsaw puzzle — comprised of 551,232 pieces — was compiled by 1,600 students at the University of Economics of Ho Chi Minh City in Vietnam in 2011 [1]. Impressive!

Now imagine completing a puzzle made of 3 billion pieces! It can be a little difficult to absorb such a challenge, but assembling this humongous puzzle was the goal of the Human Genome Project (HGP). A genome contains all of an organism’s genetic information; we know it as DNA. DNA is made up of 4 types of bases (i.e. building blocks): A, T, C and G. The arrangement of these letters ultimately determines our physical traits such as eye color, hair type, small ears versus big ears, etc. Therefore, if we can read the order of these building blocks in each gene, we can attempt to learn how our traits are controlled. For example, knowing the exact combination of these bases in a gene allows geneticists to study the intricate details of how certain groups of humans thrive at higher altitudes or how diabetes is passed on from one generation to another. These are just two examples of what DNA can tell us about ourselves and other organisms among a plethora of questions waiting to be answered.

If the genome of any organism contains such a treasure trove of information, why haven’t we already sequenced the DNA of all creatures? Well, no worries because several genome sequencing projects are currently ongoing! One such venture is the 1,000 Genomes Project, which aspires to find genetic variants that occur in at least 1% of the human populations of Asian, African, European and American ancestry [2]. Accomplishing this endeavor is made possible by next-generation sequencing (NGS) technologies. Sequencing is the process of resolving the precise order of A’s, T’s, C’s and G’s in a DNA fragment. HGP was completed by sequencing numerous fragments of DNA that were compiled together into an entire genome consisting of three billion bases [4]. Several more human (as well as non-human) genomes have been described following the finalized draft of the human genome presented in 2004. All these studies have implemented one of the available NGS methods [5].

"I think I found a corner piece." Photo source.

"I think I found a corner piece." Photo source.

Current NGS technologies have been lowering the cost of sequencing DNA from millions of dollars to only a few thousand [5]. Illumina Sequencing is one of today’s popular NGS platforms used by most labs around the world. An American biotech company based in San Diego, CA—Illumina, Inc.—owns the patent for this technology. However, the initial work was first conducted at the University of Cambridge in UK by two biochemists—Drs. Shankar Balasubramanian and David Klenerman [3]. The fundamental concept behind Illumina Sequencing is similar to the original technique (i.e. the Sanger method) during which the DNA bases (A, T, C, G) are labeled with a tag that emits a different color signal for each corresponding base as they pass through a gel. The light color produced by the signals is detected by a machine and organized in sentences of A’s, T’s, C’s and G’s. However, the largest difference between Sanger method and NGS is the scale over which the sequencing process is extended. We can sequence scores of DNA fragments at a given time using the Sanger technique; whereas, Illumina can sequence billions of DNA fragments from one or more individual organisms simultaneously. Even though Illumina Sequencing was not used in the Human Genome Project, quite a few genomes have been resequenced including Dr. James D. Watson’s in 2008. Watson jointly discovered the structure of DNA in March of 1953 [7]. According to the study’s researchers, Watson’s genome “was completed in two months at approximately one-hundredth of the cost” of more conventional methods [6]. The project stands in stark contrast to HGP, which lasted over 10 years and cost $2.7 billion [4].

The $1000 Genome. Photo source.

The $1000 Genome. Photo source.

The field of NGS technologies continues to develop at a fast pace. However, several challenges remain for scientists concerning storage, transfer, and analysis of data. Nevertheless, these developments have the potential to personalize medicine among many other exciting advances in the field of genetics. Considering the first DNA strand was only sequenced about 40 years ago, the science behind this process has come a long way, but the possibilities for further discovery remain.

Taruna Aggarwal is a Plant Biology graduate student at the University of New Hampshire, where her research encompasses understanding the evolution of forest fungal pathogens. She’s excited about using many bioinformatic tools to study pathogenesis in an emerging fungal pathogen called Geosmithia morbida. Environmentalism is her passion, and she’s a firm believer of Reducing, Reusing, and Recycling.

References:

  1. Largest jigsaw puzzle – most pieces. Guinness World Records. Accessed December 15, 2014. http://www.guinnessworldrecords.com/world-records/largest-jigsaw-puzzle-most-pieces
  2. About the 1000 Genomes Project. 1000 Genomes: A Deep Catalog of Human Genetic Variation. Accessed December 20, 2014. http://www.1000genomes.org/about#ProjectSamples
  3. History of Illumina Sequencing. Illumina. Accessed December 19, 2014. http://technology.illumina.com/technology/next-generation-sequencing/solexa-technology.html
  4. The Human Genome Project Completion: Frequently Asked Questions. National Human Genome Research Institute. Accessed December 20, 2014. http://www.genome.gov/11006943
  5. Metzker L., Michael. Sequencing technologies—the next generation. Nature Reviews: Genetics 100: 31-46. 2010.
  6. Wheeler A., David et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452: 872-876. 2008.
  7. James Watson—Biographical. Nobelprize.org. Accessed December 23, 2014. http://www.nobelprize.org/nobel_prizes/medicine/laureates/1962/watson-bio.html