text-only page produced automatically by LIFT Text Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Home National Science Foundation - Computer & Information Science & Engineering (CISE)
Computer & Information Science & Engineering (CISE)
design element
CISE Home
About CISE
Funding Opportunities
Awards
News
Events
Discoveries
Publications
Advisory Committee
Career Opportunities
Advisory Committee for Cyberinfrastructure
See Additional CISE Resources
View CISE Staff
CISE Organizations
Advanced Cyberinfrastructure (ACI)
Computing and Communication Foundations (CCF)
Computer and Network Systems (CNS)
Information & Intelligent Systems (IIS)
Proposals and Awards
Proposal and Award Policies and Procedures Guide
  Introduction
Proposal Preparation and Submission
bullet Grant Proposal Guide
  bullet Grants.gov Application Guide
Award and Administration
bullet Award and Administration Guide
Award Conditions
Other Types of Proposals
Merit Review
NSF Outreach
Policy Office
Additional CISE Resources
Assistant Director's Presentations and Congressional Testimony
CS Bits & Bytes
CISE Distinguished Lecture Series
Webcasts/Webinars
WATCH Series
Workshops
CISE Strategic Plan for Broadening Participation
Cybersecurity Ideas Lab Report
Keith Marzullo on Serving in CISE
Other Site Features
Special Reports
Research Overviews
Multimedia Gallery
Classroom Resources
NSF-Wide Investments

Email this pagePrint this page

Discovery
Placing Landmarks on the Genome Map

Vishy Iyer and colleagues use supercomputers and next-generation gene sequences to explore DNA and heredity

Schematic diagram showing human chromosome 21 with a small region outlined in red.

Schematic diagram showing human chromosome 21.
Credit and Larger Version

May 31, 2011

Supercomputers and next-generation gene sequencers allow researchers to explore DNA and heredity.

We typically think of heredity--eye color, body type or susceptibility to a disease--as rooted in our genes. And it is. But as biologists sequence more genomes and analyze the results, they're finding that the non-coding regions of the genome outside the genes, formerly considered "junk," play an important role in our genetic make-up as well.

Since 2001, the cost of DNA sequencing a human genome has dropped from billions to tens of thousands of dollars, enabling more focused investigations of gene expression. This has greatly improved scientists' ability to understand biological systems and their relation to illness.

Many common diseases have a genetic component that predisposes one to become sick, but the connection is rarely simple. The combination of next-generation gene sequencers and high-performance computers are enabling biologists to ask novel questions about our DNA and to glean new insights about disease and heredity.

An important example involves the role of transcription factor proteins in gene regulation, which scientists are just beginning to explore. These proteins bind to landing pads on the genome and act as control dials for gene regulation--turning genes on or off, and determining the level of gene activity in a cell.

"If you're comparing normal cells to cancer cells, you want to know what happened in the cancer cell that makes it different," said Vishy Iyer, at the University of Texas at Austin. "The gene expression patterns change, and we want to know which genes are regulated up or down, and how that came about."

About 2,000 transcription factor proteins have been identified, and some have been linked to breast and other cancers, Rett syndrome, and autoimmune diseases. However, little is known about how they work.

Iyer, along with colleagues at Duke University, University of North Carolina at Chapel Hill, National Human Genome Research Institute and Wellcome Trust Genome Campus, are trying to change that. Published in the journal Science in 2010, their research was one of the first studies to use next-generation sequencing and supercomputers to explore the expression of genes related to a specific regulatory transcription factor (called CTCF). They determined that transcription factor binding is a heritable trait.

"We showed for the first time that some of the differences in DNA between individuals can affect the binding of transcription factors," said Iyer. "More importantly, that those differences could be inherited."

The group used a relatively new sequencing technology, called ChIP-Seq, to study only the regions of DNA to which the proteins of interest were bound. These base pairs were then sequenced to determine the order of nucleotides and to count how many molecules were bound to the protein.

Sounds simple enough, until you try to sequence millions of these regions to locate their exact position among the approximately three billion base pairs in the human genome.

"The genome is a vast area with many features," said Iyer. "You can think of the proteins as landmarks that we're trying to place on the genome map."

The National Science Foundation-funded Ranger supercomputer at the Texas Advanced Computing Center took the short sequence reads generated by ChIP-Seq and aligned them to the reference genome.

"It's like a text search. Though if you tried to run it in Microsoft Word, it would never finish," Iyer joked.

Using several thousand processors simultaneously on Ranger, the alignment took several hours for each of the data sets, and, in total, used the equivalent of 20 years on a single processor.

The single base resolution offered by next-generation sequencing enabled the researchers to look at individual, known differences in the DNA and to use those dissimilarities to examine how genes on each chromosome bind transcription factors.

"We could tell the difference in binding from the gene that you inherited from your father and mother--that was the big advance," said Iyer. "Now, we're applying this technology to cases where you know that the gene from one of your parents has a mutation that pre-disposes you to some disease."

These findings bring science one step closer to personalized medicine based on a detailed reading of an individual's genome, including the non-coding regions. Despite the tremendous complexity of the genome, Iyer is optimistic that the research will have an impact on human health.

"There are lots of diseases and for a subset, they're affecting gene expression by impacting transcription factors," he said. "If we pick the diseases and the factors smartly, I think we'll find them."

The research was also supported by the National Human Genome Research Institute.

-- Aaron Dubrow, Texas Advanced Computing Center, aarondubrow@tacc.utexas.edu

This Behind the Scenes article was provided to LiveScience in partnership with the National Science Foundation.

Investigators
Vishy Iyer
Ryan McDaniel
Bum-Kyu Lee
Lingyun Song
Zheng Liu
Alan Boyle
Michael Erdos
Laura Scott
Mario Morken
Katerina Kucera
Anna Battenhouse
Damian Keefe
Francis Collins
Huntington Willard
Jason Lieb
Terrence Furey
Gregory Crawford
Ewan Birney

Related Institutions/Organizations
University of Texas at Austin
Duke University
University of North Carolina at Chapel Hill
Wellcome Trust Genome Campus

Locations
Maryland
North Carolina
Texas
United Kingdom

Related Awards
#0622780 World-Class Science Through World Leadership in HPC

Related Agencies
National Human Genome Research Institute

Related Websites
LiveScience.com: Behind the Scenes: Placing Landmarks on the Genome Map: http://www.livescience.com/14259-computing-transcription-factors-genome-bts.html
Iyer Lab: http://microarray.icmb.utexas.edu/research.html
Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans, Science, April 9, 2010: http://www.sciencemag.org/content/328/5975/235.short

Representation of allele-specific and non-allele-specific SNPs across the CTCF binding motif (17).
Representation of allele-specific and non-allele-specific SNPs across the CTCF binding motif (17).
Credit and Larger Version



Email this pagePrint this page
Back to Top of page