NIH expands its ENCODE project

BETHESDA, Md.—The National Institutes of Health (NIH) has announced plans to expand its Encyclopedia of DNA Elements (ENCODE) Project, a genomics resource used by many scientists to study human health and disease. Funded by the National Human Genome Research Institute (NHGRI), part of NIH, the ENCODE Project is generating a catalog of all the genes and regulatory elements—the parts of the genome that control whether genes are active or not—in humans and select model organisms. With four years of additional support, NHGRI builds on a long-standing commitment to developing freely available genomics resources for use by the scientific community.

“ENCODE has created high-quality and easily accessible sets of data, tools and analyses that are being used extensively in studies to interpret genome sequences and to understand the consequence of genomic variation,” remarked Dr. Elise Feingold, a program director in the Division of Genome Sciences at NHGRI. “These awards provide the opportunity to strengthen this foundation by expanding the breadth and depth of the resource.”

Eight mapping centers have received awards, as well as five characterization centers, and six awards for computational analysis. One data coordinating center received an award, as well as one data analysis center.

“The idea behind ENCODE is to make it easier and more efficient for scientists to interpret genome structure and function. The ENCODE resource consists of data from genome-wide assays on a number of cell types, as well as analyses to identify candidate functional elements in the genome (genes and regulatory regions),” Dr. Mike Pazin, program director in NHGRI’s Division of Genome Sciences, tells DDNews.

“The project will continue development of the ENCODE Encyclopedia and the ENCODE portal, which are both freely shared with anyone that has internet access. Additionally, ENCODE now supports six computational analysis groups to develop analytical tools and methods with the goal of increasing the utility of ENCODE resources for the community,” he adds.

Since launching in 2003, ENCODE has funded a network of researchers to develop and apply methods for mapping candidate functional elements in the genome and to analyze the enormous database of generated genomic information. The data and tools generated by ENCODE are organized by two groups: a data coordinating center, which houses the data and provides access to the resource through an open-access portal, and a data analysis center, which synthesizes the data into an encyclopedia for use by the research community.

Pending the availability of funds, NHGRI plans to commit up to $31.5 million in fiscal year 2017 for these awards. With this funding, ENCODE will expand the scope of these efforts to include characterization centers, which will study the biological role that candidate functional elements may play and develop methods to determine how they contribute to gene regulation in a variety of cell types and model systems. Additionally, the project will enhance the ENCODE catalog by developing a way to incorporate data provided by the research community, and will use biological samples from research participants who have explicitly consented for unrestricted sharing of their genomic data.

According to Pazin, “The goal of characterization centers is to develop generalizable approaches to learn how to characterize these candidate elements, with respect to when and where they are active, and what they do (if anything). As they develop their approaches, we will learn some information about the specific elements they test, in the particular settings they use. They will use a variety of approaches including massively parallel reporter assays, genome editing, epigenome editing, as well as genome editing and reporter assays in mice.”

At its core, ENCODE is about enabling the scientific community to make discoveries by using basic science approaches to understand genomes at the most fundamental level. Its catalog of genomic information can be used for a variety of research projects—for example, generating hypotheses about what goes wrong in specific diseases or understanding the processes that determine how the same genome sequence is used in different parts of the body to make cells with specialized functions. More than 1,600 scientific publications by the research community have used ENCODE data or tools.

“We found that many of the people that are using the ENCODE resource are doing so for disease studies, and this attests to its translational value,” says Pazin. “The NHGRI ENCODE team has found at least 1,700 papers that have used ENCODE data, from research that was not funded by ENCODE (we call them community papers). We share them through our project portal, so they can provide examples for others of how the data are being used. We see these community publications fitting into four broad categories that we categorize as human disease, basic biology, software tools and model organism biology. There are about 600 human disease publications, which we think attests to the high translational value of the resource. ENCODE is often used to provide additional annotation of genetic associations with human disease, and is sometimes used to guide further experiments,” Pazin concludes.