The NCI Genomic Data Commons (GDC) was established by the NCI Center for Cancer Genomics (CCG) to support
the receipt, harmonization, distribution, and analysis of genomic and clinical data from cancer research
programs. The GDC provides the cancer research community with a repository and computational platform for
cancer researchers who need to understand cancer, its clinical progression, and response to therapy. The GDC
accomplishes this by harmonizing raw sequence data against a common reference genome (GrCH38), applying
state-of-the-art methods for generating high-level data such as mutation calls and structural variants, and
providing scalable tools supporting data download and analysis. The GDC maintains harmonized data from
several supported cancer research programs such as The Cancer Genome Atlas (TCGA), Therapeutically Applicable
Research to Generate Effective Treatments (TARGET), the Clinical Proteomic Tumor Analysis Consortium (CPTAC),
and other contributing programs.
List of Services
-
GDC Data Portal – The GDC Data Portal is a robust web-based platform that allows users to search, download,
and analyze data from cancer genomic studies. The GDC Data Portal provides advanced search and cohort building
features, gene and variant level data visualization and analysis, and a repository for data download.
Links:
Launch the GDC Data Portal |
User’s Guide
-
GDC Data Transfer Tool (DTT) – The GDC DTT is a command-line driven application for the download and upload
of large, high-volume data. The GDC DTT provides an optimized method for transferring data to-and-from the GDC and
enables resumption of interrupted transfers. The GDC DTT Client provides a command-line interface supporting both
GDC data downloads and submissions. The GDC DTT User Interface (UI) provides a user-friendly interface to the GDC
DTT Client for downloading data from the GDC.
Links:
Download the GDC DTT Client and UI |
User’s Guide
-
GDC Application Programming Interface (API) – The GDC API is a programmatic interface for searching,
downloading, submitting, and analyzing GDC data and metadata. The GDC API is the external-facing Representational
State Transfer (REST) interface for the GDC and uses JSON as its communication format, and standard HTTP methods
(GET, PUT, POST, and DELETE).
Links:
User’s Guide
-
GDC Data Dictionary and Data Model – The GDC Data Dictionary is a resource that describes the clinical,
biospecimen, administrative, and genomic metadata that can be used in parallel with the genomic data generated
by the GDC. The dictionary defines the structure of the GDC graph-based data model and the rules the data need
to follow. In addition, the dictionary includes information about the relationships between entities within the
data model.
Links:
View the GDC Data Dictionary |
GDC Data Model
-
GDC Data Submission Portal – The GDC Data Submission Portal is a web-based tool for submitting clinical,
biospecimen, and molecular data associated with projects that are registered in dbGaP and accepted for submission
into the GDC. Submitted data is validated using built-in GDC review/QC tools.
Links:
Request Data Submission |
GDC Data Submission Processes and Tools |
Launch the GDC Data Submission Portal (Login Required) |
User’s Guide
-
GDC Bioinformatics Pipelines – GDC Bioinformatics Pipelines are standard workflows supporting DNA, RNA, and
miRNA alignments against a common reference genome (GRCh38) and higher-level data generation of these and other
data types.
Links:
GDC DNA-Seq Pipelines (
DNA-Seq,
RNA-Seq,
miRNA-Seq) |
GitHub Repository (GDC Workflows) |
Reference Files Used by GDC Pipelines
-
GDC Publication Pages – GDC Publication Pages provide access to information and supplementary files from
publications associated with NCI-supported programs. Search facilities are provided to filter publications by
program, project, publication year, and keywords.
Links:
View GDC Publication Pages
-
GDC Web Site – The GDC Web Site provides information on the GDC, data hosted in the GDC, and processes
and tools supporting data access, submission, and analysis. The GDC Web Site also provides access to GDC support
information including webinar videos and news on GDC releases.
Links:
GDC Web Site
-
GDC Documentation Site – The GDC Documentation Site provides access to User’s Guides for GDC applications
and services and includes detailed information on GDC Bioinformatics pipelines. The site also hosts the GDC Data
Dictionary.
Links:
GDC Documentation Site