Proteomic Data Common (PDC) represents the NCI’s largest public repository of proteogenomic comprehensive tumor datasets, essentially a Proteogenomic Cancer Atlas. It was developed to advance our understanding of how proteins help shape the risk, diagnosis, development, progression, and treatment of cancer. The objectives are (1) to make cancer-related proteomic data sets easily accessible to the public, and (2) facilitate multi-omic integration in support of precision medicine through interoperability with other resources. The PDC uses a cloud-based approach by storing all data sets with Amazon Web Services (AWS) which allows for analysis flexibility within the NCI Cloud Resources. The data in the PDC are structured and queryable using the PDC data model and data dictionary. Submitted data are processed and then harmonized to maintain data and metadata consistency, integrity, and availability to the PDC users.
A core principle of the PDC is the sharing and re-use of data across the biomedical research community which is vital to accelerating scientific discovery and the clinical translation to patient care. Proteomic data and related data files are organized into datasets by tumor type, study, and sub-proteome. In addition to the raw mass spectrometry-based data files, computational processing is performed to map spectra to peptide sequences and identify proteins. The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) and International Cancer Proteogenome Consortium (ICPC) proteomic data can be found on the PDC.
Email at: PDCHelpDesk@mail.nih.gov for general questions, assistance, etc.
In addition to providing data, the PDC also offers analysis tools:
All data are freely available to the public, subject to the Data Use Agreement. Through the PDC, researchers will have access to highly curated and standardized biospecimen, clinical, and proteomic data, as well as an intuitive interface to filter, query, search, visualize, and download all data and metadata. The Genomic Data Common (GDC) and the Cancer Imaging Archive (TCIA) data for the same sample group (when available) are accessible through PDC. In addition to the PDC’s graphical user interface, there is also an Application Programming Interface (API) that can be used to query the data programmatically.