Web Page
Bioinformatics
Single cell RNA sequencing (scRNA-Seq) is becoming increasingly more common in biomedical research, but what is scRNA-Seq? How does it differ from other transcriptomic approaches (e.g., bulk RNA-Seq), and what are the potential applications, Read More...
Web Page
Bioinformatics
Introduction to single cell RNA-Seq Single cell RNA sequencing (scRNA-Seq) is becoming increasingly more common in biomedical research, but what is scRNA-Seq? How does it differ from other transcriptomic approaches (e.g., bulk RNA-Seq), and Read More...
Web Page
Bioinformatics
04/24/2024 - This seminar provides an introduction to R in the context of single cell RNA-Seq analysis with Seurat. In this seminar, attendees will learn about options for analyzing scRNA-Seq data, resources for learning R, how Read More...
Web Page
Bioinformatics
The CCBR Single-cell RNA-seq Workflow on NIDAP Josh Meyer, bioinformatics analyst (CCBR), will cover a scRNA-seq workflow available to NCI researchers on NIDAP . NIDAP, the NIH Integrated Data Analysis Platform, is a cloud-based and collaborative Read More...
Web Page
Bioinformatics
Josh Meyer, bioinformatics analyst (CCBR), will cover a scRNA-seq workflow available to NCI researchers on NIDAP . NIDAP, the NIH Integrated Data Analysis Platform, is a cloud-based and collaborative data aggregation and analysis platform that hosts Read More...
Web Page
Bioinformatics
Welcome to Getting Started with scRNA-Seq This is a mini seminar series designed to help attendees learn more about single cell RNA-Seq, from applicable technologies to data analysis. Seminar Schedule April 3, 2024 - The CCR Single Read More...
Web Page
Bioinformatics
Learn about options for analyzing your scRNA-Seq data. Learn about resources for learning R programming. Learn how to import your data for working with R. Learn about Seurat and the Seurat object including how to Read More...
Web Page
Bioinformatics
The role of cell clustering is to identify cells with similar transcriptomic profiles by computing Euclidean distances across genes . However, scRNA-Seq experiments include highly dimensional data, with each cell associated with the expression of thousands Read More...
Web Page
Bioinformatics
Following normalization, the next step is to find variable features. In most scRNA-seq experiments only a small proportion of the genes will be informative and biologically variable. A subset of cells with high cell to Read More...
Web Page
Bioinformatics
Use the read functions to import data (e.g., read.csv , read.delim , etc.). Use write functions to export data (e.g., write.table ). There are specific functions for unique data. For example, we will Read More...
Web Page
Bioinformatics
BTEP scRNA-Seq FAQs Training modules available on Github Orchestrating Single Cell Analysis with Bioconductor Single Cell Best Practices 2023 BTEP Single Cell Annotation Seminar Series Event recordings are located in the BTEP Video Archive . Analysis Guides Read More...
Web Page
Bioinformatics
Now that we have loaded the package, we can import our data. Generally, in R programming, functions that involve data import begin with "read / Read". Seurat includes a number of read functions for Read More...
Web Page
Bioinformatics
This lesson provides an introduction to R in the context of single cell RNA-Seq analysis with Seurat. Learning Objectives Learn about options for analyzing your scRNA-Seq data. Learn about resources for learning R programming. Learn Read More...
Web Page
Bioinformatics
April 3, 2024 - The CCR Single Cell Analysis Facility (SCAF): An Overview (Mike Kelly, SCAF) ( Recording ) April 10, 2024 - Introduction to single cell RNA-Seq (Charlie Seibert, Saeed Yadranji Aghdam, SCAF) ( Recording ) April 17, 2024 - SCAF: Overview of Cell Read More...
Web Page
Bioinformatics
This tutorial was designed to demonstrate common secondary analysis steps in a scRNA-Seq workflow. We will start with a merged Seurat Object with multiple data layers representing multiple samples. Throughout this tutorial we will Apply Read More...
Web Page
Bioinformatics
12/10/2020 - Register Single cell sequencing has reopened the definition of a cell’s identity and the ways in which that identity is regulated by the cell’s molecular circuitry. Learn the types of studies that Read More...
Web Page
Bioinformatics
This is a mini seminar series designed to help attendees learn more about single cell RNA-Seq, from applicable technologies to data analysis.
Web Page
Bioinformatics
This tutorial has been designed to demonstrate common secondary analysis steps in a scRNA-Seq workflow. We will start with a merged Seurat Object with multiple data layers representing multiple samples that have already been filtered Read More...
Web Page
Bioinformatics
In this tutorial, we will continue to use data from Nanduri et al. 2022, Epigenetic regulation of white adipose tissue plasticity and energy metabolism by nucleosome binding HMGN proteins , published in Nature Communications . As a reminder, Read More...
Web Page
Bioinformatics
In this tutorial, we will continue to use data from Nanduri et al. 2022, Epigenetic regulation of white adipose tissue plasticity and energy metabolism by nucleosome binding HMGN proteins , published in Nature Communications . As a reminder, Read More...
Web Page
Bioinformatics
Additional Resources BTEP scRNA-Seq FAQs Training modules available on Github Orchestrating Single Cell Analysis with Bioconductor Single Cell Best Practices 2023 BTEP Single Cell Annotation Seminar Series Event recordings are located in the BTEP Video Archive . Read More...
Web Page
Bioinformatics
Here, we will start with the data stored in a Seurat object. For instructions on data import and creating the object, see an Introduction to scRNA-Seq with R (Seurat) . adp
Web Page
Bioinformatics
Following Cell Ranger and/or other pre-processing tools, you will have a gene-by-cell counts table for each sample. The three most popular frameworks for analyzing these count matrices include: R ( Seurat ). Seurat, brought to you Read More...
Web Page
Bioinformatics
The following sources inspired this content: https://www.sc-best-practices.org https://hbctraining.github.io/scRNA-seq_online/ https://bioconductor.org/books/3.15/OSCA.basic/
Web Page
Bioinformatics
The following resources were instrumental in designing this lesson: https://www.sc-best-practices.org/introduction/analysis_tools.html#single-cell-analysis-frameworks-and-consortia https://satijalab.org/seurat/articles/essential_commands https://github.com/hbctraining/scRNA-seq_online/blob/master/lessons/03_SC_ Read More...
Web Page
Bioinformatics
The following sources inspired this content: https://www.sc-best-practices.org https://hbctraining.github.io/scRNA-seq_online/ https://bioconductor.org/books/3.15/OSCA.basic/ This is only a small subset of tools available to single cell RNASeq. Read More...
Web Page
Bioinformatics
This tutorial assumes that all pre-processing steps (read demultiplexing, FASTQ QC, reference based alignment, error correction) have been completed. At this stage of the analysis, a gene-by-cell count matrix has been generated for each sample.
Web Page
Bioinformatics
The Seurat Object is a data container for single cell RNA-Seq and related data. It is an S4 object, which is a type of data structure that stores complex information (e.g., scRNA-Seq count matrix, Read More...
Web Page
Bioinformatics
Here, we will start with the data stored in a Seurat object. For instructions on data import and creating the object, see an Introduction to scRNA-Seq with R (Seurat) and Getting Started with Seurat: QC Read More...
Web Page
Bioinformatics
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse Read More...
Web Page
Bioinformatics
Data frames hold tabular data comprised of rows and columns; they can be created using data.frame() . To understand more about the structure of an object and data frame, consider the following functions: str() displays Read More...
Web Page
Bioinformatics
Learning Objectives This tutorial was designed to demonstrate common secondary analysis steps in a scRNA-Seq workflow. We will start with a merged Seurat Object with multiple data layers representing multiple samples. Throughout this tutorial we Read More...
Web Page
Bioinformatics
1. Introduction and Learning Objectives This tutorial has been designed to demonstrate common secondary analysis steps in a scRNA-Seq workflow. We will start with a merged Seurat Object with multiple data layers representing multiple samples that Read More...
Web Page
Bioinformatics
help() and ? "provide access to the documentation pages for R functions, data sets, and other objects". help.search() "allows for searching the help system for documentation matching a given character string in Read More...
Web Page
Bioinformatics
R is both a computational language and environment for statistical computing and graphics. It is open-source and widely used by scientists, not just bioinformaticians. Base packages of R are built into your initial installation, but Read More...
Web Page
Bioinformatics
Please email us at ncibtep@nih.gov with questions, comments, or concerns.
Web Page
Bioinformatics
All seminars will be recorded and made available on the BTEP Video Archive 24 to 48 hours following the event.
Web Page
Bioinformatics
RStudio is an integrated development environment for R, and now python. RStudio includes a console, editor, and tools for plotting, history, debugging, and work space management. It provides a graphic user interface for working with Read More...
Web Page
Bioinformatics
You can annotate your code by starting annotations with # . Comments to the right of # will be ignored by R. Use # ---- to create navigable code sections. For report generation, use R Markdown or Quarto .
Web Page
Bioinformatics
R functions perform specific tasks. R has a ton of built-in functions and functions available through additional packages. You can also create your own functions. The general syntax for a function is the name followed Read More...
Web Page
Bioinformatics
A vector is a collection of values that are all of the same type (numbers, characters, etc.) --- datacarpentry.org c() - used to combine elements of a vector When you combine elements of different Read More...
Web Page
Bioinformatics
Unlike vectors, lists can hold values of different types. list(1, "apple", 3)
Web Page
Bioinformatics
There are 3 primary plotting systems with R: base R, ggplot2 , and lattice . Data visualization functions from Seurat primarily use ggplot2 and can easily be customized by adding additional ggplot2 layers. Check out the R Graph Read More...
Web Page
Bioinformatics
Seurat can be installed directly from CRAN. install.packages("Seurat") If you would like to install the development version or previous versions, see the installation instructions available here . Load the package from your Read More...
Web Page
Bioinformatics
Who can you contact with questions? NCI CCR Single Cell Analysis Facility (SCAF) provides single cell support to NCI-CCR Researchers. NCI CCR Bioinformatics Training and Education Program (BTEP) provides bioinformatics training support. Contact BTEP via Read More...
Web Page
Bioinformatics
ALWAYS, ALWAYS, ALWAYS read the documentation. Use the help pane in the lower right of RStudio or the functions, help() and help.search() or ? and ?? . Check out package vignettes ( vignette() ). Check the Github site if Read More...
Web Page
Bioinformatics
Rather than relying on the above steps ( NormalizeData() , FindVariableFeatures() , and ScaleData() ), we are going to proceed with a newer method ( SCtransform ) instead. This method uses Pearson residuals for transformation, which better accounts for the overall Read More...
Web Page
Bioinformatics
In this tutorial, we are using data from Nanduri et al. 2022, Epigenetic regulation of white adipose tissue plasticity and energy metabolism by nucleosome binding HMGN proteins , published in Nature Communications . The raw count matrices are Read More...
Web Page
Bioinformatics
Single cell RNASeq is a remarkably powerful tool for analyzing populations of cells that can be recovered from various experiments. Clustering and cell type annotation can be used to distinguish different populations with a level Read More...
Web Page
Bioinformatics
Look at the distribution of nCount_RNA with a Violin plot: # set colors cnames% ggplot(aes(color=orig.ident, x=nCount_RNA, fill= orig.ident)) + geom_density(alpha = 0.2) + theme_classic() + scale_x_log10() + geom_vline( Read More...
Web Page
Bioinformatics
The total number of detected transcripts expressed in a cell is dependent on the amount of mRNA in a cell. Cells naturally vary in the total amount of mRNA expressed. However, the chemistry of the Read More...
Web Page
Bioinformatics
It is next standard to scale and center the features in the data set prior to dimension reduction or visualization via heatmap. Scaling the data will keep highly expressed genes from dominating our analysis. This Read More...
Web Page
Bioinformatics
Transformed data will be available in the SCT assay, which is set as the default after running sctransform. During normalization, we can also remove confounding sources of variation, for example, mitochondrial mapping percentage The glmGamPoi Read More...
Web Page
Bioinformatics
Clustering is used to group cells by similar transcriptomic profiles. Seurat uses a graph based clustering method. You can read more about it here . The first step is to compute the nearest neighbors of each Read More...
Web Page
Bioinformatics
The answer to this question will largely depend on the user. While some will be able to learn R on the go, others will need to know some amount of R programming prior to beginning Read More...
Web Page
Bioinformatics
setwd() Set working directory (equivalent to cd ) getwd() Get working directory (equivalent to pwd )
Web Page
Bioinformatics
Anything that you want assigned to memory must be assigned to an R object.
Web Page
Bioinformatics
See the attached resources on for loops apply functions purr::map conditionals .
Web Page
Bioinformatics
sessionInfo() Print version information about R, the OS and attached or loaded packages. This is useful for reporting methods for publication. Consider using the package renv to track and share exact versions of packages used Read More...
Web Page
Bioinformatics
Base R cheat sheet Other cheat sheets can be found here . There is also a nice review here . There are a ton of free tutorials. There are at least 230 Git repositories that focus on R Read More...
Web Page
Bioinformatics
The general experimental design was as follows: WT vs DKO mice at two time points, 0 and 6 days. At day 0, cells were in a preadipocyte state, while at day 6 they had differentiated into adipocytes. Each time Read More...
Web Page
Bioinformatics
We can add information to our metadata by accessing and assigning to metadata columns or using ?AddMetaData() . Add condition to metadata (either wildtype of double knockout). adp$condition
Web Page
Bioinformatics
At this point, our Seurat object is fairly large (~2.1 GB). It is wise to save this for downstream applications using saveRDS() . saveRDS(adp,"../outputs/merged_Seurat_adp.rds")
Web Page
Bioinformatics
library(tidyverse) # dplyr and ggplot2 library(Seurat) # Seurat toolkit library(hdf5r) # for data import library(patchwork) # for plotting library(presto) # for differential expression library(glmGamPoi) # for sctransform Warning: package 'glmGamPoi' was built under R Read More...
Web Page
Bioinformatics
metadata
Web Page
Bioinformatics
The quality of the cells should be assessed considering the above metrics jointly and not simply in isolation. QC can be applied manually by subjectively choosing and applying thresholds, which can be quite arbitrary. Alternatively, Read More...
Web Page
Bioinformatics
Now, let's take a look at the number of detected features. VlnPlot(adp, features = "nFeature_RNA", group.by="orig.ident") + scale_fill_manual(values=cnames) Warning: Default search for " Read More...
Web Page
Bioinformatics
Due to limits on computational resources, you may be interested in running your analysis on an HPC. Biowulf is the NIH high performance compute cluster. It has greater than 90k processors, and can easily perform Read More...
Web Page
Bioinformatics
To work interactively with RStudio on Biowulf, you will need to request an interactive session using sinteractive . When using R, you will need to include a few more options while obtaining your interactive session. For Read More...
Web Page
Bioinformatics
Integration is the process of aligning the same cell types across samples, treatments, data sets, batches, etc. Clustering should represent biological differences and not technical artifacts. Integration is not always necessary. You should run through Read More...
Web Page
Bioinformatics
The metadata in the Seurat object is located in adp@metadata and contains the information associated with each cell. We can access the metadata using: head(adp@meta.data) #using head to return only the Read More...
Web Page
Bioinformatics
library(tidyverse) # dplyr and ggplot2; CRAN library(Seurat) # Seurat toolkit; CRAN library(hdf5r) # for data import; CRAN library(patchwork) # for plotting; CRAN library(presto) # for differential expression; Github library(glmGamPoi) # for sctransform; Bioconductor library( Read More...
Web Page
Bioinformatics
As we can see above, the glimpse command shows the metadata that can be used to classify the cells. Within Seurat, the metadata is used to define the "identity" of the dataset. This Read More...
Web Page
Bioinformatics
The FindAllMarkers function is particularly useful in identifying the differentially expressed genes that distinguish several groups, such as seen here in the clusters. What makes this unique is that none of the identities are initially Read More...
Web Page
Bioinformatics
Many experiments look to compare to distinct populations, and scRNASeq is no exception. The two populations being compared can vary wildly from experiment to experiment; some look to draw comparisons based on the experimental condition, Read More...
Web Page
Bioinformatics
The goal of quality control is to keep only high quality cells (i.e., remove low quality cells (dead or dying cells), cell-free RNA, or doublets). Low quality cells will impact downstream analyses. Take care Read More...
Web Page
Bioinformatics
There are several metrics that can be used to assess overall quality. The base workflow from Seurat suggests the following: nCount_RNA - the absolute number of RNA molecules (UMIs) per cell (i.e., count Read More...
Web Page
Bioinformatics
VlnPlot(adp, features = "percent.mt", group.by="orig.ident") + scale_fill_manual(values=cnames) + geom_hline(yintercept=10,color="red") Warning: Default search for "data" layer in " Read More...
Web Page
Bioinformatics
To look at how these metrics correlate, we can use FeatureScatter() , which can be used to visualize feature-feature relationships and also be applied to other data stored in our Seurat object (e.g., metadata columns, Read More...
Web Page
Bioinformatics
Now, we can either use our filtering parameters directly with subset() or provide a cells argument. # use different parameters; established above adp_filt 350 & nCount_RNA >650 & percent.mt
Web Page
Bioinformatics
There are many tools and strategies available for cell annotation. I've included a non-comprehensive list for your convenience: Azimuth SingleR scType Garnet Digital Cell Sorter . Cell Marker 2.0 Here is a list of resources that Read More...
Web Page
Bioinformatics
Seurat v5 introduced the following new features: Integrative multi-modal analysis with bridge integration ‘Sketch’-based analysis of large data sets methods for spatial transcriptomics assay layers You can read about major changes between Seurat v5 Read More...
Web Page
Bioinformatics
The base data type (e.g., numeric, character, logical, etc.) and the class (dataframe, matrix, etc.) will be important for what you can do with an object. Learn more about an object with the following: Read More...
Web Page
Bioinformatics
R Introductory Series Data Wrangling with R Data Visualization with R Toward Reproducibility with R on Biowulf A Beginner's Guide to Troubleshooting R Code
Web Page
Bioinformatics
If you plan to use RStudio to interactively analyze your data, RStudio Server, a browser-based interface very similar to the standard RStudio desktop environment, is the best option to avoid issues with lag. Each RStudio Read More...
Web Page
Bioinformatics
If you need help with Biowulf, the NIH HPC systems are well-documented at hpc.nih.gov . The User guides , Training documentation , and How To docs are also fantastic resources for getting help with most HPC Read More...
Web Page
Bioinformatics
There are several built in functions for visualizing data with Seurat. We can use violin plots and scatter plots to check out the individual distributions and correlations between metrics.
Web Page
Bioinformatics
To take full advantage of R, you need to install R packages. R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by Read More...
Web Page
Bioinformatics
Regarding computational requirements, rule of thumb is to keep it at 8 samples or less when running on a laptop. You may be able to stretch it to 10 samples, but any more than that strains the Read More...
Web Page
Bioinformatics
To load a single file, we use W10
Web Page
Bioinformatics
Once we have read in the matrices, the next step is to create a Seurat object. The Seurat object will be used to store the raw count matrices, sample information, and processed data (normalized counts, Read More...
Web Page
Bioinformatics
The Seurat v5 object doesn’t require all assays have the same cells. In this case, Cells() can be used to return the cell names of the default assay while colnames() can be used to Read More...
Web Page
Bioinformatics
Differential expression analysis is the process of identifying genes that have a significant difference in expression between two or more groups. For many sequencing experiments, regardless of methodology, differential analysis lays the foundation of the Read More...
Web Page
Bioinformatics
In Seurat (since version 4), differential analysis requires a preprocessing step to appropriately scale the normalized SCTransform assay across samples: adp = PrepSCTFindMarkers(adp) Found 8 SCT models. Recorrecting SCT counts using minimum median counts: 8146 As covered earlier, Read More...
Web Page
Bioinformatics
Some tools have been described in the previous session (see here ). Today, we will be focusing on the SingleR tool, which also requires the celldex package . In short, SingleR operates by comparing your current dataset Read More...
Web Page
Bioinformatics
To calculate percent.mt , we use PercentageFeatureSet() , which calculates the percentage of all the counts belonging to a subset of the possible features (e.g., mitochondrial genes, ribosomal genes) for each cell. adp[["percent. Read More...
Web Page
Bioinformatics
There are a number of ways to explore the PCA results. Two of the more useful visualizations include the DimHeatmap() and ElbowPlot() . DimHeatmap() allows us to visualize the top genes contributing to each PC. Both Read More...
Bethesda, MD
Repositories
Trans NIH Facility
The NCI Genomic Data Commons (GDC) was established by the NCI Center for Cancer Genomics (CCG) to support the receipt, harmonization, distribution, and analysis of genomic and clinical data from cancer research programs. The GDC Read More...
Bethesda, MD
Collaborative
Our operational objectives are to provide state-of-the-art OMICS technologies in support of the Genetics Branch (GB) investigators and collaborators. Research Services Wet Lab Single cell isolation from fresh, frozen, and FFPE tissue, DNA/RNA extractions Read More...
Bethesda, MD
Collaborative
The goals of the Bioinformatics Training and Education Program within NCI/CCR are: To make researchers aware of the bioinformatics resources available to them, To provide training and guidance on these resources regularly and at Read More...
Bethesda, MD
Trans NIH Facility
The NIH Center for Human Immunology, Inflammation, and Autoimmunity (CHI) is a trans-NIH resource whose mission is to provide a collaborative hub of advanced translational immunology for NIH clinical and pre-clinical studies. This uniquely structured Read More...
Frederick, MD
Core Facility
The introduction of DNA sequencing instruments capable of producing millions of DNA sequence reads in a single run has profoundly altered the landscape of genetics and cancer biology. Complex questions can now be answered at Read More...
Bethesda, MD
Collaborative
The CCR Collaborative Bioinformatics Resource (CCBR) is a centrally funded resource group which provides a mechanism for CCR researchers to obtain many different types of bioinformatics assistance to further their research goals. The group has Read More...
Rockville, MD
Trans NIH Facility
NISC’s role within NHGRI, and more broadly across NIH, aims to advance genome sequencing and its many applications, with a goal not simply to produce sequence data, but to produce the infrastructure required to Read More...
Web Page
CREx News & Updates January 2022 Learn about the NIH Collaborative Research Exchange (CREx), Core Facilities, Webinars, & More Site Spotlight NCI ANTIBODY CHARACTERIZATION LABORATORY (ACL) The ACL specializes in rigorously validating antibodies for signaling and Read More...
Bethesda, MD
Collaborative
The Pan-Microbial Serology Facility (PMSF) is part of the Center for Cancer Research (CCR) at the National Cancer Institute (NCI). The PMSF focuses on determining individualized pan-microbial immune profiles associated with human diseases including immunological Read More...
Bethesda, MD
Core Facility
The CCR Genomics Core is located in Building 41 on the NIH Bethesda campus. The primary goal of the Core is to provide investigators from CCR/NCI and other NIH Institutes access to genomic technologies and Read More...