Web Page
Bioinformatics
%in% "returns a logical vector indicating if there is a match or not for its left operand". This logical vector can then be used to filter the datamframe to only matched values. For Read More...
Web Page
Bioinformatics
To align raw sequencing reads to a reference transcriptome, we will need a reference transcriptome (ie. the sequences of known transcripts in FASTA format). Fortunately, a reference transcriptome is included with the Golden Snidget dataset, Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . We are going to use the filtlowabund_scaledcounts_airways.txt file Read More...
Web Page
Bioinformatics
Lesson 2 Exercise Questions: Part 1 (BaseR subsetting and Factors) The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . We are going Read More...
Web Page
Bioinformatics
The following represents the basic ggplot2 template. ggplot(data = ) + (mapping = aes()) The only required components to begin plotting are the data we want to plot, geom function(s), and mapping aesthetics. Notice the + symbol following Read More...
Web Page
Bioinformatics
In lesson 3, we learned how to read and save excel spreadsheet data to a R object using the tidyverse package readxl . Today we will use some example data from an excel spreadsheet to learn the Read More...
Web Page
Bioinformatics
How do we ultimately get our figures to a publishable state? The bread and butter of pretty plots really falls to the additional non-data layers of our ggplot2 code. These layers will include code to Read More...
Web Page
Bioinformatics
Aligners typically align against the entire genome and provide a output where the results can be visibly inspected (bam file via IGV). The must be used for detecting novel genes/transcripts. Quantitation of aligned reads Read More...
Web Page
Bioinformatics
Now that we have downloaded the Golden Snidget reference files let's take a moment to get to know the references. First, change into the refs folder. How do we do this from the ~/biostar_ Read More...
Web Page
Bioinformatics
The following represents the basic ggplot2 template: ggplot(data = ) + (mapping = aes()) We need three basic components to create a plot: the data we want to plot , geom function(s) , and mapping aesthetics . Notice the + symbol Read More...
Web Page
Bioinformatics
Let's create a column in our original differential expression data frame denoting significant transcripts (those with an FDR corrected p-value less than 0.05 and a log fold change greater than or equal to 2). dexp_sigtrnsc Read More...
Web Page
Bioinformatics
How do we ultimately get our figures to a publishable state? The bread and butter of pretty plots really falls to the additional non-data layers of our ggplot2 code. These layers will include code to Read More...
Web Page
Bioinformatics
Let's create a column in our original differential expression data frame denoting significant transcripts (those with an FDR corrected p-value less than 0.05 and a log fold change greater than or equal to 2). ::: {.cell} dexp_ Read More...
Web Page
Bioinformatics
Using what you have learned about select() and filter() , use the pipe ( |> ) to create a subset data frame from scaled_counts that only includes the columns 'sample', 'cell', 'dex', 'transcript', and 'counts_scaled' and Read More...
Web Page
Bioinformatics
Using what you have learned about select() and filter() , use the pipe ( |> ) to create a subset data frame from scaled_counts that only includes the columns 'sample', 'cell', 'dex', 'transcript', and 'counts_scaled' and Read More...
Web Page
Bioinformatics
The geom functions require a mapping argument. The mapping argument includes the aes() function, which "describes how variables in the data are mapped to visual properties (aesthetics) of geoms" (ggplot2 R Documentation). If Read More...
Web Page
Bioinformatics
As we have discussed, R objects are used to store things created in R to memory. This includes plots. scatter_plot
Web Page
Bioinformatics
First, notice that you can easily access columns from the sample metadata ( colData() ) using $ . Using brackets to subset: se$SampleName ## [1] GSM1275862 GSM1275863 GSM1275866 GSM1275867 GSM1275870 GSM1275871 GSM1275874 ## [8] GSM1275875 ## 8 Levels: GSM1275862 GSM1275863 GSM1275866 GSM1275867 ... GSM1275875 se$ Read More...
Web Page
Bioinformatics
The SummarizedExperiment class and the inherited class RangedSummarizedExperiment are available in the R package SummarizedExperiment . SummarizedExperiment is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc.) and columns represent Read More...
Web Page
Bioinformatics
Excel files are the primary means by which many people save spreadsheet data. .xls or .xlsx files store workbooks composed of one or more spreadsheets. Importing excel files requires the R package readxl . While this Read More...
Web Page
Bioinformatics
Why do we need a reference genome? {{Sdet}} Solution{{Esum}} The reference genome serves as a "known" that guides us in constructing the genome of the unknown from sequencing data. {{Edet}} What file Read More...
Web Page
Bioinformatics
Technical Replicates It’s generally accepted that they are not necessary because of the low technical variation in RNASeq experiments Biological Replicates (Always useful) Not strictly needed for the identification of novel transcripts and transcriptome Read More...
Web Page
Bioinformatics
The first step in using IGV is to load our reference genome. Take some time to see if you recall how to do this. {{Sdet}} Solution{{Esum}} {{Edet}} After loading the genome, let's view Read More...
Web Page
Bioinformatics
One of the things we will be doing quite often is to visualize genomics data using some sort of genome browser. In this course series, we will use a popular one called Integrative Genome Viewer( Read More...
Web Page
Bioinformatics
One of the things we will be doing quite often is to visualize genomics data using some sort of genome browser. In this course series, we will use a popular one called Integrative Genome Viewer( Read More...
Web Page
Bioinformatics
While we can always download reference genomes and reference transcriptomes from repositories such as NCBI or Ensembl, we will use gffread to create one from the chromosome 22 genome (22.fa) that we have used when analyzing Read More...
Web Page
Bioinformatics
Lesson 9 Practice Objectives In this practice session, we will apply our knowledge to learn about the reference genome and annotation file for the Golden Snidget dataset visualize the Golden Snidget genome using the Integrative Genome Read More...
Rockville, MD
Collaborative
We are a bioinformatics team within the Center for Biomedical Informatics and Information Technology’s (CBIIT’s) Cancer Informatics Branch (CIB)—soon to be referred to as the Informatics and Data Science (IDS) Program. Headed Read More...
Web Page
NanoString Technology The nCounter® Analysis System is an automated, multi-application, digital detection and counting system which directly profiles up to 800 molecules simultaneously from a single sample using a novel barcoding technology. Profile hundreds of mRNAs, Read More...
Web Page
Bioinformatics
The total number of detected transcripts expressed in a cell is dependent on the amount of mRNA in a cell. Cells naturally vary in the total amount of mRNA expressed. However, the chemistry of the Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of class here . The diffexp_ Read More...
Web Page
Bioinformatics
Lesson 2 Exercise Questions: Part 2 (Tidyverse) The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of class here . The diffexp_ Read More...
Web Page
Bioinformatics
Lesson 4 Exercise Questions: Tidyverse The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of Read More...
Web Page
Bioinformatics
The geom functions require a mapping argument. The mapping argument includes the aes() function, which "describes how variables in the data are mapped to visual properties (aesthetics) of geoms" (ggplot2 R Documentation). If Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of class here . The diffexp_ Read More...
Web Page
Bioinformatics
Lesson 4 Exercise Questions: Tidyverse The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of Read More...
Web Page
Bioinformatics
#Setting a theme my_theme
Web Page
Bioinformatics
Now, if we want the top five transcripts with the greatest median scaled counts by treatment, we need to organize our data frame and then return the top rows. We can use arrange() to arrange Read More...
Web Page
Bioinformatics
Now, if we want the top five transcripts with the greatest median scaled counts by treatment, we need to organize our data frame and then return the top rows. We can use arrange() to arrange Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . We are going to use the filtlowabund_scaledcounts_airways.txt file Read More...
Web Page
Bioinformatics
Lesson 3 Exercise Questions: BaseR dataframe manipulation and factors The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . We are going Read More...
Web Page
Bioinformatics
The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of class here . The diffexp_ Read More...
Web Page
Bioinformatics
Lesson 5 Exercise Questions: Tidyverse The filtlowabund_scaledcounts_airways.txt includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here . You can obtain the data outside of Read More...
Web Page
Bioinformatics
Many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot: bar charts, histograms, and frequency polygons bin your data and then plot bin counts, Read More...
Web Page
Bioinformatics
A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. --- R4DS There are multiple geom functions that Read More...
Web Page
Bioinformatics
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. --- dplyr.tidyverse.org Let's create a column in Read More...
Web Page
Bioinformatics
ggplot2 will automatically assign colors to the categories in our data. Colors are assigned to the fill and color aesthetics in aes() . We can change the default colors by providing an additional layer to our Read More...
Web Page
Bioinformatics
Introduction to ggplot2 Objectives Learn the ggplot2 syntax. Build a ggplot2 general template. By the end of the course, students should be able to create simple, pretty, and effective figures. Data Visualization in the tidyverse Read More...
Web Page
Bioinformatics
10/25/2023 - Every week, thousands of biomedical research papers are published with a portion of them containing supporting tables with data about genes, transcripts, variants, and proteins. For example, supporting tables may contain differentially expressed genes Read More...
Web Page
Bioinformatics
In Lesson 9, we got a short introduction on what IGV can do. It allows us to visualize genomic data such as reference genomes and how features such as genes and transcripts align to them. A Read More...
Web Page
Bioinformatics
Due to time constraints, we will not be going over the creation of bigWig files in class. The information below is for your reference and you can view these during your spare time. In Lesson 9, Read More...
Web Page
Bioinformatics
Variant annotation means predicting the effects of genetic variants (SNPs, insertions, deletions, copy number variations (CNV) or structural variations (SV)) on the function of genes, transcripts, and protein sequence, as well as regulatory regions. The Read More...
Web Page
Bioinformatics
Lesson 16 Practice Objectives In this lesson, we learned about the classification based approach for RNA sequencing analysis. In this approach, we are aligning our raw sequencing reads to a reference transcriptome rather than a genome. Read More...
Web Page
Bioinformatics
The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . The UHR is total RNA isolated from a diverse set of 10 cancer cell lines. The HBR Read More...
Web Page
Bioinformatics
The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . The UHR is total RNA isolated from a diverse set of 10 cancer cell lines. The HBR Read More...
Web Page
Bioinformatics
Lesson 12: RNA sequencing review 1 Learning objectives Here, we will do a quick review of what we have learned about RNA sequencing in Lessons 8 through 11. Accessing the Biostar handbook The URL for the Biostar handbook is Read More...
Web Page
Bioinformatics
Two commercially available RNA samples. Universal Human Reference (UHR) is total RNA isolated from a diverse set of 10 cancer cell lines. Human Brain Reference (HBR) is total RNA isolated from the brains of 23 Caucasians, male Read More...
Web Page
Bioinformatics
Two commercially available RNA samples. Universal Human Reference (UHR) is total RNA isolated from a diverse set of 10 cancer cell lines. Human Brain Reference (HBR) is total RNA isolated from the brains of 23 Caucasians, male Read More...
Web Page
Bioinformatics
Two commercially available RNA samples. Universal Human Reference (UHR) is total RNA isolated from a diverse set of 10 cancer cell lines. Human Brain Reference (HBR) is total RNA isolated from the brains of 23 Caucasians, male Read More...
Web Page
Bioinformatics
This page uses content directly from the Biostar Handbook by Istvan Albert. Obtain RNA-seq test data. The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . Read More...
Web Page
Bioinformatics
This page uses content directly from the Biostar Handbook by Istvan Albert. Obtain RNA-seq test data. The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . Read More...
Web Page
Bioinformatics
The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . The UHR is total RNA isolated from a diverse set of 10 cancer cell lines. The HBR Read More...
Web Page
Bioinformatics
Retrieve R "helper" scripts developed for Biostars environment. curl -O http://data.biostarhandbook.com/rnaseq/code/deseq1.r curl -O http://data.biostarhandbook.com/rnaseq/code/deseq2.r curl -O http://data.biostarhandbook. Read More...
Web Page
Bioinformatics
Retrieve R "helper" scripts developed for Biostars environment. curl -O http://data.biostarhandbook.com/rnaseq/code/deseq1.r curl -O http://data.biostarhandbook.com/rnaseq/code/deseq2.r curl -O http://data.biostarhandbook. Read More...
Web Page
Bioinformatics
Let's now take a look at our final differential analysis results table (results_with_gene_names_labeled.txt), using the SLC2A11 gene as an example and below we use the column command to Read More...
Web Page
Bioinformatics
Note that we now have differential expression by transcripts and our first column contains the transcript IDs. But what genes do these transcripts map to? We will need to do some data wrangling to find Read More...
Web Page
Bioinformatics
After alignment of sequencing data to genome, we will need to count how many reads aligned to which gene. Using the tool featureCounts, we were able to do this. This tool takes as input our Read More...
Web Page
Bioinformatics
Retrieve R "helper" scripts developed for Biostars environment. curl -O http://data.biostarhandbook.com/rnaseq/code/deseq1.r curl -O http://data.biostarhandbook.com/rnaseq/code/deseq2.r curl -O http://data.biostarhandbook. Read More...
Web Page
Bioinformatics
Next, we need to generate the counts (ie. number of reads that map to a transcript). But first, change back into the ~/biostar_class/snidget folder and then take a moment to think about how Read More...
Web Page
Bioinformatics
STAR 2-pass mode --sjdbGTFfile is the path to the file with annotated transcripts in standard GTF format, STAR extracts splice junctions from this file, improves accuracy of mapping. Using annotations is highly recommended whenever they Read More...
Web Page
Bioinformatics
Alignment RNASeq Mapping Challenges The majority of mRNA derived from eukaryotes is the result of splicing together discontinuous exons, and this creates specific challenges for the alignment of RNASEQ data. Mapping Challenges Reads not perfect Read More...
Web Page
Bioinformatics
The Golden Snidget reference genome is located at http://data.biostarhandbook.com/books/rnaseq/data/golden.genome.tar.gz. Can you download and extract? {{Sdet}} Solution{{Esum}} Download wget http://data.biostarhandbook.com/books/rnaseq/ Read More...
Web Page
What is Visium FFPE v2 with CytAssist? Visium FFPE v2 is sequencing-based spatial profiling technology developed by 10x Genomics. This assay can take mouse or human tissue sections on normal glass slides as input and Read More...
Web Page
What is Xenium? Xenium is a high-resolution, imaging-based in situ spatial profiling technology from 10x Genomics that allows for simultaneous expression analysis of RNA targets (currently in range of 100’s) within the same tissue section. Read More...
Web Page
CREx News & Updates July 2021 Learn about the NIH Collaborative Research Exchange (CREx), Core Facilities, Webinars, & More NIH Collaborative Research Exchange (CREx) News Site Spotlight FACILITY HIGHLIGHTS Learn more about services from the NHLBI Read More...
Bethesda, MD
Collaborative
The NCI High-Throughput Imaging Facility (HiTIF) works in a collaborative fashion with NCI/NIH Investigators by providing them with the necessary expertise, instrumentation, and software to develop and execute advanced High-Throughput Imaging (HTI) assays. These Read More...
Frederick, MD
Core Facility
The introduction of DNA sequencing instruments capable of producing millions of DNA sequence reads in a single run has profoundly altered the landscape of genetics and cancer biology. Complex questions can now be answered at Read More...
Rockville, MD
Trans NIH Facility
NISC’s role within NHGRI, and more broadly across NIH, aims to advance genome sequencing and its many applications, with a goal not simply to produce sequence data, but to produce the infrastructure required to Read More...
Web Page
Bioinformatics
Long read sequencing was recently named 2022’s method of the year by Nature Methods . Long read sequencing technologies, those that generate sequence reads with lengths of 10s of kilobases or longer have several advantages over Read More...
Web Page
Bioinformatics
Learning Objectives This tutorial was designed to demonstrate common secondary analysis steps in a scRNA-Seq workflow. We will start with a merged Seurat Object with multiple data layers representing multiple samples. Throughout this tutorial we Read More...
Web Page
Bioinformatics
Objectives Review the grammar of graphics template. Learn about the statistical transformations inherent to geoms. Learn more about fine tuning figures with labels, legends, scales, and themes. Learn how to save plots with ggsave() . Review Read More...
Web Page
Bioinformatics
Data visualization with ggplot2 Objectives To learn how to create publishable figures using the ggplot2 package in R. By the end of this lesson, learners should be able to create simple, pretty, and effective figures. Read More...
Web Page
Bioinformatics
This lesson will introduce data wrangling with R. Attendees will learn to filter data using base R and tidyverse (dplyr) functionality. Learning Objectives Understand the concept of tidy data. Become familiar with the tidyverse packages. Read More...
Web Page
Bioinformatics
In this lesson, attendees will learn how to transform, summarize, and reshape data using functions from the tidyverse. Learning Objectives Continue to wrangle data using tidyverse functionality. To this end, you should understand: how to Read More...
Web Page
Bioinformatics
In this lesson, attendees will learn how to transform, summarize, and reshape data using functions from the tidyverse. Learning Objectives Continue to wrangle data using tidyverse functionality. To this end, you should understand: how to Read More...
Web Page
Bioinformatics
Objectives To explore Bioconductor, a repository for R packages related to biological data analysis. To better understand S4 objects as they relate to the Bioconductor core infrastructure. To learn more about a popular Bioconductor S4 Read More...
Web Page
Bioinformatics
All solutions should use the pipe. Import the file "./data/filtlowabund_scaledcounts_airways.txt" and save to an object named sc . Create a subset data frame from sc that only includes the columns Read More...
Web Page
Bioinformatics
Help Session Lesson 5 All solutions should use the pipe. Import the file "./data/filtlowabund_scaledcounts_airways.txt" and save to an object named sc . Create a subset data frame from sc that only Read More...
Web Page
Bioinformatics
As you can see from the image, there are several accessor functions to access the data from the object: assays() - access matrix-like experimental data (e.g., count data). Rows are genomic features (e.g., Read More...
Web Page
Bioinformatics
Data import and reshape Objectives 1. Learn to import multiple data types 2. Data reshape with tidyr : pivot_longer() , pivot_wider() , separate() , and unite() Installing and loading packages So far we have only worked with objects that Read More...
Web Page
Bioinformatics
dplyr : joining, tranforming, and summarizing data frames Objectives Today we will continue to wrangle data using the tidyverse package, dplyr . We will learn: how to join data frames using dplyr how to transform and create Read More...
Web Page
Bioinformatics
09/06/2023 - With the advancements in single cell and spatial profiling technologies and methods, some of us thought it would be helpful to re-establish a community of end users on campus. We invite those that are Read More...
Web Page
Bioinformatics
Before we can align the HBR and UHR raw sequencing data to human chromosome 22 transcriptome, we need to create an index of this transcriptome (like we did with the genome). This will make the alignment Read More...
Web Page
Bioinformatics
This page uses context taken directly from the Biostar Handbook by Istvan Albert. Remember to activate the class bioinformatics environment. conda activate bioinfo Introduction to Genomic Variation Genomic variations are typically categorized into different classes Read More...
Web Page
Bioinformatics
This page contains content taken directly from the Biostar Handbook (Istvan Albert). Always remember to activate the class bioinformatics environment. conda activate bioinfo For this data analysis, we will be using: Two commercially available RNA Read More...
Web Page
Bioinformatics
This page contains content taken directly from the Biostar Handbook (Istvan Albert). Always remember to activate the class bioinformatics environment. conda activate bioinfo For this data analysis, we will be using: Two commercially available RNA Read More...
Web Page
Bioinformatics
Lesson 9: Reference genomes and genome annotations used in RNA sequencing Before getting started, remember to be signed on to the DNAnexus GOLD environment. Lesson 8 Review In Lesson 8, we learned about the basics of RNA sequencing, Read More...
Web Page
Bioinformatics
Lesson 9: Reference genomes and genome annotations used in RNA sequencing Before getting started, remember to be signed on to the DNAnexus GOLD environment. Lesson 8 Review In Lesson 8, we learned about the basics of RNA sequencing, Read More...
Web Page
Bioinformatics
This page contains content directly from The Biostar Handbook . Always remember to start the bioinformatics environment. conda activate bioinfo Pseudoalignment-based methods identify locations in the genome using patterns rather than via alignment type algorithms. It Read More...
Web Page
Bioinformatics
This page contains content directly from The Biostar Handbook . Always remember to start the bioinformatics environment. conda activate bioinfo Pseudoalignment-based methods identify locations in the genome using patterns rather than via alignment type algorithms. It Read More...
Web Page
Bioinformatics
RNA-SEQ Overview What is RNASEQ ? RNA-Seq (RNA sequencing), uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment. (Wikipedia) Strictly speaking this could be any Read More...
Web Page
Bioinformatics
Lesson 16: RNA sequencing review and classification based analysis Before getting started, remember to be signed on to the DNAnexus GOLD environment. Review In the previous classes, we learned about the steps involved in RNA sequencing Read More...
Web Page
Bioinformatics
Now that we have downloaded the HBR and UHR dataset and know where analysis tools are, let's start learning about RNA sequencing, by first learning about our reference genome and annotation files. Let's Read More...
Web Page
Bioinformatics
Now that we have downloaded the HBR and UHR dataset and know where analysis tools are, let's start learning about RNA sequencing, by first learning about our reference genome and annotation files. Let's Read More...
Web Page
Bioinformatics
This page uses content directly from the Biostar Handbook by Istvan Albert. Obtain RNA-seq test data. The test data consists of two commercially available RNA samples: Universal Human Reference (UHR) and Human Brain Reference (HBR) . Read More...
Web Page
Bioinformatics
“Gene set enrichment analysis” refers to the process of discovering the common characteristics potentially present in a list of genes. When these characteristics are GO terms, the process is called “functional enrichment.” Warning Overall GO Read More...
Web Page
Bioinformatics
How to download data from the Sequence Read Archive (NCBI/SRA) to your account on NIH HPC Biowulf You will need: active, unlocked Biowulf account (hpc.nih.gov) active Globus account for transferring files OR Read More...