---
title:
"VaSP: Quantification and Visulization of
Variations of Splicing in Population"
author: "*Huihui Yu, Qian Du and Chi Zhang*"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{Variations of Splicing in Population}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
options(tinytex.verbose = TRUE)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## 1. Introduction {.tabset .tabset-fade .tabset-pills}
**VaSP** is an R package for the discovery of genome-wide variable splicing
events from short-read RNA-seq data. Based on R package **Ballgown**, VaSP
calculates the Single Splicing Strength (3S) score of an intron by the junction
count normalized by the gene-level average read coverage (score=junction
count/gene-level average read coverage). The 3S scores can be used for further
analysis, such as differential splicing analysis between two sample groups and
sQTL (splicing Quantitative Trait Locus) identification in a large population.
The VaSP package provides a function to find large-effect differential splicing
events without the need of genotypic information in an inbred plant population,
so called genotype-specific splicing (GSS). Integrated with functions from R
package **Sushi**, VaSP package also provides a function to visualize gene
splicing information for publication-quality multi-panel figures.
## 2. Installation
Start R (>= 4.0) and run:
```{r,eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("vasp", build_vignettes=TRUE)
vignette('vasp')
```
If you use an older version of R (>=3.5.0), enter:
```{r,eval=FALSE}
BiocManager::install("yuhuihui2011/vasp@R3", build_vignettes=TRUE)
vignette('vasp')
```
## 3. Data input
Users need to follow the manual of R package Ballgown
() to create a ballgown object as an
input for the VaSP package. See `?ballgown` for detailed information on
creating Ballgown objects. The object can be stored in a `.RDate` file by
`save()` . Here is an example of constructing rice.bg object from
HISAT2+StringTie output
```{r,eval=FALSE}
library(vasp)
?ballgown
path<-system.file('extdata', package='vasp')
rice.bg<-ballgown(samples = list.dirs(path = path,recursive = F) )
```
## 4. Quick start
Calculate 3S (Single Splicing Strength) scores, find GSS
(genotype-specific splicing) events and display the splicing information.
* Calculating 3S scores:
```{r}
library(vasp)
data(rice.bg)
?rice.bg
rice.bg
score<-spliceGene(rice.bg, gene="MSTRG.183", junc.type = "score")
tail(round(score,2),2)
```
* Discovering GSS:
```{r}
gss <- BMfinder(score, cores = 1)
gss
```
* Extracing intron information
```{r}
gss_intron<-structure(rice.bg)$intron
(gss_intron<-gss_intron[gss_intron$id%in%rownames(gss)])
range(gss_intron)
```
* Showing the splicing information
```{r splicePlot, fig.width=8, fig.height=5}
splicePlot(rice.bg,gene='MSTRG.183',samples = sampleNames(rice.bg)[c(1,3,5)],
start = 1179000, end = 1179300)
```
## 5. Functions
Currently, there are 6 functions in VaSP:
***getDepth***: Get read depth from a BAM file (in bedgraph format)
***getGeneinfo***: Get gene informaton from a ballgown object
***spliceGene***: Calculate 3S scores for one gene
***spliceGenome***: Calculate genome-wide splicing scores
***BMfinder***: Discover bimodal distrubition features
***splicePlot***: Visualization of read coverage, splicing information and
gene information in a gene region
### 5.1 getDepth
Get read depth from a BAM file (in bedgraph format) and return a data.frame in
bedgraph file format which can be used as input for `plotBedgraph` in
the **SuShi** package.
```{r plotBedgraph, fig.height=4, fig.width=8}
path <- system.file("extdata", package = "vasp")
bam_files <- list.files(path, "*.bam$")
bam_files
depth <- getDepth(file.path(path, bam_files[1]), "Chr1", start = 1171800,
end = 1179400)
head(depth)
library(Sushi)
par(mar=c(3,5,1,1))
plotBedgraph(depth, "Chr1", chromstart = 1171800, chromend = 1179400,yaxt = "s")
mtext("Depth", side = 2, line = 2.5, cex = 1.2, font = 2)
labelgenome("Chr1", 1171800, 1179400, side = 1, scipen = 20, n = 5,scale = "Kb")
```
### 5.2 getGeneinfo
Get gene informaton from a ballgown object by genes or by genomic regions and
return a data.frame in bed-like file format that can be used as input
for `plotGenes` in the **SuShi** package
```{r plotGenes, fig.height=5,fig.width=8}
unique(geneIDs(rice.bg))
gene_id <- c("MSTRG.181", "MSTRG.182", "MSTRG.183")
geneinfo <- getGeneinfo(genes = gene_id, rice.bg)
trans <- table(geneinfo$name) # show how many exons each transcript has
trans
chrom = geneinfo$chrom[1]
chromstart = min(geneinfo$start) - 1e3
chromend = max(geneinfo$stop) + 1e3
color = rep(SushiColors(2)(length(trans)), trans)
par(mar=c(3,1,1,1))
p<-plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2,
bentline = FALSE, plotgenetype = "arrow", labeloffset = 0.5)
labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = "Kb")
```
### 5.3 spliceGene
Calculate 3S Scores from ballgown object for a given gene. This function can
only calculate one gene. Please use function `spliceGenome` to obtain
genome-wide 3S scores.
```{r}
rice.bg
head(geneIDs(rice.bg))
score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
count <- spliceGene(rice.bg, "MSTRG.183", junc.type = "count")
## compare
tail(score)
tail(count)
## get intron structrue
intron <- structure(rice.bg)$intron
intron[intron$id %in% rownames(score)]
```
### 5.4 spliceGenome
Calculate 3S scores from ballgown objects for all genes and return a list of
two elements: "score' is a matrix of intron 3S scores with intron rows and
sample columns and "intron" is a `GRanges` object of intron structure.
```{r}
rice.bg
splice <- spliceGenome(rice.bg, gene.select = NA, intron.select = NA)
names(splice)
head(splice$score)
splice$intron
```
### 5.5 BMfinder
Find bimodal distrubition features and divide the samples into 2 groups by
k-means clustering and return a matrix with feature rows and sample columns.
```{r}
score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
score <- round(score, 2)
as <- BMfinder(score, cores = 1) # 4 bimodal distrubition features found
## compare
as
score[rownames(score) %in% rownames(as), ]
```
### 5.6 spliceplot
Visualization of read coverage, splicing information and gene information in a
gene region. This function is a wrapper of `getDepth`, `getGeneinfo`,
`spliceGene`, `plotBedgraph` and `plotGenes`.
```{r, fig.width=8, fig.height=4.4}
samples <- paste("Sample", c("027", "102", "237"), sep = "_")
bam.dir <- system.file("extdata", package = "vasp")
## plot the whole gene region without junction lables
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.text = FALSE,
bheight = 0.2)
## plot the alternative splicing region with junction splicing scores
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", start = 1179000)
```
If the bam files are provided (`bam.dir` is not NA), the read depth for each
sample is plotted. Otherwise (`bam.dir=NA`), the conserved exons of the samples
are displayed by rectangles (an example is the figure in **4. Quick start**).
And by default (`junc.type = 'score'`, `junc.text = TRUE`), the junctions
(represented by arcs) are labeled with splicing scores. You can change the
argument `junc.text = FALSE` to unlabel the junctions or change the argument
`junc.type = 'count'` to label with junction read counts.
```{r, fig.width=8, fig.height=4.4}
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.type = 'count',
start = 1179000)
```
There are other more options to modify the plot, please see the function
`?splicePlot` for details.
## 6. Session Information
```{r}
sessionInfo()
```