---
title: "Evaluate impact of Semantic Similiarity choice"
author:
- name: Aurelien Brionne
affiliation: Institut National de la Recherche Agronomique (INRA)
- name: Amelie Juanchich
affiliation: Institut National de la Recherche Agronomique (INRA)
- name: Christelle Hennequet-Antier
affiliation: Institut National de la Recherche Agronomique (INRA)
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
BiocStyle::html_document:
highlight: tango
vignette: >
%\VignetteIndexEntry{3: SS_choice}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: "`r system.file('extdata','bibliography.bib',package='ViSEAGO')`"
csl: "`r system.file('extdata','bmc-genomics.csl',package='ViSEAGO')`"
---
```{r setup,include=FALSE}
# load
library(ViSEAGO)
# knitr document options
knitr::opts_chunk$set(
eval=FALSE,fig.path='./data/output/',echo=TRUE,fig.pos = 'H',
fig.width=8,message=FALSE,comment=NA,warning=FALSE
)
```
# Introduction{-}
In the overview (see`utils::vignette("overview", package ="ViSEAGO")`), we explained how to use `r BiocStyle::Biocpkg("ViSEAGO")` package.
In this vignette we explain how to explore the effect of the GO semantic similarity algorithms on the tree structure, and the effect of the trees clustering based on the mouse_bioconductor vignette dataset (see `utils::vignette("2_mouse_bioconductor", package ="ViSEAGO")`).
# Data{-}
Vignette build convenience (for less build time and size) need that data were pre-calculated (provided by the package), and that illustrations were not interactive.
```{r vignette_data_used}
# load vignette data
data(
myGOs,
package="ViSEAGO"
)
```
# Clusters-heatmap of GO terms
The GO annotations of genes created and enriched GO terms are combined using `ViSEAGO::build_GO_SS`. The Semantic Similarity (SS) between enriched GO terms are calculated using `ViSEAGO::compute_SS_distances` method. We compute all distances methods with *Resnik*, *Lin*, *Rel*, *Jiang*, and *Wang* algorithms implemented in the `r BiocStyle::Biocpkg("GOSemSim")` package @pmid20179076. The built object `myGOs` contains all informations of enriched GO terms and the SS distances between them.
Then, a hierarchical clustering method using `ViSEAGO::GOterms_heatmap` is performed based on each SS distance between the enriched GO terms using the `ward.D2` aggregation criteria. Clusters of enriched GO terms are obtained by cutting branches off the dendrogram. Here, we choose a dynamic branch cutting method based on the shape of clusters using `r BiocStyle::CRANpkg("dynamicTreeCut")` [@pmid18024473; @dynamicTreeCut].
```{r SS_build,eval=FALSE}
# compute Semantic Similarity (SS)
myGOs<-ViSEAGO::compute_SS_distances(
myGOs,
distance=c("Resnik","Lin","Rel","Jiang","Wang")
)
```
1. Resnik distance
```{r SS_terms_Resnik-wardD2}
# GO terms heatmap
Resnik_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=TRUE,
showGOlabels=TRUE,
GO.tree=list(
tree=list(
distance="Resnik",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
```
2. Lin distance
```{r SS_Lin-wardD2}
# GO terms heatmap
Lin_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=TRUE,
showGOlabels=TRUE,
GO.tree=list(
tree=list(
distance="Lin",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
```
3. Rel distance
```{r SS_ Rel-wardD2}
# GO terms heatmap
Rel_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=TRUE,
showGOlabels=TRUE,
GO.tree=list(
tree=list(
distance="Rel",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
```
4. Jiang distance
```{r SS_Jiang-wardD2}
# GO terms heatmap
Jiang_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=TRUE,
showGOlabels=TRUE,
GO.tree=list(
tree=list(
distance="Jiang",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
```
5. Wang distance
```{r SS_Wang-wardD2}
# GO terms heatmap
Wang_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
myGOs,
showIC=TRUE,
showGOlabels=TRUE,
GO.tree=list(
tree=list(
distance="Wang",
aggreg.method="ward.D2"
),
cut=list(
dynamic=list(
deepSplit=2,
minClusterSize =2
)
)
),
samples.tree=NULL
)
```
# Trees comparison
## Global trees comparisons
The `r BiocStyle::CRANpkg("dendextend")` package @dendextend, offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings (see `utils::vignette("introduction", package ="dendextend")`). In this package we use `dendextend::dendlist` and `dendextend::cor.dendlist` functions in order to calculate a correlation matrix between trees, which is based on the Baker Gamma and cophenetic correlation as mentioned in `r BiocStyle::CRANpkg("dendextend")`.
The correlation matrix can be visualized with the nice `corrplot::corrplot` function from `r BiocStyle::CRANpkg("corrplot")` package @corrplot.
```{r parameters_dend_correlation}
# build the list of trees
dend<- dendextend::dendlist(
"Resnik"=slot(Resnik_clusters_wardD2,"dendrograms")$GO,
"Lin"=slot(Lin_clusters_wardD2,"dendrograms")$GO,
"Rel"=slot(Rel_clusters_wardD2,"dendrograms")$GO,
"Jiang"=slot(Jiang_clusters_wardD2,"dendrograms")$GO,
"Wang"=slot(Wang_clusters_wardD2,"dendrograms")$GO
)
# build the trees matrix correlation
dend_cor<-dendextend::cor.dendlist(dend)
```
```{r parameters_dend_correlation_print}
# corrplot
corrplot::corrplot(
dend_cor,
"pie",
"lower",
is.corr=FALSEALSE,
cl.lim=c(0,1)
)
```
As expected, we can easily tells us that GO semantic similarity algorithms based on the Information Content (IC-based) with *Resnik*, *Lin*, *Rel*, and *Jiang* methods are more similar than the *Wang* method which in based on the topology of the GO graph structure (Graph-based).
## Paired trees comparison
We can also compare the dendrograms build with, for example, the *Resnik* and the *Wang* algorithms using `dendextend::dendlist`, `dendextend::untangle`, and `dendextend::tanglegram` functions.
The quality of the alignment of the two trees can be calculated with `dendextend::entanglement` (0: good to 1:bad).
```{r parameters_dend_comparison,fig.cap="dendrograms comparison"}
# dendrogram list
dl<-dendextend::dendlist(
slot(Resnik_clusters_wardD2,"dendrograms")$GO,
slot(Wang_clusters_wardD2,"dendrograms")$GO
)
# untangle the trees (efficient but very highly time consuming)
tangle<-dendextend::untangle(
dl,
"step2side"
)
# display the entanglement
dendextend::entanglement(tangle) # 0.08362968
# display the tanglegram
dendextend::tanglegram(
tangle,
margin_inner=5,
edge.lwd=1,
lwd = 1,
lab.cex=0.8,
columns_width = c(5,2,5),
common_subtrees_color_lines=FALSE
)
```
# Clusters comparison
Another possibility concerns the comparison of the dendrograms clusters.
## Multiple clusters comparison
We can also explore the GO terms assignation between clusters according the used parameters with `ViSEAGO::clusters_cor` and plot the results with `corrplot::corrplot` using `r BiocStyle::CRANpkg("corrplot")` package.
```{r parameters_clusters_correlation}
# clusters to compare
clusters=list(
Resnik="Resnik_clusters_wardD2",
Lin="Lin_clusters_wardD2",
Rel="Rel_clusters_wardD2",
Jiang="Jiang_clusters_wardD2",
Wang="Wang_clusters_wardD2"
)
# global dendrogram partition correlation
clust_cor<-ViSEAGO::clusters_cor(
clusters,
method="adjusted.rand"
)
```
```{r parameters_clusters_correlation_print}
# global dendrogram partition correlation
corrplot::corrplot(
clust_cor,
"pie",
"lower",
is.corr=FALSEALSE,
cl.lim=c(0,1)
)
```
As expected, same as in the global trees comparison, we can easily tells us that GO semantic similarity algorithms based on the Information Content (IC-based) with Resnik, Lin, Rel, and Jiang methods are more similar than the Wang method which in based on the topology of the GO graph structure (Graph-based).
## Paired trees comparison
We can also explore *in details* the GO terms assignation between clusters according the used parameters with `ViSEAGO::compare_clusters`.
```{r parameters_clusters_comparison,fig.height=8}
# clusters content comparisons
ViSEAGO::compare_clusters(clusters)
```
NB: For this vignette, this illustration is not interactive.
# Conclusion
`r BiocStyle::Biocpkg("ViSEAGO")` package provides convenient methods to explore the effect of the GO semantic similarity algorithms on the tree structure, and the effect of the trees clustering playing a key role to ensuring functional coherence.
# References{-}