---
title: "3.4 - Motif enrichment analysis on the selected probes"
output: 
  html_document:
    self_contained: true
    number_sections: no
    theme: flatly
    highlight: tango
    mathjax: default
    toc: true
    toc_float: true
    toc_depth: 2
    css: style.css
  
bibliography: bibliography.bib    
vignette: >
  %\VignetteIndexEntry{"3.4 - Motif enrichment analysis on the selected probes"}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---


```{r, echo = FALSE,hide=TRUE, message=FALSE, warning=FALSE}
library(ELMER)
library(DT)
library(dplyr)
library(BiocStyle)
```

<br>

# Motif enrichment analysis on the selected probes

## Introduction

This step is to identify enriched motif in a set of probes which is carried out by
function `get.enriched.motif`.

## Description

In order to identify enriched motifs and potential upstream regulatory TFs, all probes with occurring in significant probe-gene pairs are combined for motif enrichment analysis. HOMER (Hypergeometric Optimization of Motif EnRichment) [@heinz2010simple] is used to find motif occurrences in a $\pm 250bp$ region around each probe, using HOCOMOCO (HOmo sapiens COmprehensive MOdel COllection) v11 [@kulakovskiy2016hocomoco]. Transcription factor (TF) binding models are available at http://hocomoco.autosome.ru/downloads. HOCOMOCO is the most comprehensive TFBS database and is consistently updated, marking an improvement over ELMER version 1.

For each probe set tested (i.e. the set of all probes occurring in significant probe-gene pairs), we quantify enrichments using Fisher's exact test (where $a$ is the number of probes within the selected probe set that contains one or more motif occurrences; $b$ is the number of probes within the selected probe set that do not contain a motif occurrence; $c$ and $d$ are the same counts within 
the entire array probe set drawn from the same set of distal-only probes using the same definition as the primary analysis) and multiple testing correction with the Benjamini-Hochberg procedure [@fisher].  

A probe set was considered significantly enriched 
for a particular motif if the 95\% confidence interval of the Odds Ratio was greater than $1.1$ (specified by option `lower.OR`, $1.1$ is default),  the motif 
occurred at least 10 times (specified by option `min.incidence`, $10$ is default) in 
the probe set and $FDR < 0.05$.


# Function arguments
<div class="panel panel-info">
<div class="panel-heading">Main get.pair arguments </div>
<div class="panel-body">
| Argument | Description |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| data | A multi Assay Experiment from  createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually. |
| probes | A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated. |
| lower.OR | A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference). |
| min.incidence | A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default. |
</div>
</div>

# Example of use
```{r,eval=TRUE, message=FALSE, warning = FALSE,results = "hide"}
# Load results from previous sections
mae <- get(load("mae.rda"))
sig.diff <- read.csv("result/getMethdiff.hypo.probes.significant.csv")
pair <- read.csv("result/getPair.hypo.pairs.significant.csv")
head(pair) # significantly hypomethylated probes with putative target genes

# Identify enriched motif for significantly hypomethylated probes which 
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
                                     probes = pair$Probe, 
                                     dir.out = "result", 
                                     label = "hypo",
                                     min.incidence = 10,
                                     lower.OR = 1.1)
```

```{r,eval=TRUE, message=FALSE, warning = FALSE}
names(enriched.motif) # enriched motifs
head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.

# get.enriched.motif automatically save output files. 
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif. 
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif") 

# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf") 
```

# Bibliography