---
title: "Data for mCSEA package"
author:
- name: Jordi Martorell-Marugán
  affiliation:
  - Bioinformatics Unit. GENYO, Centre for Genomics and Oncological Research
- name: Pedro Carmona-Sáez
  affiliation:
  - Bioinformatics Unit. GENYO, Centre for Genomics and Oncological Research
  email: pedro.carmona@genyo.es
package: mCSEAdata
date: "`r doc_date()`"
abstract: >
  _mCSEAdata_ package contains the necessary files to run the core analysis in 
  _mCSEA_ package. It also contains example data used by _mCSEA_ to show it's 
  functionality.
output:
  BiocStyle::pdf_document
vignette: >
  %\VignetteIndexEntry{Data for mCSEA package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

---


# Package contents

```{r}
library(mCSEAdata)
data(mcseadata)
data(bandTable)
```

Firstly, **betaTest**, **phenoTest** and **exprTest** are the objects necessary 
to run the 
examples in _mCSEA_ package. **betaTest** is a matrix with the beta-values 
of 10000 EPIC probes for 20 samples. **exprTest** is a subset of 100 genes' 
expression from bone marrows of 10 healthy and 10 leukemia patients. 
**phenoTest** is a dataframe with the 
explanatory variable and covariates associated to the samples.

```{r}
class(betaTest)
dim(betaTest)
head(betaTest, 3)
class(phenoTest)
dim(phenoTest)
head(phenoTest, 3)
class(exprTest)
dim(exprTest)
head(exprTest, 3)
```




On the other hand, there are 6 association objects. Each one is a list of 
features with their associated 450k or EPIC CpG probes. The features included 
are promoters (**assocPromoters450k** and **assocPromotersEPIC**), gene bodies 
(**assocGenes450k** and **assocGenesEPIC**) and CpG islands (**assocCGI450k** 
and **assocCGIEPIC**). These objects are internally used by _mCSEA.test_ 
function in _mCSEA_ package.

```{r}
class(assocPromoters450k)
length(assocPromoters450k)
head(assocPromoters450k, 3)
class(assocGenes450k)
length(assocGenes450k)
head(assocGenes450k, 3)
class(assocCGI450k)
length(assocCGI450k)
head(assocCGI450k, 3)
class(assocPromotersEPIC)
length(assocPromotersEPIC)
head(assocPromotersEPIC, 3)
class(assocGenesEPIC)
length(assocGenesEPIC)
head(assocGenesEPIC, 3)
class(assocCGIEPIC)
length(assocCGIEPIC)
head(assocCGIEPIC, 3)
```


There are also 2 GRanges objects with the locations of 450K and EPIC probes, 
used by _mCSEAPlot()_ and _mCSEAIntegrate()_ functions:
```{r, message = FALSE}
class(annot450K)
head(annot450K, 3)
class(annotEPIC)
head(annotEPIC, 3)
```

Finally, **bandTable** object contains chromosomes band information and 
centromer location. It is used by mCSEAPlot() function to plot the chromosome 
track.

```{r}
head(bandTable)
```

# Sources
* Example objects:
    + **betaTest** contains simulated beta-values for EPIC platform probes.
    + **exprTest** contains expression data from Leukemia and healthy patients 
extracted from `r Biocpkg("leukemiaEset")` package.
    + **phenoTest** contains arbitrary phenotypes for each sample.
    
* Association objects: They were all constructed from 
`r Biocpkg("IlluminaHumanMethylation450kanno.ilmn12.hg19")` and 
`r Biocpkg("IlluminaHumanMethylationEPICanno.ilm10b2.hg19")` 
packages annotation data. For that purpose, a _RGChannelSet_ object was
obtained with `r Biocpkg("minfi")` package and 
_getAnnotation()_ function 
was applied to such object in order to get the annotation DataFrame. That was 
done for both 450k and EPIC platforms. The annotation DataFrame contains 
several information about each CpG probe, and we used that information to 
associate each probe to one or more promoter, gene body or CpG Island 
following this scheme:

Region type | mCSEAdata association objects | Column from association DataFrame used | Column values | Feature name column
----------- | ----------------------------- | -------------------------------------- | ------------- | -------------------
Promoters | assocPromoters450k and assocPromotersEPIC | UCSC_RefGene_Group | TSS1500, TSS200, 5'UTR or 1stExon | UCSC_RefGene_Name
Gene bodies | assocGenes450k and assocGenesEPIC | UCSC_RefGene_Group | Body | UCSC_RefGene_Name
CpG Islands | assocCGI450k and assocCGIEPIC | Relation_to_Island | Island, N_Shore, S_Shore, N_Shelf or S_Shelf | Islands_Name

For instance, _cg00212031_ probe from 450k platform has the following 
annotation data in the association DataFrame:

UCSC_RefGene_Group | UCSC_RefGene_Name | Relation_to_Island | Islands_Name
------------------ | ----------------- | ------------------ | ------------
TSS200 | TTTY14 | Island | chrY:21238448-21240005

So this probe is associated to TTTY14 promoter in assocPromoters450k object 
and to chrY:21238448-21240005 CpG Island in assocCGI450k object.

* Annotation objects (**annot450K** and **annotEPIC**): They were both 
constructed with `r Biocpkg("minfi")` package. A _RGChannelSet_ object was
obtained for each platform and _getLocations()_ function 
was applied to such objects.

* bandTable: It was constructed with `r Biocpkg("Gviz")` package, concretely 
with _IdeogramTrack()_ function.


# Session info
```{r sessionInfo, echo=FALSE}
sessionInfo()
```