--- title: "Make A Package" author: "Lori Shepherd" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Make A Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Make A Package A package is a collection of functions that work together in a cohesive manner to achieve a goal. It (should) include detailed documentation through man pages and vignettes and (should) be tested for accuracy and efficiency. [Writing R extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) is a very detailed manual to writing packages and what the structure of a package entails. ## Using `devtools` to create a package The `devtools` package provides a lot of options and utility for helping to construct a new package. You can get a list of all available devtools functions with `ls("package:devtools")`. Some useful references for using devtools to build packages are [Rstudio Devtools Cheetsheet](https://www.rstudio.com/wp-content/uploads/2015/03/devtools-cheatsheet.pdf) and [Jennifer Bryan class](http://stat545.com/packages06_foofactors-package.html) ### set up shell of a package ```{r, eval=FALSE} library(devtools) create("myFirstPackage") ``` Create will instantiate all the necessary files and sub-directories that are required by R to be a valid package: DESCRIPTION, NAMESPACE, and R directory. You will have to edit the DESCRIPTION to insert the information pertinent to your package. After running create, the package has a valid package structure and can installed and loaded: `install("myFirstPackage")`, `library("myFirstPackage")`, `library(help="myFirstPackage")`. ### version control It is an excellent idea to version control whenever creating a package and especially when collaborating on a project, where multiple users are allowed to make changes. It allows for a constant record of changes that can be advanced or reverted if necessary. Only a project can be version controlled, to make a directory a project in Rstudio go to: `File -> New Project` . In this case we started creating the directory so we follow the prompts for the option `Use Existing Directory`. Now that it is a project we can go to `Tools -> Version Control -> Project Setup` Change the `Version Control System` to `Git` and follow the prompts. Notice in the Rstudio pane for environments/history/build there is a new tab Git. The package now can start using `git` version control by making commits. To make a commit, you can go to the Git tab, select the check box next to any files that have been modified, added, or deleted that you would like to track, and select `commit`. Enter a new commit message in the window that pops up and select `commit`. We should tell Git who we are. In Rstudio, select `Tools -> Shell` and type the following subsituting your user.email and the user.name (if you have github we recommend using your email and user.name associated with github here). ``` git config --global user.email "" git config --global user.name "" ``` We coud stop here but we also would like put the package on GitHub. (This assumes you have a github account). First, in Rstudio go to `Tools -> Global Options` and select Git/SVN. Ensure the paths are correct. If you have not linked a Rstudio project to Github, select `Create RSA key`. Close the window. Click on `View public key` and copy the displayed public key. Now in a web browser, open your GitHub account. Go to `Settings` and `SSH and GPG keys`. Click on the option for `New SSH key` and paste the public key that was copied. Also on GitHub, create a new repository with the same name as the one you created in Rstudio using `create()`. Back in Rstudio, select `Tools -> Shell` and type the following subsituting your github user.name and the package name. ``` git remote add origin https://github.com//.git git config remote.origin.url git@github.com:/.git git pull origin master git push -u origin master ``` For instance, my package repository on github is "myFirstPackage" and my git hub user.name is lshep: ``` https://github.com/lshep/myFirstPackage.git git@github.com:lshep/myFirstPackage.git ``` Now if you look in the Rstudio tab for Git, the push and pull options are available. You can now push and pull from/to the local version of your package and the GitHub repository version of the package. ### Back to making a valid package `devtools` proviodes built in functions for building, checking, and installing a package. The package we created using `create` has a valid package structure but if we did `check` we will find the DESCRIPTION needs updating. Update the DESCRIPTION file to be appropriate for your package; throughout the development of your package you might have to update the DESCRIPTION for appropraite Depends, Imports, and Suggest fields. Now we want to start writing R functions. In Rstudio you can open an empty file by `File -> New File` and selecting `R Script`. Save the file in the R directory. Write your functions and document. You can either document functions manually or if you use roxygen you can use the devtools function `document()`. See the [Writing R extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) for manual creation of Rd files (located in the man directory) but roxygen is growing increating popular. Some helpful links for roxygen tags can be found [Rstudio Devtools Cheetsheet](https://www.rstudio.com/wp-content/uploads/2015/03/devtools-cheatsheet.pdf) and [Roxygen Help](https://cran.rstudio.com/web/packages/roxygen2/vignettes/rd.html). Some useful devtools commands while creating functions: * `load_all()` loads all package functions in environment to test * `check()` checks the package (R CMD check) * `document()` generates or updates andy documentation files and using the Rstudio options `Build -> Build And Reload` and `Build -> Clean and Rebuild`. It is also recommended to have a man page for your package. devtools provides framework for this and creates the file that needs to be modified by calling `use_package_doc()`. If you import any functions in your code, don't forget to update the DESCRIPTION file for Depends, Imports, or Suggests. ### Testing It is highly recommended to add unit tests to your package. Unit test ensure the package is working as expect. The two main ways to test are using `RUnit` or `testthat`. testthat functionality is included in devtools. Using `use_testthat()` will set up the needed directory structure and add the package suggestion to the DESCRIPTION. Here are some examples of the structure of tests for testthat: `expect_identical`, `expect_true`, `expect_error`. There are other options as well that are discussed in [testthat Wickham](https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf) and [testthat](http://r-pkgs.had.co.nz/tests.html). ### Vignettes Vignettes are another major documentation piece to a package and more and more repository systems (CRAN, Bioconductor, ROpenSci) and making vignettes a standard requirement. They are a more indepth description and examples of package usage. devtools also provides the function `use_vignette()` to set up the directory structure and initial file for creating a vignette. For Bioconductor submissions we recommend changing the `output:` section of the vignette header to the following (this would require adding BiocStyle to the Suggest field in the DESCRIPTION): ``` output: BiocStyle::html_document: toc: true toc_depth: 2 ``` Or if BiocStyle is already installed on your system, you can also use Rstudio: `New File -> Rmarkdown -> From Template -> Bioconductor HTML/PDF Vignette` A helpful rmarkdown link, which is commonly used for vignette creation can be found here: [rmarkdown cheetsheet](http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf) ## Bioconductor Standards 1. proper coding and efficient coding: * [Efficient and Robust Code](http://bioconductor.org/developers/how-to/efficient-code/) * [Query Web Resource](http://bioconductor.org/developers/how-to/web-query/) 2. Bioconductor interconnectivity and S4 classes * S4 over S3 classes * reuse existing infastructure first! + classes - DNA/RNA - Biostrings DNAstringset - gene sets - GSEABase GeneSet - genomic intervals - GenomicRanges Granges - rectangular freature x sample data - (RNAseq count matrix, microarray) - SummarizedExperiment/MultiAssayExperiment - Single cell - SingleCellExperiment - Mass Spec - MSnbase + import/loading data - GTF, GFF, BED, BigWig, ... rtracklayer::import() - FASTA Biostrings::readDNAStringSet() - SAM / BAM Rsamtools::scanBam(), GenomicAlignments::readGAlignment*() - VCF VariantAnnotation::readVcf() - FASTQ ShortRead::readFastq() * [BiocViews](http://bioconductor.org/packages/release/BiocViews.html#___Software) 3. Tests * [unitTest guidelines](http://bioconductor.org/developers/how-to/unitTesting-guidelines/) 4. complete and detailed Vignettes and man pages with executable examples 5. check time < 5min 6. package size < 4Mb 7. All package guidelines can be found [here](http://bioconductor.org/developers/package-guidelines/) A clean build and check and BiocCheck is not a guarantee to acceptance. It will go through a formal view process. ### Submitting to Bioconductor Read the [Contributions Page](https://github.com/Bioconductor/Contributions) and when ready to submit open a [New Issue](https://github.com/Bioconductor/Contributions/issues). The Title: should be the name of your package. Once the package is approved for building, don't forget to set up the [webhook](https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#adding-a-web-hook)