---
title: "Providing Bioimage Dataset for ExperimentHub"
author: 
- name: Satoshi Kume
  email: satoshi.kume.1984@gmail.com
date: "`r Sys.Date()`"
graphics: no
package: BioImageDbs
output:
  BiocStyle::html_document:
      toc_float: true
bibliography: BioImageDbs.bib
csl: bj.csl
vignette: >
  %\VignetteIndexEntry{Providing Bioimage Dataset for ExperimentHub}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{ExperimentHub}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```

**Last modified:** `r file.info("BioImageDbs.Rmd")$mtime`<br />
**Compiled**: `r Sys.time()`

# Utilisation and prospects of bioimage datasets

In recent years, there has been a growing need for data analysis using machine
learning in the field of bioimaging research. Machine learning is an inductive
approach using data, and the construction of models, such as image segmentation
and classification, involves the use of image data itself. Therefore, the
publication and sharing of bioimage datasets [@Williams2017] as well as knowledge creation
through providing metadata to bioimages [@Kobayashi2018;@Kume2017] are important issues to be 
discussed. At present, there is no commonly used format for sharing bioimage
datasets. Also, the data is scattered among various repositories. Therefore,
different image repositories manage the data in different formats (image data 
itself and metadata, including image format, instruments/microscopes and
biosamples).

In the data analysis and quantification using those images, it is assumed that
several steps of image pre-processing are performed depending on the analysis
environment used. However, the implementation of supervised learning starts with
finding a repository of the bioimage dataset that contains original images and 
their corresponding supervised labels. Once the repository is found, the image 
data is downloaded from the repository, the data is loaded into each 
environment and it is prepared in a format suitable for analytical package. 
These processes are time consuming before the main analysis. Also, in most of 
the image repositories, the data are not published in a format suitable for 
reading and processing in R (.Rdata, etc.), and the data are not easy to use 
for R users. 

For performing supervised learning of bioimage data, BioImageDbs provides R 
list objects of the original images and their corresponding supervised labels
converted into a 4D or 5D array. After retrieving the data from ExperimentHub, 
it can be utilised for deep learning using Keras/Tensorflow [@Chollet2017kerasR] 
and other machine learning methods, without the need for pre-processing.

On the other hand, many image analysis packages are also available on R; 
however, there is a lack of standardisation in image analysis. The use of 
common, open datasets is one of the essential steps in standardising and 
comparing the analytical methods. The provision of the array data of images 
through ExperimentHub is also intended for applications such as (1) comparing 
models using common-sharing data among R users and (2) applying predictions 
to new datasets through transfer learning and fine-tuning based on these arrays.

# Fetch Bioimage Datasets from ExperimentHub

The `BioImageDbs` package provides the metadata for all BioImage 
databases in `r Biocpkg("ExperimentHub")`. 

The `BioImageDbs` package provides the metadata for bioimage datasets,
which is preprocessed as array format and saved in
`r Biocpkg("ExperimentHub")`.

First we load/update the `ExperimentHub` resource.

```{r load-lib, message = FALSE}
library(ExperimentHub)
eh <- ExperimentHub()
```

Next we list all BioImageDbs entries from `ExperimentHub`.

```{r list-BioImageDbs}
query(eh, "BioImage")
```

We can confirm the metadata in ExperimentHub in Bioconductor S3 bucket
with `mcols()`.

```{r confirm-metadata}
mcols(query(eh, "BioImage"))
```

We can retrieve only the BioImageDbs tibble files as follows.

```{r query-mouse}
qr <- query(eh, c("BioImageDbs", "LM_id0001"))
qr

#Import data
#BioImageDbs_image_Dat <- qr[[1]]
```

# 5D Arrays from the ExperimentHub

The ordering of the array dimensions corresponds to the channels_last format
(default) in R/Keras. The input shape of 5D array is to be batch, spatial_dim1,
spatial_dim2, spatial_dim3 and channels. The number of this batch is the same 
as the number of the 3D image sets. The number of channels is 1 for grey 
images and 3 for RGB images.

# 4D Arrays from the ExperimentHub

The ordering of the array dimensions corresponds to the channels_last format
(default) in R/Keras. The input shape of 4D array is to be batch, height, 
width and channels. The number of this batch is the same as the number of 
the 2D images.

# Visualization of gif images from the ExperimentHub

As a test, we also provided gif files of some arrays for visualizations.
We visualize the files using `magick::image_read` function.

```{r}
qr <- query(eh, c("BioImageDbs", ".gif"))
qr

#EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_data
qr[1]

##Display the gif image
#magick::image_read(qr[[1]])
```

```{r Fig001, fig.cap = "EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[1]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0001.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig002, fig.cap = "EM_id0002_Drosophila_brain_region_5dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[2]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0002.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig003, fig.cap = "EM_id0003_J558L_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[9]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0003.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig004, fig.cap = "EM_id0004_PrHudata_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[10]])
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0004.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig008, fig.cap = "EM_id0005_Mouse_Kidney_2D_All_Mito_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0005_Mouse_Kidney_2D_Mito.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig009, fig.cap = "EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0005_Mouse_Kidney_2D_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig010, fig.cap = "EM_id0006_Rat_Liver_2D_All_Mito_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0006_Rat_Liver_Mito.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```


```{r Fig011, fig.cap = "EM_id0006_Rat_Liver_2D_All_Nuc_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0006_Rat_Liver_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig012, fig.cap = "EM_id0007_Mouse_Kidney_MultiScale_All_Low_Glomerulus_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0007_Mouse_Kidney_MultiScale_Glomerulus.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig013, fig.cap = "EM_id0007_Mouse_Kidney_MultiScale_All_Middle_Podocyte_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0007_Mouse_Kidney_MultiScale_Podocyte.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig014, fig.cap = "EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0008_Human_NB4_Nuc.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig015, fig.cap = "EM_id0008_Human_NB4_2D_All_Nuc_1024_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0008_Human_NB4_Cell.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig016, fig.cap = "EM_id0009_MurineBMMC_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0009.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig017, fig.cap = "EM_id0010_HumanBlast_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0010.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig018, fig.cap = "EM_id0011_HumanJurkat_All_512_4dTensor_dataset.gif", echo = FALSE}
options(EBImage.display = "raster")
img <- system.file("images", "EM_id0011.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig005, fig.cap = "LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[3]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0001.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig006, fig.cap = "LM_id0002_PhC_C2DH_U373_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[5]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0002.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

```{r Fig007, fig.cap = "LM_id0003_Fluo_N2DH_GOWT1_4dTensor_train_dataset.gif", echo = FALSE}
#magick::image_read(qr[[7]])
options(EBImage.display = "raster")
img <- system.file("images", "LM_id0003.png", package="BioImageDbs")
EBImage::display(EBImage::readImage(files = img))
```

# A simple execution command using Keras/Tensorflow

We select a data array and a label array from the data list and assign 
them to variables. These variables are then used as the x and y arguments 
of the fit (<keras.engine.training.Model>) function of Keras as an example. 
The model in Keras should be prepared before the execution.

```{r}
## Not Run ##
# qr <- query(eh, c("BioImageDbs"))
# BioImageData <- qr[[1]]
# data <- BioImageData$Train$Train_Original
# labels <- BioImageData$Train$Train_GroundTruth
# dim(data); dim(labels)
# model %>% fit( x = data, y = labels )
```

# About the imaging dataset and its metadata in BioImageDbs

For this dataset in BioImageDbs, the published open data was used as follows:

1.	For cellular ultra-microstructures, electron microscopy-based imaging data of 
mouse B myeloma cell line J558L (ex. EM_id0003_J558L_4dTensor.Rda) [@Morath2013] 
and primary human T cell isolated from peripheral blood mononuclear cells 
(ex. EM_id0004_PrHudata_4dTensor.Rda) [@Morath2013], Human NB-4 cell
(ex. EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor.Rds) [@Kume2017], 
murine bone marrow derived-mast cells (ex. EM_id0009_MurineBMMC_All_512_4dTensor.Rds) [@Morath2013], 
human blasts (ex. EM_id0010_HumanBlast_All_512_4dTensor.Rds) [@Morath2013], and
human T-cell line Jurkat (ex. EM_id0011_HumanJurkat_All_512_4dTensor.Rds) [@Morath2013]
were used.

2. For bio-tissue ultra-microstructures, electron microscopy-based imaging data 
of the mouse brain (ex. EM_id0001_Brain_CA1_hippocampus_region_5dTensor.Rda) 
[@Lucchi2012;@Lucchi2013bib], Drosophila brain (ex. EM_id0002_Drosophila_brain_region_5dTensor.Rda) 
[@Cardona2010;@ArgandaCarreras2015], mouse kidney (ex. EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds) [@Kume2016]
and rat liver (ex. EM_id0006_Rat_Liver_2D_All_Mito_1024_4dtensor.Rds) [@Kume2016] were used.

3.	For cell tracking, light microscopy-based imaging data of the human HeLa 
cells on a flat glass (ex. LM_id0001_DIC_C2DH_HeLa_4dTensor.Rda) [@Maska2014;@Ulman2017], 
human glioblastoma-astrocytoma U373 cells on a polyacrylamide substrate 
(ex. LM_id0002_PhC_C2DH_U373_4dTensor.Rda) [@Maska2014;@Ulman2017] and GFP-GOWT1 mouse stem 
cells (ex. LM_id0003_Fluo_N2DH_GOWT1_4dTensor.Rda) [@Bartova2011] were used.

The values of the supervised labels were provided as array data with binary 
or multiple values. The detailed information was described in the metadata 
file of BioImageDbs. 
Some of cell tracking data were obtained from the [cell tracking challenge](http://celltrackingchallenge.net/2d-datasets/).

# Session information {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```

# References