---
title: "Generating Zeros using the Species List"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Generating Zeros using the Species List}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Introduction

The aim of this document is demonstrate how the Species List table (SL) of the RDBES can be used to complement the sample table with zeros in cases where, e.g., a species was looked for but not found and therefore does not appear in the Sample table (SA) of the RDBES. The task of adding zeros to the Sample table (SA) is made easy by using the function `generateZerosUsingSL` available in the `RDBEScore` package.

## Load the package

```{r setup}
library(RDBEScore)
library(data.table)
```

## Load and validate example data

```{r}
# read an example dataset and simplify it to 1 trip and 1 haul [dev bote: this section needs to be reworked when data and filterRDBESDataObject are  updated]
data(Pckg_survey_apistrat_H1)
myH1DataObject1 <- Pckg_survey_apistrat_H1
myH1DataObject1$SL<-myH1DataObject1$SL[grepl(myH1DataObject1$SL$SLspeclistName, pat="Pckg_survey_apistrat_H1"),]
#myH1DataObject1<-filterAndTidyRDBESDataObject(myH1DataObject1, fieldsToFilter="FOid",valuesToFilter=70849, killOrphans = TRUE)
myH1DataObject1<-filterRDBESDataObject(myH1DataObject1, fieldsToFilter="SSid",valuesToFilter=227694, killOrphans = TRUE)
# check it is a valid RDBESobject
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)
```

## A closer look the example data and its characteristics

The example is from data in hierarchy 1. It contains a single trip with a single haul. For simplicity, we restrict our analysis to the tables SL, SS and SA which are the ones handled by the functions we which behaviour we want to demonstrate. 

Examining a print of the Species List (SL) one can conclude that the sampling targeted the landings of only 1 species. In this case the species was *Nephrops norvegicus* (aphiaId 107254).

```{r}
myH1DataObject1[c("SL")]
```

Examining a print of the Species Selection table (SS), one can confirm that only one fishing operation is present in the data (FOid 70849) and that landings were indeed sampled from it (for simplicity only a subset of columns is printed). Note that variable \code{SSuseCalcZero} is set to "N" (i.e., No). This will have to be changed later on if we want zeros calculated.

```{r}
myH1DataObject1[[c("SS")]][,c(1:15,19)]
```
Given the previous, it is  expected that if *Nephrops norvegicus* was sampled it will appear in the RDBES Sample table (SA). One can confirm that happened by printing that table (for simplicity only a subset of columns is printed).
```{r}
myH1DataObject1[[c("SA")]][,c(1:9,48:49)]
```

## Generating Zeros for species looked for but not reported in the SA table

Lets change the example, by adding a couple of new species (*Pandalus borealis* and *Cancer pagurus*) to the Species List table (SS). We also change variable \code{SSuseCalcZero} to "Y" so that zeros can be calculated.     
```{r}
# first we duplicate the SL
myH1DataObject1$SL<-rbind(myH1DataObject1$SL, myH1DataObject1$SL, myH1DataObject1$SL)
# then we update a few fields and reset the SL key
myH1DataObject1$SL[2:3,c("SLid","SLcommTaxon","SLsppCode")]<-data.frame(c(47892, 47893),c(107276, 107649),   c(107276, 107649))
setkeyv(myH1DataObject1$SL,"SLid")
# change SSuseCalcZero to "Y"
myH1DataObject1$SS$SSuseCalcZero<-"Y"
# finally we make sure the object we created is a valid RDBES data object. No message is a good sign.
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)
# display new SL
myH1DataObject1[c("SL")]
```

After the update, the new dataset likens a situation where observers looked for three species (*Nephrops norvegicus*, *Cancer pagurus* and *Pandalus borealis*) with only one of them (*Nephrops norvegicus*) having been found in the sample. Running function \code{generateZerosUsingSL} the zeros for those additional species can be quickly added. 

```{r}
myH1DataObject1updte<-generateZerosUsingSL(myH1DataObject1)
myH1DataObject1updte$SA[,c(1:9,48:49)]
```
Note that the new rows have floating points values for SAid and SAseqNum (we use sprintf to ensure the decimal places are displayed). This facilitates the ordering of the samples and prevents overlaps when different datasets are joined. Also a SAunitName was created for the new rows that is identical to the SAid.
```{r}
sprintf(myH1DataObject1updte[['SA']]$SAid, fmt = '%.3f')
sprintf(myH1DataObject1updte[['SA']]$SAseqNum, fmt = '%.3f')
print(myH1DataObject1updte[['SA']]$SAunitName)
```


## ...