Generating Zeros using the Species List

Introduction

The aim of this document is demonstrate how the Species List table (SL) of the RDBES can be used to complement the sample table with zeros in cases where, e.g., a species was looked for but not found and therefore does not appear in the Sample table (SA) of the RDBES. The task of adding zeros to the Sample table (SA) is made easy by using the function generateZerosUsingSL available in the RDBEScore package.

Load the package

library(RDBEScore)
library(data.table)

Load and validate example data

# read an example dataset and simplify it to 1 trip and 1 haul [dev bote: this section needs to be reworked when data and filterRDBESDataObject are  updated]
data(Pckg_survey_apistrat_H1)
myH1DataObject1 <- Pckg_survey_apistrat_H1
myH1DataObject1$SL<-myH1DataObject1$SL[grepl(myH1DataObject1$SL$SLspeclistName, pat="Pckg_survey_apistrat_H1"),]
#myH1DataObject1<-filterAndTidyRDBESDataObject(myH1DataObject1, fieldsToFilter="FOid",valuesToFilter=70849, killOrphans = TRUE)
myH1DataObject1<-filterRDBESDataObject(myH1DataObject1, fieldsToFilter="SSid",valuesToFilter=227694, killOrphans = TRUE)
# check it is a valid RDBESobject
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)

A closer look the example data and its characteristics

The example is from data in hierarchy 1. It contains a single trip with a single haul. For simplicity, we restrict our analysis to the tables SL, SS and SA which are the ones handled by the functions we which behaviour we want to demonstrate.

Examining a print of the Species List (SL) one can conclude that the sampling targeted the landings of only 1 species. In this case the species was Nephrops norvegicus (aphiaId 107254).

myH1DataObject1[c("SL")]
#> $SL
#> Key: <SLid>
#>     SLid SLrecType  SLcou SLinst                             SLspeclistName
#>    <int>    <char> <char> <char>                                     <char>
#> 1: 47891        SL     ZW   4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#>    SLyear SLcatchFrac SLcommTaxon SLsppCode
#>     <int>      <char>       <int>     <int>
#> 1:   1965         Lan      107254    107254

Examining a print of the Species Selection table (SS), one can confirm that only one fishing operation is present in the data (FOid 70849) and that landings were indeed sampled from it (for simplicity only a subset of columns is printed). Note that variable is set to “N” (i.e., No). This will have to be changed later on if we want zeros calculated.

myH1DataObject1[[c("SS")]][,c(1:15,19)]
#> Key: <SSid>
#>      SSid  LEid  FOid  TEid  FTid  SLid  OSid SSrecType SSseqNum
#>     <int> <int> <int> <int> <int> <int> <int>    <char>    <int>
#> 1: 227694    NA 70849    NA    NA 47891    NA        SS        1
#>    SSstratification SSstratumName SSclustering SSclusterName SSobsActTyp
#>              <char>        <char>       <char>        <char>      <char>
#> 1:                N             U            N             U        Sort
#>    SScatchFra SSuseCalcZero
#>        <char>        <char>
#> 1:        Lan             N

Given the previous, it is expected that if Nephrops norvegicus was sampled it will appear in the RDBES Sample table (SA). One can confirm that happened by printing that table (for simplicity only a subset of columns is printed).

myH1DataObject1[[c("SA")]][,c(1:9,48:49)]
#> Key: <SAid>
#>      SAid   SSid  LEid SArecType SAseqNum SAparSequNum SAstratification
#>     <num>  <int> <int>    <char>    <num>        <num>           <char>
#> 1: 572813 227694    NA        SA        1           NA                N
#>    SAstratumName SAspeCode SAtotalWtMes SAsampWtMes
#>           <char>    <char>        <int>       <int>
#> 1:             U    107254          276         276

Generating Zeros for species looked for but not reported in the SA table

Lets change the example, by adding a couple of new species (Pandalus borealis and Cancer pagurus) to the Species List table (SS). We also change variable to “Y” so that zeros can be calculated.

# first we duplicate the SL
myH1DataObject1$SL<-rbind(myH1DataObject1$SL, myH1DataObject1$SL, myH1DataObject1$SL)
# then we update a few fields and reset the SL key
myH1DataObject1$SL[2:3,c("SLid","SLcommTaxon","SLsppCode")]<-data.frame(c(47892, 47893),c(107276, 107649),   c(107276, 107649))
setkeyv(myH1DataObject1$SL,"SLid")
# change SSuseCalcZero to "Y"
myH1DataObject1$SS$SSuseCalcZero<-"Y"
# finally we make sure the object we created is a valid RDBES data object. No message is a good sign.
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)
# display new SL
myH1DataObject1[c("SL")]
#> $SL
#> Key: <SLid>
#>     SLid SLrecType  SLcou SLinst                             SLspeclistName
#>    <int>    <char> <char> <char>                                     <char>
#> 1: 47891        SL     ZW   4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> 2: 47892        SL     ZW   4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> 3: 47893        SL     ZW   4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#>    SLyear SLcatchFrac SLcommTaxon SLsppCode
#>     <int>      <char>       <int>     <int>
#> 1:   1965         Lan      107254    107254
#> 2:   1965         Lan      107276    107276
#> 3:   1965         Lan      107649    107649

After the update, the new dataset likens a situation where observers looked for three species (Nephrops norvegicus, Cancer pagurus and Pandalus borealis) with only one of them (Nephrops norvegicus) having been found in the sample. Running function the zeros for those additional species can be quickly added.

myH1DataObject1updte<-generateZerosUsingSL(myH1DataObject1)
myH1DataObject1updte$SA[,c(1:9,48:49)]
#> Key: <SAid>
#>      SAid   SSid  LEid SArecType SAseqNum SAparSequNum SAstratification
#>     <num>  <int> <int>    <char>    <num>        <num>           <char>
#> 1: 572813 227694    NA        SA    0.998           NA                N
#> 2: 572813 227694    NA        SA    0.999           NA                N
#> 3: 572813 227694    NA        SA    1.000           NA                N
#>    SAstratumName SAspeCode SAtotalWtMes SAsampWtMes
#>           <char>    <char>        <num>       <num>
#> 1:             U    107649            0           0
#> 2:             U    107276            0           0
#> 3:             U    107254          276         276

Note that the new rows have floating points values for SAid and SAseqNum (we use sprintf to ensure the decimal places are displayed). This facilitates the ordering of the samples and prevents overlaps when different datasets are joined. Also a SAunitName was created for the new rows that is identical to the SAid.

sprintf(myH1DataObject1updte[['SA']]$SAid, fmt = '%.3f')
#> [1] "572812.998" "572812.999" "572813.000"
sprintf(myH1DataObject1updte[['SA']]$SAseqNum, fmt = '%.3f')
#> [1] "0.998" "0.999" "1.000"
print(myH1DataObject1updte[['SA']]$SAunitName)
#> [1] "572812.998" "572812.999" "1"