--- title: "Manipulating RDBESDataObjects" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Manipulating RDBESDataObjects} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The aim of this document is to illustrate some of the ways of manipulating RDBESDataObjects. ```{r} library(RDBEScore) ``` ## Prerequisites First we'll load some example data from the RDBES and check it's valid. It's a good idea to check your RDBESDataObjects are valid after any manipulations you perform. See how to import your own data in the vignette [Import RDBES data](raw-data-import.html) In this vignette package example data is used. ```{r load H1} # load Hierarchy 1 demo data myH1RawObject <- H1Example validateRDBESDataObject(myH1RawObject, verbose = FALSE) ``` ### Print, sort and summary of RDBESDataObject The print method gives list of non-null tables in the RDBESDataObject. The structure of the output for each table is: - Table name (DE, TE, LE, etc.) - Number of rows - Sampling method (if applicable, SWRWR, NPJS, CENSUS, etc.) - Range of number sampled (if applicable) - Range of number total (if applicable) If there is a single hierarchy present the output is ordered by RDBES hierarchy structure. ```{r} print(myH1RawObject) ``` Underling the print function there is a summary function that retains only unique rows for some columns used in print. ```{r} h1summary <- summary(myH1RawObject) #get the hierarchy h1summary$hierarchy #extract the number of rows in tables from the summary sapply(h1summary$data, function(x){x$rows}) ``` To get the **number of rows** in each non-null table you can simply call the object: ```{r} myH1RawObject #equivalent to print(summary(myH1RawObject)) ``` To get an example Hierarchy 5 data: ```{r} # Hierarchy 5 demo data myH5RawObject <- H5Example validateRDBESDataObject(myH5RawObject, verbose = FALSE) # Number of rows in each non-null table and hierarchy print(myH5RawObject) ``` ### Sorting RDBESDataObject If the data has a single hierarchy in the DE table a correct sort order is defined for it. You can use the **sort()** method for it. Printing the object sorts it automatically if possible. ```{r} #before sorting names(H8ExampleEE1) ``` ```{r} #after sorting names(sort(H8ExampleEE1)) ``` ```{r} #printing the summary H8ExampleEE1 ``` ## Combining RDBESDataObjects RDBESDataObjects can be combined using the **combineRDBESDataObjects()** function. This might be required when different sampling schemes are used to collect on-shore and at-sea samples - it will often be required to combine all the data together before further analysis. ```{r combine} myCombinedRawObject <- combineRDBESDataObjects(myH1RawObject, myH5RawObject) # Number of rows in each non-null table and hierarchies print(myCombinedRawObject) validateRDBESDataObject(myCombinedRawObject, verbose = FALSE) ``` ## Filtering RDBESDataObjects RDBESDataObjects can be filtered using the **filterRDBESDataObject()** function - this allows the RDBESDataObject to be filtered by any field. A typical use of filtering might be to extract all data collected in a particular ICES division. ```{r filter} myH1RawObject <- H1Example # Number of rows in each non-null table print(myH1RawObject) myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) # Number of rows in each non-null table unlist(summary(myFilteredObject)$rows) validateRDBESDataObject(myFilteredObject, verbose = FALSE) ``` It is important to note that filtering is likely to result in "orphan" rows being produced so it is usual to also apply the **findAndKillOrphans()** function to the filtered data to remove these records. ```{r clean1} myFilteredObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) validateRDBESDataObject(myFilteredObjectNoOrphans, verbose = FALSE) ``` You can also remove any records that are not linking to a row in the VesselDetails (VD) table using the **removeBrokenVesselLinks()** function. ```{r clean2} myFields <- c("VDlenCat") myValues <- c("18-<24" ) myFilteredObject <- filterRDBESDataObject(myFilteredObjectNoOrphans, fieldsToFilter = myFields, valuesToFilter = myValues ) myFilteredObjectValidVesselLinks <- removeBrokenVesselLinks( objectToCheck = myFilteredObject, verbose = FALSE) validateRDBESDataObject(myFilteredObjectValidVesselLinks, verbose = FALSE) ``` Finally you can also remove any records that are not linking to an entry in the SpeciesListDetails (SL) table using the **removeBrokenSpeciesListLinks()** function. ```{r clean3} myFields <- c("SLspeclistName") myValues <- c("ZW_1965_SpeciesList" ) myFilteredObjectValidSpeciesLinks <- filterRDBESDataObject(myFilteredObjectValidVesselLinks, fieldsToFilter = myFields, valuesToFilter = myValues ) myFilteredObjectValidSpeciesLinks <- removeBrokenSpeciesListLinks( objectToCheck = myFilteredObjectValidSpeciesLinks, verbose = FALSE) validateRDBESDataObject(myFilteredObjectValidSpeciesLinks, verbose = FALSE) ``` See also other package vignettes: ```{r echo=FALSE, results='asis'} vignettes <- as.data.frame(browseVignettes(package = "RDBEScore")$RDBEScore) for (i in seq_along(vignettes$PDF)) { cat(sprintf("- [%s](%s)\n", vignettes$Title[i], vignettes$PDF[i])) } #END ```