| Title: | Functions for the ICES Regional Database and Estimation System (RDBES) |
|---|---|
| Description: | The RDBEScore package provides functions to import and work with fisheries data downloaded from the ICES RDBES database. It also contains functions to perform estimation analysis using the resulting objects. |
| Authors: | c( person(given = "David", family = "Currie", role = c("aut"), comment = c(ORCID = "0000-0002-3523-6895")), person(given = "Richard", family = "Meitern", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-2600-3002")), person(given = "Nuno", family = "Prista", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-5145-7241")), person(given = "Nicholas", family = "Carey", role = c("aut"), email = "[email protected]"), person(given = "Petri", family = "Sarvamaa", role = c("aut"), email = "[email protected]"), person(given = "Kirsten", family = "Birch Håkansson", role = c("aut"), email = "[email protected]"), person(given = "Karolina", family = "Molla Gazi", role = c("aut"), email = "[email protected]"), person(given = "Julia", family = "Wischnewski", role = c("aut"), email = "[email protected]"), person(given = "Ana Cláudia", family = "Fernandes", role = c("aut"), email = "[email protected]"), person(given = "Katarzyna", family = "Krakówka", role = c("aut"), email = "[email protected]"), person(given = "Marta", family = "Szymańska", role = c("aut"), email = "[email protected]"), person(given = "Nicolas", family = "Goñi", role = c("aut"), email = "[email protected]"), person(given = "Annica", family = "de Groote", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Ball", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Rault", role = c("ctb"), email = "[email protected]"), person(given = "Antti", family = "Sykkö", role = c("ctb"), email = "[email protected]"), person(given = "Liz", family = "Clarke", role = c("ctb"), email = "[email protected]"), person(given = "Chun", family = "Chen", role = c("ctb"), email = "[email protected]"), person(given = "Hongru", family = "Zhai", role = c("ctb"), email = "[email protected]"), person(given = "Eros", family = "Quesada", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Stounberg", role = c("ctb"), email = "[email protected]"), person(given = "Ana", family = "Ribeiro Santos", role = c("ctb"), email = "[email protected]"), person(given = "Jose", family = "Castro", role = c("ctb"), email = "[email protected]"), person(given = "Jessica", family = "Craig", role = c("ctb"), email = "[email protected]") ) |
| Maintainer: | Colin Millar <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 0.3.5 |
| Built: | 2026-05-06 09:16:55 UTC |
| Source: | https://github.com/ices-tools-dev/RDBEScore |
This function adds data from a CL table in an RDBESDataObject to a
BV or FM table. It combines information from the CS and CL tables and
calculates aggregate statistics such as the sum of the specified fields in the CL table.
addCLtoLowerCS( rdbes, strataListCS, strataListCL, combineStrata = T, lowerHierarchy = "C", CLfields = c("CLoffWeight"), verbose = FALSE )addCLtoLowerCS( rdbes, strataListCS, strataListCL, combineStrata = T, lowerHierarchy = "C", CLfields = c("CLoffWeight"), verbose = FALSE )
rdbes |
An object of class |
strataListCS |
A named list of filter criteria for subsetting the |
strataListCL |
A named list of filter criteria for subsetting the |
combineStrata |
Logical, if |
lowerHierarchy |
A character string specifying the level of the lower hierarchy table to which the CL data will be added. Currently, only "C" is supported ie BV data only. |
CLfields |
A character vector of field names from the |
verbose |
Logical, if |
The function first subsets the biological data in the RDBESDataObject based on the criteria in strataListCS. It then retrieves
the corresponding CL data based on the criteria in strataListCL, sums the fields specified in CLfields, and adds them as new columns
to the biological data. If combineStrata is TRUE, strata columns from the CS data are collapsed using a vertical bar (|). The function
currently supports only biological data at the "C" hierarchy level.
A data.table containing the biological data from the lower hierarchy with added strata information from the CL table and
the sum of the specified fields from the CL data.
getLowerTableSubsets, upperTblData
## Not run: strataListCS <- list(LEarea="27.3.d.28.1", LEmetier6 = "OTM_SPF_16-31_0_0", TEstratumName = month.name[1:3], SAspeCodeFAO = "SPR") strataListCL <- list(CLarea="27.3.d.28.1", CLquar = 1, CLmetier6 = "OTM_SPF_16-31_0_0", CLspecFAO = "SPR") biolCL <- addCLtoLowerCS(rdbesObject, strataListCS, strataListCL, combineStrata = TRUE, lowerHierarchy = "C", CLfields = c("CLoffWeight")) ## End(Not run)## Not run: strataListCS <- list(LEarea="27.3.d.28.1", LEmetier6 = "OTM_SPF_16-31_0_0", TEstratumName = month.name[1:3], SAspeCodeFAO = "SPR") strataListCL <- list(CLarea="27.3.d.28.1", CLquar = 1, CLmetier6 = "OTM_SPF_16-31_0_0", CLspecFAO = "SPR") biolCL <- addCLtoLowerCS(rdbesObject, strataListCS, strataListCL, combineStrata = TRUE, lowerHierarchy = "C", CLfields = c("CLoffWeight")) ## End(Not run)
Wrapper to generate probabilities. The wrapper calls runChecksOnSelectionAndProbs which main tests need to be passed before probabilities can be calculated. The it calls generateProbs for each sample in each sampling level of the hierarchy.
applyGenerateProbs( x, probType, overwrite, runInitialProbChecks = TRUE, verbose = FALSE, strict = TRUE )applyGenerateProbs( x, probType, overwrite, runInitialProbChecks = TRUE, verbose = FALSE, strict = TRUE )
x |
|
probType |
|
overwrite |
|
runInitialProbChecks |
|
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
a list of all the RDBES data tables with probabilites calculated
runChecksOnSelectionAndProbs
generateProbs
# To be added# To be added
This function checks if a specified column exists in a given data table and has unique values. If the column does not exist or has non-unique values, an error is thrown.
check_key_column(dt, col)check_key_column(dt, col)
dt |
A data table to check |
col |
A character string specifying the name of the column to check |
nothing if the column exists and has unique values, otherwise an error is thrown
## Not run: RDBEScore:::check_key_column(H1Example$DE, "DEid") ## End(Not run)## Not run: RDBEScore:::check_key_column(H1Example$DE, "DEid") ## End(Not run)
Combine Two RDBES Raw Objects combines 2 RDBESDataObjects into a single RDBESDataObject by merging individual tables one by one
combineRDBESDataObjects( RDBESDataObject1, RDBESDataObject2, verbose = FALSE, strict = TRUE )combineRDBESDataObjects( RDBESDataObject1, RDBESDataObject2, verbose = FALSE, strict = TRUE )
RDBESDataObject1 |
The first object to combine |
RDBESDataObject2 |
The second object to combine |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
When combining RDBESDataObjects from different hierarchies (e.g., H1 and H5), a warning is issued. The resulting combined object will have a mixed hierarchy, which may be structurally and statistically invalid for some analyses. However, such combinations can be useful for fisheries overviews, annual reports, or countries performing broader estimations.
the combination of RDBESDataObject1 and RDBESDataObject2
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myH5RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h5_v_1_19") myCombinedRawObject <- combineRDBESDataObjects(RDBESDataObject1=myH1RawObject, RDBESDataObject2=myH5RawObject) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myH5RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h5_v_1_19") myCombinedRawObject <- combineRDBESDataObjects(RDBESDataObject1=myH1RawObject, RDBESDataObject2=myH5RawObject) ## End(Not run)
Load raw object and create prepared object Function relies on the data being correctly named following established hierarchy
createDBEPrepObj(input, output)createDBEPrepObj(input, output)
input |
a string pointing towards the input folder |
output |
a string pointing towards the output folder |
.Rdata files
## Not run: input <- "WKRDB-EST2/testData/output/DBErawObj/" output <- "WKRDB-EST2/subGroup1/personal/John/PreparedOutputs/" createDBEPrepObj(input = input, output = output) ## End(Not run)## Not run: input <- "WKRDB-EST2/testData/output/DBErawObj/" output <- "WKRDB-EST2/subGroup1/personal/John/PreparedOutputs/" createDBEPrepObj(input = input, output = output) ## End(Not run)
This function lets you create an RDBES Data object in your current R environment.
createRDBESDataObject( input = NULL, listOfFileNames = NULL, castToCorrectDataTypes = TRUE, verbose = FALSE, ... )createRDBESDataObject( input = NULL, listOfFileNames = NULL, castToCorrectDataTypes = TRUE, verbose = FALSE, ... )
input |
Strings or |
listOfFileNames |
|
castToCorrectDataTypes |
Logical. If |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
... |
Additional parameters forwarded to helper functions used by this
function. Most commonly these are forwarded to
|
The input should be either:
A zip file downloaded from RDBES (or multiple zip files if you want to include or overwrite tables, for example CL and CE data). NOTE: Only the downloaded
RDBES data with Table data format with ids is loaded by this function and not the uploaded format.
A folder containing csv files downloaded from RDBES (e.g. the unzipped file), or any set of csv files of the RDBES tables.
A list of data frames in the current environment representing different tables in the hierarchy.
A NULL input will return and empty RDBES data object
ZIP file inputs
This input should be a path to a zip file downloaded from RDBES. Multiple
zip files can be entered if you want to include additional tables, for
example CL and CE. E.g. 'input = c("path/to/H1.zip", "path/to/CL.zip"). If
any tables in the first input are overwritten by other inputs a warning is
given. You should not input different hierarchy files; this function will not
combine them.
If the zip contains multiple hierarchies (e.g., H1 and H5 within the same
archive), you can select which one to import by passing Hierarchy via
..., for example: Hierarchy = 1. If Hierarchy is not specified and the
zip contains multiple hierarchies, an error is raised prompting you to set it.
CSV file inputs
This input should be a path to a folder of csv files. These can be the
csv files downloaded from RDBES (e.g. an unzipped hierarchy), or any set
of csv files containing RDBES tables. If the files do not have the default
RDBES name (e.g. 'Design.csv') the listOfFileNames input can by used to
specify the file names e.g. list("DE" = "DE.csv", "SD" = "SD.csv", etc.).
List of data frames inputs
This input should be a list object containing data frames (or
data.tables) for each table in your hierarchy. They should be named with the
appropriate 2-letter code (DE, SD, etc.). Columns within these tables
will be renamed to the RDBES model documentation 'R name'. Note if you choose
to create an RDBESDAtaObject from local data frames these may have not
passed the data integrity checks performed when you upload to RDBES!
NULL inputs
This input produces an empty RDBESDataObject, i.e. all tables with
correct data classes but the tables will be empty.
A RDBESDataObject
# Create an empty object myEmptyRDBESObject <- createRDBESDataObject(input = NULL)# Create an empty object myEmptyRDBESObject <- createRDBESDataObject(input = NULL)
Creates an RDBESEstObject from RDBES data
createRDBESEstObject( rdbesPrepObject, hierarchyToUse = NULL, stopTable = NULL, verbose = FALSE, strict = TRUE, incDesignVariables = TRUE )createRDBESEstObject( rdbesPrepObject, hierarchyToUse = NULL, stopTable = NULL, verbose = FALSE, strict = TRUE, incDesignVariables = TRUE )
rdbesPrepObject |
The RDBES object that should be used to create an estimation object |
hierarchyToUse |
The upper RDBES hierarchy to use |
stopTable |
(Optional) The table to stop at in the RDBES hierarchy. If specified, only tables up to and including this table will be included in the resulting RDBESEstObject. The default is NULL, which means all tables in the hierarchy will be included. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
incDesignVariables |
(Optional) Should the design variables be included? The default is TRUE. |
An object of class RDBESEstObject ready for use in design based estimation
myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA")myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA")
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
createTableOfRDBESIds(x, addSAseqNums = TRUE)createTableOfRDBESIds(x, addSAseqNums = TRUE)
x |
RDBESdataObject |
addSAseqNums |
should SAseqNum be included? Default value is TRUE |
data frame of Ids of all tables in sampling hierarchy
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") myTableOfIds<- createTableOfRDBESIds(myH1RawObject) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") myTableOfIds<- createTableOfRDBESIds(myH1RawObject) ## End(Not run)
The default files names when you download data from the RDBES.
DefaultFileNamesDefaultFileNames
A list containing the default file names for each RDBES table
A dataset containing the RDBES "design variable" names
designVariablesdesignVariables
A vector containing the short R names of the RDBES design variables (without any 2 letter table prefixes) R field name:
The design variable names
This function estimates catch at number (CANUM) for a specified biological variable, such as age or length. It aggregates data based on specified columns and generates a "plus group" for the highest value in the defined classes. The function supports grouping by various units (e.g., age, length, weight) and calculates required indices, totals, and proportions for the groups.
doBVestimCANUM( bv, addColumns, classUnits = "Ageyear", classBreaks = 1:8, verbose = FALSE )doBVestimCANUM( bv, addColumns, classUnits = "Ageyear", classBreaks = 1:8, verbose = FALSE )
bv |
A |
addColumns |
A character vector of additional column names used to group the data for aggregation (e.g., |
classUnits |
A character string specifying the class units of the biological variable to use for grouping (e.g., "Ageyear", "Lengthmm", "Weightg"). Default is "Ageyear". |
classBreaks |
A numeric vector specifying the breakpoints for classifying the biological variable. The last value defines the lower bound of the "plus group". Default is |
verbose |
Logical, if |
The function performs the following steps:
Validates the presence of the classUnits in the biological variable data.
Reshapes the input data using dcast and groups the biological variable into classes using cut().
Aggregates mean weights and lengths by the defined classes, along with calculating proportions and indices based on the sample size.
A "plus group" is created for values exceeding the highest classBreaks value.
Calculates total weights, catch numbers, and performs a sanity check to ensure there are no rounding errors in the final results.
Let:
be the mean weight for each group.
be the mean length for each group.
be the number of weight measurements in each group.
be the total number of measurements in the sample.
be the proportion of the sample represented by each group.
be the weight index for each group.
be the sum of weight indices across all groups.
be the total catch weight.
be the total weight for each group.
be the total catch number for each group.
The calculations are as follows:
Proportion of sample:
Weight Index:
Sum of Weight Indices:
Total Weight Coefficient:
Total Weight per Group:
Total Catch Number per Group:
A data.table containing the aggregated results, including groupings, calculated means, proportions, indices, and totals for the specified biological variable.
Generates the DBE estimation object for the upper hierarchy tables
doDBEestimantionObjUpp(inputList)doDBEestimantionObjUpp(inputList)
inputList |
All the data tables in a named list. Name should be equal to the short table names e.g. DE, SD, TE, FO. |
The upper hierarchy tables in the DBE estimation object (DBEestimantionObjUpp)
## Not run: H1 <- readRDS("./WKRDB-EST2/testData/output/DBErawObj/DBErawObj_DK_1966_H1.rds") H1out <- doDBEestimantionObjUpp(H1) ## End(Not run)## Not run: H1 <- readRDS("./WKRDB-EST2/testData/output/DBErawObj/DBErawObj_DK_1966_H1.rds") H1out <- doDBEestimantionObjUpp(H1) ## End(Not run)
Create design-based point and variance estimates from RDBES estimation object (rdbesEstimObj)
doDBestimation( x = rdbesEstimObj, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPik", stage = 0, domainOfinterest = NULL )doDBestimation( x = rdbesEstimObj, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPik", stage = 0, domainOfinterest = NULL )
x |
a data.frame (or data.table) in rdbesEstimObj format with value of target variable in column targetValue |
estimateType |
a string with type of estimate. As of now only "total" is defined |
pointEstimator |
a string with type of point estimator. As of now only "Unbiased" is defined |
varEstimator |
a string with type of variance estimator. As of now only "WRonPSUviaPik" is defined |
stage |
a natural number (0,1,..) with sampling stage of estimate. 0 corresponds to DE level. |
domainOfinterest |
list ofdomains of interest (e.g., SAarea). As of now only NULL (=no domain estimate) is defined |
a list of values for pointEstimate, varEstimate and estimation options
## Not run: data(shrimps) doDBestimation (x = shrimps, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPsi",stage = 0, domainOfinterest = NULL ) ## End(Not run)## Not run: data(shrimps) doDBestimation (x = shrimps, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPsi",stage = 0, domainOfinterest = NULL ) ## End(Not run)
Estimate totals and means, and try to generate samples variances for all strata in an RDBESEstObject
doEstimationForAllStrata(RDBESEstObjectForEstim, targetValue, verbose = FALSE)doEstimationForAllStrata(RDBESEstObjectForEstim, targetValue, verbose = FALSE)
RDBESEstObjectForEstim |
The RDBESEstObject to generate estimates for |
targetValue |
The field to estimate for, for example "SAsampWtLive" |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is FALSE |
A data frame containing estimates for all strata
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") # Update our test data with some random sample measurements myH1RawObject[["SA"]]$SAsampWtLive <- round(runif(n = nrow(myH1RawObject[["SA"]]), min = 1, max = 100)) myH1EstObj <- createRDBESEstObject(myH1RawObject, 1) myStrataEst <- doEstimationForAllStrata( RDBESDataObjectForEstim = myH1EstObj, targetValue = 'SAsampWtLive' ) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") # Update our test data with some random sample measurements myH1RawObject[["SA"]]$SAsampWtLive <- round(runif(n = nrow(myH1RawObject[["SA"]]), min = 1, max = 100)) myH1EstObj <- createRDBESEstObject(myH1RawObject, 1) myStrataEst <- doEstimationForAllStrata( RDBESDataObjectForEstim = myH1EstObj, targetValue = 'SAsampWtLive' ) ## End(Not run)
The function is under development and does not work yet.
doEstimationRatio( RDBESDataObj, targetValue = "LengthComp", raiseVar = "Weight", classUnits = "mm", classBreaks = c(100, 300, 10), LWparam = NULL, lowerAux = NULL, verbose = FALSE )doEstimationRatio( RDBESDataObj, targetValue = "LengthComp", raiseVar = "Weight", classUnits = "mm", classBreaks = c(100, 300, 10), LWparam = NULL, lowerAux = NULL, verbose = FALSE )
RDBESDataObj |
A validated RDBESDataObject containing hierarchical sampling and biological data. Must include appropriate tables (e.g., CL, CE, SA, FM, or BV) depending on estimation requirements. |
targetValue |
A character string specifying the type of composition to estimate. Options are "LengthComp" or "AgeComp". |
raiseVar |
The variable used to construct the ratio. |
classUnits |
Units of the class intervals for length or age, typically "mm" for millimeters or "cm" for centimeters. Used in defining class intervals. |
classBreaks |
A numeric vector of three values: minimum value, maximum value, and class width (e.g., c(100, 300, 10)). Defines the class intervals for grouping lengths or ages. |
LWparam |
A numeric vector of length two specifying parameters (a, b) for the weight-length relationship (W = a * L^b). Used if no direct weights are available but lengths are provided. |
lowerAux |
A numeric or character vector referencing a variable in the SA table used as an auxiliary variable for ratio estimation (e.g., sample weights, sub-sample expansion factors). |
verbose |
Logical; if TRUE, detailed messages are printed during processing. |
A list or data.table containing the estimated numbers at length or age and associated mean values such as weight and length, depending on input and target type.
Generic function for estimation of population total and variance
estim( y, enk, enkl, method = "SRSWOR", estFunction, varFunction, verbose = FALSE )estim( y, enk, enkl, method = "SRSWOR", estFunction, varFunction, verbose = FALSE )
y |
numeric variable to be estimated |
enk |
expected value of k |
enkl |
expected value of k, given l |
method |
character selection method code e.g SRSWOR |
estFunction |
the function to use to estimate total given parameters y and enk |
varFunction |
the function to use to estimate variance given parameters y,enk and enkl |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
list of 7 elements including the population mean, total (and their variance), the algorithm name used and the I order inclusion probabilities
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
Multiple Count Estimator for Population Total and Variance
estimMC( y, sampled, total, method = "SRSWOR", selProb = NULL, incProb = NULL, verbose = FALSE )estimMC( y, sampled, total, method = "SRSWOR", selProb = NULL, incProb = NULL, verbose = FALSE )
y |
numeric variable to be estimated |
sampled |
numeric total number of units sampled |
total |
numeric total number of units int the population |
method |
character selection method code e.g SRSWOR |
selProb |
the selection probabilities (if known) |
incProb |
the inclusion probabilities (if known) |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
list of 7 elements including the population mean, total (and their variance), the algorithm name used and the I order inclusion probabilities
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
This function transforms the estimation results into the InterCatch format.
exportEstimationResultsToInterCatchFormat(dataToExport, verbose = FALSE)exportEstimationResultsToInterCatchFormat(dataToExport, verbose = FALSE)
dataToExport |
A data frame containing the estimation results - this should include the output from the doEstimationForAllStrata function and already have the the InterCatch columns present. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
A character vector representing the flattened InterCatch exchange format. The vector includes all fields from the HI, SI, and SD components, ordered by their associated keys, and is suitable for writing to an InterCatch-formatted exchange file.
This function filters an RDBESDataObject based on specified fields and values, and can optionally remove any orphan records.
The returned object will include all rows which either: a) do not include any of the field names in fieldsToFilter, or b) do include the field names and have one of the allowed values in valuesToFilter.
If killOrphans is set to TRUE, the function will remove orphaned rows. The default is TRUE.
filterAndTidyRDBESDataObject( RDBESDataObjectToFilterAndTidy, fieldsToFilter, valuesToFilter, killOrphans = TRUE, verbose = FALSE, strict = TRUE )filterAndTidyRDBESDataObject( RDBESDataObjectToFilterAndTidy, fieldsToFilter, valuesToFilter, killOrphans = TRUE, verbose = FALSE, strict = TRUE )
RDBESDataObjectToFilterAndTidy |
The RDBESDataObject to filter. |
fieldsToFilter |
A vector of the field names you wish to check. |
valuesToFilter |
A vector of the field values you wish to filter for. |
killOrphans |
Controls if orphan rows are removed. Default is TRUE. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
The filtered input object of the same class as RDBESDataObjectToFilterAndTidy.
## Not run: myH1RawObject <- createRDBESDataObject(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") # To check how removeBrokenVesselLinks() works myH1RawObject$VD$VDlenCat[which(myH1RawObject$VD$VDencrVessCode=="VDcode_10")] <- "VL40XX" myFields <- c("VSencrVessCode", "VDlenCat") myValues <- c("VDcode_1","VDcode_2", "VDcode_10","VL1518","VL2440") myFilteredObject <- filterAndTidyRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues, killOrphans = TRUE, verboseBrokenVesselLinks = TRUE ) ## End(Not run)## Not run: myH1RawObject <- createRDBESDataObject(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") # To check how removeBrokenVesselLinks() works myH1RawObject$VD$VDlenCat[which(myH1RawObject$VD$VDencrVessCode=="VDcode_10")] <- "VL40XX" myFields <- c("VSencrVessCode", "VDlenCat") myValues <- c("VDcode_1","VDcode_2", "VDcode_10","VL1518","VL2440") myFilteredObject <- filterAndTidyRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues, killOrphans = TRUE, verboseBrokenVesselLinks = TRUE ) ## End(Not run)
The returned object will include all rows which either: a) do not included
any of the field names in fieldsToFilter, or b) do include the field names
and have one of the allowed values in valuesToFilter.
If you want to filter for a id field like DEid, FTid etc, the filtering
works only on the table where the id field is its key. For example, if you
try to filter on FOid it does not look FOid in other tables like FT,
although the field FOid exists in FT table.
filterRDBESDataObject( RDBESDataObjectToFilter, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verbose = FALSE, strict = TRUE )filterRDBESDataObject( RDBESDataObjectToFilter, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verbose = FALSE, strict = TRUE )
RDBESDataObjectToFilter |
The |
fieldsToFilter |
A vector of the field names you wish to check |
valuesToFilter |
A vector of the field values you wish to filter for |
killOrphans |
Controls if orphan rows are removed. Default is |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
killOrphans allows you to remove orphaned rows if set to TRUE. The
default is FALSE.
the filtered input object of the same class as
RDBESDataObjectToFilter
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry", "VDctry", "VDflgCtry", "FTarvLoc") myValues <- c("ZW", "ZWBZH", "ZWVFA") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) # Inverse filtering (exclude certain values) # Example: keep all DE rows except those with DEid in `excludedValues` # Compute the complement of the excluded set using setdiff allValues <- unique(myH1RawObject$DE$DEid) excludedValues <- c(5351) myInverseFiltered <- filterRDBESDataObject( myH1RawObject, fieldsToFilter = "DEid", valuesToFilter = setdiff(allValues, excludedValues) ) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry", "VDctry", "VDflgCtry", "FTarvLoc") myValues <- c("ZW", "ZWBZH", "ZWVFA") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) # Inverse filtering (exclude certain values) # Example: keep all DE rows except those with DEid in `excludedValues` # Compute the complement of the excluded set using setdiff allValues <- unique(myH1RawObject$DE$DEid) excludedValues <- c(5351) myInverseFiltered <- filterRDBESDataObject( myH1RawObject, fieldsToFilter = "DEid", valuesToFilter = setdiff(allValues, excludedValues) ) ## End(Not run)
The returned object will include all rows which include the field names
and have one of the allowed values in valuesToFilter.
filterRDBESEstObject( RDBESEstObjectToFilter, fieldsToFilter, valuesToFilter, verbose = FALSE )filterRDBESEstObject( RDBESEstObjectToFilter, fieldsToFilter, valuesToFilter, verbose = FALSE )
RDBESEstObjectToFilter |
The |
fieldsToFilter |
A vector of the field names you wish to check |
valuesToFilter |
A vector of the field values you wish to filter for |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
the filtered input object of the same class as
RDBESEstObjectToFilter
## Not run: myRawObject <- createRDBESDataObject(input = "tests\\testthat\\h1_v_1_19_26") myEstObject <- createRDBESEstObject(myRawObject,1) myFilteredEst <- filterRDBESEstObject(myEst,c("BVid"),c(7349207)) ## End(Not run)## Not run: myRawObject <- createRDBESDataObject(input = "tests\\testthat\\h1_v_1_19_26") myEstObject <- createRDBESEstObject(myRawObject,1) myFilteredEst <- filterRDBESEstObject(myEst,c("BVid"),c(7349207)) ## End(Not run)
This function finds and removed any orphan records in an RDBESDataObject. Normally data that has been downloaded from the RDBES will not contain orphan records - however if the data is subsequently filtered it is possible to introduce orphan records.
findAndKillOrphans(objectToCheck, verbose = FALSE, strict = TRUE)findAndKillOrphans(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any orphan records removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) ## End(Not run)
Internal function to identify orphan records in a given RDBESDataObject table
findOrphansByTable(tableToCheck, objectToCheck, foreignKeyIds, verbose = FALSE)findOrphansByTable(tableToCheck, objectToCheck, foreignKeyIds, verbose = FALSE)
tableToCheck |
The two letter code for the table to check |
objectToCheck |
An RDBESDataObject |
foreignKeyIds |
A vetor of the foreign key field names to check |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
A data frame with the primary keys of the table checked, the two letter table identifier, and their orphan status.
Fixes SLid in SL table (facilitating SS-SL joins).
fixSLids(RDBESDataObject, verbose = FALSE, validate = TRUE, strict = TRUE)fixSLids(RDBESDataObject, verbose = FALSE, validate = TRUE, strict = TRUE)
RDBESDataObject |
A valid RDBESDataObject |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
validate |
(Optional) Should the function validate its input data? The default is TRUE. |
strict |
(Optional) If the function validates its input data - should the validation be strict? The default is TRUE. |
RDBES SL can be seen as a join of two tables - one that identifies the species list in terms of SLcou * SLinst * SLspeclistName * SLyear * SLcatchFrac and one that specifies the taxa (SLcommTaxon * SLsppCode) in the list. In SS, SLid remits to the 1st taxa in a species list and not - as it would be expected - to the species list itself. This function fixes this by creating a new SLtaxaId variable in SL and assigning all taxa in a species to a single SSid.
an RDBESDataObject with SL ids reworked
# To add# To add
Generate any missing SS rows. When FOcatchReg=="All" it is expected that SScatchFraction is either "Catch" OR "Lan"+"Dis". In the latter case, if one is missing the other is to be assumed 0. This function generates SS rows for any missing catch fractions.
generateMissingSSRows( RDBESDataObject, speciesListName, verbose = FALSE, strict = TRUE )generateMissingSSRows( RDBESDataObject, speciesListName, verbose = FALSE, strict = TRUE )
RDBESDataObject |
A valid RDBESDataObject |
speciesListName |
The name of the Species List you want to use for any SS rows that are created. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
A data table of SS data with any missing rows added
# To follow# To follow
Generate NAs in samples using Species List information
generateNAsUsingSL( RDBESDataObject, targetAphiaId, overwriteSampled = TRUE, validate = TRUE, verbose = FALSE, strict = TRUE )generateNAsUsingSL( RDBESDataObject, targetAphiaId, overwriteSampled = TRUE, validate = TRUE, verbose = FALSE, strict = TRUE )
RDBESDataObject |
An RDBESDataObject. |
targetAphiaId |
a vector of aphiaId. |
overwriteSampled |
(Optional) should SAtotalWtMes and SAsampWtMes be set to 0 if spp recorded but absent from SL? The default is TRUE. |
validate |
(Optional) Set to TRUE if you want validation to be carried out. The default if TRUE. |
verbose |
(Optional) Set to TRUE if you want informative text on validation printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function can validate its input data - should the validation be strict? The default is TRUE. |
RDBES data object where SA was complemented with NAs for species not looked for (sensu in SL)
# To be added# To be added
Generate vector of selection or inclusion probabilities
generateProbs(x, probType, verbose = FALSE)generateProbs(x, probType, verbose = FALSE)
x |
RDBES data object |
probType |
"selection" or "inclusion" for selection and inclusion probabilities respectively |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
When the selection method is SRSWR selection probabilities are
calculated as and inclusion probabilities as
. When the selection method is SRSWOR selection
probabilities are not currently implemented. Inclusion probabilities are
calculated as . When the selection method is CENSUS both types of
probabilities are set to 1. Probabilities for selection methods UPSWR and
UPSWOR are not calculated (they need to be supplied by the user). The same
happens with regards to non-probabilistic methods
A vector or probabilities
## Not run: generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) # population size a<-generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) sum(1/a$VSincProb) # returns error generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("selection")) ## End(Not run)## Not run: generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) # population size a<-generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) sum(1/a$VSincProb) # returns error generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("selection")) ## End(Not run)
Private function to generate SS rows
generateSSRows(FOids, speciesListName, catchFra)generateSSRows(FOids, speciesListName, catchFra)
FOids |
Vector of FOids |
speciesListName |
Name of the species list |
catchFra |
The catch fraction to create |
SS data frame
Generates a named list of data tables that follow the structure of RDBESDataObject. The tables only have columns required for testing The generate tables
generateTestTbls(tblNames, prevTbls = list(), ...)generateTestTbls(tblNames, prevTbls = list(), ...)
tblNames |
character vector of table names to be created |
prevTbls |
list of data.tables upstream of the generated table. Defaults to empty list |
... |
Arguments passed on to
|
a list of named data.table's
## Not run: generateTestTbls(c("A", "B", "C"), selMeth = "SRSWOR") generateTestTbls(LETTERS[1:5]) # makes 5 tables with method CENSUS ## End(Not run)## Not run: generateTestTbls(c("A", "B", "C"), selMeth = "SRSWOR") generateTestTbls(LETTERS[1:5]) # makes 5 tables with method CENSUS ## End(Not run)
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
generateZerosUsingSL(x, verbose = FALSE, strict = TRUE)generateZerosUsingSL(x, verbose = FALSE, strict = TRUE)
x |
RDBES data frame |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
RDBES data frame where SA was complemented with species looked for (sensu in sampling objectives) but not registered in sample
Private function used by doEstimationForAllStrata to get the estimates
getEstimForStratum(x)getEstimForStratum(x)
x |
The input |
Data frame with estimated values
The getLinkedDataFromLevel function facilitates the retrieval of linked data between different levels of RDBES tables. Depending on the relative positions of the source and target tables within the RDBESDataObject, the function determines whether to traverse "up" or "down" the data hierarchy to obtain the desired linked data.
getLinkedDataFromLevel(field, values, rdbesTables, level, verbose = FALSE)getLinkedDataFromLevel(field, values, rdbesTables, level, verbose = FALSE)
field |
A character string specifying the field name from which to retrieve linked data. The first two characters of this field indicate the source table. |
values |
A vector of values corresponding to the specified |
rdbesTables |
An |
level |
A character string specifying the target table level from which to retrieve linked data. This must be one of the names within the |
verbose |
Logical flag indicating whether to print detailed information about the data retrieval process. Default is |
The subset of the table at the specified level.
## Not run: # Example 1: Going up in the table hierarchy to retrieve data from the DE table # Retrieve data from the DE level based on BVid from the BV table # This returns 1 row from the DE table getLinkedDataFromLevel("BVid", c(1), H8ExampleEE1, "DE", TRUE) # Example 2: Going down in the table hierarchy to retrieve data from the SA table # Retrieve data from the SA level based on DEid from the DE table # This returns 15 rows from the SA table getLinkedDataFromLevel("DEid", c(1), H8ExampleEE1, "SA", TRUE) # Example 3: Going up in the table hierarchy to see the Vessel that caught a specific fish # Retrieve data from the VS level based on BVfishId from the BV table getLinkedDataFromLevel("BVfishId", c("410472143", "410472144"), H8ExampleEE1, "VS", TRUE) ## End(Not run)## Not run: # Example 1: Going up in the table hierarchy to retrieve data from the DE table # Retrieve data from the DE level based on BVid from the BV table # This returns 1 row from the DE table getLinkedDataFromLevel("BVid", c(1), H8ExampleEE1, "DE", TRUE) # Example 2: Going down in the table hierarchy to retrieve data from the SA table # Retrieve data from the SA level based on DEid from the DE table # This returns 15 rows from the SA table getLinkedDataFromLevel("DEid", c(1), H8ExampleEE1, "SA", TRUE) # Example 3: Going up in the table hierarchy to see the Vessel that caught a specific fish # Retrieve data from the VS level based on BVfishId from the BV table getLinkedDataFromLevel("BVfishId", c("410472143", "410472144"), H8ExampleEE1, "VS", TRUE) ## End(Not run)
This function takes a list of subsets, a target lower level table name, and a list of tables. It returns a unique data frame containing the rows of the target lower level table that are associated with the given values of the upper table field in each subset. The function can also add the subset values to the result for reference.
getLowerTableSubsets( subsets, tblName, rdbesTables, combineStrata = TRUE, verbose = FALSE )getLowerTableSubsets( subsets, tblName, rdbesTables, combineStrata = TRUE, verbose = FALSE )
subsets |
A named list of vectors. Each vector contains values for a specific upper table field. |
tblName |
A character string specifying the name of the target lower level table. |
rdbesTables |
A RDBESData object containing the tables. |
combineStrata |
A logical value indicating whether to include the strata information in the result.
If |
verbose |
A logical value indicating whether to print informative text. |
The function recursively intersects the rows of the target lower level table that match the values from each subset in the upper tables. It then ensures that only unique rows are returned, based on the ID column of the target table.
A unique data frame containing the rows of the target lower level table that are associated with
the given values of the upper table field in each subset. If combineStrata = TRUE, the result will also include
a column for each subset with the corresponding collapsed values.
Private function to find which FO rows are not matching SS
getMissingSSCatchFraction(FOdata, SSdata, catchFra, verbose)getMissingSSCatchFraction(FOdata, SSdata, catchFra, verbose)
FOdata |
The FOdata |
SSdata |
The SSdata |
catchFra |
The catchfra |
verbose |
verbose or not? |
Vector of FOids that aren't matching SS rows
Returns the tables for a given hierarchy
getTablesInRDBESHierarchy( hierarchy, includeOptTables = TRUE, includeLowHierTables = TRUE, includeTablesNotInSampHier = TRUE, verbose = FALSE )getTablesInRDBESHierarchy( hierarchy, includeOptTables = TRUE, includeLowHierTables = TRUE, includeTablesNotInSampHier = TRUE, verbose = FALSE )
hierarchy |
Integer value between 1 and 13 inclusive |
includeOptTables |
Include any optional tables? Default value is TRUE |
includeLowHierTables |
Include the lower hierarchy tables? Default value is TRUE |
includeTablesNotInSampHier |
Include tables that aren't sampling units in that hierarcy? Default value is TRUE |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
A vector containing the 2-letter names of the tables in the requested hierarchy
getTablesInRDBESHierarchy(5)getTablesInRDBESHierarchy(5)
A dataset containing test RDBES data for H1 in the RDBESDataObject structure
H1ExampleH1Example
A list containing entries required for H1 RDBES data:
the Design data table
the Sampling Details data table
the Vessel Selection data table
the Fishing Trip data table
the Fishing Operation data table
the Species Selection data table
the Sample data table
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
the Individual Species table
A dataset containing test RDBES data for H5 in the RDBESDataObject structure
H5ExampleH5Example
A list containing entries required for H5 RDBES data:
the Design data table
the Sampling Details data table
the Fishing Trip data table
the Onshore Event data table
the Landing Event data table
the Species Selection data table
the Sample data table
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
the Individual Species table
This dataset does not have passed the RDBES upload checks, hence the object might be somewhat invalid, however it resembles real data from the Estonian Market Sampling for 2022 for 2 species
H7ExampleH7Example
A list containing entries required for H7 RDBES data:
the Design data table
the Sampling Details data table
the Onshore Sample data table
the Landing Event data table
the Species Selection data table
the Sample data table
the Biological Variable data table
the Species List data table
the Individual Species table
#' @source Richard Meitern @ Estonian Marine Institute, 2025
This dataset does not have passed the RDBES upload checks, hence the object might be somewhat invalid, however it resembles real data from the Estonian Baltic Trawling fleet for 2022 sprat total landings and commercial sampling
H8ExampleEE1H8ExampleEE1
A list containing entries required for H8 RDBES data:
the Design data table
the Sampling Details data table
the Temporal Event data table
the Vessel Selection data table
the Landing Event data table
the Species Selection data table
the Sample data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
the Individual Species
the Commertial Landing data table
the Commertial Effort data table
#' @source Richard Meitern @ Estonian Marine Institute, 2025
A dataset containing a copy of the ICES 'Species (WoRMS)' code list. The latest code list can be downloaded from https://vocab.ices.dk/.
icesSpecWoRMSicesSpecWoRMS
A data frame with the following columns:
GUID of the code type in ICES Vocabulary (e.g. the 'Species (WoRMS)' list).
Numeric ID of the code type.
GUID identifying this code record.
Numeric ID of this code record.
AphiaID (numeric key from WoRMS).
Scientific name.
(If present) additional description; often not used.
Datetime the record was last modified at ICES.
Logical; whether this code is deprecated at ICES.
Date the snapshot was downloaded, e.g. "2023-10-18".
Internal function to remove orphan records from an RDBESDataObject
killOrphans(objectToCheck, orphansToRemove)killOrphans(objectToCheck, orphansToRemove)
objectToCheck |
an RDBESDataObject |
orphansToRemove |
The output from the findOrphansByTable function (A data frame with the primary keys of the table checked, the two letter table identifier, and their orphan status.) |
RDBESDataObject with orphan records removed
This function extracts the list of functions contained within a specified R package and retrieves a brief description for each function from the package documentation. The description is obtained by parsing the Rd file associated with each function to extract the text from the \title field. In addition, the function determines whether each function is exported from the package by comparing against the package’s exported names.
listPackageFunctions(pkg)listPackageFunctions(pkg)
pkg |
A character string or an unquoted name specifying the package from which to extract functions. The package must be installed and accessible. |
The function first accesses the package namespace using asNamespace and retrieves all objects using ls. It filters these objects to include only functions. For each function, the associated help file is retrieved using utils::help and the Rd file is extracted with utils:::.getHelpFile. The internal helper function getRdTitle is then used to parse the Rd object and extract the text in the \title field. Finally, the function assembles the output into a data frame that also includes an indicator of whether each function is exported.
A data frame with three columns. The Function column lists the names of the functions found in the package. The Description column contains the brief descriptions extracted from each function’s documentation, and the Exported column is a logical vector indicating whether the function is exported from the package.
## Not run: # Extract functions from the stats package along with their descriptions and export status. tab <- listPackageFunctions("stats") print(tab) ## End(Not run)## Not run: # Extract functions from the stats package along with their descriptions and export status. tab <- listPackageFunctions("stats") print(tab) ## End(Not run)
Generate a Data Table
makeTbl( tblName, prevTbls = list(), rows = 4, propSamp = 0.5, selMeth = "CENSUS", stratums = c("U"), mean = 5 )makeTbl( tblName, prevTbls = list(), rows = 4, propSamp = 0.5, selMeth = "CENSUS", stratums = c("U"), mean = 5 )
tblName |
Name of the table |
prevTbls |
list of data.tables upstream of the generated table. Defaults to empty list |
rows |
numeric number of rows per parent record. Defaults to 4. |
propSamp |
numeric proportion of how many of total are sampled. This is ignored for "CENSUS". Defaults to 0.5 |
selMeth |
character selection method used. Defaults to "CENSUS". Others like SRSWR or SRSSWOR can be used as well |
stratums |
character vector of the stratum names to be created. Defaults to c("U"), meaning not stratified. |
mean |
numeric the expected mean of the target variable.
The variable is created using |
a data.table
A dataset containing the mapping from database column names to R field names
mapColNamesFieldRmapColNamesFieldR
A data frame containing database field names and their equivalent R field name:
The two letter prefix of the relevent RDBES table
The database field names
The equivalent R field name
The equivalent R data type (e.g. "integer", "character" etc)
The Data type in the RDBES documentation (e.g. "Decimal", etc)
Is this column considered essential?
...
Constructor for RDBESDataObject class
newRDBESDataObject( DE = NULL, SD = NULL, VS = NULL, FT = NULL, FO = NULL, TE = NULL, LO = NULL, OS = NULL, LE = NULL, SS = NULL, SA = NULL, FM = NULL, BV = NULL, VD = NULL, SL = NULL, IS = NULL, CL = NULL, CE = NULL, verbose = FALSE )newRDBESDataObject( DE = NULL, SD = NULL, VS = NULL, FT = NULL, FO = NULL, TE = NULL, LO = NULL, OS = NULL, LE = NULL, SS = NULL, SA = NULL, FM = NULL, BV = NULL, VD = NULL, SL = NULL, IS = NULL, CL = NULL, CE = NULL, verbose = FALSE )
DE |
Data table of RDBES DE data or null |
SD |
Data table of RDBES DE data or null |
VS |
Data table of RDBES DE data or null |
FT |
Data table of RDBES DE data or null |
FO |
Data table of RDBES DE data or null |
TE |
Data table of RDBES DE data or null |
LO |
Data table of RDBES DE data or null |
OS |
Data table of RDBES DE data or null |
LE |
Data table of RDBES DE data or null |
SS |
Data table of RDBES DE data or null |
SA |
Data table of RDBES DE data or null |
FM |
Data table of RDBES DE data or null |
BV |
Data table of RDBES DE data or null |
VD |
Data table of RDBES DE data or null |
SL |
Data table of RDBES DE data or null |
IS |
Data table of RDBES DE data or null |
CL |
Data table of RDBES DE data or null |
CE |
Data table of RDBES DE data or null |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
a named list
This data set is derived from the data(agstrat) used in Lohr examples 3.2 and 3.6. Table VS is stratified with VSstratumName set to agstrat$region, and VSnumberSampled and VSnumberTotal set according to agstrat. VSunitName is set to a combination of original agstrat$county, agstrat$state, agstrat$region and agstrat$agstrat row numbers. Table SA contains the variable measured agstrat$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1. Table DE, SD, FT and FO are for the most dummy tables inserted to meet RDBES model requirements to be aggregated during estimation tests. Values of mandatory fields have dummy values taken from an onboard programme, with exception of selectionMethod that is set to CENSUS. BV, FM, CL, and CE are not provided. SL and VD are subset to the essential rows.
Pckg_SDAResources_agstrat_H1Pckg_SDAResources_agstrat_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains dummy values with exception of selectionMethod that is set to CENSUS
the Sampling Details data table. Contains dummy values
the Vessel Selection data table. Contains core information of data(agstrat), VSstratumName set to agstrat$region, and VSnumberSampled and VSnumberTotal set according to agstrat, VSunitName is set to a combination of original agstrat$county, agstrat$state, agstrat$region and agstrat$agstrat row numbers
the Fishing Trip data table. Contains dummy values
the Fishing Operation data table. Contains dummy values
the Species Selection data table. Contains dummy values
the Sample data table. Contains the variable measured agstrat$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1
the Frequency Measure data table. Not provided
the Biological Variable data table. Not provided
the Vessel Details data table. Subset to the essential rows
the Species List data table. Subset to the essential rows
the Individual Species table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 2-stage cluster sampling with clusters of unequal sizes. An SRS of 40 districts is selected (psus) from the 757 districts in the population and then up to 5 schools (min
were selected from each district (ssus).
Pckg_survey_apiclus2_H1Pckg_survey_apiclus2_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_apiclus2_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 40 child rows (the 40 districts), VSnumberTotal is 757, VSnumberSampled is 40
the Fishing Trip data table. Contains 126 child rows (the 126 schools finally observed), each associated to its cluster (dname), FTnumberTotal is the number of schools in district, FTnumberSAmpled is 1...5 schools sampled
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll (NB! there are 4 NAs)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
the Individual Species table
https://CRAN.R-project.org/package=survey
This data set is a stratified version of the previous "apiclus2" data. It is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 1-stage cluster sampling with clusters of unequal sizes. An SRS of 200 districts is selected (psus) from the 755 districts in the population. All schools within district are selected (ssus).
Pckg_survey_apistrat_H1Pckg_survey_apistrat_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 200 child rows (the 200 schools finally observed), each associated to its cluster (dname), VSnumberTotalClusters is 755, VSnumberTotal is 50-100 schools sampled
the Fishing Trip data table. Contains 200 child rows (the 200 schools finally observed), each associated to its cluster (dname), FTnumberTotal is the number of schools in the cluster (census)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table. Contains 311 child rows
the Species List data table. Contains 1 child row
the Individual Species table
https://CRAN.R-project.org/package=survey
Private function to get sub-sample level and top-level SAid for SA data
prepareSubSampleLevelLookup(SAdata)prepareSubSampleLevelLookup(SAdata)
SAdata |
The SA data to check |
A data.table with SAid, topLevelSAid and subSampleLevel
This method prints the hierarchy of the DE data.table (if it exists), and the number of rows for each data.table in the RDBESDataObject that is not NULL. It also provides the sampling method and number sampled and number total for tables where it is applicable. If the RDBESDataObject has a mixed hierarchy, a warning message is printed.
This method sorts the RDBESDataObject based on the hierarchy.
This method returns a list containing the hierarchy of the DE data.table, the number of rows for each data.table in the RDBESDataObject that is not NULL, and a logical value indicating if the hierarchy is not NULL.
## S3 method for class 'RDBESDataObject' print(x, ...) ## S3 method for class 'RDBESDataObject' sort(x, decreasing = TRUE, ...) ## S3 method for class 'RDBESDataObject' summary(object, ...)## S3 method for class 'RDBESDataObject' print(x, ...) ## S3 method for class 'RDBESDataObject' sort(x, decreasing = TRUE, ...) ## S3 method for class 'RDBESDataObject' summary(object, ...)
x |
An object of class RDBESDataObject. |
... |
parameters to underling functions (not used currently) |
decreasing |
should hierarchy tables be the first ones |
object |
An object of class RDBESDataObject. |
None.
The sorted RDBESDataObject by hierarchy.
A list with three elements:
hierarchy: The hierarchy of the DE data.table in the RDBESDataObject.
rows: A named list where the names are the names of the data.tables in the RDBESDataObject and the values are the number of rows in each data.table. NULL values are excluded.
CS: A logical value indicating if the hierarchy is not NULL.
# Print the package data object print(H1Example) # Sort the package data sort(H8ExampleEE1) # Get summary of the package data summary(H1Example)# Print the package data object print(H1Example) # Sort the package data sort(H8ExampleEE1) # Get summary of the package data summary(H1Example)
Private function to process the lower hierarchies when creating the RDBESEstObject
procRDBESEstObjLowHier(rdbesPrepObject, verbose = FALSE)procRDBESEstObjLowHier(rdbesPrepObject, verbose = FALSE)
rdbesPrepObject |
A prepared RDBESRawObj |
verbose |
logical. Output messages to console. |
allLower - the FM and BV tables combined
Private function to process the upper hierarchies when creating the RDBESEstObject
procRDBESEstObjUppHier( myRDBESEstObj = NULL, rdbesPrepObject, hierarchyToUse, i = 1, targetTables, verbose = FALSE )procRDBESEstObjUppHier( myRDBESEstObj = NULL, rdbesPrepObject, hierarchyToUse, i = 1, targetTables, verbose = FALSE )
myRDBESEstObj |
An RDBESEstObj to add data to |
rdbesPrepObject |
A prepared RDBESDataObject |
hierarchyToUse |
The hierarchy we are using |
i |
Integer to keep track of where we are in the list of tables |
targetTables |
The RDBES tables we are interested in |
verbose |
logical. Output messages to console. |
A partial RDBESEstObject with the data from the upper hierarchy
Remove rows which are not pointing to a valid SpecliestListDetails (SL) records i.e.those rows which have a value of SpeciesListName that does not exist in the SL table.
removeBrokenSpeciesListLinks(objectToCheck, verbose = FALSE, strict = TRUE)removeBrokenSpeciesListLinks(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any records with an invalid SpeciesListName rows removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SLspeclistName") myValues <- c("WGRDBES-EST TEST 5 - sprat data") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidSpeciesListLinks <- removeBrokenSpeciesListLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SLspeclistName") myValues <- c("WGRDBES-EST TEST 5 - sprat data") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidSpeciesListLinks <- removeBrokenSpeciesListLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
Remove rows which are not pointing to a valid VesselDetails (VD) records i.e. those rows which have a value of VDid that does not exist in the VD table.
removeBrokenVesselLinks(objectToCheck, verbose = FALSE, strict = TRUE)removeBrokenVesselLinks(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any records with an invalid VDid removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("VDlenCat") myValues <- c("18-<24") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidVesselLinks <- removeBrokenVesselLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("VDlenCat") myValues <- c("18-<24") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidVesselLinks <- removeBrokenVesselLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
This function runs some basic checks on selection methods and and probabilities of the different sampling tables of a hierarchy. It should be run ahead of generateProbs to secure its correct execution and for that reason it is included in the wrapper applyGenerateProbs.
runChecksOnSelectionAndProbs(x, verbose = FALSE, strict = TRUE)runChecksOnSelectionAndProbs(x, verbose = FALSE, strict = TRUE)
x |
|
verbose |
|
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
nothing
applyGenerateProbs
generateProbs
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
For a given RDBESDataObject convert the required columns to the correct data types. (This function can cause an error if we have data in the columns that can't be cast to the desired data type.)
setRDBESDataObjectDataTypes(RDBESDataObjectToConvert)setRDBESDataObjectDataTypes(RDBESDataObjectToConvert)
RDBESDataObjectToConvert |
list - the raw item for conversion |
An RDBESDataObject with the correct date types for the required fields
A dataset of rdbesEstimObj type containing simplified haul-level samples (rows) of shrimp landings (targetValue, in kg) observed onboard using H1 of RDBES with UPWOR on vessels. Data is provided for developing/testing purposes only.
shrimpsshrimps
A data frame with 10 rows and 95 variables:
DEsamplingScheme - Sampling Scheme
DEyear - Year of data collection
DEstratumName - Fishery code
DEhierarchyCorrect - Design Variable of RDBES. More details in RDBES documentation
DEhierarchy - Design Variable of RDBES. More details in RDBES documentation
DEsampled - Design Variable of RDBES. More details in RDBES documentation
DEreasonNotSampled - Design Variable of RDBES. More details in RDBES documentation
SDcountry - Country that collected the data
SDinstitution - Institution that collected the data
su1, su2, su3, su4, su5 - sampling units of RDBES. More details in RDBES documentation
XXXnumberSampled, ... - Design Variables of RDBES. More details in RDBES documentation
targetValue - estimate of weight landed in each haul (in kg)
plus XX other columns
Nuno Prista @ SLU Aqua, 2022
A dataset of rdbesEstimObj type containing simplified haul-level samples (rows) of shrimp catches (targetValue, in kg) observed onboard using H1 of RDBES with UPWOR on vessels. Catches are divided into three strata (91, 92, 93_94) that correspond to sorting sieves used onboard. Data is provided for developing/testing purposes only.
shrimpsStratshrimpsStrat
A data frame with 10 rows and 95 variables:
DEsamplingScheme - Sampling Scheme
DEyear - Year of data collection
DEstratumName - Fishery code
DEhierarchyCorrect - Design Variable of RDBES. More details in RDBES documentation
DEhierarchy - Design Variable of RDBES. More details in RDBES documentation
DEsampled - Design Variable of RDBES. More details in RDBES documentation
DEreasonNotSampled - Design Variable of RDBES. More details in RDBES documentation
SDcountry - Country that collected the data
SDinstitution - Institution that collected the data
su1, su2, su3, su4, su5 - sampling units of RDBES. More details in RDBES documentation
XXXnumberSampled, ... - Design Variables of RDBES. More details in RDBES documentation
su5stratumName - sieve fraction
targetValue - estimate of weight fraction in each haul (in kg)
plus XX other columns
Nuno Prista @ SLU Aqua, 2022
A data frame containing the tables required for each RDBES hierachy
tablesInRDBESHierarchiestablesInRDBESHierarchies
A data frame containing the tables required for each RDBES hierachy.
the hierachy this applies to H1 to H13
the 2-letter table name
is this a lower hierarchy table?
is this table optional within the hierarchy?
is this table a sampling unit within the hierarchy?
the table sort order within the hiaerarchy
https://github.com/davidcurrie2001/MI_RDBES_ExchangeFiles
Function checks the rank of aphia id in both of tables SA and SL, and tries to replace a more accurate rank in SA with a broader rank from SL. There are 3 possible situations:
If the aphiaid id in the SA table is more accurate than in the SL table and the aphiaids are from the same kingdom, phylum, class, order, family, or genus then the aphiaid in the SA table is changed to the aphia id from the SL table.
If the aphia Ids are from different kingdom, phylum, class, order, family, or genus then the function retains the original SA species code.
If the SL table has a more accurate rank than in the SA table, the function also retains the original SA species code.
updateSAwithTaxonFromSL( RDBESDataObject, validate = TRUE, verbose = FALSE, strict = TRUE )updateSAwithTaxonFromSL( RDBESDataObject, validate = TRUE, verbose = FALSE, strict = TRUE )
RDBESDataObject |
An RDBESDataObject. |
validate |
Set to TRUE if you want validation to be carried out. The default if TRUE. |
verbose |
(Optional) Set to TRUE if you want informative text on validation printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function can validate its input data - should the validation be strict? The default is TRUE. |
RDBES data object where species in SA were renaming for species occuring in SL for level of species rank. If in SA is Sprat(126425), in SL Clupeidae (125464) function renameSpeciesSA rename Sprat from SA to Clupeidae. Clupeidae(family rank) is higher rank than Sprat(species rank).
## Not run: myObject <- createRDBESDataObject(input = "WGRDBES-EST/personal/Kasia/vignettes/vignetteData") renameSpeciesSA(RDBESDataObject=myObject,validate,verbose,strict) ## End(Not run)## Not run: myObject <- createRDBESDataObject(input = "WGRDBES-EST/personal/Kasia/vignettes/vignetteData") renameSpeciesSA(RDBESDataObject=myObject,validate,verbose,strict) ## End(Not run)
RDBESDataObject is in a Valid FormatPerform basic checks on a object.
validateRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE ) checkRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE )validateRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE ) checkRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE )
objectToCheck |
RDBESDataObject i.e. a list of data.tables |
checkDataTypes |
(Optional) Set to TRUE if you want to check that the data types of the required columns are correct, or FALSE if you don't care. Default value is FALSE. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want an error if validation fails, set to FALSE if you want only a warning to be issued. The default is TRUE. |
Checks if 'objectToCheck' parameter is valid. Returns the parameter if it is
valid and otherwise stops on error.
It checks the RDBESDataObject if:
Is this an object of class RDBESDataObject
Tables don't have column names that aren't allowed
Tables have all the required column names
It does not check if the data is valid. The RDBES upload system performs an extensive set of checks on the uploaded data.
Returns objectToCheck
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") validateRDBESDataObject(myH1RawObject) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") validateRDBESDataObject(myH1RawObject) ## End(Not run)
Checks the data types of the columns in an RDBESDataObject against an expected list of data types. Any differences are returned
validateRDBESDataObjectDataTypes(objectToCheck)validateRDBESDataObjectDataTypes(objectToCheck)
objectToCheck |
An RDBESDataObject to check |
A data frame containing any data type differences (an empty data frame if there are no differences)
check RDBES Raw Object Content Private function to do some basic checks on the content of the RDBESDataObject (e.g. all required field names are present). Function is only used by checkRDBESDataObject and should only be passed a list of non-null objects
validateRDBESDataObjectDuplicates( objectToCheck, verbose = FALSE, strict = TRUE )validateRDBESDataObjectDuplicates( objectToCheck, verbose = FALSE, strict = TRUE )
objectToCheck |
|
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want to be sure all columns are present in the data, set to FALSE if you only want to check that essential columns are present. The default is TRUE. |
list with first element as the object and the second the warnings
are all required fields present? 2) are there any extra fields present? It is used by validateRDBESDataObject() and should only be passed a list of non-null objects
check RDBES Data Object field names Private function to do some checks on the columns of an RDBESDataObject -
are all required fields present? 2) are there any extra fields present? It is used by validateRDBESDataObject() and should only be passed a list of non-null objects
validateRDBESDataObjectFieldNames( objectToCheck, verbose = FALSE, strict = TRUE )validateRDBESDataObjectFieldNames( objectToCheck, verbose = FALSE, strict = TRUE )
objectToCheck |
|
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want to be sure all columns are present in the data, set to FALSE if you only want to check that essential columns are present. The default is TRUE. |
list with first element as a boolean indicating validity and the second element contains any warnings
Check whether an object is a valid RDBESEstObject
validateRDBESEstObject(objectToCheck, verbose = FALSE)validateRDBESEstObject(objectToCheck, verbose = FALSE)
objectToCheck |
The object to check |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
Whoever revises this function please specify what it returns here
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") myEStObj <- createRDBESEstObject(myH1RawObject,1) validateRDBESEstObject(myEStObj) ## End(Not run)## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") myEStObj <- createRDBESEstObject(myH1RawObject,1) validateRDBESEstObject(myEStObj) ## End(Not run)
A dataset containing aphia records for species found in icesSpecWoRMS
wormsAphiaRecordwormsAphiaRecord
A data frame
E.g. 100684
E.g. "https://www.marinespecies.org/aphia.php?p=taxdetails&id=100684"
E.g. "Cerianthidae"
E.g. "Milne Edwards & Haime, 1851"
E.g. "accepted"
E.g. NA
E.g. 140
E.g. "Family" "Genus" "Species" "Species"
E.g. 100684
E.g. "Cerianthidae"
E.g. "Milne Edwards & Haime, 1851"
E.g. 151646
E.g. "Animalia"
E.g. "Cnidaria"
E.g. "Anthozoa"
E.g. "Spirularia"
E.g. "Cerianthidae"
E.g. NA "Cerianthus"
E.g. "Molodtsova, T. (2023). World List of Ceriantharia. Cerianthidae Milne Edwards & Haime, 1851. Accessed through: "...
internal database identifier
E.g. 1
E.g. 1
E.g. 0
E.g. 0
E.g. NA
E.g. "exact"
E.g. "2018-01-22T17:48:34.063Z"
E.g. "2023-10-18"
...
https://www.marinespecies.org/