Title: | Functions for the ICES Regional Database and Estimation System (RDBES) |
---|---|
Description: | The RDBEScore package provides functions to import and work with fisheries data downloaded from the ICES RDBES database. It also contains functions to perform estimation analysis using the resulting objects. |
Authors: | c( person(given = "David", family = "Currie", role = c("aut"), comment = c(ORCID = "0000-0002-3523-6895")), person(given = "Richard", family = "Meitern", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-2600-3002")), person(given = "Nuno", family = "Prista", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-5145-7241")), person(given = "Nicholas", family = "Carey", role = c("aut"), email = "[email protected]"), person(given = "Petri", family = "Sarvamaa", role = c("aut"), email = "[email protected]"), person(given = "Kirsten", family = "Birch Håkansson", role = c("aut"), email = "[email protected]"), person(given = "Karolina", family = "Molla Gazi", role = c("aut"), email = "[email protected]"), person(given = "Julia", family = "Wischnewski", role = c("aut"), email = "[email protected]"), person(given = "Ana Cláudia", family = "Fernandes", role = c("aut"), email = "[email protected]"), person(given = "Katarzyna", family = "Krakówka", role = c("aut"), email = "[email protected]"), person(given = "Marta", family = "Szymańska", role = c("aut"), email = "[email protected]"), person(given = "Nicolas", family = "Goñi", role = c("aut"), email = "[email protected]"), person(given = "Annica", family = "de Groote", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Ball", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Rault", role = c("ctb"), email = "[email protected]"), person(given = "Antti", family = "Sykkö", role = c("ctb"), email = "[email protected]"), person(given = "Liz", family = "Clarke", role = c("ctb"), email = "[email protected]"), person(given = "Chun", family = "Chen", role = c("ctb"), email = "[email protected]"), person(given = "Hongru", family = "Zhai", role = c("ctb"), email = "[email protected]"), person(given = "Eros", family = "Quesada", role = c("ctb"), email = "[email protected]"), person(given = "Jonathan", family = "Stounberg", role = c("ctb"), email = "[email protected]"), person(given = "Ana", family = "Ribeiro Santos", role = c("ctb"), email = "[email protected]"), person(given = "Jose", family = "Castro", role = c("ctb"), email = "[email protected]"), person(given = "Jessica", family = "Craig", role = c("ctb"), email = "[email protected]") ) |
Maintainer: | Colin Millar <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 0.3.1 |
Built: | 2024-12-13 05:59:14 UTC |
Source: | https://github.com/ices-tools-dev/RDBEScore |
Wrapper to generate probabilities. The wrapper calls runChecksOnSelectionAndProbs which main tests need to be passed before probabilities can be calculated. The it calls generateProbs for each sample in each sampling level of the hierarchy.
applyGenerateProbs( x, probType, overwrite, runInitialProbChecks = TRUE, strict = TRUE )
applyGenerateProbs( x, probType, overwrite, runInitialProbChecks = TRUE, strict = TRUE )
x |
|
probType |
|
overwrite |
|
runInitialProbChecks |
|
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
a list of all the RDBES data tables with probabilites calculated
runChecksOnSelectionAndProbs
generateProbs
# To be added
# To be added
This function checks if a specified column exists in a given data table and has unique values. If the column does not exist or has non-unique values, an error is thrown.
check_key_column(dt, col)
check_key_column(dt, col)
dt |
A data table to check |
col |
A character string specifying the name of the column to check |
nothing if the column exists and has unique values, otherwise an error is thrown
## Not run: RDBEScore:::check_key_column(H1Example$DE, "DEid") ## End(Not run)
## Not run: RDBEScore:::check_key_column(H1Example$DE, "DEid") ## End(Not run)
Combine Two RDBES Raw Objects combines 2 RDBESDataObjects into a single RDBESDataObject by merging individual tables one by one
combineRDBESDataObjects(RDBESDataObject1, RDBESDataObject2, strict = TRUE)
combineRDBESDataObjects(RDBESDataObject1, RDBESDataObject2, strict = TRUE)
RDBESDataObject1 |
The first object to combine |
RDBESDataObject2 |
The second object to combine |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
the combination of RDBESDataObject1
and RDBESDataObject2
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myH5RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h5_v_1_19") myCombinedRawObject <- combineRDBESDataObjects(RDBESDataObject1=myH1RawObject, RDBESDataObject2=myH5RawObject) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myH5RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h5_v_1_19") myCombinedRawObject <- combineRDBESDataObjects(RDBESDataObject1=myH1RawObject, RDBESDataObject2=myH5RawObject) ## End(Not run)
Load raw object and create prepared object Function relies on the data being correctly named following established hierarchy
createDBEPrepObj(input, output)
createDBEPrepObj(input, output)
input |
a string pointing towards the input folder |
output |
a string pointing towards the output folder |
.Rdata files
## Not run: input <- "WKRDB-EST2/testData/output/DBErawObj/" output <- "WKRDB-EST2/subGroup1/personal/John/PreparedOutputs/" createDBEPrepObj(input = input, output = output) ## End(Not run)
## Not run: input <- "WKRDB-EST2/testData/output/DBErawObj/" output <- "WKRDB-EST2/subGroup1/personal/John/PreparedOutputs/" createDBEPrepObj(input = input, output = output) ## End(Not run)
This function lets you create an RDBES Data object in your current R environment.
createRDBESDataObject( input = NULL, listOfFileNames = NULL, castToCorrectDataTypes = TRUE, ... )
createRDBESDataObject( input = NULL, listOfFileNames = NULL, castToCorrectDataTypes = TRUE, ... )
input |
Strings or |
listOfFileNames |
|
castToCorrectDataTypes |
Logical. If |
... |
parameters passed to validateRDBESDataObject
if input is list of data frames e.g. |
The input
should be either:
A zip
file downloaded from RDBES (or multiple zip files if you want to include or overwrite tables, for example CL and CE data)
A folder containing csv
files downloaded from RDBES (e.g. the unzipped file), or any set of csv files of the RDBES tables.
A list
of data frames in the current environment representing different tables in the hierarchy.
A NULL
input will return and empty RDBES data object
ZIP file inputs
This input
should be a path to a zip file downloaded from RDBES. Multiple
zip files can be entered if you want to include additional tables, for
example CL and CE. E.g. 'input = c("path/to/H1.zip", "path/to/CL.zip"). If
any tables in the first input are overwritten by other inputs a warning is
given. You should not input different hierarchy files; this function will not
combine them.
CSV file inputs
This input
should be a path to a folder of csv
files. These can be the
csv
files downloaded from RDBES (e.g. an unzipped hierarchy), or any set
of csv files containing RDBES tables. If the files do not have the default
RDBES name (e.g. 'Design.csv') the listOfFileNames
input can by used to
specify the file names e.g. list("DE" = "DE.csv", "SD" = "SD.csv", etc.)
.
List of data frames inputs
This input
should be a list
object containing data frames (or
data.tables) for each table in your hierarchy. They should be named with the
appropriate 2-letter code (DE
, SD
, etc.). Columns within these tables
will be renamed to the RDBES model documentation 'R name'. Note if you choose
to create an RDBESDAtaObject
from local data frames these may have not
passed the data integrity checks performed when you upload to RDBES!
NULL inputs
This input
produces an empty RDBESDataObject
, i.e. all tables with
correct data classes but the tables will be empty.
A RDBESDataObject
myEmptyRDBESObject <- createRDBESDataObject(input = NULL)
myEmptyRDBESObject <- createRDBESDataObject(input = NULL)
Creates an rdbesEStObject from prepared RDBES data
createRDBESEstObject( rdbesPrepObject, hierarchyToUse = NULL, stopTable = NULL, verbose = FALSE, strict = TRUE )
createRDBESEstObject( rdbesPrepObject, hierarchyToUse = NULL, stopTable = NULL, verbose = FALSE, strict = TRUE )
rdbesPrepObject |
The prepared RDBES object that should be used to create an estimation object |
hierarchyToUse |
The upper RDBES hiearchy to use |
stopTable |
(Optional) The table to stop at in the RDBES hierarchy. If specified, only tables up to and including this table will be included in the resulting RDBESEstObject. The default is NULL, which means all tables in the hierarchy will be included. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
An object of class RDBESEstObject ready for use in design based estimation
An object of class RDBESEstObject ready for use in design based estimation
#Creates an rdbesEStObject from prepared RDBES data myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA") myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA")
#Creates an rdbesEStObject from prepared RDBES data myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA") myH1EstObj <- createRDBESEstObject(H1Example, 1, "SA")
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
createTableOfRDBESIds(x, addSAseqNums = TRUE)
createTableOfRDBESIds(x, addSAseqNums = TRUE)
x |
RDBESdataObject |
addSAseqNums |
should SAseqNum be included? Default value is TRUE |
data frame of Ids of all tables in sampling hierarchy
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") myTableOfIds<- createTableOfRDBESIds(myH1RawObject) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") myTableOfIds<- createTableOfRDBESIds(myH1RawObject) ## End(Not run)
A dataset containing the RDBES "design variable" names
designVariables
designVariables
A vector containing the short R names of the RDBES design variables (without any 2 letter table prefixes) R field name:
The design variable names
Generates the DBE estimation object for the upper hierarchy tables
doDBEestimantionObjUpp(inputList)
doDBEestimantionObjUpp(inputList)
inputList |
All the data tables in a named list. Name should be equal to the short table names e.g. DE, SD, TE, FO. |
The upper hierarchy tables in the DBE estimation object (DBEestimantionObjUpp)
## Not run: H1 <- readRDS("./WKRDB-EST2/testData/output/DBErawObj/DBErawObj_DK_1966_H1.rds") H1out <- doDBEestimantionObjUpp(H1) ## End(Not run)
## Not run: H1 <- readRDS("./WKRDB-EST2/testData/output/DBErawObj/DBErawObj_DK_1966_H1.rds") H1out <- doDBEestimantionObjUpp(H1) ## End(Not run)
Create design-based point and variance estimates from RDBES estimation object (rdbesEstimObj)
doDBestimation( x = rdbesEstimObj, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPik", stage = 0, domainOfinterest = NULL )
doDBestimation( x = rdbesEstimObj, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPik", stage = 0, domainOfinterest = NULL )
x |
a data.frame (or data.table) in rdbesEstimObj format with value of target variable in column targetValue |
estimateType |
a string with type of estimate. As of now only "total" is defined |
pointEstimator |
a string with type of point estimator. As of now only "Unbiased" is defined |
varEstimator |
a string with type of variance estimator. As of now only "WRonPSUviaPik" is defined |
stage |
a natural number (0,1,..) with sampling stage of estimate. 0 corresponds to DE level. |
domainOfinterest |
list ofdomains of interest (e.g., SAarea). As of now only NULL (=no domain estimate) is defined |
a list of values for pointEstimate, varEstimate and estimation options
## Not run: data(shrimps) doDBestimation (x = shrimps, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPsi",stage = 0, domainOfinterest = NULL ) ## End(Not run)
## Not run: data(shrimps) doDBestimation (x = shrimps, estimateType = "total", pointEstimator = "Unbiased", varEstimator = "WRonPSUviaPsi",stage = 0, domainOfinterest = NULL ) ## End(Not run)
Estimate totals and means, and try to generate samples variances for all strata in an RDBESEstObject
doEstimationForAllStrata(RDBESEstObjectForEstim, targetValue, verbose = FALSE)
doEstimationForAllStrata(RDBESEstObjectForEstim, targetValue, verbose = FALSE)
RDBESEstObjectForEstim |
The RDBESEstObject to generate estimates for |
targetValue |
The field to estimate for, for example "SAsampWtLive" |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is FALSE |
A data frame containing estimates for all strata
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") # Update our test data with some random sample measurements myH1RawObject[["SA"]]$SAsampWtLive <- round(runif(n = nrow(myH1RawObject[["SA"]]), min = 1, max = 100)) myH1EstObj <- createRDBESEstObject(myH1RawObject, 1) myStrataEst <- doEstimationForAllStrata( RDBESDataObjectForEstim = myH1EstObj, targetValue = 'SAsampWtLive' ) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") # Update our test data with some random sample measurements myH1RawObject[["SA"]]$SAsampWtLive <- round(runif(n = nrow(myH1RawObject[["SA"]]), min = 1, max = 100)) myH1EstObj <- createRDBESEstObject(myH1RawObject, 1) myStrataEst <- doEstimationForAllStrata( RDBESDataObjectForEstim = myH1EstObj, targetValue = 'SAsampWtLive' ) ## End(Not run)
Generic function for estimation of population total and variance
estim(y, enk, enkl, method = "SRSWOR", estFunction, varFunction)
estim(y, enk, enkl, method = "SRSWOR", estFunction, varFunction)
y |
numeric variable to be estimated |
enk |
expected value of k |
enkl |
expected value of k, given l |
method |
character selection method code e.g SRSWOR |
estFunction |
the function to use to estimate total given parameters y and enk |
varFunction |
the function to use to estimate variance given parameters y,enk and enkl |
list of 7 elements including the population mean, total (and their variance), the algorithm name used and the I order inclusion probabilities
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
Multiple Count Estimator for Population Total and Variance
estimMC(y, sampled, total, method = "SRSWOR", selProb = NULL, incProb = NULL)
estimMC(y, sampled, total, method = "SRSWOR", selProb = NULL, incProb = NULL)
y |
numeric variable to be estimated |
sampled |
numeric total number of units sampled |
total |
numeric total number of units int the population |
method |
character selection method code e.g SRSWOR |
selProb |
the selection probabilities (if known) |
incProb |
the inclusion probabilities (if known) |
list of 7 elements including the population mean, total (and their variance), the algorithm name used and the I order inclusion probabilities
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
estimMC(c(3, 4, 4, 5), c(4, 4, 4, 4), c(8, 8, 8, 8))
This function filters an RDBESDataObject based on specified fields and values, and can optionally remove any orphan records.
The returned object will include all rows which either: a) do not include any of the field names in fieldsToFilter
, or b) do include the field names and have one of the allowed values in valuesToFilter
.
If killOrphans
is set to TRUE
, the function will remove orphaned rows. The default is FALSE
.
filterAndTidyRDBESDataObject( RDBESDataObjectToFilterAndTidy, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verboseOrphans = FALSE, verboseBrokenVesselLinks = FALSE )
filterAndTidyRDBESDataObject( RDBESDataObjectToFilterAndTidy, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verboseOrphans = FALSE, verboseBrokenVesselLinks = FALSE )
RDBESDataObjectToFilterAndTidy |
The RDBESDataObject to filter. |
fieldsToFilter |
A vector of the field names you wish to check. |
valuesToFilter |
A vector of the field values you wish to filter for. |
killOrphans |
Controls if orphan rows are removed. Default is |
verboseOrphans |
Controls if verbose output for orphan rows is printed. Default is |
verboseBrokenVesselLinks |
Controls if verbose output for broken vessel links is printed. Default is |
The filtered input object of the same class as RDBESDataObjectToFilterAndTidy
.
## Not run: myH1RawObject <- createRDBESDataObject(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") # To check how removeBrokenVesselLinks() works myH1RawObject$VD$VDlenCat[which(myH1RawObject$VD$VDencrVessCode=="VDcode_10")] <- "VL40XX" myFields <- c("VSencrVessCode", "VDlenCat") myValues <- c("VDcode_1","VDcode_2", "VDcode_10","VL1518","VL2440") myFilteredObject <- filterAndTidyRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues, killOrphans = TRUE, verboseBrokenVesselLinks = TRUE ) ## End(Not run)
## Not run: myH1RawObject <- createRDBESDataObject(rdbesExtractPath = "tests\\testthat\\h1_v_1_19_13") # To check how removeBrokenVesselLinks() works myH1RawObject$VD$VDlenCat[which(myH1RawObject$VD$VDencrVessCode=="VDcode_10")] <- "VL40XX" myFields <- c("VSencrVessCode", "VDlenCat") myValues <- c("VDcode_1","VDcode_2", "VDcode_10","VL1518","VL2440") myFilteredObject <- filterAndTidyRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues, killOrphans = TRUE, verboseBrokenVesselLinks = TRUE ) ## End(Not run)
The returned object will include all rows which either: a) do not included
any of the field names in fieldsToFilter
, or b) do include the field names
and have one of the allowed values in valuesToFilter
.
If you want to filter for a id field like DEid
, FTid
etc, the filtering
works only on the table where the id field is its key. For example, if you
try to filter on FOid
it does not look FOid
in other tables like FT
,
although the field FOid
exists in FT
table.
filterRDBESDataObject( RDBESDataObjectToFilter, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verbose = FALSE, strict = TRUE )
filterRDBESDataObject( RDBESDataObjectToFilter, fieldsToFilter, valuesToFilter, killOrphans = FALSE, verbose = FALSE, strict = TRUE )
RDBESDataObjectToFilter |
The |
fieldsToFilter |
A vector of the field names you wish to check |
valuesToFilter |
A vector of the field values you wish to filter for |
killOrphans |
Controls if orphan rows are removed. Default is |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
killOrphans
allows you to remove orphaned rows if set to TRUE
. The
default is FALSE
.
the filtered input object of the same class as
RDBESDataObjectToFilter
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry", "VDctry", "VDflgCtry", "FTarvLoc") myValues <- c("ZW", "ZWBZH", "ZWVFA") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry", "VDctry", "VDflgCtry", "FTarvLoc") myValues <- c("ZW", "ZWBZH", "ZWVFA") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) ## End(Not run)
This function finds and removed any orphan records in an RDBESDataObject. Normally data that has been downloaded from the RDBES will not contain orphan records - however if the data is subsequently filtered it is possible to introduce orphan records.
findAndKillOrphans(objectToCheck, verbose = FALSE, strict = TRUE)
findAndKillOrphans(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any orphan records removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) ## End(Not run)
Internal function to identify orphan records in a given RDBESDataObject table
findOrphansByTable(tableToCheck, objectToCheck, foreignKeyIds, verbose = FALSE)
findOrphansByTable(tableToCheck, objectToCheck, foreignKeyIds, verbose = FALSE)
tableToCheck |
The two letter code for the table to check |
objectToCheck |
An RDBESDataObject |
foreignKeyIds |
A vetor of the foreign key field names to check |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
A data frame with the primary keys of the table checked, the two letter table identifier, and their orphan status.
Fixes SLid in SL table (facilitating SS-SL joins).
fixSLids(RDBESDataObject, verbose = FALSE, strict = TRUE)
fixSLids(RDBESDataObject, verbose = FALSE, strict = TRUE)
RDBESDataObject |
A valid RDBESDataObject |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
RDBES SL can be seen as a join of two tables - one that identifies the species list in terms of SLcou * SLinst * SLspeclistName * SLyear * SLcatchFrac and one that specifies the taxa (SLcommTaxon * SLsppCode) in the list. In SS, SLid remits to the 1st taxa in a species list and not - as it would be expected - to the species list itself. This function fixes this by creating a new SLtaxaId variable in SL and assigning all taxa in a species to a single SSid.
an RDBESDataObject with SL ids reworked
# To add
# To add
Generate any missing SS rows. When FOcatchReg=="All" it is expected that SScatchFraction is either "Catch" OR "Lan"+"Dis". In the latter case, if one is missing the other is to be assumed 0. This function generates SS rows for any missing catch fractions.
generateMissingSSRows( RDBESDataObject, speciesListName, verbose = FALSE, strict = TRUE )
generateMissingSSRows( RDBESDataObject, speciesListName, verbose = FALSE, strict = TRUE )
RDBESDataObject |
A valid RDBESDataObject |
speciesListName |
The name of the Species List you want to use for any SS rows that are created. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
A data table of SS data with any missing rows added
# To follow
# To follow
Generate NAs in samples using Species List information
generateNAsUsingSL( RDBESDataObject, targetAphiaId, overwriteSampled = TRUE, validate = TRUE, verbose = FALSE, strict = TRUE )
generateNAsUsingSL( RDBESDataObject, targetAphiaId, overwriteSampled = TRUE, validate = TRUE, verbose = FALSE, strict = TRUE )
RDBESDataObject |
An RDBESDataObject. |
targetAphiaId |
a vector of aphiaId. |
overwriteSampled |
(Optional) should SAtotalWtMes and SAsampWtMes be set to 0 if spp recorded but absent from SL? The default is TRUE. |
validate |
(Optional) Set to TRUE if you want validation to be carried out. The default if TRUE. |
verbose |
(Optional) Set to TRUE if you want informative text on validation printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function can validate its input data - should the validation be strict? The default is TRUE. |
RDBES data object where SA was complemented with NAs for species not looked for (sensu in SL)
# To be added
# To be added
Generate vector of selection or inclusion probabilities
generateProbs(x, probType)
generateProbs(x, probType)
x |
RDBES data object |
probType |
"selection" or "inclusion" for selection and inclusion probabilities respectively |
When the selection method is SRSWR selection probabilities are
calculated as and inclusion probabilities as
. When the selection method is SRSWOR selection
probabilities are not currently implemented. Inclusion probabilities are
calculated as
. When the selection method is CENSUS both types of
probabilities are set to 1. Probabilities for selection methods UPSWR and
UPSWOR are not calculated (they need to be supplied by the user). The same
happens with regards to non-probabilistic methods
A vector or probabilities
## Not run: generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) # population size a<-generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) sum(1/a$VSincProb) # returns error generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("selection")) ## End(Not run)
## Not run: generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) # population size a<-generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("inclusion")) sum(1/a$VSincProb) # returns error generateProbs(x = Pckg_SDAResources_agstrat_H1[["VS"]], probType = ("selection")) ## End(Not run)
Private function to generate SS rows
generateSSRows(FOids, speciesListName, catchFra)
generateSSRows(FOids, speciesListName, catchFra)
FOids |
Vector of FOids |
speciesListName |
Name of the species list |
catchFra |
The catch fraction to create |
SS data frame
Generates a named list of data tables that follow the structure of RDBESDataObject. The tables only have columns required for testing The generate tables
generateTestTbls(tblNames, prevTbls = list(), ...)
generateTestTbls(tblNames, prevTbls = list(), ...)
tblNames |
character vector of table names to be created |
prevTbls |
list of data.tables upstream of the generated table. Defaults to empty list |
... |
Arguments passed on to
|
a list of named data.table's
## Not run: generateTestTbls(c("A", "B", "C"), selMeth = "SRSWOR") generateTestTbls(LETTERS[1:5]) # makes 5 tables with method CENSUS ## End(Not run)
## Not run: generateTestTbls(c("A", "B", "C"), selMeth = "SRSWOR") generateTestTbls(LETTERS[1:5]) # makes 5 tables with method CENSUS ## End(Not run)
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
generateZerosUsingSL(x, verbose = FALSE, strict = TRUE)
generateZerosUsingSL(x, verbose = FALSE, strict = TRUE)
x |
RDBES data frame |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
RDBES data frame where SA was complemented with species looked for (sensu in sampling objectives) but not registered in sample
Private function used by doEstimationForAllStrata to get the estimates
getEstimForStratum(x)
getEstimForStratum(x)
x |
The input |
Data frame with estimated values
Private function to find which FO rows are not matching SS
getMissingSSCatchFraction(FOdata, SSdata, catchFra, verbose)
getMissingSSCatchFraction(FOdata, SSdata, catchFra, verbose)
FOdata |
The FOdata |
SSdata |
The SSdata |
catchFra |
The catchfra |
verbose |
verbose or not? |
Vector of FOids that aren't matching SS rows
Private function to get sub-sample level and top-level SAid for SA data
getSubSampleLevel(SAdata, SAidToCheck, subSampleLevel = 1)
getSubSampleLevel(SAdata, SAidToCheck, subSampleLevel = 1)
SAdata |
The SA data to check |
SAidToCheck |
The SAid to check |
subSampleLevel |
The currrent level of sampling |
Whoever revises this function please specify what it returns here
Returns the tables for a given hierarchy
getTablesInRDBESHierarchy( hierarchy, includeOptTables = TRUE, includeLowHierTables = TRUE, includeTablesNotInSampHier = TRUE )
getTablesInRDBESHierarchy( hierarchy, includeOptTables = TRUE, includeLowHierTables = TRUE, includeTablesNotInSampHier = TRUE )
hierarchy |
Integer value between 1 and 13 inclusive |
includeOptTables |
Include any optional tables? Default value is TRUE |
includeLowHierTables |
Include the lower hierarchy tables? Default value is TRUE |
includeTablesNotInSampHier |
Include tables that aren't sampling units in that hierarcy? Default value is TRUE |
A vector containing the 2-letter names of the tables in the requested hierarchy
getTablesInRDBESHierarchy(5)
getTablesInRDBESHierarchy(5)
A dataset containing test RDBES data for H1 in the RDBESDataObject structure
H1Example
H1Example
A list containing entries required for H1 RDBES data:
the Design data table
the Sampling Details data table
the Vessel Selection data table
the Fishing Trip data table
the Fishing Operation data table
the Species Selection data table
the Sample data table
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
A dataset containing test RDBES data for H5 in the RDBESDataObject structure
H5Example
H5Example
A list containing entries required for H5 RDBES data:
the Design data table
the Sampling Details data table
the Fishing Trip data table
the Onshore Event data table
the Landing Event data table
the Species Selection data table
the Sample data table
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
This dataset does not have passed the RDBES upload checks, hence the object might be somewhat invalid, however it resembles real data from the Estonian Baltic Trawling fleet for 2022 sprat total landings and commercial sampling
H8ExampleEE1
H8ExampleEE1
A list containing entries required for H8 RDBES data:
the Design data table
the Sampling Details data table
the Temporal Event data table
the Vessel Selection data table
the Landing Event data table
the Species Selection data table
the Sample data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
the Commertial Landing data table
the Commertial Effort data table
#' @source Richard Meitern @ Estonian Marine Institute, 2023
A dataset containing a copy of the icesSpecWoRMS code list. The latest code list data can be downloaded from https://vocab.ices.dk/
icesSpecWoRMS
icesSpecWoRMS
A data frame
Globally unique identifier assigned by ICES
AphiaID
Scientific name
Ignore
Date when the code was last updated
IS this still a valid code. If FALSE the code is no longer valid within ICES.
E.g. "2023-10-18"
...
Internal function to remove orphan records from an RDBESDataObject
killOrphans(objectToCheck, orphansToRemove)
killOrphans(objectToCheck, orphansToRemove)
objectToCheck |
an RDBESDataObject |
orphansToRemove |
The output from the findOrphansByTable function (A data frame with the primary keys of the table checked, the two letter table identifier, and their orphan status.) |
RDBESDataObject with orphan records removed
This data set is created for testing the idea of manipulating Sample data (SA) based on Species List (SL). It represents the simplest case for testing this idea. The data set contains two species in SL for the same SLcountry, SLinstitute, SLspeciesListName, SLyear, SLcatchFraction, SLcommercialTaxon, SLspeciesCode & SLcommercialTaxon == SLspeciesCode. There is one species in SA - one row in SS with keys equal to the SL keys.
MadeUpData_for_SL_SA_tests_v1
MadeUpData_for_SL_SA_tests_v1
A list containing entries required for H1 RDBES data:
the Design data table
the Sampling Details data table
the Vessel Selection data table
the Fishing Trip data table
the Fishing Operation data table
the Species Selection data table. Contains one row with keys equal to the SL keys
the Sample data table. Contains one species
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table. Contains two species for the same SLcountry, SLinstitute, SLspeciesListName, SLyear, SLcatchFraction, SLcommercialTaxon, SLspeciesCode & SLcommercialTaxon == SLspeciesCode
Generate a Data Table
makeTbl( tblName, prevTbls = list(), rows = 4, propSamp = 0.5, selMeth = "CENSUS", stratums = c("U"), mean = 5 )
makeTbl( tblName, prevTbls = list(), rows = 4, propSamp = 0.5, selMeth = "CENSUS", stratums = c("U"), mean = 5 )
tblName |
Name of the table |
prevTbls |
list of data.tables upstream of the generated table. Defaults to empty list |
rows |
numeric number of rows per parent record. Defaults to 4. |
propSamp |
numeric proportion of how many of total are sampled. This is ignored for "CENSUS". Defaults to 0.5 |
selMeth |
character selection method used. Defaults to "CENSUS". Others like SRSWR or SRSSWOR can be used as well |
stratums |
character vector of the stratum names to be created. Defaults to c("U"), meaning not stratified. |
mean |
numeric the expected mean of the target variable.
The variable is created using |
a data.table
A dataset containing the mapping from database column names to R field names
mapColNamesFieldR
mapColNamesFieldR
A data frame containing database field names and their equivalent R field name:
The two letter prefix of the relevent RDBES table
The database field names
The equivalent R field name
The equivalent R data type (e.g. "integer", "character" etc)
The Data type in the RDBES documentation (e.g. "Decimal", etc)
Is this column considered essential?
...
Constructor for RDBESDataObject class
newRDBESDataObject( DE = NULL, SD = NULL, VS = NULL, FT = NULL, FO = NULL, TE = NULL, LO = NULL, OS = NULL, LE = NULL, SS = NULL, SA = NULL, FM = NULL, BV = NULL, VD = NULL, SL = NULL, CL = NULL, CE = NULL )
newRDBESDataObject( DE = NULL, SD = NULL, VS = NULL, FT = NULL, FO = NULL, TE = NULL, LO = NULL, OS = NULL, LE = NULL, SS = NULL, SA = NULL, FM = NULL, BV = NULL, VD = NULL, SL = NULL, CL = NULL, CE = NULL )
DE |
Data table of RDBES DE data or null |
SD |
Data table of RDBES DE data or null |
VS |
Data table of RDBES DE data or null |
FT |
Data table of RDBES DE data or null |
FO |
Data table of RDBES DE data or null |
TE |
Data table of RDBES DE data or null |
LO |
Data table of RDBES DE data or null |
OS |
Data table of RDBES DE data or null |
LE |
Data table of RDBES DE data or null |
SS |
Data table of RDBES DE data or null |
SA |
Data table of RDBES DE data or null |
FM |
Data table of RDBES DE data or null |
BV |
Data table of RDBES DE data or null |
VD |
Data table of RDBES DE data or null |
SL |
Data table of RDBES DE data or null |
CL |
Data table of RDBES DE data or null |
CE |
Data table of RDBES DE data or null |
a named list
This data set is derived from the data(agsrs) used in Lohr examples 2.6, 2.7 and 2.11 of SDA book. Information required for example 4.8 (domain estimation) is also added to SA (farmcat <=> SAarea). VSnumberSampled and VSnumberTotal set according to agsrs and book pop values. VSunitName is set to a combination of original agsrs$county, agsrs$state, agsrs$region and row numbers. Table SA contains the variable measured agsrs$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1. Table SA also contains the domain information, coded in SAarea. Table DE, SD, FT and FO are for the most dummy tables inserted to meet RDBES model requirements to be aggregated during estimation tests. Values of mandatory fields have dummy values with exception of Design-Variables in VS that match the book. BV, FM, CL, and CE are not provided. SL and VD are subset to the essential rows.
Pckg_SDAResources_agsrs_H1
Pckg_SDAResources_agsrs_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains dummy values with exception of Design-Variables in VS that match the book
the Sampling Details data table. Contains dummy values
the Vessel Selection data table. Contains core information of data(agsrs), VSnumberSampled and VSnumberTotal set according to agsrs and book pop values, VSunitName is set to a combination of original agsrs$county, agsrs$state, agsrs$region and row numbers
the Fishing Trip data table. Contains dummy values
the Fishing Operation data table. Contains dummy values
the Species Selection data table. Contains dummy values
the Sample data table. Contains the variable measured agsrs$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1, and the domain information, coded in SAarea
the Frequency Measure data table. Not provided
the Biological Variable data table. Not provided
the Vessel Details data table. Subset to the essential rows
the Species List data table. Subset to the essential rows
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the data(agstrat) used in Lohr examples 3.2 and 3.6. Table VS is stratified with VSstratumName set to agstrat$region, and VSnumberSampled and VSnumberTotal set according to agstrat. VSunitName is set to a combination of original agstrat$county, agstrat$state, agstrat$region and agstrat$agstrat row numbers. Table SA contains the variable measured agstrat$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1. Table DE, SD, FT and FO are for the most dummy tables inserted to meet RDBES model requirements to be aggregated during estimation tests. Values of mandatory fields have dummy values taken from an onboard programme, with exception of selectionMethod that is set to CENSUS. BV, FM, CL, and CE are not provided. SL and VD are subset to the essential rows.
Pckg_SDAResources_agstrat_H1
Pckg_SDAResources_agstrat_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains dummy values with exception of selectionMethod that is set to CENSUS
the Sampling Details data table. Contains dummy values
the Vessel Selection data table. Contains core information of data(agstrat), VSstratumName set to agstrat$region, and VSnumberSampled and VSnumberTotal set according to agstrat, VSunitName is set to a combination of original agstrat$county, agstrat$state, agstrat$region and agstrat$agstrat row numbers
the Fishing Trip data table. Contains dummy values
the Fishing Operation data table. Contains dummy values
the Species Selection data table. Contains dummy values
the Sample data table. Contains the variable measured agstrat$acres92 in SAtotalWeightMeasured, SAsampleWeightMeasured and SAconversionFactorMeasLive set to 1
the Frequency Measure data table. Not provided
the Biological Variable data table. Not provided
the Vessel Details data table. Subset to the essential rows
the Species List data table. Subset to the essential rows
https://CRAN.R-project.org/package=SDAResources
This data set is derived from a fictional data for an SRS of 12 algebra classes in a city, from a population of 187 classes. The design is 1-stage cluster sampling with clusters of unequal sizes. Clusters are classes (class). Clusters (psu) are unequal sized (Mi). In each cluster, all students are selected (ssus, nrows). The total number of psus is known (187). The target variable is score.
Pckg_SDAResources_algebra_H1
Pckg_SDAResources_algebra_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_algebra_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 299 child rows (the 299 students observed), each associated to its cluster (class), VSnumberTotalClusters is 187, VSnumberSampledClusters is 12, VSnumberTotal is Missing
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. Each score is a SAsampleWeightMeasured
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the data(coots). The design is 2-stage cluster sampling with clusters of unequal sizes and Npsu not known. Clusters are clutches of eggs (nests) with at least 2 eggs. In each cluster, the volume of two eggs is measured. Clusters (psu) are unequal sized. In each cluster, 2 eggs are selected (ssus) and measured. The total number of psus is not known (a drawback in this example). It is assumed very large (fpc negligible). The target variable is volume (others are available).
Pckg_SDAResources_coots_H1
Pckg_SDAResources_coots_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_coots_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 368 child rows (the 368 eggs/psus observed), each associated to its cluster (clutch), VSnumberTotalClusters is not known, VSnumberTotal is csize
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. Each volume is a SAsampleWeightMeasured. ATT volumes are *100000000 to meet type requirement (integer)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the data(coots). The design is 2-stage cluster sampling with clusters of unequal sizes and Npsu not known. Clusters are clutches of eggs (nests) with at least 2 eggs. In each cluster, the volume of two eggs is measured. Clusters (psu) are unequal sized. In each cluster, 2 eggs are selected (ssus) and measured. The total number of psus is not known (a drawback in this example). It is assumed very large (fpc negligible). The target variable is volume.
Pckg_SDAResources_coots_multistage_H1
Pckg_SDAResources_coots_multistage_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_coots_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 184 child rows (the 184 clutches/psus observed), each associated to its cluster (clutch), VSnumberTotal is not known, VSnumberSampled is 184
the Fishing Trip data table. Contains 368 child rows (the 368 eggs/ssus measured), each associated to its vessel (clutch), FTnumbersampled is 2, FTnumberTotal is csize
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. Each volume is a SAsampleWeightMeasured. ATT volumes are *100000000 to meet type requirement (integer)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the data(gpa). The design is 1-stage cluster sampling with clusters of equal sizes. Each cluster (suite) has 4 elements with the same weight. The target variable is gpa.
Pckg_SDAResources_gpa_H1
Pckg_SDAResources_gpa_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_gpa_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 20 child rows (the 20 observations), each associated to its cluster (suite), VSnumberTotalClusters is 100, VSnumberTotal is 4 because all elements in cluster are sampled
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. Each gpa score is a SAsampleWeightMeasured. ATT gpa scores are *100 to meet type requirement (integer)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the data(schools). The design is 2-stage cluster sampling with clusters of unequal sizes and Npsu not known. Clusters are schools (schoolid). Clusters (psu) are unequal sized (Mi). In each cluster, 20 students are selected (ssus) and measured (nrows). The total number of psus is known (75). The target variable is mathlevel.
Pckg_SDAResources_schools_H1
Pckg_SDAResources_schools_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_schools_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 200 child rows (the 200 students observed), each associated to its cluster (schoolid), VSnumberTotalClusters is 100, VSnumberTotal is Mi
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. Each volume is a SAsampleWeightMeasured
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=SDAResources
This data set is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 1-stage cluster sampling with clusters of unequal sizes. An SRS of 15 districts is selected (psus) from the 757 districts in the population. All schools within district are selected (ssus). The weights (pw) do not match 757/15 probably because they have been calibrated. The target variable is enroll.
Pckg_survey_apiclus1_v2_H1
Pckg_survey_apiclus1_v2_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_apiclus1_v2_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 183 child rows (the 186 schools finally observed), each associated to its cluster (dname), VSnumberTotalClusters is 757, VSnumberTotal is the number of schools in the cluster (census), calibrated weights are provided as 1/pw in VSinclusionProbCluster
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=survey
This data set is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 2-stage cluster sampling with clusters of unequal sizes. An SRS of 40 districts is selected (psus) from the 757 districts in the population and then up to 5 schools (min
were selected from each district (ssus).
Pckg_survey_apiclus2_H1
Pckg_survey_apiclus2_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_apiclus2_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 40 child rows (the 40 districts), VSnumberTotal is 757, VSnumberSampled is 40
the Fishing Trip data table. Contains 126 child rows (the 126 schools finally observed), each associated to its cluster (dname), FTnumberTotal is the number of schools in district, FTnumberSAmpled is 1...5 schools sampled
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll (NB! there are 4 NAs)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=survey
This data set is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 2-stage cluster sampling with clusters of unequal sizes. An SRS of 40 districts is selected (psus) from the 757 districts in the population and then up to 5 schools (min
were selected from each district (ssus). The target variable is enroll - note that it contains 4 NA values.
Pckg_survey_apiclus2_v2_H1
Pckg_survey_apiclus2_v2_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row with DEstratumName == "Pckg_SDAResources_apiclus2_v2_H1"
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 126 child rows (the 126 schools finally observed), each associated to its cluster (dname), VSnumberTotalClusters is 757, VSnumberTotal is 1...5 schools sampled
the Fishing Trip data table. Just 1:1 links to the final data (in SA)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll (note the 4 NAs)
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table
the Species List data table
https://CRAN.R-project.org/package=survey
This data set is a stratified version of the previous "apiclus2" data. It is derived from the Academic Performance Index computed for all California schools based on standardized testing of students. The original data sets contain information for all schools with at least 100 students and for various probability samples of the data. The design is 1-stage cluster sampling with clusters of unequal sizes. An SRS of 200 districts is selected (psus) from the 755 districts in the population. All schools within district are selected (ssus).
Pckg_survey_apistrat_H1
Pckg_survey_apistrat_H1
A list containing entries required for H1 RDBES data:
the Design data table. Contains 1 DE row
the Sampling Details data table. Contains 1 child SD row
the Vessel Selection data table. Contains 200 child rows (the 200 schools finally observed), each associated to its cluster (dname), VSnumberTotalClusters is 755, VSnumberTotal is 50-100 schools sampled
the Fishing Trip data table. Contains 200 child rows (the 200 schools finally observed), each associated to its cluster (dname), FTnumberTotal is the number of schools in the cluster (census)
the Fishing Operation data table. Just 1:1 links to the final data (in SA)
the Species Selection data table. Just 1:1 links to the final data (in SA)
the Sample data table. SAsampleWeightMeasured is enroll
the Frequency Measure data table
the Biological Variable data table
the Vessel Details data table. Contains 311 child rows
the Species List data table. Contains 1 child row
https://CRAN.R-project.org/package=survey
This method prints the hierarchy of the DE data.table (if it exists), and the number of rows for each data.table in the RDBESDataObject that is not NULL. It also provides the sampling method and number sampled and number total for tables where it is applicable. If the RDBESDataObject has a mixed hierarchy, a warning message is printed.
This method sorts the RDBESDataObject based on the hierarchy.
This method returns a list containing the hierarchy of the DE data.table, the number of rows for each data.table in the RDBESDataObject that is not NULL, and a logical value indicating if the hierarchy is not NULL.
## S3 method for class 'RDBESDataObject' print(x, ...) ## S3 method for class 'RDBESDataObject' sort(x, decreasing = TRUE, ...) ## S3 method for class 'RDBESDataObject' summary(object, ...)
## S3 method for class 'RDBESDataObject' print(x, ...) ## S3 method for class 'RDBESDataObject' sort(x, decreasing = TRUE, ...) ## S3 method for class 'RDBESDataObject' summary(object, ...)
x |
An object of class RDBESDataObject. |
... |
parameters to underling functions (not used currently) |
decreasing |
should hierarchy tables be the first ones |
object |
An object of class RDBESDataObject. |
None.
The sorted RDBESDataObject by hierarchy.
A list with three elements:
hierarchy: The hierarchy of the DE data.table in the RDBESDataObject.
rows: A named list where the names are the names of the data.tables in the RDBESDataObject and the values are the number of rows in each data.table. NULL values are excluded.
CS: A logical value indicating if the hierarchy is not NULL.
# Print the package data object print(H1Example) # Sort the package data sort(H8ExampleEE1) # Get summary of the package data summary(H1Example)
# Print the package data object print(H1Example) # Sort the package data sort(H8ExampleEE1) # Get summary of the package data summary(H1Example)
Private function to process the lower hierarchies when creating the RDBESEstObject
procRDBESEstObjLowHier(rdbesPrepObject, verbose = FALSE)
procRDBESEstObjLowHier(rdbesPrepObject, verbose = FALSE)
rdbesPrepObject |
A prepared RDBESRawObj |
verbose |
logical. Output messages to console. |
allLower - the FM and BV tables combined
Private function to process the upper hierarchies when creating the RDBESEstObject
procRDBESEstObjUppHier( myRDBESEstObj = NULL, rdbesPrepObject, hierarchyToUse, i = 1, targetTables, verbose = FALSE )
procRDBESEstObjUppHier( myRDBESEstObj = NULL, rdbesPrepObject, hierarchyToUse, i = 1, targetTables, verbose = FALSE )
myRDBESEstObj |
An RDBESEstObj to add data to |
rdbesPrepObject |
A prepared RDBESRawObj |
hierarchyToUse |
The hierarchy we are using |
i |
Integer to keep track of where we are in the list of tables |
targetTables |
The RDBES tables we are interested in |
verbose |
logical. Output messages to console. |
Whoever revises this function please specify what it returns here
Remove rows which are not pointing to a valid SpecliestListDetails (SL) records i.e.those rows which have a value of SpeciesListName that does not exist in the SL table.
removeBrokenSpeciesListLinks(objectToCheck, verbose = FALSE, strict = TRUE)
removeBrokenSpeciesListLinks(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any records with an invalid SpeciesListName rows removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SLspeclistName") myValues <- c("WGRDBES-EST TEST 5 - sprat data") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidSpeciesListLinks <- removeBrokenSpeciesListLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("SLspeclistName") myValues <- c("WGRDBES-EST TEST 5 - sprat data") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidSpeciesListLinks <- removeBrokenSpeciesListLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
Remove rows which are not pointing to a valid VesselDetails (VD) records i.e. those rows which have a value of VDid that does not exist in the VD table.
removeBrokenVesselLinks(objectToCheck, verbose = FALSE, strict = TRUE)
removeBrokenVesselLinks(objectToCheck, verbose = FALSE, strict = TRUE)
objectToCheck |
an RDBESDataObject. |
verbose |
(Optional) If set to TRUE more detailed text will be printed out by the function. Default is TRUE. |
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
an RDBESDataObject with any records with an invalid VDid removed
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("VDlenCat") myValues <- c("18-<24") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidVesselLinks <- removeBrokenVesselLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") myFields <- c("VDlenCat") myValues <- c("18-<24") myFilteredObject <- filterRDBESDataObject(myH1RawObject, fieldsToFilter = myFields, valuesToFilter = myValues ) myObjectValidVesselLinks <- removeBrokenVesselLinks( objectToCheck = myFilteredObject, verbose = FALSE ) ## End(Not run)
Remove table prefix from variable names
removePrefixFromVarNames(x)
removePrefixFromVarNames(x)
x |
RDBES raw object |
updated RDBES raw object where table prefix has been removed from all variables names except ids
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") cleanPrefixFromVarNames(x = myH1RawObject) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests\\testthat\\h1_v_1_19") cleanPrefixFromVarNames(x = myH1RawObject) ## End(Not run)
This function runs some basic checks on selection methods and and probabilities of the different sampling tables of a hierarchy. It should be run ahead of generateProbs to secure its correct execution and for that reason it is included in the wrapper applyGenerateProbs.
runChecksOnSelectionAndProbs(x, verbose = FALSE, strict = TRUE)
runChecksOnSelectionAndProbs(x, verbose = FALSE, strict = TRUE)
x |
|
verbose |
|
strict |
(Optional) This function validates its input data - should the validation be strict? The default is TRUE. |
nothing
applyGenerateProbs
generateProbs
examples for now see https://github.com/ices-eg/WK_RDBES/tree/master/WKRDB-EST2/chairs/Nuno
For a given RDBESDataObject convert the required columns to the correct data types. (This function can cause an error if we have data in the columns that can't be cast to the desired data type.)
setRDBESDataObjectDataTypes(RDBESDataObjectToConvert)
setRDBESDataObjectDataTypes(RDBESDataObjectToConvert)
RDBESDataObjectToConvert |
list - the raw item for conversion |
An RDBESDataObject with the correct date types for the required fields
A dataset of rdbesEstimObj type containing simplified haul-level samples (rows) of shrimp landings (targetValue, in kg) observed onboard using H1 of RDBES with UPWOR on vessels. Data is provided for developing/testing purposes only.
shrimps
shrimps
A data frame with 10 rows and 95 variables:
DEsamplingScheme - Sampling Scheme
DEyear - Year of data collection
DEstratumName - Fishery code
DEhierarchyCorrect - Design Variable of RDBES. More details in RDBES documentation
DEhierarchy - Design Variable of RDBES. More details in RDBES documentation
DEsampled - Design Variable of RDBES. More details in RDBES documentation
DEreasonNotSampled - Design Variable of RDBES. More details in RDBES documentation
SDcountry - Country that collected the data
SDinstitution - Institution that collected the data
su1, su2, su3, su4, su5 - sampling units of RDBES. More details in RDBES documentation
XXXnumberSampled, ... - Design Variables of RDBES. More details in RDBES documentation
targetValue - estimate of weight landed in each haul (in kg)
plus XX other columns
Nuno Prista @ SLU Aqua, 2022
A dataset of rdbesEstimObj type containing simplified haul-level samples (rows) of shrimp catches (targetValue, in kg) observed onboard using H1 of RDBES with UPWOR on vessels. Catches are divided into three strata (91, 92, 93_94) that correspond to sorting sieves used onboard. Data is provided for developing/testing purposes only.
shrimpsStrat
shrimpsStrat
A data frame with 10 rows and 95 variables:
DEsamplingScheme - Sampling Scheme
DEyear - Year of data collection
DEstratumName - Fishery code
DEhierarchyCorrect - Design Variable of RDBES. More details in RDBES documentation
DEhierarchy - Design Variable of RDBES. More details in RDBES documentation
DEsampled - Design Variable of RDBES. More details in RDBES documentation
DEreasonNotSampled - Design Variable of RDBES. More details in RDBES documentation
SDcountry - Country that collected the data
SDinstitution - Institution that collected the data
su1, su2, su3, su4, su5 - sampling units of RDBES. More details in RDBES documentation
XXXnumberSampled, ... - Design Variables of RDBES. More details in RDBES documentation
su5stratumName - sieve fraction
targetValue - estimate of weight fraction in each haul (in kg)
plus XX other columns
Nuno Prista @ SLU Aqua, 2022
A data frame containing the tables required for each RDBES hierachy
tablesInRDBESHierarchies
tablesInRDBESHierarchies
A data frame containing the tables required for each RDBES hierachy.
the hierachy this applies to H1 to H13
the 2-letter table name
is this a lower hierarchy table?
is this table optional within the hierarchy?
is this table a sampling unit within the hierarchy?
the table sort order within the hiaerarchy
https://github.com/davidcurrie2001/MI_RDBES_ExchangeFiles
RDBESDataObject
is in a Valid FormatPerform basic checks on a object.
validateRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE ) checkRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE )
validateRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE ) checkRDBESDataObject( objectToCheck, checkDataTypes = FALSE, verbose = FALSE, strict = TRUE )
objectToCheck |
RDBESDataObject i.e. a list of data.tables |
checkDataTypes |
(Optional) Set to TRUE if you want to check that the data types of the required columns are correct, or FALSE if you don't care. Default value is FALSE. |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want to be sure all columns are present in the data, set to FALSE if you only want to check that essential columns are present. The default is TRUE. |
Checks if 'objectToCheck' parameter is valid. Returns the parameter if it is
valid and otherwise stops on error.
It checks the RDBESDataObject
if:
Is this an object of class RDBESDataObject
Tables don't have column names that aren't allowed
Tables have all the required column names
It does not check if the data is valid. The RDBES upload system performs an extensive set of checks on the uploaded data.
Returns objectToCheck
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") validateRDBESDataObject(myH1RawObject) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") validateRDBESDataObject(myH1RawObject) ## End(Not run)
Checks the data types of the columns in an RDBESDataObject against an expected list of data types. Any differences are returned
validateRDBESDataObjectDataTypes(objectToCheck)
validateRDBESDataObjectDataTypes(objectToCheck)
objectToCheck |
An RDBESDataObject to check |
A data frame containing any data type differences (an empty data frame if there are no differences)
check RDBES Raw Object Content Private function to do some basic checks on the content of the RDBESDataObject (e.g. all required field names are present). Function is only used by checkRDBESDataObject and should only be passed a list of non-null objects
validateRDBESDataObjectDuplicates( objectToCheck, verbose = FALSE, strict = TRUE )
validateRDBESDataObjectDuplicates( objectToCheck, verbose = FALSE, strict = TRUE )
objectToCheck |
|
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want to be sure all columns are present in the data, set to FALSE if you only want to check that essential columns are present. The default is TRUE. |
list with first element as the object and the second the warnings
are all required fields present? 2) are there any extra fields present? It is used by validateRDBESDataObject() and should only be passed a list of non-null objects
check RDBES Data Object field names Private function to do some checks on the columns of an RDBESDataObject -
are all required fields present? 2) are there any extra fields present? It is used by validateRDBESDataObject() and should only be passed a list of non-null objects
validateRDBESDataObjectFieldNames( objectToCheck, verbose = FALSE, strict = TRUE )
validateRDBESDataObjectFieldNames( objectToCheck, verbose = FALSE, strict = TRUE )
objectToCheck |
|
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
strict |
(Optional) Set to TRUE if you want to be sure all columns are present in the data, set to FALSE if you only want to check that essential columns are present. The default is TRUE. |
list with first element as a boolean indicating validity and the second element contains any warnings
Check whether an object is a valid RDBESEstObject
validateRDBESEstObject(objectToCheck, verbose = FALSE)
validateRDBESEstObject(objectToCheck, verbose = FALSE)
objectToCheck |
The object to check |
verbose |
(Optional) Set to TRUE if you want informative text printed out, or FALSE if you don't. The default is FALSE. |
Whoever revises this function please specify what it returns here
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") myEStObj <- createRDBESEstObject(myH1RawObject,1) validateRDBESEstObject(myEStObj) ## End(Not run)
## Not run: myH1RawObject <- importRDBESDataCSV(rdbesExtractPath = "tests/testthat/h1_v_1_19") myEStObj <- createRDBESEstObject(myH1RawObject,1) validateRDBESEstObject(myEStObj) ## End(Not run)
A dataset containing aphia records for species found in icesSpecWoRMS
wormsAphiaRecord
wormsAphiaRecord
A data frame
E.g. 100684
E.g. "https://www.marinespecies.org/aphia.php?p=taxdetails&id=100684"
E.g. "Cerianthidae"
E.g. "Milne Edwards & Haime, 1851"
E.g. "accepted"
E.g. NA
E.g. 140
E.g. "Family" "Genus" "Species" "Species"
E.g. 100684
E.g. "Cerianthidae"
E.g. "Milne Edwards & Haime, 1851"
E.g. 151646
E.g. "Animalia"
E.g. "Cnidaria"
E.g. "Anthozoa"
E.g. "Spirularia"
E.g. "Cerianthidae"
E.g. NA "Cerianthus"
E.g. "Molodtsova, T. (2023). World List of Ceriantharia. Cerianthidae Milne Edwards & Haime, 1851. Accessed through: "...
internal database identifier
E.g. 1
E.g. 1
E.g. 0
E.g. 0
E.g. NA
E.g. "exact"
E.g. "2018-01-22T17:48:34.063Z"
E.g. "2023-10-18"
...
https://www.marinespecies.org/