library(TAF)
taf.example("linreg")
setwd("linreg")
The overarching goal of the Transparent Assessment Framework (TAF, Magnusson and Millar 2023) is to support open and reproducible research. To achieve this goal, the following objectives have guided the design of TAF:
Provide a standard workflow structure that is general enough for any analysis that can be run from R.
Introduce minimal constraints or learning curve, making it easy for a beginner to create a new workflow or convert an existing workflow to TAF format.
Enable reviewers to browse the data, model settings, and results, without being experts in R or the specific methods used.
Enable anyone to rerun the analysis on another computer and get the same results.
Require the scientist to briefly describe the data that are used in the analysis and where they came from.
Invite the scientist to document with scripts how they processed the data before feeding them to the model.
Invite the scientist to specify which versions of software are used, so the original analysis can be rerun at a later time.
TAF divides a workflow into four steps:
Script | Purpose |
---|---|
data.R |
Prepare data, write CSV tables |
model.R |
Run model |
output.R |
Extract results, write CSV tables |
report.R |
Plots and tables for report |
These scripts all share the same general structure, starting with loading packages and reading in files, then performing computations and writing out files. They are run sequentially in alphabetical order, where each script reads from files created in a previous step.
The initial data that are used in the analysis are declared in a file called DATA.bib
, which is processed by the taf.boot()
function. During this boot procedure, each data entry is processed and the TAF system then makes the data available in the boot/data
subfolder, where the data.R
script will read it.
The SOFTWARE.bib
file is optional. It is not used in the linear regression example below, but its functionality is covered in the boot procedure section below.
To demonstrate how a TAF analysis works, consider a simple linear regression where the x and y coordinates come from a text file. The linreg
example comes with the TAF package and can be copied to a convenient place to test and run:
library(TAF)
taf.example("linreg")
setwd("linreg")
Before running the analysis, the workflow consists of a boot
folder and four TAF scripts:
To run a TAF analysis, the first step is to start R and make sure that the current working directory is set to the location of the TAF scripts: data.R
, model.R
, etc. Some R editors do this automatically when opening an R script and in RStudio there is a menu command: Session - Set Working Directory - To Source File Location.
All TAF analyses can be run using the following commands in R:
library(TAF)
taf.boot()
source.all()
The taf.boot()
function looks for an existing boot
folder to setup the data and software required for the analysis. Then source.all()
runs the data.R
, model.R
, output.R
, and report.R
scripts in that order. The individual scripts can also be run using source()
or line by line in an R editor.
After running the analysis, each script has created a corresponding folder, data.R
creating a data
folder, etc.
The data.R
script has populated the data
folder with comma-separated values (CSV) files representing the data that are used in the analysis and the model.R
script produced model
results in a machine-readable format. The purpose of the output.R
script is to read the results from the model
folder and write out CSV files representing the results that are of primary interest. Finally, report.R
reads in the CSV output and produces plots and formatted tables, often with rounded numbers, which can be incorporated into a report. The report.R
script can also produce a dynamic document in various formats if the scientist writes the script in that way.
It is important to note that TAF does not do any of this by itself. All the work is performed by the R scripts that the scientist writes. Typically, a TAF script has the following general structure:
# What the script does
# Before: file1.csv, file2.rds, file3.spec (infolder)
# After: file4.csv, file5.csv, file6.png (outfolder)
library(TAF)
library(SomePackage)
source("utilities.R")
mkdir("folder")
# Read files from previous step
<- read.taf("infolder/file1.csv")
dat1 <- readRDS("infolder/file2.rds")
dat2 <- importSpecial("infolder/file3.spec")
dat3
# Some computations
# [...]
# Write out tables and plots
write.taf(dat4, dir="outfolder")
write.taf(dat5, dir="outfolder")
taf.png("file6")
SpecialPlot(dat6)
dev.off()
A TAF script traditionally starts with a comment section that provides a brief description of what the script does, followed by a description of the state before and after the script is run, in terms of input and output files. In the pseudocode example above, the input files are in various formats that are not necessarily CSV format.
The next section loads the TAF package and specific packages that are required for the analysis. This section may also load functions that the scientist has written and stored in a dedicated file that could be called utilities.R
. This is also a convenient time to create the directory for the output files of this script. Using the TAF function mkdir()
rather than the base R function dir.create()
has the minor benefit of not producing a warning if the directory already exists, which is sometimes the case.
Likewise, the TAF function read.taf()
is similar to the base R function read.csv()
but with some sensible defaults and useful features for typical TAF workflows. The RDS file format can be practical to read and write list-like objects in R, while importSpecial()
in the above example is a function provided by SomePackage
to import a software-specific file format. The TAF functions write.taf()
and taf.png()
are equivalent to the base R functions write.csv()
and png()
with some sensible defaults and useful features for TAF workflows.
An important observation in computer science (Spolsky 2004) is that it is harder to read code than to write it. For analytical work, this poses challenges for scientific reviews and the reuse of code at a later time, often by another person. TAF addresses this challenge by structuring a workflow in four separate scripts, each handling a clear and well-defined task, rather than in one monolithic script that does everything.
A medium-sized scientific workflow might consist of data.R
, model.R
, output.R
, and report.R
, each around 100 lines of code. In a larger workflow, where some of these steps might require a few hundred lines of code, TAF invites the scientist to organize the work in smaller steps. For example, a very short output.R
script could call secondary scripts such as output_parameters.R
, output_predictions.R
, and output_likelihoods.R
. To keep scripts short and readable, the scientist can write functions and store them inside utilities.R
, allowing lengthy or repeated computations to be performed in a single line in the scripts.
The CSV file format is default and convenient for saving tables in the data
, output
, and report
folders, while the model
folder contains results in any format, often determined by the software used in the analysis. For plots, the PNG file format is commonly used in TAF analyses, which can then be incorporated into reports in Word, HTML, or other formats. PDF is another file format for plots, practical for bundling many plots together in a multipage file. All other file formats provided by R can be used.
Similar to booting a computer, the TAF boot procedure readies the data and software components that are required for subsequent computations. The boot procedure takes place inside the boot
folder, where the taf.boot()
function looks for files called DATA.bib
(required) and SOFTWARE.bib
(optional).
As indicated by the bib
file extension, TAF metadata entries follow the BibTeX format (Patashnik 2003) originally designed for bibliographic information. One of the benefits of using the BibTeX format for TAF metadata is that information about all R packages is already available in this format, for example:
citation(package="TAF")
In the linreg
example, the DATA.bib
file contains a single metadata entry:
@Misc{ezekiel.txt,
originator = {Mordecai Ezekiel},
year = {1930},
title = {Speed of automobile and distance to stop after signal},
source = {file},
}
Most of the metadata are meant to be informative and the values are not subject to strict rules. The metadata example above informs us that the data were prepared by Mordecai Ezekiel for an analysis conducted in 1930. The only metadata fields that are parsed by taf.boot()
are the reference key ezekiel.txt
and source = file
.
The source field specifies where data or software originate from. The following types of values can be used in the source field:
GitHub reference of the form owner/repo[/subdir]@ref
, identifying a specific version of a GitHub resource.
URL, identifying a file to download.
Special value script
, indicating that a boot script (a custom R script) should be run to fetch or produce files.
Relative path starting with initial
, identifying the location of a file or directory somewhere inside the boot/initial
folder.
Special value file
or folder
, indicating that the file or folder is inside boot/initial/data
or boot/initial/software
.
In the linreg
example, the boot procedure simply copies ezekiel.txt
from boot/initial/data
to boot/data
, where it is now available for subsequent computations. In other TAF workflows, DATA.bib
may bring in data from multiple sources, involving local and remote databases, online data repositories, and GitHub resources.
When metadata entries in SOFTWARE.bib
point to R packages, they are installed in a local library for that workflow. The SOFTWARE.bib
file is where the scientist can specify the exact version of key software components that were used in the analysis, pointing to specific releases, tags, or commits of each software to strengthen reproducibility.
The TAF package comes with utility functions draft.data()
and draft.software()
that facilitate the creation of the DATA.bib
and SOFTWARE.bib
files. The online TAF Wiki provides more details and examples of metadata entries.
When authoring a TAF analysis, one can either start with a similar workflow and adapt it to the current analysis or start with a new workflow. The taf.skeleton()
function creates a new workflow, consisting of an empty boot/initial/data
folder and the four TAF scripts: data.R
, model.R
, output.R
, and report.R
. Each script provides a starting point for that step of the analysis, for example, a new data.R
script contains the following lines:
# Prepare data, write TAF data tables
# Before:
# After:
library(TAF)
mkdir("data")
After running taf.skeleton()
to create a new TAF workflow, the scientist can populate the boot/initial/data
folder with initial data files and run draft.data(file=TRUE)
to produce a DATA.bib
file.
The next step is then to run taf.boot()
to populate the boot/data
folder and start editing the data.R
script, reading files from the boot/data
folder.
Several functions from the TAF package have been mentioned in this introduction:
Initial TAF steps
draft.data
draft.software
taf.boot
taf.example
taf.skeleton
Running scripts
source.all
File management
mkdir
read.taf
write.taf
Plots
taf.png
The TAF package provides many additional functions that can be useful but are not required for authoring or running TAF workflows. Every function comes with a help page that includes examples and cross-references. Furthermore, typing ?TAF
opens a package help page that gives an overview of all the functions in the package, grouped by functionality.
Most TAF analyses are organized on GitHub, either in public or private repositories, although the TAF design does not require GitHub. Public repositories are a common standard for open and reproducible research, while private repositories can also be used, e.g., when data are sensitive and cannot be shared.
TAF repositories often contain the workflow in a ready-to-run state that does not include the results. This means that anyone browsing that analysis will need to run the workflow to see the results, in the same way as the linreg
example. The advantage of this is a guarantee that a repository does not contain scripts and results that might be out of sync. On the other hand, uploading the results to the repository may be a good choice if the analysis takes a long time to run, or to facilitate scientific review without requiring the reviewer to install software or learn how to run TAF analyses.
A TAF analysis on GitHub can be reviewed directly in a web browser, where the code is presented in a readable format and tables and plots are also easy to view. For a fuller review, the analysis can be downloaded or cloned, allowing the reviewer to open and analyze the contents of CSV tables in a spreadsheet, R, or other statistical software of choice. Furthermore, the reviewer can modify and rerun the analysis, evaluating the effect of alternative methods and model assumptions.
The development of TAF workflows started in 2016 when the authors of this vignette were hired as full-time developers at the International Council for the Exploration of the Sea (ICES), an international organization with 20 member countries collaborating in marine science. The aim was to develop a new framework for organizing ICES analyses, improving reviewability and reproducibility.
In 2017, the first version of the icesTAF
package (Millar and Magnusson 2023) was released on CRAN, followed by later version updates. In 2021-2022, the package was split into two separate but related packages:
TAF provides the core functionality and has no package dependencies.
icesTAF loads all of TAF and adds additional features that are primarily used within ICES and may depend on external packages.
ICES workflows use the icesTAF package, benefiting from access to a dedicated ICES TAF server, ICES databases, and other infrastructure. Other workflows that are not related to ICES use the smaller TAF package and reap the various benefits of zero package dependencies.
The development and maintenance of TAF and icesTAF is tightly synchronized, with new CRAN releases occurring within days of each other and using the same version number.
The make()
function, a part of the TAF package since 2017, is a powerful tool to run only the parts of a workflow where the underlying files have changed. For example, after plot scripts are modified, those scripts should be rerun to update the plots in seconds, without rerunning the entire workflow that might take minutes or hours.
The TAF package contains the make()
function, along with the related make.taf()
and make.all()
functions. These expert tools are not promoted for general use in TAF tutorials or introductory workshops, but they are useful for working with TAF at scale, managing, checking, and debugging a large number of workflows, providing support to multiple users at the same time, etc.
As the name implies, the make()
function is closely based on the shell command of the same name (Stallman et al. 2023) but implemented in R. As a generally useful tool beyond the context of TAF workflows, the make()
function was eventually packaged and released on CRAN in its own package called makeit
in 2023. The package contains annotated examples and a discussion.
The UN Food and Agriculture Organization (FAO) monitors the status of global marine resources and publishes a summary report (FAO 2024) every two years, known as SOFIA. The work behind this publication involves organizing and conducting analyses of hundreds of fish and invertebrate stocks from all the world’s oceans.
FAO is currently undertaking a methodological update that leverages advancements in computing and data availability to enhance the understanding of the state of the world’s fish stocks (Sharma 2023). One part of this restructuring is the adoption of TAF, which provides a standard structure that supports open and reproducible research. The SOFIA package (Magnusson and Sharma 2024) is a collection of tools to facilitate this work.
The targets
package (Laundau 2021) has somewhat similar objectives as TAF, to organize and run computational workflows. A key distinction between the packages is that in TAF the user organizes their workflow by writing scripts that produce files, but in targets
the user organizes their workflow by writing functions that produce objects, to be passed to the next step. Thus, the two packages present two different paradigms with a similar end result.
A TAF workflow structure is predetermined, as the main scripts will always be named data.R
, model.R
, output.R
, and report.R
. After running a TAF analysis, one can always expect to find the data and results in CSV format in the data
, output
, and report
folders. In contrast, the targets
package does not confine the user to a predetermined workflow design, and tables containing data and results are generally accessed by the user as R objects rather than files.
The targets
package requires a higher level of R expertise than TAF, both for the scientist to design a workflow using interconnected functions, and for a reviewer to browse the data, model settings, and results. The utility functions that come with TAF are for basic data and file management, while the utility functions that come with targets
are for parallel computations and a variety of cloud services.
Using SOFTWARE.bib
declarations of software and versions, TAF supports reproducibility and allows the workflow to be rerun at a later time and produce the same results as the original analysis. Alternatively, TAF can be used in combination with the renv
package (Ushey and Wickham 2025), rig
(Csárdi 2024), Docker (Merkel 2014), and other tools dedicated to reproducible research.
A common question about TAF is “Can TAF do X?”, where X is some desired functionality. The answer is yes, if it can be done using R scripts, but TAF is not going to do it by itself. The work performed by TAF is what the scientist has written in the scripts. Thanks to the general nature of the R language, R scripts can do a wide variety of things, including running software written in other languages than R.
Incremental development of TAF analyses usually takes place in a GitHub repository that can be kept private while the work is ongoing and then made public once the analysis is ready for presentation. There are essentially three paths to produce a TAF analysis: (1) start from scratch using taf.skeleton()
, (2) start from a template such as taf.example("linreg")
or another existing TAF analysis, or (3) add TAF scripts to an already completed analysis, to strengthen its reproducibility and reviewability.
Among the online examples, one-off analyses such as corona, weather, and trip are created from scratch, while ICES and FAO analyses are often based on existing templates. The SPC analyses of yellowfin and albacore tuna are examples where the model development and model fitting is conducted independent of TAF using Bash shell scripts, but TAF scripts are added afterwards to showcase the data and results of the analysis.
The skipjack growth analysis is an example where only the scripts can be made public, as the data are sensitive and cannot be shared. Sharing the scripts allows review and reuse of the methods used in the analysis and the TAF repository serves as a technical annex of the published report. The initial data should be included in the TAF repository if the data are not sensitive, allowing anyone to rerun the analysis and explore modifications of the analysis.
For review purposes, the folders produced by running the TAF analysis can also be uploaded, allowing anyone to browse the data and output in CSV format, along with plots, typically in PNG or PDF format. This is especially helpful for a scientific committee reviewing the analyses, as well as anyone else who would like to browse and use these public data.
The TAF design is intuitive and practical, dividing any analysis into four steps, which makes the code easy to organize, read, understand, adapt, and extend. For the scientist, the modular scripts offer a separation of concerns, task-oriented focus, and a structure that facilitates revisiting and debugging code. For scientific workplaces and organizations, adopting an agreed way to work on analytical workflows can greatly enhance teamwork, review processes, and institutional memory when new staff inherit earlier workflows.
TAF is intentionally low tech, simply running four scripts one after the other, but the simple design allows it to be paired with specialized or new and upcoming technologies.
report: R Markdown, Quarto, Shiny
ICES folks
FAO folks
SPC folks
Csárdi, G. (2024). rig: The R Installation Manager. Version 0.7.0.
https://github.com/r-lib/rig
Food and Agriculture Organization (FAO). (2024). The State of World Fisheries and Aquaculture 2024: Blue Transformation in Action. Rome.
https://doi.org/10.4060/cd0683en
Landau, W.M. (2021). The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959.
https://doi.org/10.21105/joss.02959, https://cran.r-project.org/package=targets
Magnusson, A. (2023). makeit: Run R Scripts if Needed. R package version 1.0.1.
https://cran.r-project.org/package=makeit
Magnusson, A. and Millar, C. (2023). TAF: Transparent Assessment Framework for Reproducible Research. R package version 4.2.0.
https://cran.r-project.org/package=TAF
Magnusson, A. and Sharma, R. (2024). SOFIA: Tools to Work with SOFIA Analyses. R package version 2.1.3.
https://github.com/sofia-taf/SOFIA
Merkel, D. (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux Journal, 239, 76-91.
https://dl.acm.org/doi/pdf/10.5555/2600239
Millar C. and A. Magnusson. (2023). icesTAF: Functions to Support the ICES Transparent Assessment Framework. R package version 4.2.0.
https://cran.r-project.org/package=icesTAF
Patashnik, O. (2003). BibTEX yesterday, today, and tomorrow. TUGboat, 24, 25-30.
https://tug.org/TUGboat/tb24-1/patashnik.pdf
Sharma, R. (2023). The state of world fishery resources: Expanding efforts to determine the status of global fish stocks. Research Outreach.
https://researchoutreach.org/wp-content/uploads/2023/05/Rishi-Sharma.pdf
Spolsky, J. (2004). Joel on Software: And on Diverse and Occasionally Related Matters That Will Prove of Interest to Software Developers, Designers, and Managers, and to Those Who, Whether by Good Fortune or Ill Luck, Work with Them in Some Capacity. Berkeley: Apress.
https://doi.org/10.1007/978-1-4302-0753-5
Stallman, R.M., McGrath, R., and Smith, P.D. (2023). The GNU Make Reference Manual. Version 4.4.1.
https://www.gnu.org/software/make/manual/make.html
Ushey, K. and Wickham, H. (2025). renv: Project Environments. R package version 1.1.2.
https://cran.r-project.org/package=renv
The online examples listed below come from international organizations where TAF has been adopted to organize reproducible workflows and present data, methods, and results as open science. The organizations are the International Council for the Exploration of the Sea (ICES), UN Food and Agriculture Organization (FAO), and the Pacific Community (SPC). These examples are followed by a collection of workflows covering domains outside of fisheries science.
ICES maintains four core examples that demonstrate TAF analyses using the icesTAF
package. These simple workflows are used in icesTAF
workshops and system tests. They use only core TAF functions, so the library(icesTAF)
call at the top of each script could be replaced with library(TAF)
and produce the same results.
2015_rjm-347d | |
---|---|
Analysis | Spotted ray in the North Sea |
Repository | https://github.com/ices-taf/2015_rjm-347d |
Methods | ICES method for data-limited stocks |
External deps | 1 CRAN package |
Data sources | 2 local files |
Results | 3 tables, 1 plot |
Notes | Simple arithmetic, the hello world of icesTAF analyses |
2015_had-iceg | |
---|---|
Analysis | Haddock in Icelandic waters |
Repository | https://github.com/ices-taf/2015_had-iceg |
Methods | Age-structured model, C++ standalone executable |
External deps | - |
Data sources | 1 local file |
Results | 10 tables, 2 plots |
Notes | Model executable is in bootstrap/initial/software |
2016_ple-eche | |
---|---|
Analysis | Plaice in the English Channel |
Repository | https://github.com/ices-taf/2016_ple-eche |
Methods | Age-structured model, C++ standalone executable |
External deps | 2 GitHub packages, many CRAN packages |
Data sources | 2 local files |
Results | 14 tables, 2 plots |
Notes | ggplot2 depends on many packages |
2016_cod-347d | |
---|---|
Analysis | Cod in the North Sea |
Repository | https://github.com/ices-taf/2016_cod-347d |
Methods | Age-structured model, C++ standalone executable |
External deps | 1 CRAN package |
Data sources | 13 local files |
Results | 23 tables, 2 plots |
Notes | Similar to Icelandic haddock example, just more parts |
FAO uses the SOFIA
and sraplus
packages to analyze the status of global marine resources. The sraplus
package comes with over 100 package dependencies. The precise number of dependencies changes over time, as underlying packages change their dependencies. The sraplus-deps
analysis is a simple workflow that describes the dependencies, while the WorkshopEffortShared
demonstrates a SOFIA-TAF analysis using the sraplus
package and a small dataset.
sraplus-deps | |
---|---|
Analysis | Dependency analysis of the sraplus package |
Repository | https://github.com/sofia-taf-dev/sraplus-deps |
Methods | pdeps(), a function to analyze package dependencies |
External deps | - |
Data sources | Online (the sraplus software project) |
Results | 2 tables, 1 plot |
Notes | No local data files |
WorkshopEffortShared | |
---|---|
Analysis | FAO analysis of the relative status of marine stocks |
Repository | https://github.com/sofia-taf/WorkshopEffortShared |
Methods | sraplus, an R package |
External deps | 2 GitHub packages, >100 CRAN packages |
Data sources | 4 local files |
Results | 6 tables, 16 plots |
Notes | sraplus depends on many packages |
WorkshopEffortByStock | |
---|---|
Analysis | FAO analysis of the relative status of marine stocks |
Repository | https://github.com/sofia-taf/WorkshopEffortByStock |
Methods | sraplus, an R package |
External deps | 2 GitHub packages, >100 CRAN packages |
Data sources | 4 local files |
Results | 6 tables, 16 plots |
Notes | sraplus depends on many packages |
SPC uses TAF to make the results of tuna and billfish analyses available online, inviting anyone to browse the data and results and/or rerun the analysis.
ofp-sam-yft-2023-diagnostic | |
---|---|
Analysis | Yellowfin tuna in the Western and Central Pacific |
Repository | https://github.com/PacificCommunity/ofp-sam-yft-2023-diagnostic |
Methods | Age-structured model, C++ standalone executable |
External deps | 2 GitHub packages, many CRAN packages |
Data sources | 7 local files |
Results | 24 tables, 4 plots |
Notes | TAF analysis is organized inside a subdirectory |
ofp-sam-alb-2024-diagnostic | |
---|---|
Analysis | South Pacific Albacore tuna |
Repository | https://github.com/PacificCommunity/ofp-sam-alb-2024-diagnostic |
Methods | Age-structured model, C++ standalone executable |
External deps | 2 GitHub packages, many CRAN packages |
Data sources | 7 local files |
Results | 22 tables, 4 plots |
Notes | TAF analysis is organized inside a subdirectory |
ofp-sam-skj-2025-tag-growth-public | |
---|---|
Analysis | Skipjack tuna growth analysis |
Repository | https://github.com/PacificCommunity/ofp-sam-skj-2025-tag-growth-public |
Methods | Comparison of growth models fitting to tags and otolith data |
External deps | 4 CRAN packages |
Data sources | 3 local files |
Results | 34 tables, 10 plots |
Notes | Public repository includes only scripts, not sensitive data |
The following examples demonstrate various uses of TAF covering domains outside of fisheries science.
taf-four-minutes | |
---|---|
Analysis | Comparison between TAF and targets |
Repository | https://github.com/ices-taf-dev/taf-four-minutes |
Methods | Linear regression |
External deps | Many CRAN packages |
Data sources | 1 local file |
Results | 2 tables, 1 plot |
Notes | Tidyverse workflow |
corona | |
---|---|
Analysis | COVID-19 cases and deaths by country |
Repository | https://github.com/arni-magnusson/corona |
Methods | Basic arithmetic and plots, global and selected countries |
External deps | Many CRAN packages |
Data sources | Online (Johns Hopkins University) and 1 local file |
Results | 15 tables, 64 plots |
Notes | Useful during the global epidemic to view the latest data |
weather | |
---|---|
Analysis | Compare climate in selected cities around the world |
Repository | https://github.com/arni-magnusson/weather |
Methods | World map of UV, trigonometry for sunrise calculations, etc. |
External deps | 3 CRAN packages |
Data sources | Online (NASA Earth Observations) and 15 local files |
Results | 5 tables, 12 plots |
Notes | Winter and summer temperatures, snowfall, humidity, etc. |
trip | |
---|---|
Analysis | Family vacation plan |
Repository | https://github.com/arni-magnusson/trip |
Methods | World map with flight routes, testing R Markdown and HTML |
External deps | Many CRAN packages |
Data sources | 2 local files |
Results | 2 tables, 1 map |
Notes | Creates the webpage https://arni-magnusson.github.io/trip/ |