In this vignette, we will cover the main steps involved in the creation of the Core GeoLocator Data Package (i.e., before analysis) with the Swiss Ornitholigical Institute data, starting from the placement of order. Here are the mains steps, with the corresponding code provided below:
1. Placing order
Ask collaborator to enter basic metadata information on the project by creating a datapackage.json
:
title
-
constributors
: including email to define who has access to the private Zenodo during embargo -
licences
: CC-BY-4.0 by default. embargo
Tip
This can be done with
GeoLocatoR::create_gldp()
(see below) or using this datapackage.json template file.
2. Sending geolocator together with pre-filled tags.csv
and observations.csv
Provide basic instructions to ringers/collaborator on how to fill these csv filer with link to the documentation: tags.csv
and observations.csv
:
Caution
- Do not modify columns header, do not add or remove column.
- Make sure to fill all required fields (marked by
*
in documentations)- Each tags should be presents only once in
tags.csv
and at least one inobservations.csv
withobservation_type: "equipment"
.- You will only have access to the data when these two tables are returned complete.
It is strongly recommended to generate these two files with add_gldp_soi()
with pre-filled field based on the access database (see below).
3. Processing returned data
Upon reception of the geolocator:
- Extract geolocator data and store the data (as usual) on the
Z:
drive. - Add also the filled
tags.csv
andobservations.csv
in the same folder - Create the first core DataPackage with
add_gldp_soi()
in any temporary drive (see below) - Upload on Zenodo
- Share access right to contributors based on their urls.
1. Initiate DataPackage with datapackage.json
In the contract/agreement, the following terms should be specify:
- List of contributor with email giving them access write to the datapackage on zenodo while still under embargo
- Licenses of the data according to agreement
- Embargo date according to agreement
Here is a simple example to create a geolocator data package based on these basic information:
pkg <- create_gldp(
title = "Geolocator study of {species_name} in {location}", # required
contributors = list( # required
list(
title = "Raphaël Nussbaumer",
roles = c("ContactPerson", "DataCurator", "ProjectLeader"),
email = "raphael.nussbaumer@vogelwarte.ch",
path = "https://orcid.org/0000-0002-8185-1020",
organization = "Swiss Ornithological Institute"
),
list(
title = "Yann Rime",
roles = c("Researcher"),
email = "yann.rimme@vogelwarte.ch",
path = "https://orcid.org/0009-0005-7264-6753",
organization = "Swiss Ornithological Institute"
)
),
# The default licenses is a Creative Common
licenses = list(list(
name = "CC-BY-4.0",
title = "Creative Commons Attribution 4.0",
path = "https://creativecommons.org/licenses/by/4.0/"
)),
# This is the default value, set to a past date so that there are no embargo
embargo = "1970-01-01"
)
Read the datapackage specification to learn about all recommended metadata that can be added. They can be added in create_gldp()
or update manually on pkg
directly as below:
# Description is really important to provide some textual background information on the project.
pkg$description <- "Geolocator study of Mangrove Kingfisher and Red-capped Robin-chat on the coast of Kenya"
# If you have a website link, it's quite a nice way to link them up
pkg$homepage <- "https://github.com/Rafnuss/MK-RCRC"
# And the url to an image describing your datapackage
pkg$image <- NULL
# Versioning of the datapackage is a good idea to allow update of new data.
pkg$version <- "1.0.0"
# You can also add keywords:
pkg$keywords <- c("Mangrove Kingfisher", "Red-capped Robin Chat", "multi-sensor geolocator")
# Important field for publication/researcher
# By default, the licenses of the data is Creative Common BY 4.0.
# pkg$licenses
# Add DOI of the datapackage if already available or reserve it https://help.zenodo.org/docs/deposit/describe-records/reserve-doi/#reserve-doi
pkg$id <- "https://doi.org/10.5281/zenodo.11207141"
# Provide the recommended citation for the package
pkg$citation <- "Nussbaumer, R., & Rime, Y. (2024). Woodland Kingfisher: Migration route and timing of South African Woodland Kingfisher (v1.1). Zenodo. https://doi.org/10.5281/zenodo.11207141"
# Funding sources
pkg$grants <- c("Swiss Ornithological Intitute")
# Identifiers of resources related to the package (e.g. papers, project pages, derived datasets, APIs, etc.).
pkg$relatedIdentifiers <- c("Nussbaumer, R., Jackson, C. Using geolocators to unravel intra-African migrant strategies. http://dx.doi.org/10.13140/RG.2.2.34477.10721 ")
# List of references related to the package
pkg$references <- NULL
Once you’re done, you can visual them
You can export it as datapackage.json
with:
Tip
Save this file in the root of the project folder on the
Z:
drive
2. Create pre-filled tags.csv
and observations.csv
from SOI database (no data)
First, we need to extract the data from the access database.
# Root folder of the dataset (typically the Z-drive)
soi_data_directory <- "/Users/rafnuss/Library/CloudStorage/Box-Box/geolocator_data/UNIT_Vogelzug/"
# Load the geolocator (GDL) database from the access file
gdl0 <- read_gdl(access_file = file.path(soi_data_directory, "database/GDL_Data.accdb"), filter_col = FALSE)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
For this example, we will take the geolocator data from the order OtuScoES24
and add them to the pk
created earlier.
pkg <- pkg %>%
add_gldp_soi(
gdl = gdl0 %>% filter(str_detect(OrderName, "OtuScoES24")),
directory_data = file.path(soi_data_directory, "data"),
allow_empty_o = TRUE, # This is required to create an empty observation table
)
#> Warning: We could not find the data directory for 49 tags (out of 49). GDL_IDs: 32TK,
#> 32OD, 32UE, 32TI, 32OQ, 32SL, 32SV, 32OU, 32UF, 32TJ, 32PJ, 32UX, 32OR, 32OT,
#> 32TM, 32NK, 32PI, 32UV, …, 32SS, and 32NL. These will not be imported.
A warning message should confirm that no data is available in the dataset. However, as we have allowed to create an empty observations table with allow_empty_o
.
Note
In this example we used the initial datapackage
pkg
, but if you want to use an existingdatapackage.json
, you can you can create the packagepkg
withpkg <- read_gldp("https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json") #> Warning: `file` #> 'https://gist.githubusercontent.com/Rafnuss/457b9096a4fc0ad004115df1712a8923/raw/46d5d6e4461f919f725961b037199f4fcf8313b0/datapackage.json' #> should have a resources property containing at least one resource. #> ℹ Use `add_resource()` to add resources.
The warning message is normal, it’s reminding you that the package is actually empty.
As the geolocator have just been built and ready to be ship, you can generate the observations.xlsx
and tags.xlsx
table to be sent to the ringers. I suggest to create a .xlsx
spreadsheet and not .csv
to preserve the column class.
write_xlsx(tags(pkg), "./tags.xlsx")
write_xlsx(observations(pkg), "./observations.xlsx")
3. Create core DataPackage with data from SOI database
If we have data available on the Z:
drive, we can use the same function add_gldp_soi()
to add the three core resources on pkg
# Initiate pkg with a minimalist metadata, use exisiting datapackage.json if possible
pkg <- create_gldp(
title = "Geolocator data of Cossypha natalensis in Kenya",
contributors = list(
list(
title = "Raphaël Nussbaumer",
roles = c("ContactPerson", "DataCurator", "ProjectLeader")
)
),
)
pkg <- pkg %>% add_gldp_soi(
gdl = gdl0 %>% filter(str_detect(OrderName, "CosNatKE")),
directory_data = file.path(soi_data_directory, "data")
)
#> Warning: We could not find the data directory for 55 tags (out of 65). GDL_IDs: 27MK,
#> 27LF, 27LC, 27MM, 27ML, 27MO, 27LE, 24SX, 24SW, 24SY, 26YC, 26YE, 28AF, 28CF,
#> 28BE, 28BG, 28BF, 28BH, …, 32YS, and 30IJ. These will not be imported.
#> ⠙ 4/10 ETA: 7s |
#> ⠹ 5/10 ETA: 6s |
#> ⠸ 7/10 ETA: 4s |
print(pkg)
#> A GeoLocator Data Package with 3 resourcess:
#> • tags
#> • measurements
#> • observations
#> Use `unclass()` to print the Data Package as a list.
Whenever you add some new resources to a datapackage, gldp_update()
should be called so that the following field are updated created
, temporal
, reference_location
, taxonomic
4. Update tags
and/or observations
table
There are different way to modify your resources table.
- In the ideal case,
tags.csv
andobservations.csv
have been returned by the ringer. In this case you can simply replace them with
- In other case, you might want to edit the default table. I suggest to create a temporary
.xlsx
spreadsheet (and not.csv
to preserve the column class), modify it in excel, and read it back into R. You can also edit the tibble directly withdplyr
functions.
- Finally, you can also modify the table (not recommended). For this vignette, I’ll just randomly create value for the two required field
scientific_name
andring_number
Warning
Don’t forget to update the meta-data information of your Data Package when you’ve updated any tables
pkg <- update_gldp(pkg)
5. Check and Validate datapackage
Before publishing your data, it is essential to validating your GeoLocator Data Package.
A first check can be done with check_gldp()
check_gldp(pkg)
The python package frictionless-py
offers more advance and general validation.
First you’ll need to install python and the frictionless package,
pip install frictionless
validate_gldp(pkg, path = "/Users/rafnuss/anaconda3/bin/")
6. Write package to directory
Now that we have confirmed that all the data are correct, we can write the package with
write_package(pkg, directory = file.path("~/", pkg$name))
Tip
The folder created can now be uploaded on Zenodo. All information needed in the upload are provided in
datapackage.json
!