tag <- tag_create("18LX",
crop_start = "2017-06-20",
crop_end = "2018-05-02",
quiet = TRUE
)
8 Labelling tracks
In this last chapter, we tackle the challenge of labelling your tag!
This chapter assumes that you are familiar with the overall process of GeoPressureR presented in the Basic tutorial, as well as the concept of PressurePath and the use of GeoPressureViz.
8.1 Labelling principles
Labelling your tracks is imperative because GeoPressureR requires highly precise and well-defined pressure timeseries of a fixed/constant location both horizontally (geographical: +/- 10-50km) and vertically (altitude: +/- 2m).
The procedure involves labelling each datapoint (1) with the flight
label when the bird is in active migratory flight, (2) with the discard
label for pressure datapoints that should be discarded from the matching exercise and (3) with elev_*
for continued changed of altitude within a stap. The overall objective is to create a pressure timeseries for each stationary period where the bird can be assumed to remain at the same location and elevation during the entire period.
Labelling
flight
defines stationary periods and flight duration. A stationary period is a period during which the bird is considered static relative to the size of the grid (~10-50km). The start and end of the stationary period is used to define the pressure timeseries to be matched. Having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods.Labelling
discard
eliminates vertical (altitudinal) movements of the bird. The algorithm matching the pressure timeseries is sensitive to small pressure variations of a few hPa, such that even altitudinal movements of a couple of meters can throw off the estimation map for short stationary periods. Since the reanalysis data to be matched is provided at a single pressure level, we must discard all datapoints corresponding to a different elevation.Labelling stapelev with
elev_*
accounts for changes in altitude within a single stationary period. This more advenced labeling technique is introduced in section Elevation period.
Each species’ migration behaviour is so specific that manual editing remains the fastest option. Indeed, small changes in pressure and activity can correspond to either local movement or slow migration. Expertise on your bird’s expected migration style will be essential to correctly label your tracks. As you label, you will learn how the bird is moving (e.g. long continuous high altitude flights, short flights over multiple days, alternation between short migration flights and stopovers, etc.). Manual editing also gives a sense of the uncertainty of your labelling, which is useful to interpret your results.
8.2 With or without acceleration data
Acceleration data can significantly improve our understanding of bird activity and movement. It’s often very helpful for the labeling flights, and even essential for some specie. For instance, acceleration data is the only way to idenfy low-altitude migration flight. In addition, acceleration is typically recorded at a higher temporal resolution (5min), which can refine flight duration and thus the movement model when building the trajectory.
When acceleration data is available, use the flight
label on the acceleration timeseries, and the discard
label on the pressure timeseries.
In the absence of acceleration data, both labels are applied to the pressure timeseries.
Let’s see an example using acceleration data with 18LX.
Acceleration data can be used to initialize the flight
label automatically. tag_label_auto()
first classifies low and high activity using a k-means clustering, and then identifies and labels long periods of high activity (e.g., lasting more than 30 minutes) as flights. A post-processing reclassify short high-activities based on the proximity of other flights to improve the classification of low-activity period happening during a migration flight (e.g. gliding phase).
tag <- tag_label_auto(tag, min_duration = 30)
plot(tag, type = "acceleration")
More classification methods are described in the PALMr manual.
8.3 The process
The general process to perform the labelling is as follows:
# 1. Create the csv label file `./data/tag-label/18LX.csv` (only once)
tag_label_write(tag)
# 2. Edit csv file on TRAINSET
# *on TRAINSET*
# 3. Export csv file `./data/tag-label/18LX-labeled.csv`
# *on TRAINSET*
# 4a. Read exported label file
tag <- tag_label_read(tag)
# 4b. Compute stationary period data.frame tag$stap
tag <- tag_label_stap(tag)
# Steps 4a and 4b are typically performed simultaneously with
tag <- tag_label(tag)
In practice, labelling of your tags will then be a long iterative process where you check the validity of the labels and subsequently refine them by modifying the csv file ./data/tag-label/18LX-labeled.csv
on TRAINSET.
You can expect to spend 1 minutes to 60 minutes per track, depending on the species’ complexity, and likely more for the first track you label. The more you label, the faster it will get!
8.4 Introduction to TRAINSET
TRAINSET is a web-based graphical tool for labelling timeseries.
We customized the original TRAINSET app for a smoother experience in labelling tags and twilight data: https://trainset.raphaelnussbaumer.com/
Please use this new version and report issues or bugs on the Github repository.
Start by uploading your .csv file (e.g., data/tag-label/18LX.csv
) using the “Upload Tag Label” button.
A few tips:
- Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labelling (add/remove a label), e.g. using SHIFT.
- Because of the large number of datapoints, keep a narrow temporal window to avoid your browser from becoming slow or unresponsive.
- Change the
Active Series
andReference Series
depending on what you are labelling but use both timeseries at the same time to help you determine what the bird might be doing. - Adapt the y-axis range to each stationary period to see the small (but essential) pressure variations which are not visible in the full view.
8.5 Elevation period
It is common for birds to change elevation level within the same stationary period (e.g., roost vs feeding site or altitudinal movements for mountainous species). Such movements can result in drastic variations in pressure, which interfere with the ERA5 matching exercise.
To circumvent this issue while preserving as much data as possible in the match, you can label pressure data with different elevation levels by using elev_x
in TRAINSET. To do so, click on the + sign in the bottom right to create a new elevation level (e.g., elev_1
). Assign the new label to all pressure datapoints belonging to the same elevation period.
Note 1: Conseptually, you can think of unlabelled datapoints (i.e., black dots) as
elev_0
.Note 2: The elevation levels do not have to be continuous: it’s even better if the same elevation period comes back several times during the same stationary period. For instance, you can have unlabelled datapoints for the feeding site every day and labelled datapoints at the roosting location of every night as
elev_1
.Note 3: You can restart the count of
elev_x
for each new stationary period (e.g., the datapoints from stap 1 labelled aselev_1
are not connected to the datapoints from stap 2 also labelled aselev_1
).Note 4: In general, short elev level (< 12-24h) might be better left out (i.e, discard) to reduce computational/memory cost, especially, if you have long eleve level on the same stap. But see note 5.
Note 5: It’s ideal to have the highest and lowest elevation included as they can bring tromendous constrains on the elevation mask of the likelihood map.
8.6 Labelling checks
Use the checks outlined below to evaluate and improve the quality of your labelling.
8.6.1 Initial checks
Check 1: Duration of stationary periods and flights
The first test consists in checking the duration of flights and stationary periods. This is systematically checked when computing stationary periods and a message will give you feedback on where to find potential errors:
tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v1.csv")
── Short stationary periods (<6hr): ────────────────────────────────────────────
! Stap 7 (2017-08-30 23:47 - 2017-08-30 23:52) : 5m
! Stap 26 (2018-04-15 14:57 - 2018-04-15 18:47) : 3h 49m 60s
! Stap 27 (2018-04-15 19:32 - 2018-04-15 20:07) : 35m
! Stap 30 (2018-04-29 23:37 - 2018-04-29 23:42) : 5m
! Stap 32 (2018-04-30 19:22 - 2018-04-30 19:37) : 15m
! Stap 33 (2018-04-30 21:47 - 2018-04-30 21:52) : 5m
! Stap 34 (2018-04-30 23:02 - 2018-04-30 23:07) : 5m
! Stap 35 (2018-05-01 00:02 - 2018-05-01 00:07) : 5m
── Short flights (<2hr): ───────────────────────────────────────────────────────
! Flight 14 -> 15 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m
! Flight 18 -> 19 (2017-09-19 23:32 - 2017-09-20 00:42) : 1h 10m
! Flight 26 -> 27 (2018-04-15 18:47 - 2018-04-15 19:32) : 45m
! Flight 27 -> 28 (2018-04-15 20:07 - 2018-04-15 21:37) : 1h 30m
! Flight 31 -> 32 (2018-04-30 18:37 - 2018-04-30 19:22) : 45m
! Flight 33 -> 34 (2018-04-30 21:52 - 2018-04-30 23:02) : 1h 10m
! Flight 34 -> 35 (2018-04-30 23:07 - 2018-05-01 00:02) : 55m
! Flight 35 -> 36 (2018-05-01 00:07 - 2018-05-01 00:47) : 40m
Here, I used the label produced by tag_label_auto()
without making any edits on TRAINSET. In such cases, the most common error is that a flight is cut because the bird was inactive during a few data-points. You can correct this by cleaning up flights on TRAINSET.
tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v2.csv")
── Short stationary periods (<6hr): ────────────────────────────────────────────
! Stap 25 (2018-04-15 15:02 - 2018-04-15 18:47) : 3h 45m
── Short flights (<2hr): ───────────────────────────────────────────────────────
! Flight 13 -> 14 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m
! Flight 17 -> 18 (2017-09-19 23:32 - 2017-09-20 01:07) : 1h 35m
Check 2: Pressure timeseries
Here we visually inspect that the pressure timeseries of each stationary period (1) is correctly grouped and (2) does not include pressure outliers (e.g., altitudinal movements). It is worth zooming in and panning on each individual stationary period to manually inspect the timeseries.
plot(tag, type = "pressure", quiet = TRUE)
It is acceptable to have a few stationary periods/flights shorter than the warning threshold of checks 1 and 2. Feel free to overwrite the default warning (e.g., warning_flight_duration
).
However, having too many stationary periods leads to slow code as the number of likelihood maps and, more importantly, the size of the graph becomes excessively large. This is where the parameter include_min_duration
from tag_set_map()
is particularly useful: it allows you to ignore the short stationary periods and merge flights.
Note that while the MSE mismatch is usually not useful for stationary periods shorter than a few hours, a single high or low pressure datapoint can indicate a high or low altitude location which is not common on the map!
8.6.2 GeoPressureViz check
At this stage, the label file is good enough to start estimating the position of the bird. This will be super helpful to then compare the pressure measurements to the ERA5 pressure and refine the labelling. To help in this process, we’ll be using GeoPressureViz.
Before this, we need to compute the likelihood maps. It’s generally better to start from scratch to be sure everything is working fine:
tag <- tag_create("18LX",
crop_start = "2017-06-20",
crop_end = "2018-05-02", quiet = TRUE
) |>
tag_label("./data/tag-label/18LX-labeled-v3.csv", quiet = TRUE) |>
tag_set_map(
extent = c(-16, 23, 0, 50),
scale = 1, # Use a coarse scale at first 1° is usually enough
known = data.frame(
stap_id = 1,
known_lat = 48.9,
known_lon = 17.05
)
)
tag <- geopressure_map(tag, quiet = TRUE)
geopressureviz(
tag,
path = tag2path(tag, interp = 1) # interpolate stap shorter than 1 day to have a smoother trajectory
)
In GeoPressureViz, your main task is to draw a coherent trajectory with the following steps:
- Toggle the “Full Track” button to be in the stationary period view
- “Edit position” of the stationary periods to be coherent.
- “Query pressure” to retrieve the ERA5 pressure at this position
- Compare the ERA5 pressure to the tag pressure in the bottom panel
- Refine the label in TRAINSET accordingly: (1) Modify flight duration with
flight
, (2) label pressure outliers withdiscard
, (3) use stapelev withelev_x
label or even (4) merge or split stationary periods
To make sure that the labelling is correctly performed, you should be able to manually edit the position of the trajectory so that (1) the distance between consecutive stationary periods matches the flight duration (i.e., dots following on the circles) and (2) the pressure measurements match the ERA5 pressure at this location (i.e., high likelihood value on the map).
Note that the initial path shown on GeoPressureViz is likely not realistic as no movement model has been included (i.e., no limitation on bird flight duration). This is fine at this stage: we don’t really want to assume a realistic path, just to see what pressure can tell us without assuming anything. Using this path, we can retrieve the ERA5 pressure along this path,
Mismatches between the geolocator and ERA5 are usually indicative of altitudinal movement of the bird. Depending on the situation, there are multiple ways to label this mismatch.
- In the easiest case, the bird simply flew within the same stationary site (<10-50km) for a short time and came back to the same location. In this case, you can simply discard the pressure timeseries during the temporary change of altitude.
- If the bird changes altitude but never comes back to the same elevation, you can either consider that the new altitude is a new stationary period and label the activity data, or you can discard the timeseries of the shorter period. It is essential that the resulting timeseries matches the ERA5 pressure everywhere. Matches are usually better for longer periods. Looking at activity data for the same period can also help understand what the bird is doing.
- If the bird changes back and forth between two elevation levels, use the
elev_x
label to label them accordingly.
As a general guideline, it is better to remove a bit more for long stationary periods to get a better estimation of the position. You can do this iteratively by removing a bit and seeing whether the position improves as a result.
The main advantages of using GeoPressureViz are:
- Only query the ERA5 pressure for stationary periods that need to be checked
- Compare the ERA5 pressure for different positions
- Separate the likelihood of pressure (mask and mismatch) and light as well as flight duration to see where they agree/disagree
- Filter out short stationary periods to see if they are coherent, before refining the trajectory by adding the shorter ones
8.6.3 Pressurepath check
A final check should be performed on the pressurepath computed from your most likely trajectory.
We want to look at the histogram of the pressure error between geolocator and ERA5 timeseries for each stapelev.
load("./data/interim/18LX.Rdata")
plot_pressurepath(pressurepath, type = "hist", plot_plotly = FALSE)
- For long stationary periods (over 5 days), you want to check that there is a single mode in your distribution. Two modes indicate that the bird is spending time at two different altitudes. - This is usual when birds have a day site and a night roost at different elevations. In such cases, use the
elev_x
label. - The red vertical dotted line indicates +/-3 sd which can be helpful to identify potential outliers.
- Stationary periods which have an empirical sd greater than the one used (
sd
) are highlighted in red. The likelihood map for these stationary periods might not be correct.
sd
?
The conversion of the mean squared error (MSE) into a likelihood performed by geopressure_map_likelihood()
assumes that the error distribution of pressure is normally distributed. This has important consequences in that it does not perform well in the presence of large errors, typically resulting in a map with a single possible pixel.
As mentioned in Pressure map, sd
should be adjusted to your own data. You can use the empirical sd
value displayed on the histogram to guide you in setting the standard deviation parameter sd
in geopressure_map()
.
Note that you can use a different sd
value to account for stationary periods with high altitudinal variation (e.g., mountainous areas), while keeping a low sd
value when the bird is in a low-topography area.
In this case, an sd=1
(default value) seems adequate, though 0.8
or 0.9
might offer more precision positioning.
This check should also be repeated at the very end of the GeoPressureR workflow with the most likely trajectory and the associated pressurepath.
In addition, a last check with GeoPressureViz is highly recommended to see the difference between the likelihood maps and the marginal map. Remember that models are only as good as the data provided!