8 Labelling tracks

In this last chapter, we tackle the challenge of labelling your tag!

Important

This chapter assumes that you are familiar with the overall process of GeoPressureR presented in the Basic tutorial, as well as the concept of PressurePath and the use of GeoPressureViz.

8.1 Labelling principles

Labelling your tracks is imperative because GeoPressureR requires highly precise and well-defined pressure timeseries of a fixed/constant location both horizontally (geographical: +/- 10-50km) and vertically (altitude: +/- 2m).

The procedure involves labelling each datapoint (1) with the flight label when the bird is in active migratory flight, (2) with the discard label for pressure datapoints that should be discarded from the matching exercise and (3) with elev_* for continued changed of altitude within a stap. The overall objective is to create a pressure timeseries for each stationary period where the bird can be assumed to remain at the same location and elevation during the entire period.

Labelling flight defines stationary periods and flight duration. A stationary period is a period during which the bird is considered static relative to the size of the grid (~10-50km). The start and end of the stationary period is used to define the pressure timeseries to be matched. Having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods.
Labelling discard eliminates vertical (altitudinal) movements of the bird. The algorithm matching the pressure timeseries is sensitive to small pressure variations of a few hPa, such that even altitudinal movements of a couple of meters can throw off the estimation map for short stationary periods. Since the reanalysis data to be matched is provided at a single pressure level, we must discard all datapoints corresponding to a different elevation.
Labelling stapelev with elev_* accounts for changes in altitude within a single stationary period. This more advenced labeling technique is introduced in section Elevation period.

Each species’ migration behaviour is so specific that manual editing remains the fastest option. Indeed, small changes in pressure and activity can correspond to either local movement or slow migration. Expertise on your bird’s expected migration style will be essential to correctly label your tracks. As you label, you will learn how the bird is moving (e.g. long continuous high altitude flights, short flights over multiple days, alternation between short migration flights and stopovers, etc.). Manual editing also gives a sense of the uncertainty of your labelling, which is useful to interpret your results.

8.2 With or without acceleration data

Acceleration data can significantly improve our understanding of bird activity and movement. It’s often very helpful for the labeling flights, and even essential for some specie. For instance, acceleration data is the only way to idenfy low-altitude migration flight. In addition, acceleration is typically recorded at a higher temporal resolution (5min), which can refine flight duration and thus the movement model when building the trajectory.

Which timeseries should I label?

When acceleration data is available, use the flight label on the acceleration timeseries, and the discard label on the pressure timeseries.

In the absence of acceleration data, both labels are applied to the pressure timeseries.

Let’s see an example using acceleration data with 18LX.

tag <- tag_create("18LX",
  crop_start = "2017-06-20",
  crop_end = "2018-05-02",
  quiet = TRUE
)

Acceleration data can be used to initialize the flight label automatically. tag_label_auto() first classifies low and high activity using a k-means clustering, and then identifies and labels long periods of high activity (e.g., lasting more than 30 minutes) as flights. A post-processing reclassify short high-activities based on the proximity of other flights to improve the classification of low-activity period happening during a migration flight (e.g. gliding phase).

tag <- tag_label_auto(tag, min_duration = 30)
plot(tag, type = "acceleration")

More classification methods are described in the PALMr manual.

8.3 The process

The general process to perform the labelling is as follows:

# 1. Create the csv label file `./data/tag-label/18LX.csv` (only once)
tag_label_write(tag)

# 2. Edit csv file on TRAINSET
# *on TRAINSET*

# 3. Export csv file `./data/tag-label/18LX-labeled.csv`
# *on TRAINSET*

# 4a. Read exported label file
tag <- tag_label_read(tag)

# 4b. Compute stationary period data.frame tag$stap
tag <- tag_label_stap(tag)

# Steps 4a and 4b are typically performed simultaneously with
tag <- tag_label(tag)

In practice, labelling of your tags will then be a long iterative process where you check the validity of the labels and subsequently refine them by modifying the csv file ./data/tag-label/18LX-labeled.csv on TRAINSET.

You can expect to spend 1 minutes to 60 minutes per track, depending on the species’ complexity, and likely more for the first track you label. The more you label, the faster it will get!

8.4 Introduction to TRAINSET

TRAINSET is a web-based graphical tool for labelling timeseries.

New TRAINSET

We customized the original TRAINSET app for a smoother experience in labelling tags and twilight data: https://trainset.raphaelnussbaumer.com/

Please use this new version and report issues or bugs on the Github repository.

Start by uploading your .csv file (e.g., data/tag-label/18LX.csv) using the “Upload Tag Label” button.

*Example of a labeled file on TRAINSET. First, the pressure timeseries is separated into stationary periods by “flight” label (red). When the bird is changing altitude or flying for short periods, the pressure datapoint are labeled as “discard” (gray)

A few tips:

Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labelling (add/remove a label), e.g. using SHIFT.
Because of the large number of datapoints, keep a narrow temporal window to avoid your browser from becoming slow or unresponsive.
Change the Active Series and Reference Series depending on what you are labelling but use both timeseries at the same time to help you determine what the bird might be doing.
Adapt the y-axis range to each stationary period to see the small (but essential) pressure variations which are not visible in the full view.

8.5 Elevation period

It is common for birds to change elevation level within the same stationary period (e.g., roost vs feeding site or altitudinal movements for mountainous species). Such movements can result in drastic variations in pressure, which interfere with the ERA5 matching exercise.

To circumvent this issue while preserving as much data as possible in the match, you can label pressure data with different elevation levels by using elev_x in TRAINSET. To do so, click on the + sign in the bottom right to create a new elevation level (e.g., elev_1). Assign the new label to all pressure datapoints belonging to the same elevation period.

Note 1: Conseptually, you can think of unlabelled datapoints (i.e., black dots) as elev_0.
Note 2: The elevation levels do not have to be continuous: it’s even better if the same elevation period comes back several times during the same stationary period. For instance, you can have unlabelled datapoints for the feeding site every day and labelled datapoints at the roosting location of every night as elev_1.
Note 3: You can restart the count of elev_x for each new stationary period (e.g., the datapoints from stap 1 labelled as elev_1 are not connected to the datapoints from stap 2 also labelled as elev_1).
Note 4: In general, short elev level (< 12-24h) might be better left out (i.e, discard) to reduce computational/memory cost, especially, if you have long eleve level on the same stap. But see note 5.
Note 5: It’s ideal to have the highest and lowest elevation included as they can bring tromendous constrains on the elevation mask of the likelihood map.

Example of elevation periods label. “elev_x” label allows you to identify periods when the bird changes altitude for a considerable amount of time within the same stationary period . Note that you can re-use the same elev label for two distinct stationary periods without implying the same elevation.

8.6 Labelling checks

Use the checks outlined below to evaluate and improve the quality of your labelling.

8.6.1 Initial checks

Check 1: Duration of stationary periods and flights

The first test consists in checking the duration of flights and stationary periods. This is systematically checked when computing stationary periods and a message will give you feedback on where to find potential errors:

tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v1.csv")

── Short stationary periods (<6hr): ────────────────────────────────────────────

! Stap 7 (2017-08-30 23:47 - 2017-08-30 23:52) : 5m

! Stap 26 (2018-04-15 14:57 - 2018-04-15 18:47) : 3h 49m 60s

! Stap 27 (2018-04-15 19:32 - 2018-04-15 20:07) : 35m

! Stap 30 (2018-04-29 23:37 - 2018-04-29 23:42) : 5m

! Stap 32 (2018-04-30 19:22 - 2018-04-30 19:37) : 15m

! Stap 33 (2018-04-30 21:47 - 2018-04-30 21:52) : 5m

! Stap 34 (2018-04-30 23:02 - 2018-04-30 23:07) : 5m

! Stap 35 (2018-05-01 00:02 - 2018-05-01 00:07) : 5m

── Short flights (<2hr): ───────────────────────────────────────────────────────

! Flight 14 -> 15 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m

! Flight 18 -> 19 (2017-09-19 23:32 - 2017-09-20 00:42) : 1h 10m

! Flight 26 -> 27 (2018-04-15 18:47 - 2018-04-15 19:32) : 45m

! Flight 27 -> 28 (2018-04-15 20:07 - 2018-04-15 21:37) : 1h 30m

! Flight 31 -> 32 (2018-04-30 18:37 - 2018-04-30 19:22) : 45m

! Flight 33 -> 34 (2018-04-30 21:52 - 2018-04-30 23:02) : 1h 10m

! Flight 34 -> 35 (2018-04-30 23:07 - 2018-05-01 00:02) : 55m

! Flight 35 -> 36 (2018-05-01 00:07 - 2018-05-01 00:47) : 40m

Here, I used the label produced by tag_label_auto() without making any edits on TRAINSET. In such cases, the most common error is that a flight is cut because the bird was inactive during a few data-points. You can correct this by cleaning up flights on TRAINSET.

tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v2.csv")

── Short stationary periods (<6hr): ────────────────────────────────────────────

! Stap 25 (2018-04-15 15:02 - 2018-04-15 18:47) : 3h 45m

── Short flights (<2hr): ───────────────────────────────────────────────────────

! Flight 13 -> 14 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m

! Flight 17 -> 18 (2017-09-19 23:32 - 2017-09-20 01:07) : 1h 35m

Check 2: Pressure timeseries

Here we visually inspect that the pressure timeseries of each stationary period (1) is correctly grouped and (2) does not include pressure outliers (e.g., altitudinal movements). It is worth zooming in and panning on each individual stationary period to manually inspect the timeseries.

plot(tag, type = "pressure", quiet = TRUE)

How short can a stationary period be?

It is acceptable to have a few stationary periods/flights shorter than the warning threshold of checks 1 and 2. Feel free to overwrite the default warning (e.g., warning_flight_duration).

However, having too many stationary periods leads to slow code as the number of likelihood maps and, more importantly, the size of the graph becomes excessively large. This is where the parameter include_min_duration from tag_set_map() is particularly useful: it allows you to ignore the short stationary periods and merge flights.

Note that while the MSE mismatch is usually not useful for stationary periods shorter than a few hours, a single high or low pressure datapoint can indicate a high or low altitude location which is not common on the map!

8.6.2 GeoPressureViz check

At this stage, the label file is good enough to start estimating the position of the bird. This will be super helpful to then compare the pressure measurements to the ERA5 pressure and refine the labelling. To help in this process, we’ll be using GeoPressureViz.

Before this, we need to compute the likelihood maps. It’s generally better to start from scratch to be sure everything is working fine:

tag <- tag_create("18LX",
  crop_start = "2017-06-20",
  crop_end = "2018-05-02", quiet = TRUE
) |>
  tag_label("./data/tag-label/18LX-labeled-v3.csv", quiet = TRUE) |>
  tag_set_map(
    extent = c(-16, 23, 0, 50),
    scale = 1, # Use a coarse scale at first 1° is usually enough
    known = data.frame(
      stap_id = 1,
      known_lat = 48.9,
      known_lon = 17.05
    )
  )

tag <- geopressure_map(tag, quiet = TRUE)

geopressureviz(
  tag,
  path = tag2path(tag, interp = 1) # interpolate stap shorter than 1 day to have a smoother trajectory
)

In GeoPressureViz, your main task is to draw a coherent trajectory with the following steps:

Toggle the “Full Track” button to be in the stationary period view
“Edit position” of the stationary periods to be coherent.
“Query pressure” to retrieve the ERA5 pressure at this position
Compare the ERA5 pressure to the tag pressure in the bottom panel
Refine the label in TRAINSET accordingly: (1) Modify flight duration with flight, (2) label pressure outliers with discard, (3) use stapelev with elev_x label or even (4) merge or split stationary periods

What do you mean by “drawing” a coherent trajectory?

To make sure that the labelling is correctly performed, you should be able to manually edit the position of the trajectory so that (1) the distance between consecutive stationary periods matches the flight duration (i.e., dots following on the circles) and (2) the pressure measurements match the ERA5 pressure at this location (i.e., high likelihood value on the map).

Note that the initial path shown on GeoPressureViz is likely not realistic as no movement model has been included (i.e., no limitation on bird flight duration). This is fine at this stage: we don’t really want to assume a realistic path, just to see what pressure can tell us without assuming anything. Using this path, we can retrieve the ERA5 pressure along this path,

What to do when there is mismatch between ERA5 and tag pressure?

Mismatches between the geolocator and ERA5 are usually indicative of altitudinal movement of the bird. Depending on the situation, there are multiple ways to label this mismatch.

In the easiest case, the bird simply flew within the same stationary site (<10-50km) for a short time and came back to the same location. In this case, you can simply discard the pressure timeseries during the temporary change of altitude.
If the bird changes altitude but never comes back to the same elevation, you can either consider that the new altitude is a new stationary period and label the activity data, or you can discard the timeseries of the shorter period. It is essential that the resulting timeseries matches the ERA5 pressure everywhere. Matches are usually better for longer periods. Looking at activity data for the same period can also help understand what the bird is doing.
If the bird changes back and forth between two elevation levels, use the elev_x label to label them accordingly.

As a general guideline, it is better to remove a bit more for long stationary periods to get a better estimation of the position. You can do this iteratively by removing a bit and seeing whether the position improves as a result.

The main advantages of using GeoPressureViz are:

Only query the ERA5 pressure for stationary periods that need to be checked
Compare the ERA5 pressure for different positions
Separate the likelihood of pressure (mask and mismatch) and light as well as flight duration to see where they agree/disagree
Filter out short stationary periods to see if they are coherent, before refining the trajectory by adding the shorter ones

8.6.3 Pressurepath check

A final check should be performed on the pressurepath computed from your most likely trajectory.

We want to look at the histogram of the pressure error between geolocator and ERA5 timeseries for each stapelev.

load("./data/interim/18LX.Rdata")
plot_pressurepath(pressurepath, type = "hist", plot_plotly = FALSE)

For long stationary periods (over 5 days), you want to check that there is a single mode in your distribution. Two modes indicate that the bird is spending time at two different altitudes. - This is usual when birds have a day site and a night roost at different elevations. In such cases, use the elev_x label.
The red vertical dotted line indicates +/-3 sd which can be helpful to identify potential outliers.
Stationary periods which have an empirical sd greater than the one used (sd) are highlighted in red. The likelihood map for these stationary periods might not be correct.

How to calibrate sd?

The conversion of the mean squared error (MSE) into a likelihood performed by geopressure_map_likelihood() assumes that the error distribution of pressure is normally distributed. This has important consequences in that it does not perform well in the presence of large errors, typically resulting in a map with a single possible pixel.

As mentioned in Pressure map, sd should be adjusted to your own data. You can use the empirical sd value displayed on the histogram to guide you in setting the standard deviation parameter sd in geopressure_map().

Note that you can use a different sd value to account for stationary periods with high altitudinal variation (e.g., mountainous areas), while keeping a low sd value when the bird is in a low-topography area.

In this case, an sd=1 (default value) seems adequate, though 0.8 or 0.9 might offer more precision positioning.

This check should also be repeated at the very end of the GeoPressureR workflow with the most likely trajectory and the associated pressurepath.

In addition, a last check with GeoPressureViz is highly recommended to see the difference between the likelihood maps and the marginal map. Remember that models are only as good as the data provided!