8 Labelling tracks

In this last chapter, we will tackle the challenge of labelling your tag!

Important

This chapter assumes that you are familiar with the overall process of GeoPressureR presented in the Basic tutorial, as well as the concept of pressurepath and the use of GeopressureViz.

8.1 Labelling principles

Labelling your tracks is imperative because GeoPressureR requires highly precise and well-defined pressure timeseries of a fixed/constant location both horizontally (geographical: +/- 10-50km) and vertically (altitude: +/- 2m).

The procedure involves labelling each datapoint (1) with the flight label when the bird is in active migratory flight and (2) with the discard label for pressure datapoints that should be discarded from the matching exercise. The overall objective is to create a pressure timeseries for each stationary period where the bird can be assumed to remain at the same location and elevation during the entire period.

Labelling flight defines stationary periods and flight duration. A stationary period is a period during which the bird is considered static relative to the size of the grid (~10-50km). The start and end of the stationary period is used to define the pressure timeseries to be matched. Having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods.
Labelling discard eliminates vertical (altitudinal) movements of the bird. The algorithm matching the pressure timeseries is sensitive to small pressure variations of a few hPa, such that even altitudinal movements of a couple of meters can throw off the estimation map for short stationary periods. Since the reanalysis data to be matched is provided at a single pressure level, we must discard all data points corresponding to a different elevation.

Each species’ migration behaviour is so specific that manual editing remains the fastest option. Indeed, small changes in pressure and activity can correspond to either local movement or slow migration. Expertise on your bird’s expected migration style will be essential to correctly label your tracks. As you label, you will learn how the bird is moving (e.g. long continuous high altitude flights, short flights over multiple days, alternation between short migration flights and stopovers, etc.). Manual editing also gives a sense of the uncertainty of your labelling, which is useful to interpret your results.

8.2 With or without acceleration data

Acceleration data can significantly improve our understanding of bird activity and movement. One of its main strength is to refine short stationary periods or flights at the end of the night, when birds tend to flight low. In addition, acceleration is typically recorded at a higher temporal resolution (5min), which can refine flight duration and thus the movement model when building the trajectory.

Which timeseries should I label?

When acceleration data is available, use the flight label on the acceleration timeseries, and the discard label on the pressure timeseries.

In the absence of acceleration data, both labels are applied to the pressure timeseries.

Let’s see an example using acceleration data with 18LX.

tag <- tag_create("18LX",
  crop_start = "2017-06-20",
  crop_end = "2018-05-02"
)

✔ Read './data/raw-tag/18LX/18LX_20180725.pressure'

✔ Read './data/raw-tag/18LX/18LX_20180725.glf'

✔ Read './data/raw-tag/18LX/18LX_20180725.acceleration'

✔ Read './data/raw-tag/18LX/18LX_20180725.temperature'

Acceleration can be used to initialize the flight label automatically. tag_label_auto() first classifies low and high activity using a k-mean clustering, and then identifies and labels long periods of high activity (e.g., lasting more than 30 minutes) as flights.

tag <- tag_label_auto(tag, min_duration = 30)
plot(tag, type = "acceleration")

More classification methods are described in the PALMr manual.

8.3 The process

Labelling is an iterative process where you will need to check the validity of the pressure timeseries for a given stationary period against the reanalysis data several times. You can expect to spend 30sec to 30min per track, depending on the species’ complexity (acceleration data, number of flights, altitude of flight etc…).

The tag_label() function can be used to guide you through the entire labelling process, but we also outline each step below:

# 1. Create the csv label file `"./data/tag-label/18LX.csv"`
tag_label_write(tag)

# 2. Edit csv file on TRAINSET
# *on TRAINSET*

# 3. Export csv file `"./data/tag-label/18LX-labeled.csv"`
# *on TRAINSET*

# 4. Read exported label file
tag <- tag_label_read(tag)

# 5. Compute stationary period data.frame tag$stap
tag <- tag_label_stap(tag)

Any subsequent modification of the csv file "./data/tag-label/18LX-labeled.csv" can be directly processed (steps 4-5) using tag_label().

8.4 Introduction to TRAINSET

We suggesting using TRAINSET, a web-based graphical tool for labelling timeseries.

New TRAINSET

We customized the original TRAINSET app for a smoother experience in labelling tags and twilight data: https://trainset.raphaelnussbaumer.com/

Please use this new version and report issues or bugs on the Github repository.

Start by uploading your .csv file (e.g., data/tag-label/18LX.csv) using the “Upload Tag Label” button.

*Example of a labeled file on TRAINSET. First, the pressure timeseries is separated into stationary periods by “flight” label (red). When the bird is changing altitude or flying for short periods, the pressure datapoint are labeled as “discard” (gray)

8.5 A few tips:

Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labelling (add/remove a label)., e.g. using SHIFT.
Because of the large number of datapoints, keep a narrow temporal window to avoid your browser from becoming slow or unresponsive.
Change the Active Series and Reference Series depending on what you are labelling but use both timeseries at the same time to help you determine what the bird might be doing.
Adapt the y-axis range to each stationary period to see the small (but essential) pressure variations which are not visible in the full view.

8.6 Elevation period

It is common for birds to change elevation level within the same stationary period (e.g., roost vs feeding site or altitudinal movements for mountainous species). Such movements can result in drastic variations in pressure, which interfere with the ERA5 matching exercise.

To circumvent this issue while preserving as much data as possible in the match, you can label pressure data with different elevation levels by using elev_x in TRAINSET To do so, click on the + sign in the bottom right to create a new elevation level (e.g., elev_1). Assign the new label to all pressure datapoints belonging to the same elevation period. You can use multiple elevation periods, but for computational reasons, try to limit them to a minimum. Unlabelled datapoints can essentially be thought of as evel_0.

Note that the elevation levels do not have to be continuous: it’s even better if the same elevation period comes back several times during the same stationary period. For instance, you can have unlabelled datapoints for the feeding site every day and labelled datapoints at the roosting location of every night as elev_1.

Note that you can restart the count of elev_x for each new stationary period (e.g., the datapoints from stap 1 labelled as elev_1 are not connected to the datapoints from stap 2 also labelled as elev_1).

Example of elevation periods label. “elev_x” label allows to identify periods when the bird changes altitude for a considerable amount of time within the same stationary period . Note that you can re-use the same elev label for two distinct stationary periods without implying the same elevation.

8.7 Labelling checks

Use the checks outlined below to evaluate and improve the quality of your labelling.

8.7.1 Simple checks

Check 1: Duration of stationary periods and flights

The first test consists in checking the duration of flights and stationary periods. This is systematically checked when computing stationary periods and a message will give you feedback on where to find potential errors:

tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v1.csv")

── Short stationary periods (<6hr): ────────────────────────────────────────────

! Stap 7 (2017-08-30 23:47 - 2017-08-30 23:52) : 5m

! Stap 26 (2018-04-15 14:57 - 2018-04-15 18:47) : 3h 49m 60s

! Stap 27 (2018-04-15 19:32 - 2018-04-15 20:07) : 35m

! Stap 30 (2018-04-29 23:37 - 2018-04-29 23:42) : 5m

! Stap 32 (2018-04-30 19:22 - 2018-04-30 19:37) : 15m

! Stap 33 (2018-04-30 21:47 - 2018-04-30 21:52) : 5m

! Stap 34 (2018-04-30 23:02 - 2018-04-30 23:07) : 5m

! Stap 35 (2018-05-01 00:02 - 2018-05-01 00:07) : 5m

── Short flights (<2hr): ───────────────────────────────────────────────────────

! Flight 14 -> 15 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m

! Flight 18 -> 19 (2017-09-19 23:32 - 2017-09-20 00:42) : 1h 10m

! Flight 26 -> 27 (2018-04-15 18:47 - 2018-04-15 19:32) : 45m

! Flight 27 -> 28 (2018-04-15 20:07 - 2018-04-15 21:37) : 1h 30m

! Flight 31 -> 32 (2018-04-30 18:37 - 2018-04-30 19:22) : 45m

! Flight 33 -> 34 (2018-04-30 21:52 - 2018-04-30 23:02) : 1h 10m

! Flight 34 -> 35 (2018-04-30 23:07 - 2018-05-01 00:02) : 55m

! Flight 35 -> 36 (2018-05-01 00:07 - 2018-05-01 00:47) : 40m

Here, I used the label produced by tag_label_auto() without making any edits on TRAINSET. In such cases, the most common error is that a flight is cut because the bird was inactive during a few data-points. You can correct this by cleaning up flights on TRAINSET.

tag <- tag_label(tag, "./data/tag-label/18LX-labeled-v2.csv")

── Short stationary periods (<6hr): ────────────────────────────────────────────

! Stap 25 (2018-04-15 15:02 - 2018-04-15 18:47) : 3h 45m

── Short flights (<2hr): ───────────────────────────────────────────────────────

! Flight 13 -> 14 (2017-09-11 23:32 - 2017-09-12 00:22) : 50m

! Flight 17 -> 18 (2017-09-19 23:32 - 2017-09-20 01:07) : 1h 35m

Check 2: Pressure timeseries

Here we visually inspect that the pressure timeseries of each stationary period (1) is correctly grouped and (2) does not include pressure outliers (e.g., altitudinal movements). It is worth zooming-in and panning on each individual stationary period manually inspect the timeseries.

plot(tag, type = "pressure")

── Pre-processed pressure data length

✔ All stationary periods have more than 12 datapoints.

── Pressure difference

→ 2 timestamps show abnormal hourly change in pressure (i.e., >3hPa):

! 2017-09-01 10:30:00 | stap: 8 | 3.8 hPa

! 2017-10-01 17:30:00 | stap: 18 | 3 hPa

How short can a stationary period be?

It can acceptable to have a few stationary periods/flights shorter than the warning threshold of check 1 and 2. Feel free to overwrite the default warning (e.g., warning_flight_duration).

However, having too many stationary periods will lead to slow code as the number of likelihood maps and, more importantly, the size of the graph will become excessively large. This is where the parameter include_min_duration from tag_set_map() is particularly useful: it allows to ignore the short stationary periods and merge flights.

Note, that while the MSE mismatch is usually not useful for stationary period shorter than a few hours, a single high or low pressure datapoint can indicate an high or low altitude location which are not common on the map!

8.7.2 Pressurepath checks

The second set of checks are more complex and computationally costly but allows to fine-tune the final trajectory. Depending on your species and preference, they can be done in any order or in parrellel.

The general idea is to perform the following steps iteratively:

Estimate our current best guess of the trajectory (i.e. a path)
Compute the ERA5 pressure on this path (i.e., pressurepath),
Compare the ERA5 pressure to the tag pressure
Refine the label accordingly: (1) label outliers, (2) merge or split stationary periods, and (3) use the elev_x label.

First, let’s compute the light and pressure likelihood maps on a coarse map (e.g. scale = 1) and low precision of mismatch (e.g. max_sample = 50) to minimize the computational cost.

tag <- tag_create("18LX", crop_start = "2017-06-20", crop_end = "2018-05-02", quiet = TRUE) |>
  tag_label("./data/tag-label/18LX-labeled-v3.csv", quiet = TRUE) |>
  tag_set_map(
    extent = c(-16, 23, 0, 50),
    scale = 1,
    known = data.frame(
      stap_id = 1,
      known_lat = 48.9,
      known_lon = 17.05
    )
    # include_min_duration = 24 # Filtering long stap might also be useful at first
  ) |>
  geopressure_map(max_sample = 50, quiet = TRUE)

tag <- tag |>
  twilight_create() |>
  twilight_label_read() |>
  geolight_map(quiet = TRUE)

Check 3: Pressure timeseries match

From these maps, we can compute the path that goes through the stationary periods with the highest probability.

path <- tag2path(tag)

Note that this path is likely not realistic as no movement model has been included (i.e., no limitation on bird flight duration). This is fine at this stage: we don’t really want to assume a realistic path, just to see what pressure can tell us without assuming anything. Using this path, we can retrieve the ERA5 pressure along this path,

pressurepath <- pressurepath_create(tag, path = path, quiet = TRUE)

We can compare the pressure timeseries of the tag (grey) to the pressurepath. Zoom on each stationary period to get a better sense of the likely natural variation of pressure.

plot_pressurepath(pressurepath)

What is an outlier?

The conversion of the mean squared error (MSE) into a likelihood performed by geopressure_map_likelihood() assumes that the error distribution of pressure is normally distributed. This has important consequences in that it does not perform well in the presence of large errors, typically resulting in a map with a single possible pixel.

Assuming this normal distribution allows us to define more formally outliers as any value outside +/- 3 standard deviation (warning_std_thr). This standard deviation is defined by the parameter sd, with a default value of 1.

plot_pressurepath() displays outliers with an orange triangle so you can check each of them manually.

Besides outliers, you can use this figure to identify any period where there is a mismatch between the geolocator and ERA5, usually indicative of altitudinal movement of the bird. Depending on the situation, there are multiple ways of labelling this mismatch.

In the easiest case, the bird simply flew within the same stationary site (<10-50km) for a short time and came back to the same location. In this case, you can simply discard the pressure timeseries during the temporary change of altitude.
If the bird changes altitude but never comes back to the same elevation, you can either consider that the new altitude is a new stationary period and label the activity data, or you can discard the timeseries of the shorter period. It is essential that the resulting timeseries matches the ERA5 pressure everywhere. Matches are usually better for longer periods. Looking at activity data for the same period can also help understand what the bird is doing.
If the bird changes back and forth between two elevation levels, use the elev_x label to label them accordingly.

As a general guideline, it is better to remove a bit more for long stationary periods to get a better estimation of the position. You can do this iteratively by removing a bit and seeing whether the position improves as a result.

Once you’re happy with your new labels, you have to update the tag object. To avoid running geopressure_map() and pressurepath_create() for the full timeseries, use tag_upate() and pressurepath_upate() to update only the stationary periods that have changed.

tag2 <- tag_update(tag, file = "./data/tag-label/18LX-labeled-v4.csv", quiet = TRUE)
pressurepath <- pressurepath_update(pressurepath, tag, quiet = TRUE)

Check 4: Histogram of pressure error

In addition to the pressure timeseries, you can also look at the histogram of the pressure error between geolocator and ERA5 timeseries.

For long stationary periods (over 5 days), you want to check that there is a single mode in your distribution. Two modes indicate that the bird is spending time at two different altitudes. This is usual when birds have a day site and a night roost at different elevations. In such cases, use the elev_x label.
The red vertical dotted line indicates +/-3 sd which can be helpful to identify potential outliers (i.e., identical to the orange dot in the timeseries plot).
Stationary periods which have an empirical sd greater than the one used (sd) are highlighted in red. The likelihood map for these stationary periods might not be correct.

plot_pressurepath(pressurepath, type = "hist", plot_plotly = FALSE)

How to calibrate sd?

As mentioned in [pressure map], sd should be adjusted to your own data. Assuming that the position of the pressurepath is correct, you can use the empirical sd value displayed on the histogram to guide you in setting the standard deviation parameter sd in geopressure_map().

Note that you can use different a sd value to account for stationary periods with high altitudinal variation (e.g. mountainous areas), while keeping a low sd value when the bird is in a low-topography area.

In this case, an sd=1 (default value) seems adequate, though 0.8-0.9 might offer more precision positioning.

8.7.3 GeoPressureViz checks

Check 5: GeoPressureViz

The shiny app GeoPressureViz is another important tool to be used instead of or in parallel to the pressurepath checks. The main advantages of using the apps are

Only query the pressurepath for position that need to be checked with “query pressure”.
Quickly test the pressurepath of different position using “Start editing”.
Separate the likelihood of pressure and light as well as flight duration to see where they agree/disagree
Filter-out short stationary period to see if they are coherent, before refining the trajectory by adding the shorter ones.

More details on this app and how to use it can be found in the [GeopressureViz] chapter.

geopressureviz(
  tag, # required
  # path = pressurepath,
)

Can you “draw” the trajectory?

A good test before building the graph, is to see if you can manually roughly draw a path from the combination of the pressure/light maps and the flight duration.

You should be able to move the position of the trajectory such that the flight distances are realistic and positing are likely according to the likelihood maps.

Remember that birds can have a much higher groundspeed when the wind is blowing their way (up to 150km/h for instance) - such extremes are usually restricted to long flights. In such cases, the circle on GeopressureViz will look too small, but that’s ok.

8.7.4 Final checks

Check 6: Marignal and most likely path

Labelling should finally be checked at the end of the workflow using the marginal and most-likely path.

# Update tag and pressurepath
tag <- tag_update(tag, file = "./data/tag-label/18LX-labeled.csv", quiet = TRUE)

# Build graph, add wing, add movement
graph <- graph_create(tag, quiet = TRUE) |>
  graph_add_wind(tag$pressure, quiet = TRUE) |>
  graph_set_movement(bird = bird_create("Acrocephalus arundinaceus"))

# Compute most likely path
path_most_likely <- graph_most_likely(graph, quiet = TRUE)

# Compute marginal
marginal <- graph_marginal(graph, quiet = TRUE)

# Compute the corresponding pressurepath
pressurepath_most_likely <- pressurepath_create(tag, path = path_most_likely, quiet = TRUE)

The same figure can be checked again and hopefully everything matches well now! If not, it is important to troubleshoot!

plot_pressurepath(pressurepath_most_likely)

plot_pressurepath(pressurepath_most_likely, type = "hist", plot_plotly = FALSE)

A last check with geopressureviz is highly recommended to see the difference between the likelihood maps and the marginal map. Remember that models are only as good as the data provided!

geopressureviz(tag,
  path = pressurepath_most_likely,
  marginal = marginal
)