A simple map matching approach for bicycle GPS tracks

welcome-post

Click on the image to download the KML file.

Click on the image to download the KML file.

The number of location-based apps with tracking function is constantly growing. In conjunction with smart wearables, social media and a prevalent fitness boom a huge amount of digital, geospatial traces is generated.

Regarding tracks from bicycling, data from Strava internet are probably the most extensive one. Nevertheless, for several regions – especially where the number of Strava users/contributors is rather low – the data are biased towards leisure traffic. In the case of Salzburg, the route to the top of ‘Gaisberg’ internet, for instance, is significantly over-represented. No wonder – it’s one of the most popular sportive routes.
An alternative source of bicycle tracks comes from Bike Citizens’ internet routing app. This app, which is fueld by OpenStreetMap data, is primarily intended to support utilitarian bicyclists. Thus, the recorded tracks better represent the overall bicycle traffic, especially in cities. Bike Citizens use the tracks, amongst others, to produce incredible beautiful heatmaps internet. However, until now, these maps are only visual overlays and not ready for further spatial analysis.

The latter point is exactly where GIS comes in. A bunch of questions in the context of bicycle research and planning could be answered when bicycle flows in cities are (1) known and (2) related to other data layers.
Several, very well established map matching algorithms, which reference GPS tracks to a digital road network, already exist. Quddus et al. (2007 internet) provide a comprehensive overview. Most of these algorithms are mainly designed for GPS tracks from cars (as a side note: in this year’s GI-Forum internet transport session, Mario will present a kind of reverse map matching algorithm, where a detailed network graph is constructed from FCD GPS tracks). Because most of these map matching algorithms aim (of course for good reasons!) for most optimal solutions, they can become quite sophisticated.
However, for most of our questions we are rather interested in “good guesses” about collective bicycle flows. This is why we tried to develop a prototype of a simple map matching algorithm that requires as little additional data as possible and still produces reasonable results. As a test sample we used nearly 2,000 GPX trajectories, which were provided by Bike Citizens internet (thanks a lot!).

As a reference network graph we used authoritative data internet, which are available as Open Government Data (OGD). Additionally we set up a routing engine, which calculates shortest paths. The principle idea of the map matching algorithm is the following:

workflow-map-matching

  1. For performance reasons we restrict the whole analysis to the immediate surrounding of a track.
  2. Accordingly, the network is reduced to a minimum.
  3. In order to represent the areal characteristic of the road (which is abstracted to a line in common network graphs) a buffer around the road center line is created.
  4. Buffers are calculated around the GPS track’s vertices (waypoints) in order to compensate position errors.
  5. The buffered road network and the buffered vertices are overlayed.
  6. Track vertices which can be unambiguously assigned to an edge are selected. Conversely, vertices around intersections, which could be assigned to more than one edge are excluded.
  7. The selected vertices are snapped to the respective edge.
  8. In order to interpolate between the assigned vertices, they are fed into a routing engine as stops.
  9. After the route is calculated, it can be matched to the reference network.

Although this approach is naïve in some respects, it generates acceptable solutions for an estimation of collective bicycle flows (click on the image to enlarge it).

map-matching-resultWith the GPS track matched to the reference network, several analysis can be done: the number of bicyclists traversing a segment can serve as population for risk analysis, it can be used to calibrate and validate simulations (such as Wallentin & Loidl 2015 internet), the flows can be related to infrastructure data or route preferences could be derived and fed to route choice models etc.
Nevertheless, the algorithm has limitations, which should be mentioned as well. I just want to focus on two issues:

map-matching-biased-result
1. Roads and shortcuts might not be represented in the reference graph. In this case, the presented, naïve approach fails.

map-matching-false-positive2. In cases of highly distorted GPS signals (biased GPS tracks) and a dense road network, the map-matching algorithms might produce false positives. In the example on the left side, the tracks were assigned to an edge which was actually not traversed.

Both sources of errors – and there are some more – need to be considered whenever the map matched data are used for subsequent purposes.

However, for a first “good guess” of the spatial distribution of bicycle traffic in a city, the map-matching algorithm produces adequate results. In contrast to a simple visual overlay of track bundles (as it is done for Strava’s and Bike Citizens’ heatmaps), map-matched data are an enormously helpful data source for geospatial analysis. Besides potential pitfalls that arise during the map-matching, the suitability of the track data as such needs to be critically reflected. As I’ve shown, popular data sources, such as Strava, are heavily biased towards leisure traffic. In the near future a lot more data sources, similar to floating car data, might be available from utilitarian traffic (from apps such as Bike Citizens’). In order to exploit this (future) data wealth, map matching the raw data definitely is the initial step to take.

Advertisements

2 comments

    • gicycle

      Thanks for this hint. I knew your “popularity route” paper from last year’s GI-Forum, but I was not aware of the applied map matching algorithm.
      The prototypical algorithm we’ve proposed is more a topological map matching algorithm, as we don’t use any probability or cost-minimization function (the pre-selection of points that are fed as stops into the routing engine might be regarded as a simplified aquivalent).
      Thus, the computing cost is even lower. However, we haven’t systematcially validated the result. But as far as we can conclude from a visual inspection and feedback from local experts, the results seem to be reasonable. We’ll see …
      Concerning a potential comparison “mini-study”, we can discuss any option and setting. Let’s exchange ideas face-to-face at the next occasion (and whoever is interested to join us >> we can link real-world and online discussion easily!).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s