The number of location-based apps with tracking function is constantly growing. In conjunction with smart wearables, social media and a prevalent fitness boom a huge amount of digital, geospatial traces is generated.
Regarding tracks from bicycling, data from Strava are probably the most extensive one. Nevertheless, for several regions – especially where the number of Strava users/contributors is rather low – the data are biased towards leisure traffic. In the case of Salzburg, the route to the top of ‘Gaisberg’ , for instance, is significantly over-represented. No wonder – it’s one of the most popular sportive routes.
An alternative source of bicycle tracks comes from Bike Citizens’ routing app. This app, which is fueld by OpenStreetMap data, is primarily intended to support utilitarian bicyclists. Thus, the recorded tracks better represent the overall bicycle traffic, especially in cities. Bike Citizens use the tracks, amongst others, to produce incredible beautiful heatmaps . However, until now, these maps are only visual overlays and not ready for further spatial analysis.
The latter point is exactly where GIS comes in. A bunch of questions in the context of bicycle research and planning could be answered when bicycle flows in cities are (1) known and (2) related to other data layers.
Several, very well established map matching algorithms, which reference GPS tracks to a digital road network, already exist. Quddus et al. (2007 ) provide a comprehensive overview. Most of these algorithms are mainly designed for GPS tracks from cars (as a side note: in this year’s GI-Forum transport session, Mario will present a kind of reverse map matching algorithm, where a detailed network graph is constructed from FCD GPS tracks). Because most of these map matching algorithms aim (of course for good reasons!) for most optimal solutions, they can become quite sophisticated.
However, for most of our questions we are rather interested in “good guesses” about collective bicycle flows. This is why we tried to develop a prototype of a simple map matching algorithm that requires as little additional data as possible and still produces reasonable results. As a test sample we used nearly 2,000 GPX trajectories, which were provided by Bike Citizens (thanks a lot!).
As a reference network graph we used authoritative data , which are available as Open Government Data (OGD). Additionally we set up a routing engine, which calculates shortest paths. The principle idea of the map matching algorithm is the following:
- For performance reasons we restrict the whole analysis to the immediate surrounding of a track.
- Accordingly, the network is reduced to a minimum.
- In order to represent the areal characteristic of the road (which is abstracted to a line in common network graphs) a buffer around the road center line is created.
- Buffers are calculated around the GPS track’s vertices (waypoints) in order to compensate position errors.
- The buffered road network and the buffered vertices are overlayed.
- Track vertices which can be unambiguously assigned to an edge are selected. Conversely, vertices around intersections, which could be assigned to more than one edge are excluded.
- The selected vertices are snapped to the respective edge.
- In order to interpolate between the assigned vertices, they are fed into a routing engine as stops.
- After the route is calculated, it can be matched to the reference network.
Although this approach is naïve in some respects, it generates acceptable solutions for an estimation of collective bicycle flows (click on the image to enlarge it).
With the GPS track matched to the reference network, several analysis can be done: the number of bicyclists traversing a segment can serve as population for risk analysis, it can be used to calibrate and validate simulations (such as Wallentin & Loidl 2015 ), the flows can be related to infrastructure data or route preferences could be derived and fed to route choice models etc.
Nevertheless, the algorithm has limitations, which should be mentioned as well. I just want to focus on two issues:
1. Roads and shortcuts might not be represented in the reference graph. In this case, the presented, naïve approach fails.
2. In cases of highly distorted GPS signals (biased GPS tracks) and a dense road network, the map-matching algorithms might produce false positives. In the example on the left side, the tracks were assigned to an edge which was actually not traversed.
Both sources of errors – and there are some more – need to be considered whenever the map matched data are used for subsequent purposes.
However, for a first “good guess” of the spatial distribution of bicycle traffic in a city, the map-matching algorithm produces adequate results. In contrast to a simple visual overlay of track bundles (as it is done for Strava’s and Bike Citizens’ heatmaps), map-matched data are an enormously helpful data source for geospatial analysis. Besides potential pitfalls that arise during the map-matching, the suitability of the track data as such needs to be critically reflected. As I’ve shown, popular data sources, such as Strava, are heavily biased towards leisure traffic. In the near future a lot more data sources, similar to floating car data, might be available from utilitarian traffic (from apps such as Bike Citizens’). In order to exploit this (future) data wealth, map matching the raw data definitely is the initial step to take.