Our colleagues from Salzburg Research (SR) are very active in the field of floating car data generation, management and analysis. Among others, this real-time traffic status service is fed by their data.
In order to establish a community of researchers, authorities and companies around the topic of floating car data, SR hosts the annual “FCD Forum” in Salzburg. This year, I had the honor to contribute to the program . Since we have been working a lot with bicycling data over the last years, I was asked to evaluate the potentials of a conceptual transfer from FCD to “Floating Bicycle Data”. Well, a very fundamental finding in my research is that the term “Floating Bicycle Data” is not established yet in the scientific literature. Thus, the term is to be regarded as a word game derived from the forum’s agenda. However, I think it makes perfectly sense to invest some efforts in this context.
In my presentation , I started my argumentation from the fact that a) bicycle traffic is a relevant element of urban mobility, b) the modal share is likely to increase in the next years and c) a sound evidence base is required for future investments in bicycling infrastructure.
Currently, very little is known about the spatial and temporal distribution of bicycle traffic within cities. Comparably few permanent counting stations, sporadic, punctual counting campaigns and irregular mobility surveys do not provide sufficient and reliable data to support evidence-based policies on the local scale level. On the other hand, the popularization of the “humans as sensors” concept (Goodchild 2007 ) has opened new possibilities to acquire data on bicyclists’ movements in urban networks. When talking about floating bicycle data, I used it as a catchy term, which summarizes all kind of geo-located movement data from bicyclists; they don’t need to be necessarily in real-time.
As I’ve shown in my presentation, there a numerous application examples where floating bicycle data would make perfectly sense. However, there are several conceptual challenges, which need to be considered (most of them are also relevant for floating car data):
- When floating bicycle data are harvested through crowd-sourcing applications the data are not necessarily representative for the entire population. I referred to participation inequality or the 90-9-1 rule (see Nielsen 2006 ) in this context. Additionally, different apps are used for different purposes. Thus, the data might be biased for example towards leisure trips (as it is the case with Strava data in Salzburg).
- Currently, there is no common data standard and the heterogeneity of bicycle mobility data is huge. Good news in this context were published earlier in this year by the European Commission (see this report from the COWI project).
- Since there is no obligation to register bicycles, the (spatial distribution of the) total population is unknown. Consequently, it is hard to estimate the total bicycle traffic volume from samples. In contrast to that, cars are registered and at least the car holders’ address is known.
- In order to further process movement data (GPS trajectories), a sound and very detailed reference graph is required for map matching. In most cases network graphs are not available at this level of detail (this holds true for authoritative data as well as for OSM). Consequently, GPS trajectories can only be matched to center lines at the moment.
Although this selection of challenges might be regarded as obstacle for a broader engagement (I prefer to interpret them as research opportunities), I expect the topic of floating bicycle data to emerge in the coming years for a simple reason: the market for floating bicycle data is definitely smaller than for floating car data. But, bicycle traffic is already a major element in urban traffic and its share will become even more substantial in the next years. As a consequence, cities need to invest in adequate infrastructure and these investments will hardly be made without a sound evidence base. Floating bicycle data could close a significant gap in this regard.
If you are already working with floating bicycle data (but haven’t used the term yet), have ideas on how to further push the topic or simply want to comment on the concept, please do not hesitate to contact me! I’m happy to learn from your expertise.
For those who are about to write a thesis in this or a related context, have a look at this proposal .
The number of location-based apps with tracking function is constantly growing. In conjunction with smart wearables, social media and a prevalent fitness boom a huge amount of digital, geospatial traces is generated.
Regarding tracks from bicycling, data from Strava are probably the most extensive one. Nevertheless, for several regions – especially where the number of Strava users/contributors is rather low – the data are biased towards leisure traffic. In the case of Salzburg, the route to the top of ‘Gaisberg’ , for instance, is significantly over-represented. No wonder – it’s one of the most popular sportive routes.
An alternative source of bicycle tracks comes from Bike Citizens’ routing app. This app, which is fueld by OpenStreetMap data, is primarily intended to support utilitarian bicyclists. Thus, the recorded tracks better represent the overall bicycle traffic, especially in cities. Bike Citizens use the tracks, amongst others, to produce incredible beautiful heatmaps . However, until now, these maps are only visual overlays and not ready for further spatial analysis.
The latter point is exactly where GIS comes in. A bunch of questions in the context of bicycle research and planning could be answered when bicycle flows in cities are (1) known and (2) related to other data layers.
Several, very well established map matching algorithms, which reference GPS tracks to a digital road network, already exist. Quddus et al. (2007 ) provide a comprehensive overview. Most of these algorithms are mainly designed for GPS tracks from cars (as a side note: in this year’s GI-Forum transport session, Mario will present a kind of reverse map matching algorithm, where a detailed network graph is constructed from FCD GPS tracks). Because most of these map matching algorithms aim (of course for good reasons!) for most optimal solutions, they can become quite sophisticated.
However, for most of our questions we are rather interested in “good guesses” about collective bicycle flows. This is why we tried to develop a prototype of a simple map matching algorithm that requires as little additional data as possible and still produces reasonable results. As a test sample we used nearly 2,000 GPX trajectories, which were provided by Bike Citizens (thanks a lot!).
As a reference network graph we used authoritative data , which are available as Open Government Data (OGD). Additionally we set up a routing engine, which calculates shortest paths. The principle idea of the map matching algorithm is the following:
- For performance reasons we restrict the whole analysis to the immediate surrounding of a track.
- Accordingly, the network is reduced to a minimum.
- In order to represent the areal characteristic of the road (which is abstracted to a line in common network graphs) a buffer around the road center line is created.
- Buffers are calculated around the GPS track’s vertices (waypoints) in order to compensate position errors.
- The buffered road network and the buffered vertices are overlayed.
- Track vertices which can be unambiguously assigned to an edge are selected. Conversely, vertices around intersections, which could be assigned to more than one edge are excluded.
- The selected vertices are snapped to the respective edge.
- In order to interpolate between the assigned vertices, they are fed into a routing engine as stops.
- After the route is calculated, it can be matched to the reference network.
Although this approach is naïve in some respects, it generates acceptable solutions for an estimation of collective bicycle flows (click on the image to enlarge it).
With the GPS track matched to the reference network, several analysis can be done: the number of bicyclists traversing a segment can serve as population for risk analysis, it can be used to calibrate and validate simulations (such as Wallentin & Loidl 2015 ), the flows can be related to infrastructure data or route preferences could be derived and fed to route choice models etc.
Nevertheless, the algorithm has limitations, which should be mentioned as well. I just want to focus on two issues:
1. Roads and shortcuts might not be represented in the reference graph. In this case, the presented, naïve approach fails.
2. In cases of highly distorted GPS signals (biased GPS tracks) and a dense road network, the map-matching algorithms might produce false positives. In the example on the left side, the tracks were assigned to an edge which was actually not traversed.
Both sources of errors – and there are some more – need to be considered whenever the map matched data are used for subsequent purposes.
However, for a first “good guess” of the spatial distribution of bicycle traffic in a city, the map-matching algorithm produces adequate results. In contrast to a simple visual overlay of track bundles (as it is done for Strava’s and Bike Citizens’ heatmaps), map-matched data are an enormously helpful data source for geospatial analysis. Besides potential pitfalls that arise during the map-matching, the suitability of the track data as such needs to be critically reflected. As I’ve shown, popular data sources, such as Strava, are heavily biased towards leisure traffic. In the near future a lot more data sources, similar to floating car data, might be available from utilitarian traffic (from apps such as Bike Citizens’). In order to exploit this (future) data wealth, map matching the raw data definitely is the initial step to take.