OpenStreetMap is much more than a free map of the world. It’s a huge geo-database, which is still growing and improving in quality. OpenStreetMap is a great project in many respects!
But because it is a community project, where basically everyone can contribute, it has some particularities, which are rather uncommon in authoritative data sets. There, data is generated according to a pre-fixed data standard. Thus, (in an ideal world) the data are consistent in terms of attribute structure and values. In contrast, attribute data in OpenStreetMap can exhibit a certain degree of (semantic) heterogeneity, misclassifications and errors. The OSM wiki helps a lot, but it is not binding.
Another particularity of OpenStreetMap is the data model. Coming from a GIS background I was taught to represent spatial networks as a (planar) graph with edges and nodes. In the case of transportation networks, junctions are commonly represented by nodes and the segments between as edges. OpenStreetMap is not designed this way. Without going into details, the effect of OSM’s data model is that nodes are not necessarily introduced at junctions. This doesn’t matter for mapping, but for network analysis, such as routing!
In 2014 I presented and published an approach that deals with attributive heterogeneity in OSM data. Later I joined forces with Stefan Keller from the University of Applied Sciences in Rapperswil, Switzerland and presented our work at the AAG annual meeting 2015 in Chicago.
Since then Stefan and his team have lifted our initial ideas of harmonized attribute data to an entire different level. They formalized data cleaning routines, introduced subordinate attribute categories and developed an OSM export service, which generates real network graphs from OSM data. The result is just brilliant!
The service can be accessed via osmaxx.hsr.ch . There, a login with an OSM account is required. Users can then choose whether they go with an existing excerpt or define an individual area of interest. In the latter case the area can be clipped on a map and the export format (from Shapefiles to GeoPackage to SQLite DB) and spatial reference system can be chosen. The excerpt is then processed and published on a download server. At this stage I came across the only shortcoming of the service: you don’t get any information that the processing of the excerpt takes up to hours (see here ).
However, the rest of the service is just perfect. After “Hollywood has called” the processed data set can be downloaded from a web server.
The downloaded *.zip file contains three folders: data, static and symbology. The first contains the data in the chosen format. In the static folder all licence files and metadata can be found. The latter is especially valuable, because it contains the entire OSMaxx schema documentation. This excellent piece of work, which is the “brain” of the service is also available on GitHub . Those who are interested in data models and attribute structure should definitely have a look at this!
The symbology folder contains three QGIS map documents and a folder packed full with SVG map symbols. The QGIS map documents are optimized for three different scale levels. They can be used for the visualization of the data. I’ve tried them for a rather small dataset (500 MB ESRI File Geodatabase), but QGIS (2.16.3) always crashed. However, I think there is hardly any application context where the entire content of an OSM dataset needs to be visualized at once.
Of course, OSMaxx is not the first OSM export service. But besides the ease of use and the rich functionality (export format, coordinate system and level of detail), the attribute data cleaning and clustering are real assets. With this it is easy, for example, to map all shops in a town or all roads where motorized vehicles are banned. Using the native OSM data can make such a job quite cumbersome.
I have also tried to use the data as input for network analysis. Although the original OSM road data are transformed into a network dataset (ways are split into segments at junctions), the topology (connectivity) is invalid at several locations in the network. Before the data are used for routing etc., I would recommend a thoroughly data validation. For the detection of topological errors in a network see this post . Maybe a topology validation and correction routine can be implemented in a future version of OSMaxx.
In the current version the OSMaxx service is especially valuable for the design of maps that go beyond standard OSM renderings. But the pre-processed data are also suitable for all kinds of spatial analyses, as long as (network) topology doesn’t play a central role. Again, mapping and spatial analysis on the basis of OSM data was possible long before OSMaxx, but with this service it isn’t necessary to be an OSM expert and thus, I see a big potential (from mapping to teaching ) for this “intelligent” export service.
This is only a quick note on a recent observation I’ve made while using bicycle routing portals on the web. However, the relevance of data quality and implemented model routines becomes obvious very nicely. And because I’ve been struggling with these issues for quite a while now and things don’t necessarily turn to the better, I’m curious about your ideas on the following examples.
Imagine an absolutely normal situation in your daily mobility routines. You are at location A and you need to go to location B. Because you are a good guy, you choose the bicycle as your preferred mode of transport. What do you do? Of course you consult a routing service on the web, either via your desktop browser or mobile app.
But which service do you trust, which recommendations are reliable and relevant to you? Give it a try.
- For many people the big elephant Google Maps is their first choice . Whether you like it or not, Google has made a big leap forward with their bicycle routing service.
- Because you love OpenstreetMap and the GIScience group at Heidelberg University did a great job, you try the bicycle version of OpenRouteService. What you get is what you know from Google.
- If you consult another routing portal that is based on OSM data, you might get surprised. Naviki suggests the following route:
- So far we’ve tried a commercial service and two platforms which are fueled by crowd-sourced, open data. Let’s turn to authoritative data now. The goal of the federal routing service VAO is primarily the provision of a multi-modal routing service, with a focus on public transport. The bicycle version gives you this recommendation:
- The bicycle routing portal for the city and federal state of Salzburg, Radlkarte.info, is designed for the specific needs of utilitarian bicyclists. The data base is identical to the VAO service, but the result differs significantly.
The intention of this blog post is not to assess the quality (validity, reliability, relevance) of the routing recommendations as such. What I want to point to is the fact, that three different service, with different data sources in the back result in exactly the same routing recommendation, whereas services that are built upon the same data result in significantly different suggestions. That’s really mysterious. And it tells me, that the data and data quality is only one side of the medal. Obviously the parametrization of the routing engine and implemented model routines have a huge impact on the result. By the way, for all five examples, I’ve used the default settings.
Following the argument of the impact of parametrization and modelling, one can conclude that it is not so much about the data (they seem to be of adequate quality in all three cases), but about how well you know the user’s specific needs and preferences and turn this knowledge in appropriate models and services. Thus the next consequent step is to offer users the possibility to influence the parametrization of the routing engine in order to get what he or she expects to get: routing recommendations that perfectly fit their preferences.
Do you know routing services on the web that a allow for a maximum personalization (not only pre-defined categories)? To which degree would users benefit from personalized routing? And finally, would bicyclists use it at all? Let me know what you think and share your ideas!
Returning from Brussels, I’m sitting in the train for 8 hours now and because the ICE is delayed since Stuttgart and I’m going to miss my connection train in Munich, I’ll have another 3 hours* until arriving in Salzburg. I’ve spend most of my “train-time” wrestling with a research paper which I need to rework for resubmission. Now it’s time to do something else. For example reflecting my 2 days at this year’s POLIS conference.
First things first: the conference was an awesome event at a very, very cool location. The conference organization was perfect. The same holds true for the opportunities to exchange, both face to face and in the Twitter sphere . The mix of participants from city authorities, researchers and practitioners resulted in a stimulating atmosphere with lots of inspirations, information and examples to learn from.
The overall topic of the conference was “Innovation in Transport for Sustainable Cities and Regions”. However, I’d say the conference (or to be fair, the sessions I’ve attended) was very much about how better data could help to better understand the complex phenomenon of urban mobility and how these insights lead to better services (not only apps!) for citizens. Right from the first session on the data topic was omnipresent: Dovile Adminaite from the ETSC pointed to the fact that risk calculations for vulnerable road users (VRU) are still hard to do because of the absence of sound exposure data. Well, this is a topic we are working on for quite a while. And as I’ve learned today, a recently started H2020 project, FLOW , deals exactly with this issue.
In a very insightful workshop session, chaired by Yannis George from the Technical University of Athens, the data issue was at the center again. Alexandre Santacreu from TfL nicely showed how crucial the choice of exposure variables is for the interpretation of bicycle accidents. He came to the conclusion that only the distance travelled allows for sound risk calculations; inhabitants are crap, number of trips is tricky. Apart from the exposure variable Alexandre elaborated on how the level of spatial aggregation decides on the emerging risk patterns. My personal highlight in his presentation was the hexgrid map with disaggregated risk calculations for London – they reminded me of my own maps which I’ve recently presented in Hannover at the ICSC . The following presentation by Eric de Kievit (City of Amsterdam) also had a lot in common with what we have been doing for more than five years now. He presented a Safety Performance Indicator (SPI) which is used for the assessment of road networks. As Eric said, such modelling approaches are especially valuable when the data situation (accidents, exposure variables) is suboptimal. In turn – and we spend some time discussing this issue – it is hard to validate models and calculations in the absence of sound data. Véronique Feypell from the International Transport Forum finally presented the IRTAD database. Under the umbrella of the OECD data portal safety-relevant data are collected in a standardized way and subsequently harmonized. I’m looking forward to the updated and improved data resource!
What would be a conference these days without discussing smart cities? Actually this was the case in the opening plenary session. Commissioner Jyrki Katainen mentioned the special role of cities as driving forces for growth and innovation. This is exactly where Commissioner Katainen linked smart cities to smart citizens who are engaged in life-long learning (to be honest, I’ve never connected UNIGIS with smart cities, but maybe we should think about it …). After the welcome addresses a panel dealt with several aspects and connotations of smart cities. A recurring statement was that the wheel should not be invented multiple times and that we don’t need more technology and more research, but island solutions must be fused in order to generate values. Well, I clearly see the argument, but I think we need much more research! Maybe not necessarily on technology, but definitely on the social and ethical implications of the digitalization of the human sphere!
The last session of the first conference day was dedicated to data as an asset. It was opened by a brilliant contribution from Madrid. Sergio Fernandez shared EMT’s (Madrid’s PT operator) experiences with a radical open data approach. They publish all generated data as open data and currently witness how these data fuel a punch of newly developed, cool applications. The value generated by publishing data as Open (Government) Data was the take home message of my presentation which I gave in this session. In case you are interested, here are my slides:
The second conference started with a fireworks of best practice examples at the interface of ICT and active mobility. I got especially excited by the Beat my Street project from London, which is tightly connected to the Switch Project . The idea behind the project is rather simple, but the impact is huge. What I take home as key for a successful implementation is the move from a pure public health project (although this is exactly what it is) to a participatory, integrated community project, with fun and not health as the main promotion argument.
This project from London maybe illustrates best what became evident throughout the conference: cities and regions do have the capacity to make cities livable places and they are the driving forces for societal and technological transformations towards sustainability. But they need to have visions and the organizational and financial environment that stimulates the big leaps forward.
On a personal level, I’ve learned that several ideas we’ve been working on would perfectly correspond to past or currently running projects. Thus I can only say that I’d be more than happy if we could participate and contribute in the future. Please, don’t hesitate to use the contact form, get connected on Twitter or simply have a look at our department’s website .
[Update: I’ve added my Twitter timeline as a Storify dashboard ]
* While writing this blog post my last train for today got delayed for another hour – too bad!
Last week the twin conferences AGIT and GI-Forum took place in Salzburg, Austria. Once again it was a very intensive but stimulating event with great conversations, new contacts, nice social events and of course the everlasting struggle to choose the right session from an extensive offer of attractive parallel tracks. Whereas the general tenor of the keynotes was the increasingly tight relation between GIS and IT, my personal conference focus lay on spatial modelling and analysis in the context of transportation.
Searching the web you’ll find lots of personal reviews (this one by Anita is a great example!) and social media snippets (#AGIT2015 #GIForum2015 ). Nevertheless here is a list of links you might find useful:
- All GI-Forum journal papers as open access: http://hw.oeaw.ac.at/7826-2_inhalt
- AGIT papers as open access as well: http://gispoint.de/gisopen.html (search for AGIT 2015 in the conference search mask)
- Poster contributions: http://agitposter2015.blogspot.co.at/?view=flipcard (winner of the best post award is number 57)
- Photo stream of both conferences: https://www.flickr.com/photos/uni-salzburg/sets/72157654856614220
My conference week was dominated by the impressions from two keynotes I could attend (unfortunately I missed the other ones due to overlaps in the program) and my involvement in a double-session on transportation modelling (have a look at my recent post ), the OpenStreetMap special forum and the track on Austria’s harmonized road graph, GIP .
In Tuesday’s keynote Ingo Simonis from OGC talked about the role of standards in the context of smart cities. His motivation to argue for establishing geospatial intelligence (… and with this standards) in enormously fast growing urban agglomerations is the correlation between size and opportunities/challenges: “The bigger a city, the more of everything is there.” A geospatial framework of connected devices is thereby regarded as part of sustainable solutions that turn these vibrant, urban hot spots into smart cities. As in nearly every presentation on smart cities Songdo in South-Korea served as role model and poster child of Ingo’s argumentation (a reference I personally find not that convincing – but this would be an entirely different discussion on liveable vs. smart cities).
What I found really intriguing was Ingo’s elaborations on the “social” aspect of standards. Until recently standards were more a bone dry threat than anything else to me. But Ingo made a very important notion on that: he illustrated how standards are, as he put it, the distilled wisdom of people with expertise in their respective field. In other words, standards don’t necessarily define in advance how things have to be done, but are recommendations or a framework for activities that are already established … Standards are about a common understanding and language of domain knowledge and practise.
The second keynote on Wednesday morning came right from the opposite spectrum of the handling of large data amounts, or better data stream. Manfred Hauswirth gave an inspiring overview of what is currently going on in the field of linked data and what’s the role of GIS in the never-ending stream of data, semantic relations and interdependencies. He spoke of the internet of everything where the most relevant thing (above all in terms of business models) is to extract useful information from data; something Manfred called a rather untapped resource. Four take home messages made it into my notepad:
- Linking is the new (Is it really new? Actually this is how our human brains have worked for millennia) paradigm in the handling of data sets/streams.
- Data are increasingly dynamic. This is why the whole processing needs to be designed adaptable.
- As geonames are central to the semantic web, geospatial data and knowhow are of great importance.
- Privacy is gone. The latter point was of course not revolutionary or new. But it was the first time I heard this statement explicitly and without any dilution in a keynote on a GI conference (probably because the keynote speaker has a background in computer science) – normally we hear bloomy mantras such as: “GIS helps to make the earth a better place.” blablabla. Maybe the organizers of next year’s GI-Forum could invite a philosopher as keynote speaker, talking about the responsibility we have in science and IT!
As the years before, a highlight of the German-speaking conference, the AGIT, was the OpenStreetMap special forum, organized by TraffiCon . This year I had the chance to contribute actively; it was a great honor to got invited for a presentation on the suitability of OSM and OGD data for network modelling and analysis. Here are the slides of my presentation (sorry, German language), which I think are self-explanatory and don’t need any further comments:
Speaking at the OSM special forum the day before, it was a somehow exotic experience to give another presentation in a session dedicated to authoritative road data on Thursday morning. Since 6 years (with several more years with preparatory projects) all administrative bodies in Austria edit and manage their road-related data in the so called Graphenintegrationsplattform, GIP (engl. harmonized road graph). This standard allows for nationwide applications and prevents from cost-intensive data redundancies within administrations.
We’ve been working for quite a while with GIP data in the context of bicycle routing. Currently the web application http://www.radlkarte.info is based on authoritative road data. Over the last two years the quality of the GIP data has been significantly increased. But still, there are some critical issues that become evident when the data are used in an operative environment. This is why we have developed several quality control routines considering above all topology and attributes. The latter is important for (spatial) modelling approaches with which the data are interpreted and fitted into the specific application context. With this parallel approach – quality testing plus modelling – the reliability and robustness of the data could have been significantly increased, as I demonstrated in my presentation:
Any comments and questions? I’m looking forward to read and learn from you!
Transportation modelling is a well established domain with dedicated experts and sophisticated software packages. Still, we thought it could be worth to take a closer look on it from an explicit spatial perspective. This is why Gudrun and I have organized a special session entitled “Spatial perspective on transportation modelling” at this year’s GI-Forum conference (http://gi-forum.org ).
We had a session with five short presentations and an extended joint discussion and a workshop session. This very brief summary simply serves as a reminder of some of the major issues that were raised.
The paper session on Wednesday was a real personal highlight. Not only the presentations were inspiring, but the audience was big and active. We had presentations from various fields, covering quite a broad range of topics (all papers are online as open access):
1) Gudrun provided insights into a first version of an agent-based bicycle flow model, where she demonstrated how aggregated flows emerge from the individual behaviour of numerous agents in space and time. One of the major conclusion was that while the model as such seems to generate feasable results, the validation is rather tricky since the necessary data are hardly available.
2) Christoph gave an excellent presentation on how to link the abstract model space with the geographical space and the model steps with a temporal continuum. Additionally he presented his approach to speed up the model performance when it contains routing functionalities. With an intelligent network simplification he was able to run the simulation 12 times faster than with the initial network graph.
3) Somehow connected to the preceding two presentations, Johannes gave an introduction to cognitive agents as counterparts of selfish agents, which are assumed in most routing and navigation applications. With regard to current transportation models, Johannes estimated that those models might be more accurate and thus more meaningful when “smart” agents are incorporated.
4) Leaving the field of agent-based models, Rita answered the question what geographers could contribute to transportation modelling in a very beautiful (literally!) way. Working on the TAPAS traffic model she emphasized the role of visualization for the validation and communication of the model results. Especially the spatial context of a map helps to make sense of what the model calculates and how it actually works.
5) In the last presentation of this session the award winner of the AGEO student award, Daniel Steiner presented parts of his master thesis where he worked with real-time data from public transit. What became very clear in this presentation was, that it is hard to find PT companies that provide real-time data and that it is even harder to use these data in models and analyses because of quality issues.
In a second session, that was organized in a workshop format, three topics that were raised in the presentations and the joint discussion were further worked on:
In the very active small working groups, it quickly turned out that we as geographers do have something to contribute to the domain of transportation modelling and that there is still a lot of work to do!
In the context of data for transportation models these points were – among others – briefly discussed:
- There are lots of static data available, mostly following an established standard. Although the number of sensors is skyrocketing they are less likely accessible; at least in many parts of the world. Additionally there are numerous standards for all kinds of sensor data, what makes it cumbersome to integrated data from different source in one and the same model. Beside measured data there are also calculated or estimated data, such as interpolations. For such data hardly any standard exists; most often these data are a kind of black box where you don’t know how they were generated.
- The latter factor directly leads to the urgent need of sound metadata for transportation data and derived products. It is of crucial importance to know under which circumstances and for what purpose data were captured. For the interpretation of derived data (e.g. flow volumes) it is necessary to know how they were calculated etc. Without providing such information the reliability of modelling results suffers enormously.
- An interesting observation was that whereas most often spatial data are used as inputs for transportation models, the models themselves are non-spatial, meaning that the relation between the model objects is abstract and not geographically defined.
- Concerning the scale and aggregation level of data a rather pragmatic rule of thumb emerged: data availability, the availability of tools, processing power and the research question decide on what data are being used.
From the group working on ABM and cognitive agents a rather straight forward research agenda was drafted. The group started from three distinct characteristica of agent-based models: exploration of cause-effect relations, non-intuitive phenomena at system level, local scale. From there, the group identifed three areas of research.
- How to shift between scales and model types (top-down vs. bottom-up)?
- How does ‘smart’ behaviour of cognitive agents impact traffic flows on a broader scale?
- How can the performance issue be dealt with in a reasonable way?
The third group worked on the role of geovisualization and came up with a nice paradigmatic (in the cartography community) conclusion: maps and geovisualizations are not only for communicating (one way) results but they serve as capable interface for the exploration of and interaction with the data and the model. Besides, maps and map-related visualizations put transportation models into an explicit spatial context. Thus the model and the results can be related to the environment what on the one hand can explain results and on the other hand generates new hypothesis for further investigations. At least two issues were regarded as yet unsolved:
- How to determine the appropriate trade-off between complexity (information load) and simplicity in geovisualizations?
- How to design visualization environments that are flexible and adaptable to facilitate real multi-perspective approaches?
Some of the aspects we were working on are documented on these flipcharts.
Of course there is lot more to work on. And that’s exactly what we are going to do now. If you want to contribute or have comments on the few points raised here, just leave me a note. I’d be more than happy to learn from you and extend the group of geographers and GIS experts that strive to contribute their spatial know how to transportation models. Such an interdisciplinary approach is, from my point of view, especially valuable were established transportation models have fallen short so far and that is in the field of active transport.
While Open Government Data are currently a big deal in the German-speaking countries, the OpenStreetMap project celebrates its 10th anniversary . How these different data sources can be dealt with in spatial modelling approaches and how they can even be used in combination were the two major topics of a presentation, I’ve given last friday at a UNIGIS workshop in Salzburg.
Spatial modelling allows for interpreting and relating data for specific applications, without necessarily manipulating them. Neglecting this option and building applications directly on databases can result in rather weird and/or useless results. The reason for this is simple: generally data are captured for a certain purpose. Naturally, this purpose decides on the data model, the attribute structure or the data maintenance. And these determining factors might diverge from the requirements of the intended application.
In the case of OGD the published data are made available by different public agencies. For example the responsible department is obliged by law to monitor air quality and, in case, intervene efficiently. Thus different parameters are sensed for this very purpose. When these data are being published as OGD one can, for example, use them for building a “health map”. But in such a case the direct visualization of micrograms and PPMs of the sensed pollutants wouldn’t make much sense. The data need to be interpreted, aggregated, classified, related – in short – modelled in order to fit the intended purpose of the map.
A similar mechanism holds true for data from the OpenStreetMap project. Originally the data were mapped for the purpose of building a free world map. Meanwhile the extent of the database has grown enormously and the data can be used for much more sophisticated applications than a “simple” world map. But again, if the data – and especially the attributes – which were originally collected for a specific purpose are being used in any other context, they have to be processed and modelled.
When applications are built on not only one dataset which was originally created for a different purpose, but on several datasets (e.g. because the data availability ends at the border of an administrative unit), the necessity of modelling is given anyway. As an example I’ve referred to our current work in the context of the web application Radlkarte .
Here it was necessary to combine authoritative data (mainly published as OGD) with crowd-sourced data. Because of the fundamental differences between these data sources – concerning the data model, attribute structure, data quality and the competence for data management – evaluation and correction routines, as well as an extensive modelling workflow had to be implemented. But, as it could have been demonstrated in the presentation, this effort pays off significantly when the validity and plausibility of the results are being examined.
Geographical information systems (GIS) are intuitive and performing environments for the implementation of such multi-stage workflows. They allow for the data storage and management in spatial databases, provide modelling interfaces and facilitate immediate analysis and visualization capacities.
Knowing when and where how many people cycle would be valuable for many questions associated with bicycle research, planning and promotion. As I’ve already noted on several occasions, the data availability and quality is generally very poor in this context.
For motorized traffic and public transportation networks, a whole lot of different traffic models and extensive data capture systems (floating car data, sensor networks etc.) exist. However, the demand for high-quality data is enormous as many involved parties have interest in efficient traffic management systems.
There might be several reasons for why there are – in contrast to MIT and PT – only very few cases of sound bicycle traffic models (to be exact, I don’t know of any real macroscopic bicycle traffic model for a whole urban network). On the other hand, lots of practical questions as well as research approaches depend on sound data.
Take for example the investments in bicycle infrastructure. The sums which are spent in the construction and maintaining of the “hardware” are not too small (the bicycle advocate of Salzburg with its 150,000 inhabitants has an annual budget of at least € 1 mio.). But without exact knowledge of how many cyclists actually use this infrastructure or could have been attracted, it’s impossible to judge whether the return of investment is positive or not.
Another example would be the analysis of bicycle accidents and the calculation of risk factors. Again, highly aggregated or roughly estimated data – as they are used quite often – don’t really help.
Slides of introductory lesson for Master’s students
I’ve confronted a colleague at our department, who is an expert on spatial simulation, with this kind of “problem statement”. As she was looking for a nice topic for her course in the Master’s programme of applied geoinformatics anyway, we decided to give the idea of modelling bicycle traffic a try.
Within the next months students will work on several research questions, ranging from model optimization and calibration to different scenarios. In the end we expect a first estimation of bicycle flows for every segment in the road network for different time intervals, environments and scenarios. The bicycle traffic model will be based on agent-based simulation, implemented in a NetLogo environment.
While I was preparing some basic literature (Bezzan & Klügl (2014) for example is worth a read, PDF here), I realized that it might be a long road to meaningful and useable results. There are still major white spots on the map when it comes to using agent-based simulation for modelling bicycle traffic. Actually, little methodological work has been done so far and use cases are rare.
If you have any hints, know relevant papers or have applied agent-based simulation in this context, please let me know!