Creating Responsive Maps with Vector Tiles

Vector tiles have been around for a while and they seem to combine the best of both worlds. They provide design flexibility, something we usually associate to vector data, while enabling fast delivery, like we generally see on raster services. The mvt specification, based on Google’s protobuf format, packages geographic data into pre-defined roughly-square shaped “tiles” for transfer over the web.

The OGC API – Tiles standard, enables sharing vector tiles while ensuring interoperability among services. It is a very simple format, which formalizes what most applications are already doing in terms of tiling, while adding some interesting (optional) features. You can find more information on: tiles.developer.ogc.org .

If you want to publish vector tiles using this standard, you could use pygeoapi, which is a Python server implementation of the OGC API suite of standards and a reference implementation of OGC API – Tiles. With its plugin architecture, pygeoapi supports many different providers to render the tiles in the backend. One option could be to use the elastic search backend (mvt-elastic), which enables rendering vector tiles on the fly, from any index stored in elasticsearch. Recently, this provider also supports retrieving the properties (e.g.: fields) along with the geometry, which is needed for client side styling.

You can check some OGC API – Tiles collections in the eMOTIONAL Cities catalogue. On this map, we show the results of urban health outcomes (Prevalence rates of cardiovascular diseases) in 350m hexagonal grids of Inner London. It is rendered according to the mean value.

On the developer console, we can inspect how the attribute values of the vector tiles are exposed to the client.

Another option for interactive maps that require access to attributes, would be to retrieve a GeoJSON from an OGC API – Features endpoint. In that case, the client would need to load all the features at the start, and then carry these features in memory. If we have a high number of features, or many different layers, this could result in a less responsive application.

As an experiment, we loaded a web application with a base layer and two collections with 3480 and 3517 features (“hex350_grid_cardio_1920” and “hex350_grid_pm10_2019”). When the collections were loaded as vector tiles, the application took 20 milliseconds to load. On the other hand, when the collections were loaded as features it took 6887 milliseconds.

You can check out the code for this experiment at: https://github.com/emotional-cities/vtiles-example/tree/ecities and a map showing the vector tile layers at: https://emotional-cities.github.io/vtiles-example/demo-oat.htm

Geocoding in QGIS with OpenCage

Anyone working with geospatial data, had probably encountered at some point the need for geocoding. The task of transforming an address (e.g.: a placename, city, postcode) into a pair of coordinates (e.g.: a point geometry) is called forward geocoding, while the task of transforming a pair of coordinates into an address is called reverse geocoding.

As of today, there is some support to geocoding in QGIS, using third-party geocoding APIs. A geocoding API is a service which receives as an input an address or a pair of coordinates and returns a point or an address as result. There are many commercial geocoding APIs on the market (including the well-known Google Maps API) and there is one free API (Nominatum) which relies on OSM data. There is no silver bullet in what concerns geocoding, and you should evaluate carefully the option that best suits your use case.

The table bellow shows different QGIS plugins which support geocoding . Some of them are focused on geocoding, while others do a bunch of other things.

PluginDownloadsLast ReleaseForwardReverseAPI KeyFocus on geocodingGeocoding API
MMQGIS1574182021yynnGoogle/OSM/…
GeoCoding1469602018yyyyOSM, Google
GoogleMaps527172021ynyyGoogle
Maptiler156962022ynynMaptiler
Nominatim LF98832021yynyOSM
TravelTime74602023yyynTravelTime
TomTom14502020ynyyTomTom
Comparison between geocoding plugins in QGIS (data from 09/01/2023)

After reviewing these plugins, it became clear that there would be space for one plugin which would address the following items:

  • Bulk processing: Although in some occasions it may be useful to geocode a single instance, this is rarely the case in GIS projects. Moreover, this functionality can be accomplished by an online tool or even using the bulk processing. This line of thought renders the location filter less interesting than a bulking tool.
  • Responsive and performant: Some of the existing geocoding tools are unresponsive while handling a large number of rows. The ability to perform batch (e.g.: asynchronous) geocoding can address some of these issues.
  • Forward/reverse geocoding: Forward geocoding is disproportionately more implemented than reverse geocoding. This could be due to market demand, but also to technological reasons (e.g.: reverse geocoding is not implemented in the QGIS core). Still, if there is not too much effort, it could be nice to offer reverse geocoding to users, even if it is just for a few use cases.
  • Support to options: It would be nice to offer some of the options offered by the API, through the plugin. These could include the restriction to a country (or bounding box) and the ability to control the output fields.
  • Help/Documentation: A lot of the existing plugins have UIs which are not intuitive and do not offer any useful help/documentation. This makes using the plugins (or even finding them) very challenging. Even some resources like a tutorial or a README page on GitHub which could be referenced from the plugin, could improve this situation.
  • Intuitive UI: One of the problems with QGIS plugins is the lack of standardisation of the UI. Some plugins add icons on the toolbar, others add entries in the plugins menu or even in other menus. Some plugins add all of these things, and instead of one widget, they add multiple widgets. This renders the task of finding, setting up and using the plugin sometimes very complicated. One way of overcoming this, is to use the processing UI, which is more or less standard. Although the menu entries can be configured, the look & feel is always the same, and the plugin can always be found through the processing toolbox.

The OpenCage Geocoding plugin is a processing plugin that offers forward and reverse geocoding within QGIS. Being a processing plugin, it benefits from many features out-of-the -box, such as batch/asynchronous processing, integration with the modeller or the ability to run on the python console. It also features a standard UI, with inputs, outputs, options and feedback which should be familiar to processing users.

This plugin relies on the OpenCage geocoding API, and API that offers geocoding worldwide based on different datasets. While OpenCage makes extensive use of Nominatum, it is worth to mention that they do contribute to back to the project, both in terms of funding and of actual code.

Being a commercial API, you will need to sign-up for a key before using this plugin. You can check the different plans on their website. If you choose a trial key, you can sign-up without the need of using a credit card, which is not always the case with other providers.

Although the plugin can be run with minimal configuration using the default options, the configuration parameters leverage the capabilities of the underlying API to generate results that best fit our use case. For instance if you want to geocode addresses and you know that your addresses are all within a given region, you can feed the algorithm with a country name or even a bounding box. This bounding box can be hardcoded, but it can also be calculated from the layer extent, canvas extend or even drawn by hand.

Apart from the formatted address and the coordinates, optionally the algorithm can also return additional structured information about the location in the results. This includes for instance the timezone, the flag of the country and the currency (you can read here what are the different annotations that the API returns). As this may slow down the response, it is switched off by default, to ensure people only request it if they are really interested on this feature.

Whether you want to geocode addresses or coordinates, you may want the resulting address to be in a specific language. If you set the language parameter, the API will do the best effort to return results in that language.

I hope this plugin can be useful to users with different degrees of expertise: from the simplest use case, to the more advanced ones (through the options). Overall, the merits of this plugin are largely due to the capabilities of the processing toolbox and of the OpenCage API.

If you find any issues, please report them in the issue tracker of the project. This plugin is released under GPLV2. Feel free to fork it, look at the code and modify it for other use cases. If you feel like contributing back to the project, Pull Requests are also welcome (:

Happy geocoding!

Mapping the IVAucher

As a reaction to the record high of fuel prices, the Portuguese government has updated the IVAucher program, to allow each citizen to recover 10 cents per each liter of fuel spent, up to a maximum of 5 EUR/month. This blog post is not going to discuss whether this is good way of spending the public budget, or if it is going to make a real impact in the lives of the people that manage to subscribe to this program. Instead, I want to focus on data.

Once you subscribe to the program as a consumer, you just need to fill the tank in one of the gas stations that subscribed the program, as businesses. The IVAucher website publishes a list of subscribed stations, which seems to be updated, from time to time. The list is published as a PDF, with 2746 records, ordered by “districto” and “concelho” administrative units.

When I look for the stations around me, in the “concelho” of Lisbon, I found 67 records. In order to know where to go, I would literally need to go through each and check if I know the address or the name of the station. Lisbon is a big city, and I admit that there are lots of street names that I don’t know – and I don’t need to, because this is “why” we have maps. My first though was that this data belonged in a map, and my second though was that the data should be published in such a way that it would enable other people to create maps – and this is how this project was born.

In the five-star deployment scheme for Open Data, PDF is at the very bottom, and it is easy to understand why. There is so much you can do with a format, which is largely unstructured.

In order to be able to process these data, I had to transform it into a structured format, preferentially non proprietary, so I chosen CSV (3 stars). This was achieved using a combination of command-line processing tools (e.g.: pdftotext, sed and grep).

The next step was to publish these data, following the FAIR principles, so that it is Findable, Accessible, Interoperable and Reusable. In order to do that, I have chosen the OGC API Features standard, which allows to publish vector geospatial data on the web. This standard defines a RESTfull API with JSON encodings, which fits the expectations of modern web applications. I used a Python implementation of OGC API Features, called pygeoapi.

Before getting the data into pygeoapi, I had to georeference it. In order to do forward geocoding, I used the OpenCage API, and more specifically a Python client, which is one of the many supported SDKs. After tweaking the parameters, the results were quite good, and I was even able to georeference some incomplete addresses, something that was not possible using the Nominatum OSM API.

The next thing was to get the data into a format which supports geometry. The CSV was transformed into a GeoJSON using GDAL/ogr2ogr. I could have published it as a GeoJSON int pygeoapi, but indexing into a database adds support to more functionality, so I decided to store it in a MongoDB NoSQL data store. Everything was virtualized into docker containers, and orchestrated using this docker-compose file.

The application was deployed in AWS and the collection is available at this endpoint:

https://features.byteroad.net/collections/gas_stations

This means that anyone is able to consume this data and create their own maps, whether they are using QGIS, ArcGIS, JavaScript, Python, etc. All they need is an application which implements the OGC API Features standard.

I also created a map, using React.js and the Leaflet library. Although Leaflet does not support OGC API Features natively, I was able to fetch the data as GeoJSON, by following this approach.

The resulting application is available here:

https://ivaucher.byteroad.net

Now you can navigate through the map until you find you area of interest, or even type an address in the search box, to let the map fly to that location.

Hopefully, this application will make the user experience of the IVAucher program a bit easier, but it will also demonstrate the importance of using standards in order to leverage the use of geospatial information. Making data available on the web is good, but it is time that we move a step forward and question “how” we are making the data available, in order to ensure that its full potential is unlocked.

DevRel – What is that?

Almost a year ago, I heard the term DevRel for the first time when Sara Safavi, from Planet, gave a talk at CodeOp and used that word to describe her new role. I knew Sara as a developer, like myself, so I was curious to learn what this role entailed and understand how it could attract someone with a strong technical background.

It turns out that DevRel – Developer Relations – is as close as you can be to the developer world, without actually writing code. All these things that I used to do in my spare time, like participating in hackathons, writing blog posts, participating in conversations on Twitter, speaking at events, are now the core part of my job. I did them, because they are fun, and also because I believe that ultimately, writing code has an impact in society, and in order to run that last mile we need to get out of our compilers and reach out to the world. Technology is like a piece of art – it only fulfills its mission when it leaves the artist’s basement and it reaches the museums, or at least the living room of someone who appreciates it.

I am happy to say that I am now the DevRel at the Open Geospatial Consortium. In a way, it is a bit ironic that I ended up taking this role in an organization that does not actually produce software as its main outcome. But in a way OGC is the ultimate software facilitator, by producing the standards that will be used by developers to build their interoperable, geospatial aware, products and services. If you are reading this and you are not a geogeek, you may think of W3C as a somehow similar organization: it produces the HTML specification, which is not itself a software, but how could we build all these frontend applications using React, Vue and so many other frameworks, without using HTML? It is that important. Now you may be thinking, “so tell me an OGC standard that I use, or at least know”, and, again, if you are not a geogeek, maybe you won’t know any of the standards I will mention. Even if you use, or have used at some point location data. And this is part of the reason why I am at OGC.

Location data is increasingly part of the mainstream. We all carry devices in our pockets that produce geo referenced data with an accuracy that was undreamed ten years ago. Getting hold of these data opens a world of possibilities for data scientists and data engineers, but in order for all these applications to be able to understand each other we need sound, well articulated standards in place. My main goal as DevRel at OGC will be to bring the OGC standards closer to the developer community, by making them easier to use, and by making sure that they are actually used. And maybe, just maybe, I will also get to write some code along the way.

Interactive Maps within React.js

Recently, I have been teaching a Full-stack development bootcamp at CodeOp (great experience!).

When the students reached project phase, I was very pleased to see a lot of interest in using maps. And that is easy to understand, right? geospatial information is associated to most activities these days (e.g.: travel, home exchange, volunteering), and interactive maps are the backbone of any application which uses geospatial information.

This made me think of a nice way of introducing the students to interactive mapping. I realized that most of them want to do one thing: read an address and display it on the map, which also requires the use of a geocoder. In order to demonstrate how to put all these things together within a React application, which is the framework they are using, I created a small demo on GitHub. This was also an opportunity to practice and improve my front end skills! 🙂

Following a good tradition of GitHub, I started by forking an existing project, which I thought was similar to what I wanted to achieve. Although the project is extremely cool, I realized that I wanted to move in quite a different direction, so I ended up diverging a lot from the original code base.

To implement the map, I used my favourite library for interactive maps, Leaflet. This library is actually packaged as a React component, so it is really easy to incorporate it into an application.

Of course, maps only understand coordinates, and most of the time people have nominal locations such as street names, cities, or even postcodes. This was also the case with my students. Translating strings with addresses to a pair of coordinates is not a trivial task, so the best thing is to leave it up to the experts. I used the Open Cage geocoder, an API to convert coordinates to and from places. Why? It has a much more generous free tier than the Google Maps API, and it is open-source. And although it is built on top of OSM Nominatum, it contains several improvements.

The good news are OpenCage also has a package for JavaScript and Node, and it is really easy to use. This is the piece of code, to retrieve the coordinates from a given string:

    // Adds marker to map and flies to it with an animation
    addLocation =() =>{
      opencage
        .geocode({ q: this.state.input, key: OCD_API_KEY})
        .then(data => {
          // Found at least one result
          if (data.results.length > 0){
              console.log("Found: " + data.results[0].formatted);
              const latlng = data.results[0].geometry;
              const {markers} = this.state
              markers.push(latlng)
              console.log(latlng);
              this.setState({markers})
              let mapInst =  this.refs.map.leafletElement;
              mapInst.flyTo(latlng, 12);
          } else alert("No results found!!");

        })
        .catch(error => {
          console.log('error', error.message);
        });


    }

In order to do this, you need to sign up for a free API key first, and store it within a secrets file (.env).

The application allows the user to type any address, and it will fly to it with an animation, adding a marker on the map.

You can check out the final result at https://leaflet-react.herokuapp.com/

marianella_watercolor

 

 

Data Analytics Bootcamp

I have always dreamed about doing some contribution towards improving the gender balance in technology, which as you may know, is far from ideal.

Fortunately the opportunity arose, when Katrina Walker has invited me to teach the “Data Analytics”  bootcamp at CodeOp, an international code school for women and TGNC individuals.

Over the 6-month course, I will share my hands-on experience with the various stages of the data analysis pipeline, specifically on how to apply various technologies to ingest, model and visualize data insights.

Rather than focusing on a specific technology, I will leverage on the “best tool for the job, approach”, which is what I do when I want to analyse data. This means learning different tools, such as Python, R, SQL or QGIS, and often combine them together.

For me “data analytics” is like a journey, where we start with a high-level problem, translate it into data and algorithms, and finally extract a high-level idea. At the start and the end of journey, we should always be able to communicate with people that are not “data geeks” and this is one idea that I would like to pass in the course.

I will not add anything else, apart that I am really excited to get started!

codeops2

Modular Architectures Made Easier with docker-compose

The Open GeoPortal is a Free and Open Source framework for rapidly discovering, previewing and retrieving curated geospatial data from multiple repositories. It implements a modular architecture, including a database, a search engine and several web applications.

ogp_architecture2

While it can be argued that it is difficult to setup and run such a system, while collaborating with Tufts University, I had the opportunity to dockerize some of these applications and articulate them together in a docker composition.

docker-compose

The final result? the entire framework can be launched within a couple of minutes, with one single command: docker-compose up

If you don’t believe it, check the video bellow! 😉

The Data Ingest API from Joana Simoes on Vimeo.

If you want to try it yourself: git clone https://github.com/OpenGeoportal/Data-Ingest.git. The docker composition lives inside the docker folder.

Have fun with docker-compose! 🙂

Spatial Data Mining

Social media streams may generate massive clouds of geolocated points, but how can we extract useful information from these, sometimes huge, datasets? I think machine learning and GIS can be helpful here.

My PechaKucha talk at DataBeers : “Visualizing Geolocated Tweets: a Spatial Data Mining Approach”.

Cluster Explorer Demo

Cluster explorer is a piece of software I was working on, which blends machine learning and GIS to produce a cluster visualization on top of a virtual globe.
The virtual globe uses NasaWorldWind, a Java framework for 3D geo-visualization based on OpenGL, while the clustering algorithms use the ELKI Data mining framework for unsupervised learning.
The tool allows to cluster a bunch of points (for instance geolocated Tweets) using one of two algorithms (or both), and explore the contextual information provided by the geographic layers (e.g.: OSM, Bing).

CSV 2 GeoJSON

Recently I had another challenge, which I believe has the characteristics to be a common problem. I have a table with attributes, in CSV format, one of which is geospatial.

CSV is a structured format for storing tabular data (text and numbers), where each row corresponds to a record, and each field is separated by a known character(generally a comma). It is probably one of the most common formats to distribute that, probably because it is a standard output from relational databases.

Since people hand me data often in this format, and for a number of reasons it is more convenient for me to to use JSON data, I thought it would be handy to have a method to translating CSV into JSON, and this was the first milestone of this challenge.

The second milestone of this challenge, is that there is some geospatial information within this data, serialized in a non standard format, and I would like to convert it into a standard JSON format for spatial data; e.g.: GeoJSON. So the second milestone has actually two parts:

  • parse a GeoJSON geometry from the CSV fields
  • pack the geometry and the properties into GeoJSON field

To convert CSV (or XML) to JSON, I found this really nice website. It lets you upload a file, and save the results into another file,so I could transform this:

TMC,ROADNUMBER,DIR,PROV,CCAA,StartLatitude,StartLongitude,
EndLatitude,EndLongitude
E17+02412,A-2,E-90/AP-2/BARCELONA-ZARAGOZA (SOSES),LLEIDA,
CATALUNYA,41.5368273,0.4387071,
41.5388396,0.4638462

into this:

{
"TMC": "E17+02412",
"ROADNUMBER": "A-2",
"DIR": "E-90/AP-2/BARCELONA-ZARAGOZA (SOSES)",
"PROV": "LLEIDA",
"CCAA": "CATALUNYA",
"StartLatitude": "41.5368273",
"StartLongitude": "0.4387071",
"EndLatitude": "41.5388396",
"EndLongitude": "0.4638462"
}

This gave me a nicely formatted JSON output (the first milestone!), but as you can notice the geometry is not conform with any OGC standards. It is actually a linestring, which is defined by a start point (StartLongitude, StartLatitude) and an end point (EndLongitude, EndLatitude).

According to the JSON spec, a linestring is defined by an array of coordinates:

So the goal would be to transform the geometry above into:

"LineString",
"coordinates": [
[0.4387071, 41.5368273], [0.4638462, 41.5388396]
]

Once more, jq comes really handy to this task.

The JSON can be transformed into a feature using this syntax:

cat tramos.json | jq -c '[.[] | { type: "Feature", "geometry": {"type": "LineString","coordinates": [ [.StartLongitude, .StartLatitude| tonumber], [ .EndLongitude, .EndLatitude | tonumber] ] }, properties: {tmc: .TMC, roadnumber: .ROADNUMBER, dir: .DIR, prov: .PROV, ccaa: .CCAA}}]' > tramos.geojson

Since the JSON converser parse all the variables into strings, it is important to pass a filter (tonumber) to make sure that the coordinate numbers are converted back into numbers.

{
"properties": {
"ccaa": "CATALUNYA",
"prov": "LLEIDA",
"dir": "N-IIA/SOSES/TORRES DE SEGRE/ALCARRàS",
"roadnumber": "A-2",
"tmc": "E17+02413"
},
"geometry": {
"coordinates": [
[
0.4714937,
41.5420936
],
[
0.4891472,
41.5497014
]
],
"type": "LineString"
},
"type": "Feature"
}

Since we are creating an array of features (or “Feature Collection”), to be conform with GeoJSON, it is important to declare the root element too, by adding this outer element:

{ "type": "FeatureCollection","features": [ ]}

The result should be a valid GeoJSON, that you can view and manipulate in your favourite GIS (for instance QGIS!) 🙂

csv2geojson