Mapping the IVAucher

Featured

As a reaction to the record high of fuel prices, the Portuguese government has updated the IVAucher program, to allow each citizen to recover 10 cents per each liter of fuel spent, up to a maximum of 5 EUR/month. This blog post is not going to discuss whether this is good way of spending the public budget, or if it is going to make a real impact in the lives of the people that manage to subscribe to this program. Instead, I want to focus on data.

Once you subscribe to the program as a consumer, you just need to fill the tank in one of the gas stations that subscribed the program, as businesses. The IVAucher website publishes a list of subscribed stations, which seems to be updated, from time to time. The list is published as a PDF, with 2746 records, ordered by “districto” and “concelho” administrative units.

When I look for the stations around me, in the “concelho” of Lisbon, I found 67 records. In order to know where to go, I would literally need to go through each and check if I know the address or the name of the station. Lisbon is a big city, and I admit that there are lots of street names that I don’t know – and I don’t need to, because this is “why” we have maps. My first though was that this data belonged in a map, and my second though was that the data should be published in such a way that it would enable other people to create maps – and this is how this project was born.

In the five-star deployment scheme for Open Data, PDF is at the very bottom, and it is easy to understand why. There is so much you can do with a format, which is largely unstructured.

In order to be able to process these data, I had to transform it into a structured format, preferentially non proprietary, so I chosen CSV (3 stars). This was achieved using a combination of command-line processing tools (e.g.: pdftotext, sed and grep).

The next step was to publish these data, following the FAIR principles, so that it is Findable, Accessible, Interoperable and Reusable. In order to do that, I have chosen the OGC API Features standard, which allows to publish vector geospatial data on the web. This standard defines a RESTfull API with JSON encodings, which fits the expectations of modern web applications. I used a Python implementation of OGC API Features, called pygeoapi.

Before getting the data into pygeoapi, I had to georeference it. In order to do forward geocoding, I used the OpenCage API, and more specifically a Python client, which is one of the many supported SDKs. After tweaking the parameters, the results were quite good, and I was even able to georeference some incomplete addresses, something that was not possible using the Nominatum OSM API.

The next thing was to get the data into a format which supports geometry. The CSV was transformed into a GeoJSON using GDAL/ogr2ogr. I could have published it as a GeoJSON int pygeoapi, but indexing into a database adds support to more functionality, so I decided to store it in a MongoDB NoSQL data store. Everything was virtualized into docker containers, and orchestrated using this docker-compose file.

The application was deployed in AWS and the collection is available at this endpoint:

https://features.byteroad.net/collections/gas_stations

This means that anyone is able to consume this data and create their own maps, whether they are using QGIS, ArcGIS, JavaScript, Python, etc. All they need is an application which implements the OGC API Features standard.

I also created a map, using React.js and the Leaflet library. Although Leaflet does not support OGC API Features natively, I was able to fetch the data as GeoJSON, by following this approach.

The resulting application is available here:

https://ivaucher.byteroad.net

Now you can navigate through the map until you find you area of interest, or even type an address in the search box, to let the map fly to that location.

Hopefully, this application will make the user experience of the IVAucher program a bit easier, but it will also demonstrate the importance of using standards in order to leverage the use of geospatial information. Making data available on the web is good, but it is time that we move a step forward and question “how” we are making the data available, in order to ensure that its full potential is unlocked.

Data Analytics Bootcamp

I have always dreamed about doing some contribution towards improving the gender balance in technology, which as you may know, is far from ideal.

Fortunately the opportunity arose, when Katrina Walker has invited me to teach the “Data Analytics”  bootcamp at CodeOp, an international code school for women and TGNC individuals.

Over the 6-month course, I will share my hands-on experience with the various stages of the data analysis pipeline, specifically on how to apply various technologies to ingest, model and visualize data insights.

Rather than focusing on a specific technology, I will leverage on the “best tool for the job, approach”, which is what I do when I want to analyse data. This means learning different tools, such as Python, R, SQL or QGIS, and often combine them together.

For me “data analytics” is like a journey, where we start with a high-level problem, translate it into data and algorithms, and finally extract a high-level idea. At the start and the end of journey, we should always be able to communicate with people that are not “data geeks” and this is one idea that I would like to pass in the course.

I will not add anything else, apart that I am really excited to get started!

codeops2

Docker for Programmers

In some ways, docker can be seen as the holy grail of DevOps: develop locally, ship everywhere.

cports_800.png

Although it is still a relatively recent technology, docker’s adoption curve has been so steep that it has become almost a standard-de-facto in the software industry, for shipping software applications.

docker_use.png

Companies such as CloudBees or Elastic, and Free and Open Source projects such as PostgreSQL or Debian, all make their applications available through the official repositories of docker hub, the largest public container repository, where you can find anything from a text parser to an operating system.

Are people really using docker in production? The answer is “yes”, and perhaps the best use case is Spotify, who is not only using it, but also contributing to its usage, by making available their client Java libraries.

As an earlier adopter, I consider myself as an enthusiast, although I already had some “oops” moments which made me question if I want to be always riding on the “crest of the wave” (specially on production). Overall, I think it is a fascinating technology and I would recommend every programmer to at least know it, and apply it even if just for the simplest use cases: quickly try a software application without “polluting” your local environment, and test your software in a “clean” environment which mimics the customer’s settings. A more serious use of docker could be facilitating a continuous deployment and testing pipeline, in a cloud platform.

I recently took the challenge of Kato global to start teaching a series of docker courses, specially aimed at programmers. The first course will be an introduction, and thus it will not require any prior knowledge of docker, and subsequent courses will build on this knowledge to take students one step further. The idea is to share my first-hand knowledge of using docker in production, by doing “hands-on” courses, for people working in the software industry, with real life challenges. The first course is schedule for September, in Lisbon.

BrainGym: Docker for Programmers Class 01

Monday, Sep 17, 2018, 7:00 PM

LED’s AND CHIPS – MILL
Calçada do Moinho de Vento, 14B, 1150-236 Lisbon, Portugal Lisbon, PT

6 KATOnians Attending

Docker has the power to turn infrastructure into code, and to turn developers into devops. This course is designed to teach developers how to take advantage of one of the most revolutionary technologies in recent years. Book your space here: https://www.eventbrite.com/e/braingym-docker-for-programmers-2-day-course-tickets-48117883886 This is an 8 h…

Check out this Meetup →

https://www.eventbrite.com/e/braingym-docker-for-programmers-2-day-course-tickets-48117883886

If you are a developer, don’t miss this opportunity to extend your skills set as a DevOps, and find in which ways docker could make your life easier.

docker-course

Hope to meet you in September!

Modular Architectures Made Easier with docker-compose

The Open GeoPortal is a Free and Open Source framework for rapidly discovering, previewing and retrieving curated geospatial data from multiple repositories. It implements a modular architecture, including a database, a search engine and several web applications.

ogp_architecture2

While it can be argued that it is difficult to setup and run such a system, while collaborating with Tufts University, I had the opportunity to dockerize some of these applications and articulate them together in a docker composition.

docker-compose

The final result? the entire framework can be launched within a couple of minutes, with one single command: docker-compose up

If you don’t believe it, check the video bellow! 😉

The Data Ingest API from Joana Simoes on Vimeo.

If you want to try it yourself: git clone https://github.com/OpenGeoportal/Data-Ingest.git. The docker composition lives inside the docker folder.

Have fun with docker-compose! 🙂