Docker for Programmers

In some ways, docker can be seen as the holy grail of DevOps: develop locally, ship everywhere.

cports_800.png

Although it is still a relatively recent technology, docker’s adoption curve has been so steep that it has become almost a standard-de-facto in the software industry, for shipping software applications.

docker_use.png

Companies such as CloudBees or Elastic, and Free and Open Source projects such as PostgreSQL or Debian, all make their applications available through the official repositories of docker hub, the largest public container repository, where you can find anything from a text parser to an operating system.

Are people really using docker in production? The answer is “yes”, and perhaps the best use case is Spotify, who is not only using it, but also contributing to its usage, by making available their client Java libraries.

As an earlier adopter, I consider myself as an enthusiast, although I already had some “oops” moments which made me question if I want to be always riding on the “crest of the wave” (specially on production). Overall, I think it is a fascinating technology and I would recommend every programmer to at least know it, and apply it even if just for the simplest use cases: quickly try a software application without “polluting” your local environment, and test your software in a “clean” environment which mimics the customer’s settings. A more serious use of docker could be facilitating a continuous deployment and testing pipeline, in a cloud platform.

I recently took the challenge of Kato global to start teaching a series of docker courses, specially aimed at programmers. The first course will be an introduction, and thus it will not require any prior knowledge of docker, and subsequent courses will build on this knowledge to take students one step further. The idea is to share my first-hand knowledge of using docker in production, by doing “hands-on” courses, for people working in the software industry, with real life challenges. The first course is schedule for September, in Lisbon.

BrainGym: Docker for Programmers Class 01

Monday, Sep 17, 2018, 7:00 PM

LED’s AND CHIPS – MILL
Calçada do Moinho de Vento, 14B, 1150-236 Lisbon, Portugal Lisbon, PT

6 KATOnians Attending

Docker has the power to turn infrastructure into code, and to turn developers into devops. This course is designed to teach developers how to take advantage of one of the most revolutionary technologies in recent years. Book your space here: https://www.eventbrite.com/e/braingym-docker-for-programmers-2-day-course-tickets-48117883886 This is an 8 h…

Check out this Meetup →

https://www.eventbrite.com/e/braingym-docker-for-programmers-2-day-course-tickets-48117883886

If you are a developer, don’t miss this opportunity to extend your skills set as a DevOps, and find in which ways docker could make your life easier.

docker-course

Hope to meet you in September!

Advertisements

Modular Architectures Made Easier with docker-compose

The Open GeoPortal is a Free and Open Source framework for rapidly discovering, previewing and retrieving curated geospatial data from multiple repositories. It implements a modular architecture, including a database, a search engine and several web applications.

ogp_architecture2

While it can be argued that it is difficult to setup and run such a system, while collaborating with Tufts University, I had the opportunity to dockerize some of these applications and articulate them together in a docker composition.

docker-compose

The final result? the entire framework can be launched within a couple of minutes, with one single command: docker-compose up

If you don’t believe it, check the video bellow! 😉

The Data Ingest API from Joana Simoes on Vimeo.

If you want to try it yourself: git clone https://github.com/OpenGeoportal/Data-Ingest.git. The docker composition lives inside the docker folder.

Have fun with docker-compose! 🙂

Women in Tech: Learn How to Code

If you ask me which sort of women are coders, I would say any.

women.png

It is a fact that despite recent efforts, women are still under represented in IT. Although I think that to change this it is essential to focus on early education, it is true that a lot of women can discover the joys of programming at a later stage of their lives, and not necessarily connected to their main activity. Programming can be, or at least start, as a hobby, or as an extension of another activity. For instance, biologists may find that they want to learn how to code in order to crunch observation data, and makers may find that they want to program their hardware devices in order to schedule a process. Whatever reason which brings people into programming, it is important to say that it is not out of reach for a specific age or academic background.

Although there are no miracles, openness, curiosity and effort, can pave the way to great progresses. And the most important thing is that the journey itself, can be fun.

In this context, a little push in the beginning can save a lot of time and effort. It comes without saying that programming is also a craft, and therefore it requires a lot of self learning. However, getting the basic principles right from the beginning, is likely to put people on the right track, on a more pleasant, fruitful, and specially quicker, path.

Starting in September, I will be teaching an introductory programming course, specially aimed at women (although everyone is, obviously, welcome). The course is designed to guide the students through the initial steps of programming, from logical operations to object oriented concepts. Although Python will be used as the main programming language, I would like to think of this as a more general programming course which will introduce the foundations to start learning and using any object oriented programming language, rather than a specific Python course.

I know by first hand experience that sometimes it is a bit intimidating to be the only woman in the class, and sometimes this can stop us from raising our hands and ask questions, which is an invaluable way to learn and stay motivated. In this course we commit to provide a welcoming environment, for women of all ages to participate in the class and learn about programming.

If you are a woman and your range of interests intersects STEM (Science, Technology and Maths), don’t miss this opportunity of extending your skills. Accept the challenge and embark on a fun journey, which can ultimately bring you a lot of fulfilment and joy.

https://www.eventbrite.com/e/braingym-women-in-tech-python-2-day-workshop-tickets-48063500223

Looking forward to meeting you in September!

Docker & Microservices

In this presentation I share some “lessons learned”, through ups & downs in a “journey” to implement a microservices architecture using the docker framework.

My overall feeling is that although it has been sometimes a “bumpy” road, the microservices paradigm is a good approach to complex software projects, and the docker technology has some really great features in it to support it.

talk_docker

Great crowd at the #DockerBcn meetup: really enjoyed the meeting! Thanks to Dimitris and Skyscanner for hosting the event.

Automating the Generation of Python Bindings in QGIS

If you are a PyQGIS developer, you probably already stumbled upon a situation where you needed to look at the signature of a QGIS function, and you dive into the C++ documentation or source code. This is not the friendliest thing if you are not a C++ developer…

Fortunately, @timlinux developed a tool for generating documentation for the Python API, and thanks to great work of @RouzaudDenis and @_mkuhn, it is now possible to generate the sip files automatically from the header files. Before, the sip file had to be created manually by the developer, which means it was subject to human mistakes (like for instance forgetting to port a function).

To support this automated generation for the entire source code, it is necessary to annotate the headers with the relevant SIP notations. As an example, I may not want to port the dataItem_t pointer to Python, in which case I would annotate the header with SIP_SKIP.

typedef QgsDataItem *dataItem_t( QString, QgsDataItem * ) SIP_SKIP;

A good place to start, before adding automated SIP generation to headers, is to read the SIP Bindings section, of the QGIS coding standards, or even having a look at qgis.h, where all these annotations(macros) are defined.

The sip files which are currently not generated automatically, are defined in autosip_blacklist.sh, so if you want to automate a sip, the first thing would be to remove that file from this list. Then you may run the sipify_all.sh script, which will scan through all the files which are not blacklisted and generate the sip files for them. This will generate a new sip file for the one you removed from the blacklist. If you compare the new file with the old file, in most of the cases the signatures of the functions won’t change. In that case, you don’t need to do anything. If you find differences in the signature, it is because there are some special instructions between “/”, like /Factory/ which you need to support. To do that, you need to add the appropriated annotations in the header file; in that case, do not forget to add the “qgis.h” file, in order to support the macros.

#include "qgis.h"

When you finish annotating the file, run the script again and check if the old and new sip match. If they do match, then you have supported the automated generation of the sip file; otherwise, you need to go again to the header and check what is missing.

There are still many sip files left to be automated, so we encourage you to contribute to QGIS with PR on this matter! 🙂

Easing the Creation of Metadata in QGIS

In a previous blog post, I presented QGIS enhancement #91, which aims at providing the infrastructure in QGIS to author, consume and share standards-based metadata (e.g.: ISO).

In this post I would like to focus on a specific WP which aims at easing the task of authoring metadata. Let’s face it: this is the long face many people put on, when they are told they need to create metadata.

minion

We would like to at least reduce this effort, by letting users create a metadata template, which would then be reused across the project, enabling the automated population of metadata. Having the repetitive bits out of the way, they could focus on the fun parts: creating specific layer metadata, and of course, working with the data.

More specifically, this WP covers the support to two events:

  • Filling of the template, which would then be associated to the project; this can happen in one tab of the project settings.
  • Automated population of metadata for a layer, based on this template; this can be triggered through the layer properties, or when the user loads/creates a layer.

The mockups bellow illustrate these application scenarios.

qgis_mockup1

Creation of the Metadata Template

qgis_mockup2

Application of the Metadata Template

This template would be based on the QGIS internal schema, developed on WP1. The fields presented on the following mockups are only examples, based on the Dublin Core schema.

One interesting enhancement would be to support the import/export of this template, so that it could be shared across an organization. One user could also have multiple templates, according to the layers he was working on (see image bellow). Both these scenarios would require detaching the template from the project file and storing it in an external format.

qgis_mockup3

Support to External Templates

We envision this WP to deliver the following:

  • UI and handlers for creating the template.
  • UI and handlers for applying the template.
  • UI and handlers for exporting/importing the template (optional).

I will submit a proposal for these developments to the QGIS Grant Applications Programme and will be looking forward to having the support of the community to ease the creation of metadata in QGIS 🙂

Welcoming the QGIS Metadata Store

Support to standards based metadata (e.g.: ISO) has been greatly missed in QGIS. We would like for that to no longer be the case in QGIS 3.0, with this enhancement proposal.

91

This blog post focuses on WP3: “QGIS Metadata Store”, which will introduce an external physical format for storing metadata internally in QGIS. The goal is to support portability, enabling users to share their layer metadata, even in offline scenarios. This WP will build directly on the outputs of WP1, which will define an “internal metadata schema” and WP2, “QGIS metadata API”, which will encode/decode from the internal schema to the supported schemas.

The final goal is for QGIS to support two types of metadata stores: remote and local. In this WP we will focus on local stores, only.

qgis_diagram1

In the diagram below we depict the inheritance model for metadata stores, where an abstract metadata store will have a polymorphic behavior, according to the particular data format. For instance in the case of a PostgreSQL DB, the method “save” will create a table on the database, whether in the case of a Shapefile, it would create an XML file.

stores

Some formats, such as text files, can be more limited than others. As an example, searches in text files can be quite slow. For that reason, we will create a “prime” format, the “QGIS metadata store”, which can accompany more restrictive formats.The prime format will be an SQLite database, because of its lightweight, and because it is well-known within the QGIS community.

As the goal is to support all these different formats in the future, we will design an infrastructure to accommodate that, but in this first iteration we will focus on the simple use case of creating an xml file, and an SQLite data store.

The metadata contents will be passed by the metadata API. In this WP we will implement format translation, but not schema translation.

Along with these developments we will implement a user interface to allow the user to configure serialization/deserialization behavior, e.g.: in which format we should write metadata, and where.

The QGIS metadata store will be synced with any changes that we apply to the metadata. In the moment that we export metadata into XML, it will write those changes to the XML file.

Metadata search will also be polymorphic, according to the data format. In this iteration, as a proof of concept, we will implement some simple text search, which will enable users to query their metadata.

We envision this WP to deliver the following:

  • An infrastructure to accommodate the external storage of metadata in QGIS, fully implemented for the use case of XML files.
  • Support for searching the metadata store.
  • UI for saving/loading metadata.

I will submit a proposal for these developments to the QGIS Grant Applications Programme and will be looking forward to having the support of the community to welcome the QGIS metadata store 🙂

Go On board with a GeoNetwork Container

GeoNetwork is a FOSS catalog for geospatial information. It is used around the world by organizations such as FAO, the Dutch Kadaster or Eurostat, just to mention a few.

As any software service, it may not be trivial to install and configure, which may put people away for giving it a try. This could change with docker.

gn-docker

Docker, which could be defined in a nutshell as infrastructure as code, automates the deployment of Linux applications inside software containers. It relies in a technology, LXC, which provides operating-system-level virtualization on Linux. In less than four years it experienced a massive adoption by the software community, and it has already been taken to production in many use cases.

The docker hub is a massive repository for ready-to-use images. You can find anything from web servers to databases, or even actual operative systems. With a docker pull at the tip of your fingers, you can have them running in your computer in a matter of minutes (depending on your internet connection).

Anyone can upload their docker images to docker hub, but there are some images which are released “officially”.
Official images sources live in the docker repositories, and they are considered good to use (and reuse), because they implement docker best practices, and therefore their code can be seen as an example. They are also heavily documented according to some standards, and they go through a security audit.

Although there are a couple of geonetwork images on the docker repositories, there is no official image yet, so I decided to create one. While the image goes through the approval process, I decided to publish it anyway, so that anyone can benefit from it in the meantime.

These images provides the two latest releases of geonetwork (3.0.5 and 3.2.0), as well as the previous release (3.0.4). By default, geonetwork runs on a local h2 database, but I created a variant which can use a postgresql database as backend, either running on a container or on a bare metal server. This should make it more fit for production.

You can read more about these and other features, such as setting and persisting the data directory, on the docker hub page.

Once the official images get released I will make an announcement here. But in the meantime, there is no excuse to not start playing with geonetwork:

docker pull geocat/geonetwork

gn_shell

gn_container

Have fun with docker & Geonetwork ! 🙂

Watching a Server through a Container

Lately I have been working a lot with docker, the new kid on the block on cloud computing, which is winning the heart of sysadmins, as well as developers.

The main idea is to setup a Spatial Data Infrastructure, something that has been at the core of other projects such as Georchestra.

Unfortunately having something running on a server is normally not a complete smooth experience, and this sets the ground for the need of a monitoring service.

After searching a bit, I found NewRelic, which provides monitoring on a service basis. I really liked the advanced functionality and the completeness of the dashboards, so it was not hard to convince myself to try it.

NewRelic provides two types of monitoring: application monitoring, and server monitoring, which is what I will cover today on this post. The server monitor, is basically a daemon that runs on the server and collects statistics about various metrics, such as: memory usage, CPU usage, bandwidth, etc. But what really caught my eye about this solution, was the ability to monitor the docker daemon and the different containers that run within it.

Unfortunately this functionality appears to be broken for docker 1.11 (my current version), but with the help of the NewRelic engineers I was able to apply a workaround.

My next step was to dockerize this solution. After all, wouldn’t it be great to spin another container in my SDI, that would monitor the other containers AND the server?

The bad news is that the existing images of Newrelic’s server on docker hub do not implement the workaround. So I went and implement my own image.

You can pull this image from the repository, with:

docker pull doublebyte/newrelic_sysmond

Then you can run it with:

 docker run -d \
–privileged=true –name nrsysmond \
–pid=host \
–net=host \
-v /sys:/sys \
-v /dev:/dev \
–env=”NRSYSMOND_license_key=REPLACE_BY_NEWRELIC_KEY” \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/log:/var/log:rw \
newrelic_sysmond

The privileged flag and the bindings to the host directories are necessaries, because we need to be able to watch the docker daemon, and collect the docker metrics.

Note that if you also want to collect memory stats of the containers, it is necessary to configure it in the kernel. The procedure is explained on the docker documentation, but it really comes down to updating the bootloader and restarting. In the case of grub, you would need to add this line to /etc/default/grub:

GRUB_CMDLINE_LINUX=”cgroup_enable=memory swapaccount=1″

Then you need to update grub with:

update-grub

After a restart of the server, the docker memory statistics should be present on the server dashboard:

newrelic

Spatial Data Mining

Social media streams may generate massive clouds of geolocated points, but how can we extract useful information from these, sometimes huge, datasets? I think machine learning and GIS can be helpful here.

My PechaKucha talk at DataBeers : “Visualizing Geolocated Tweets: a Spatial Data Mining Approach”.