JSON to GeoJSON with jq

A lot of people and institutions have already made the jump of providing data in JSON, which is great, since it is an inter-operable standard and a semi-structured form of data. However when it comes down to geographic data, standards don’t seem to be so common. I have seen many different ways of encoding geospatial information within JSON, normally involving listing an array of coordinates, with or without name fields for lat and long. Rarely there is any CRS associated to this data (which could be ok, for the case that it uses WGS84), or any mention of the geometry type.

This information is more or less useless, without some pre-processing to convert it into a “GIS-friendly” format, that we could use in QGIS, GeoServer, or even R.

Since we are dealing with JSON, the natural thing would be to convert it into GeoJSON, a structured format for geographic data. And the perfect tool for doing this is jq, a tool that I mentioned in a previous post. To make it simpler to understand I will explain what I did a specific JSON dataset, but with some knowledge of jq (and GeoJSON), you could literally apply it to any JSON dataset with geographic information within it.

My test dataset was the description of a set of roads, from the city of zaragoza,.


The description of the dataset says that it is in “Google” format, which one could erroneous interpret as spherical mercator, but the name of the file suggests WGS84, and a quick look at the coordinates can confirm that too. This is literally a list of tracks, each one containing a list of coordinates that define the geometry. Let us look at an example:

  "points": [                                                                                                                                        
      "lon": -0.8437921499884775,                                                                                                                    
      "lat": 41.6710232246183
      "lon": -0.8439686263507937,                                                                                                                    
      "lat": 41.67098172145761
      "lon": -0.8442926556112658,                                                                                                                    
      "lat": 41.670866465890654
      "lon": -0.8448464412455035,                                                                                                                    
      "lat": 41.67062949885585
      "lon": -0.8453763659750164,                                                                                                                    
      "lat": 41.67040130061031
      "lon": -0.8474617762602581,
      "lat": 41.669528132440355
      "lon": -0.8535340031154578,
      "lat": 41.66696540067222
  "id": 5

So the task here would be to convert this into a GeoJSON geometry (a linestring). For instance:

  { "type": "LineString",
    "coordinates": [ [100.0, 0.0], [101.0, 1.0] ]

In jq, we want to loop through the array of roads, and parse the lat, long coordinates of each road object. This coordinates are themselves another array. If we do something like this:

cat tramoswgs84.json | jq  '.tramos[2]| .points[].lon,.points[].lat'

We are asking for the longitude and latitude coordinates of track 2, but since jq evaluates expressions from left to right, it will gives us back the array of longitude coordinates and the array of latitude coordinates, not the pairs.

The key thing is to use map, that will run the filter for each element of the array:

cat tramoswgs84.json | jq  '.tramos[2].points| map([.lon,.lat])'

The complete jq syntax for generating one linestring object, would be:

cat tramoswgs84.json | jq  -c '.tramos[1]|  {"type": "LineString", "coordinates": .points | map([.lon,.lat])}'

The next step would be to create a GeoJSON containing the entire collection of linestrings. Since we would like to attach attributes to them (“name” and “id”), we rather generate a “feature collection“, than a “geometric collection”. The code for generating each feature would be:

cat $1 | jq  -c '.tramos[]| {"type": "Feature","geometry": {"type": "LineString", "coordinates": .points | map([.lon,.lat])},"properties":{"name": .name, "id": .id}}' >> $2

And then we need to do a few text manipulation operations, that I could not find a way of performing with jq. Basically we need to add the opening tags for the feature collection, commas between each object, and then add closing tags for the feature collection.

I did this text manipulating tasks with sed, and put everything inside a shell script, that will transform the JSON file (in the format that I described) directly into a valid GeoJSON. If you are interested, you can get it from github. The resulting file can be fed to QGIS, in order to produce pretty maps, like the one bellow 🙂



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s