OpenStreepMap: import data into a PostGIS database and incrementally update it

The problem:

Load OpenStreetMap data into a PostGIS database and have it automatically updated in an incremental way (so at each update will be loaded just the changes that meanwhile had happened in the OSM servers).

The solution:

There are (really) a lot tools available to handle OSM data, and in a few cases they are just alternatives to do the same thing. The documentation of this tools (osm2pgsql, osmupdate, osmfilter, osmconvert, osmium, osmosis, imposm, etc.) is good but not great (and often not up to date), especially when it comes to “recipes” explaining how to put them together to achieve some specific goal. The following notes -to solve the aforementioned problem- work, but is to be considered that probably there is a much more straightforward and clean way to solve it. Special mention this tutorial that was fundamental to understand how to achieve the goal.

The solution passes trough a series of steps that can be wrapped into one or more scripts and then scheduled with CRON.


Phase 1, downloading, preparing and importing the data

Step 1: get OSM data

Daily updated pbf/osm (and shapefiles) datasets -clipped along national borders of most of countries- can be downloaded at Geofrabrik OSM download facility. In the same page are available daily updated datasets with the data of each continent.

Another option is to download the full Planet file, that includes the entire (world) OSM dataset.

Step 2: limit the data from the Planet to a specific area of interest

The following steps can be used to import and update the full Planet, but this will take a long time (depending also on hardware resources of the server/computer where the operations are being done). More probably it would only needed a specific, limited geographic region.

The Planet dataset can then be clipped using a polygon file (.poly) that can be easily created from a shapefile using this QGIS plugin.

The --complete-ways option is not used because is not supported by osmupdate (to be used later) so ways crossing the clip boundaries are removed. For this reason is important that the .poly file must represent an extent slightly bigger than the real area of interest.

The .o5m format is used for the output because the update operation (by osmupdate) is faster if compared to the same operation done with .pbf or .osm files as input.

$ osmconvert --verbose planet-latest.osm.pbf -B=portugal.poly -o=planet-portugal.o5m

Step 3: filter the dataset and leave just features with a specific OSM attribute/tag

In this specific example only the roads that are tagged as “highway” are the ones to be imported in the database. This tag name is misleading as “highway” means all the roads that can be used by cars, even the smallest ones. Roads/ways that is not possible to use by car are tagged as “pedestrian“.

$ osmfilter --verbose planet-portugal.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas.o5m

Step 4: remove broken references

References to nodes which have been excluded because lying outside the geographical borders of the area of interest need to be removed with the option --drop-broken-refs.

The --b=-180,-90,180,90 option defining a global bounding box seems superfluous, but is actually necessary to circumvent a bug in the --drop-broken-refs task that would leave only nodes in the data

$ osmconvert --verbose portugal_estradas.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_nbr.o5m

Step 5: import the data into PostGIS using osm2pgsql

The important bit here is to use the --slim flag, otherwise later it will not be possible to update this database in an incremental way

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --create --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry portugal_estradas_nbr.o5m -H localhost -d databasename -U username --proj 32629

Phase 2, updating the data

When a Planet file is downloaded it is already old because a new version is published only once a week (and changes in the OSM servers are continuous). So after the first import the data in the database can be immediately updated.

Step 6: update the dataset

With osmupdate we update the dataset obtained in Step 2

$ osmupdate --verbose planet-portugal.o5m planet-portugal-updated.o5m -B=portugal.poly

Step 7: filter the dataset and leave just features with a specific OSM attribute/tag

$ osmfilter --verbose planet-portugal-updated.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas_updated.o5m

Step 8: remove broken references

$ osmconvert --verbose portugal_estradas_updated.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_updated_nbr.o5m

Step 9: create the DIFF file

The DIFF file (in .osc format) contains only the differences between two OSM datasets

$ osmconvert --verbose portugal_estradas_updated_nbr.o5m --diff --fake-lonlat -o=diff.osc

Step 10: import the DIFF file

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --append --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry diff.osc -H localhost -d databasename -U username --proj 32629

Final notes:

Wrap steps 6 to 10 into a batch file and schedule it with CRON to get a fully automatic way to have a continuously updated PostGIS database with OSM data.

The speed of the above process gets a huge boost if it can be done on a SSD rather than a HDD.

OpenStreepMap: import data into a PostGIS database and incrementally update it