OpenStreepMap: import data into a PostGIS database and incrementally update it

The problem:

Load OpenStreetMap data into a PostGIS database and have it automatically updated in an incremental way (so at each update will be loaded just the changes that meanwhile had happened in the OSM servers).

The solution:

There are (really) a lot tools available to handle OSM data, and in a few cases they are just alternatives to do the same thing. The documentation of this tools (osm2pgsql, osmupdate, osmfilter, osmconvert, osmium, osmosis, imposm, etc.) is good but not great (and often not up to date), especially when it comes to “recipes” explaining how to put them together to achieve some specific goal. The following notes -to solve the aforementioned problem- work, but is to be considered that probably there is a much more straightforward and clean way to solve it. Special mention this tutorial that was fundamental to understand how to achieve the goal.

The solution passes trough a series of steps that can be wrapped into one or more scripts and then scheduled with CRON.


Phase 1, downloading, preparing and importing the data

Step 1: get OSM data

Daily updated pbf/osm (and shapefiles) datasets -clipped along national borders of most of countries- can be downloaded at Geofrabrik OSM download facility. In the same page are available daily updated datasets with the data of each continent.

Another option is to download the full Planet file, that includes the entire (world) OSM dataset.

Step 2: limit the data from the Planet to a specific area of interest

The following steps can be used to import and update the full Planet, but this will take a long time (depending also on hardware resources of the server/computer where the operations are being done). More probably it would only needed a specific, limited geographic region.

The Planet dataset can then be clipped using a polygon file (.poly) that can be easily created from a shapefile using this QGIS plugin.

The --complete-ways option is not used because is not supported by osmupdate (to be used later) so ways crossing the clip boundaries are removed. For this reason is important that the .poly file must represent an extent slightly bigger than the real area of interest.

The .o5m format is used for the output because the update operation (by osmupdate) is faster if compared to the same operation done with .pbf or .osm files as input.

$ osmconvert --verbose planet-latest.osm.pbf -B=portugal.poly -o=planet-portugal.o5m

Step 3: filter the dataset and leave just features with a specific OSM attribute/tag

In this specific example only the roads that are tagged as “highway” are the ones to be imported in the database. This tag name is misleading as “highway” means all the roads that can be used by cars, even the smallest ones. Roads/ways that is not possible to use by car are tagged as “pedestrian“.

$ osmfilter --verbose planet-portugal.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas.o5m

Step 4: remove broken references

References to nodes which have been excluded because lying outside the geographical borders of the area of interest need to be removed with the option --drop-broken-refs.

The --b=-180,-90,180,90 option defining a global bounding box seems superfluous, but is actually necessary to circumvent a bug in the --drop-broken-refs task that would leave only nodes in the data

$ osmconvert --verbose portugal_estradas.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_nbr.o5m

Step 5: import the data into PostGIS using osm2pgsql

The important bit here is to use the --slim flag, otherwise later it will not be possible to update this database in an incremental way

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --create --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry portugal_estradas_nbr.o5m -H localhost -d databasename -U username --proj 32629

Phase 2, updating the data

When a Planet file is downloaded it is already old because a new version is published only once a week (and changes in the OSM servers are continuous). So after the first import the data in the database can be immediately updated.

Step 6: update the dataset

With osmupdate we update the dataset obtained in Step 2

$ osmupdate --verbose planet-portugal.o5m planet-portugal-updated.o5m -B=portugal.poly

Step 7: filter the dataset and leave just features with a specific OSM attribute/tag

$ osmfilter --verbose planet-portugal-updated.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas_updated.o5m

Step 8: remove broken references

$ osmconvert --verbose portugal_estradas_updated.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_updated_nbr.o5m

Step 9: create the DIFF file

The DIFF file (in .osc format) contains only the differences between two OSM datasets

$ osmconvert --verbose portugal_estradas_updated_nbr.o5m --diff --fake-lonlat -o=diff.osc

Step 10: import the DIFF file

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --append --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry diff.osc -H localhost -d databasename -U username --proj 32629

Final notes:

Wrap steps 6 to 10 into a batch file and schedule it with CRON to get a fully automatic way to have a continuously updated PostGIS database with OSM data.

The speed of the above process gets a huge boost if it can be done on a SSD rather than a HDD.

OpenStreepMap: import data into a PostGIS database and incrementally update it

ogr2ogr, PostGIS and Spatialite: create 3D vectors and transfer geometries Z values to attributes table

Sample data: dataset3dvectors.zip

Create a 3D shapefile from a 2D one

first check what ogrinfo says about the sample data:

$ ogrinfo -so shapefile_2d.shp shapefile_2d
INFO: Open of `shapefile_2d.shp'
      using driver `ESRI Shapefile' successful.

Layer name: shapefile_2d
Geometry: Point
Feature Count: 5
Extent: (484628.515471, 4227667.038043) - (617462.831823, 4605830.183414)
Layer SRS WKT:
PROJCS["WGS_1984_UTM_Zone_29N",
    GEOGCS["GCS_WGS_1984",
        DATUM["WGS_1984",
            SPHEROID["WGS_84",6378137,298.257223563]],
        PRIMEM["Greenwich",0],
        UNIT["Degree",0.017453292519943295]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-9],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["Meter",1]]
z_value: Integer (10.0)

Now generate a real 3D shapefile by transferring the Z values stored in the “z_value” column in the attribute table to the output geometries:

$ ogr2ogr -f "ESRI Shapefile" shapefile_3d.shp shapefile_2d.shp -zfield z_value

Check the result with ogrinfo:

$ ogrinfo -so shapefile_3d.shp shapefile_3d
INFO: Open of `shapefile_3d.shp'
      using driver `ESRI Shapefile' successful.

Layer name: shapefile_3d
Geometry: 3D Point
Feature Count: 5
Extent: (484628.515471, 4227667.038043) - (617462.831823, 4605830.183414)
Layer SRS WKT:
PROJCS["WGS_1984_UTM_Zone_29N",
    GEOGCS["GCS_WGS_1984",
        DATUM["WGS_1984",
            SPHEROID["WGS_84",6378137,298.257223563]],
        PRIMEM["Greenwich",0],
        UNIT["Degree",0.017453292519943295]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-9],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["Meter",1]]
z_value: Integer (10.0)

The same would apply also when the input layer is a line or a polygon one.

Transfer Z values from a 3D shapefile to the table of attributes of a 2D one

points:

$ ogr2ogr shapefile_with_z_from_3d.shp shapefile_3d.shp -dialect sqlite -sql "SELECT *, st_z(geometry) AS z_geom FROM shapefile_3d"

force output as 2D, add the “-dim 2” flag

$ ogr2ogr shapefile_with_z_from_3d.shp shapefile_3d.shp -dim 2 -dialect sqlite -sql "SELECT *, st_z(geometry) AS z_geom FROM shapefile_3d"

lines:

$ ogr2ogr shapefile_lines_with_z_from_3d.shp shapefile_3d_lines.shp -dim 2 -dialect sqlite -sql "SELECT *, st_z(st_pointn(geometry,1)) AS z_geom FROM shapefile_3d_lines"

polygons:

$ ogr2ogr shapefile_polygons_with_z_from_3d.shp shapefile_3d_polygons.shp -dim 2 -dialect sqlite -sql "SELECT *, st_z(st_pointn(ST_ExteriorRing(geometry),1)) AS z_geom FROM shapefile_3d_polygons"

Rationale: for polygons is first needed to get the exterior ring (as a linestring) of the geometries with the ST_ExteriorRing function, and then for both lines and polygons is needed to get Nth point (in this examples the first one) of the linestring with the ST_PointN function. After that the Z value is extracted with ST_Z.

Import a 2D Shapefile into PostGIS as 3D vector

$ ogr2ogr -progress --config PG_USE_COPY YES -f PostgreSQL PG:"host=*** port=5432 dbname=*** password=*** user=curso" /path/to/shapefile_2d.shp shapefile_2d -lco SCHEMA=public -lco GEOMETRY_NAME=geom -lco FID=gid -nln 3d_vector -zfield z_value

check the result with ogrinfo:

$ ogrinfo -so PG:"host=** user=** dbname=** password=***" 3d_vector                                      INFO: Open of `PG:host=localhost user=curso dbname=curso password=curso'
      using driver `PostgreSQL' successful.

Layer name: 3d_vector
Geometry: 3D Point
Feature Count: 5
Extent: (484628.515471, 4227667.038043) - (617462.831823, 4605830.183414)
Layer SRS WKT:
PROJCS["WGS 84 / UTM zone 29N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-9],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","32629"],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH]]
FID Column = gid
Geometry Column = geom
z_value: Integer (10.0)

Then from the PostgreSQL console add a new empty column to the PostGIS 3D vector:

ALTER TABLE 3d_vector ADD COLUMN z_geom integer;

and then populate it with the Z values from the geometries:

UPDATE 3d_vector SET z_geom = st_z(geom);

For Spatialite the notes are the same as for PostGIS.

ogr2ogr, PostGIS and Spatialite: create 3D vectors and transfer geometries Z values to attributes table

Schedule Netgear ReadyNAS Ultra 2 backups with RSYNC over SSH

The problem:

The ReadyNAS Ultra 2 allows create scheduled backup jobs on remote RSYNC servers, using its web based control panel, but not over SSH, that could be very well the only option.

The solution:

Enable root SSH access to the ReadyNAS Ultra 2 unit and (using the command line) setup a CRON job.

Step1:

Enable root SSH on the ReadyNAS unit by installing the following extension: http://www.readynas.com/?p=4203

[Note: the “root” user password is the same as the one of the “admin” user]

Connect via SSH to the NAS

$ ssh root@IP_OF_NAS

Step2:

Generate public and private SSH keys (use no passphrase)

$ ssh-keygen

Step3:

Copy the public key on the remote server

$ ssh-copy-id -i ~/.ssh/id_rsa.pub remote_server_user@IP_OF_REMOTE_SERVER

Step4:

Edit crontab and add a scheduled RSYNC task

$ crontab -e

after issuing the above command it may very well show an error like

$ netgear-nas:~# crontab -e
 no crontab for root - using an empty one
 /bin/sh: /usr/bin/editor: No such file or directory
 crontab: "/usr/bin/editor" exited with status 127

this is because the default command line text editor is not correctly configured. Switching to “vi” is needed

$ EDITOR=/bin/vi; export EDITOR

now trying again

$ crontab -e

should work and crontab can be edited with vi

[Note: “vi” is hard for the less skilled user, but basic commands are easy enough to do simple edits to text files]

Step5:

Add a RSYNC task to crontab, save and exit

30 17 * * * rsync /nas/folder remote_server_user@IP_OF_REMOTE_SERVER:/path/where/place/backup/

Final note:

Better test the above RSYNC task directly from the command line and check if it will still ask for the password, it should not. If it still asks for the password the likely issue is because of wrong permissions of the

~/.ssh/authorized_keys

file on the remote host, ~/.ssh should be 700, and ~/.ssh/authorized_keys should be 600. If it still ask for the password check the permissions on ~ (should be 700 too).

See:

http://unix.stackexchange.com/questions/36540/why-am-i-still-getting-a-password-prompt-with-ssh-with-public-key-authentication

and/or

http://www.daveperrett.com/articles/2010/09/14/ssh-authentication-refused/

Schedule Netgear ReadyNAS Ultra 2 backups with RSYNC over SSH

Install WordPress under Ubuntu and enable automatic updates

Install MySQL and PHP5 MySQL extension:

sudo apt-get install mysql-server php5-mysql

Initialize the cluster and run a script to make MySQL more secure:

sudo mysql_install_db
sudo mysql_secure_installation

Enter MySQL console as root, create the DB, create the user and grant it the necessary permissions:

mysql -u root -p
CREATE DATABASE wordpress;
CREATE USER wordpressuser@localhost IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON wordpress.* TO wordpressuser@localhost;
FLUSH PRIVILEGES;
exit

Install WordPress as usual, after that add the following two lines to wp-config.php:

define('FS_METHOD','direct');
define( 'WP_AUTO_UPDATE_CORE', true );

and set the permissions of the WordPress folder/files as following:

sudo find /path/to/wordpress/ -type f -exec chmod 664 {} \;
sudo find /path/to/wordpress/ -type d -exec chmod 775 {} \;
sudo chgrp -R www-data /path/to/wordpress/

To backup automatically the MySQL database, first create a .my.cnf and then chmod it to 600. The file must include:

[client]
user=mysqluser
password=mysqlpassword

Then add a CRON job as

mysqldump --defaults-file=/path/to/.my.cnf -u mysqluser databasename --add-drop-table -h host | bzip2 -c > /path/to/backupfile-`date +\%Y\%m\%d`.sql.bz2
Install WordPress under Ubuntu and enable automatic updates