Split lines with points, the PostGIS way

The problem:

Given a layer of lines and one of points (the latter not necessarily already over the lines one) , split the lines using the points.

Five years after a feature request on the QGIS bug tracker

https://hub.qgis.org/issues/5040

this tool is not yet available as native QGIS tool and also an (easy) alternative is missing within the other providers in the QGIS Processing toolbox.

Here following a pure SQL/PostGIS solution. Not sure is the most elegant or fast one, but seems to work as expected.

Workflow and notes:

  • As the input points are away from the lines, get the closest place on the line (likely on a segment) using the st_closestpoint function
  • Input lines are densified with the st_segmentize: this will later allow to snap the input points -now over the segments- over a node/vertex, a really close one depending on how much the lines were densified
  • Extract the nodes/vertexes of densified lines with a custom function:
CREATE OR REPLACE FUNCTION ST_AsMultiPoint(geometry) RETURNS geometry AS
'SELECT ST_Union((d).geom) FROM ST_DumpPoints($1) AS d;'
LANGUAGE sql IMMUTABLE STRICT COST 10;
  • Snap points (the ones over lines segments) over the nearest node/vertex of the densified lines using the st_snap function
  • Split the lines using snapped points thanks to the st_split function
  • The above process is necessary as points can be used to split lines only if they are exactly over a line node/vertex (st_split won’t work in the points are over a line segment)

Step1: input points on nearest place over lines segments

CREATE TABLE points_over_lines
AS SELECT a.id,ST_ClosestPoint(ST_Union(b.geom), a.geom)::geometry(POINT,3763) AS geom
FROM points a, lines b
GROUP BY a.geom,a.id;

Step2: densify lines and extract nodes as 1 unique multipoint geometry

CREATE TABLE lines_nodes_densified AS
SELECT 1 AS id, ST_Union(ST_AsMultiPoint(st_segmentize(geom,1)))::geometry(MULTIPOINT,3763) AS geom 
FROM lines;

Step3: snap points over lines nodes/vertexes

CREATE TABLE points_snapped AS
SELECT b.id, ST_snap(ST_Union(b.geom),a.geom, ST_Distance(a.geom,b.geom)*1.01)::geometry(POINT,3763) AS geom 
FROM lines_nodes_densified a, points_over_lines b
GROUP BY a.geom, b.geom, b.id;

Step4: split lines

CREATE TABLE lines_split AS
SELECT a.id, (ST_Dump(ST_split(st_segmentize(a.geom,1),ST_Union(b.geom)))).geom::geometry(LINESTRING,3763) AS geom 
FROM lines a, points_snapped b
GROUP BY a.id;

With one unique query statement:

CREATE TABLE lines_split AS
WITH 
temp_table1 AS (SELECT a.id,ST_ClosestPoint(ST_Union(b.geom), a.geom)::geometry(POINT,3763) AS geom FROM points a, lines b GROUP BY a.geom,a.id),
temp_table2 AS (SELECT 1 AS id, ST_Union(ST_AsMultiPoint(st_segmentize(geom,1)))::geometry(MULTIPOINT,3763) AS geom FROM lines),
temp_table3 AS (SELECT b.id, ST_snap(ST_Union(b.geom),a.geom, ST_Distance(a.geom,b.geom)*1.01)::geometry(POINT,3763) AS geom 
FROM temp_table2 a, temp_table1 b
GROUP BY a.geom, b.geom, b.id)
SELECT a.id, (ST_Dump(ST_split(st_segmentize(a.geom,1),ST_Union(b.geom)))).geom::geometry(LINESTRING,3763) AS geom FROM lines a, temp_table3 b
GROUP BY a.id;
Split lines with points, the PostGIS way

OpenStreetMap: import data into a PostGIS database and incrementally update it

The problem:

Load OpenStreetMap data into a PostGIS database and have it automatically updated in an incremental way (so at each update will be loaded just the changes that meanwhile had happened in the OSM servers).

The solution:

There are (really) a lot tools available to handle OSM data, and in a few cases they are just alternatives to do the same thing. The documentation of this tools (osm2pgsql, osmupdate, osmfilter, osmconvert, osmium, osmosis, imposm, etc.) is good but not great (and often not up to date), especially when it comes to “recipes” explaining how to put them together to achieve some specific goal. The following notes -to solve the aforementioned problem- work, but is to be considered that probably there is a much more straightforward and clean way to solve it. Special mention this tutorial that was fundamental to understand how to achieve the goal.

The solution passes trough a series of steps that can be wrapped into one or more scripts and then scheduled with CRON.


Phase 1, downloading, preparing and importing the data

Step 1: get OSM data

Daily updated pbf/osm (and shapefiles) datasets -clipped along national borders of most of countries- can be downloaded at Geofrabrik OSM download facility. In the same page are available daily updated datasets with the data of each continent.

Another option is to download the full Planet file, that includes the entire (world) OSM dataset.

Step 2: limit the data from the Planet to a specific area of interest

The following steps can be used to import and update the full Planet, but this will take a long time (depending also on hardware resources of the server/computer where the operations are being done). More probably it would only needed a specific, limited geographic region.

The Planet dataset can then be clipped using a polygon file (.poly) that can be easily created from a shapefile using this QGIS plugin.

The --complete-ways option is not used because is not supported by osmupdate (to be used later) so ways crossing the clip boundaries are removed. For this reason is important that the .poly file must represent an extent slightly bigger than the real area of interest.

The .o5m format is used for the output because the update operation (by osmupdate) is faster if compared to the same operation done with .pbf or .osm files as input.

$ osmconvert --verbose planet-latest.osm.pbf -B=portugal.poly -o=planet-portugal.o5m

Step 3: filter the dataset and leave just features with a specific OSM attribute/tag

In this specific example only the roads that are tagged as “highway” are the ones to be imported in the database. This tag name is misleading as “highway” means all the roads that can be used by cars, even the smallest ones. Roads/ways that is not possible to use by car are tagged as “pedestrian“.

$ osmfilter --verbose planet-portugal.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas.o5m

Step 4: remove broken references

References to nodes which have been excluded because lying outside the geographical borders of the area of interest need to be removed with the option --drop-broken-refs.

The --b=-180,-90,180,90 option defining a global bounding box seems superfluous, but is actually necessary to circumvent a bug in the --drop-broken-refs task that would leave only nodes in the data

$ osmconvert --verbose portugal_estradas.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_nbr.o5m

Step 5: import the data into PostGIS using osm2pgsql

The important bit here is to use the --slim flag, otherwise later it will not be possible to update this database in an incremental way

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --create --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry portugal_estradas_nbr.o5m -H localhost -d databasename -U username --proj 32629

Phase 2, updating the data

When a Planet file is downloaded it is already old because a new version is published only once a week (and changes in the OSM servers are continuous). So after the first import the data in the database can be immediately updated.

Step 6: update the dataset

With osmupdate we update the dataset obtained in Step 2

$ osmupdate --verbose planet-portugal.o5m planet-portugal-updated.o5m -B=portugal.poly

Step 7: filter the dataset and leave just features with a specific OSM attribute/tag

$ osmfilter --verbose planet-portugal-updated.o5m --keep= --keep-ways="highway" --out-o5m > portugal_estradas_updated.o5m

Step 8: remove broken references

$ osmconvert --verbose portugal_estradas_updated.o5m -b=-180,-90,180,90 --drop-broken-refs -o=portugal_estradas_updated_nbr.o5m

Step 9: create the DIFF file

The DIFF file (in .osc format) contains only the differences between two OSM datasets

$ osmconvert --verbose portugal_estradas_updated_nbr.o5m --diff --fake-lonlat -o=diff.osc

Step 10: import the DIFF file

$ osm2pgsql --flat-nodes flat_nodes.bin --slim --append --cache 16000 --number-processes 12 --hstore --style openstreetmap-carto.style --multi-geometry diff.osc -H localhost -d databasename -U username --proj 32629

Final notes:

Wrap steps 6 to 10 into a batch file and schedule it with CRON to get a fully automatic way to have a continuously updated PostGIS database with OSM data.

The speed of the above process gets a huge boost if it can be done on a SSD rather than a HDD.

OpenStreetMap: import data into a PostGIS database and incrementally update it