PostGIS: remove all points (but one) that are closest than “x” meters from the others

To solve this problem a number of possible solutions may come to mind, usually this are not very straightforward and/or very slow.

It turns out that the right (and spectacularly fast) solution is:

SELECT ST_GeometryN(unnest(ST_ClusterWithin(f.geom, 0.4)),1) AS geom FROM points AS f

Quick explanation:

  1. ST_ClusterWithin creates a cluster/GeometryCollection where each GeometryCollection represents a set of geometries separated by no more than the specified distance. All the GeometryCollection are placed into an array
  2. unnest splits the array and puts each element into a separate row
  3. ST_GeometryN extracts the Nth geometry from a GeometryCollection

In the example above the distance used for this analysis is 0.4 meters, and with ST_GeometryN we extract always the 1st geometry from the cluster/GeometryCollection of points, because for this problem a) is not important what point we keep (in the cluster) and b) the first geometry of a GeometryCollection always exist.

If is needed to know how many points are in each cluster/GeometryCollection the ST_NumGeometries can be used:

SELECT unnest(ST_ClusterWithin(f.geom, 0.4)) AS geom, ST_NumGeometries(unnest(ST_ClusterWithin(f.geom, 0.4))) AS ng
FROM dados.points AS f

A different approach, not as fast as the above, could be:

CREATE TABLE points_clusters AS
WITH temp_table AS (SELECT (ST_Dump(ST_Union(ST_Buffer(geom, 0.2, 'quad_segs=100')))).geom::geometry(POLYGON,3047) AS geom FROM points)
SELECT row_number() over() AS gid, geom AS geom FROM temp_table

then

CREATE TABLE points_without_closest AS
SELECT DISTINCT ON (geom) id_point, geom FROM (
SELECT a.gid AS id_point,b.gid AS id_buffer,a.geom AS geom, row_number() OVER (PARTITION BY b.gid ORDER BY a.gid) AS rownum FROM points a,
points_clusters b WHERE ST_Within(a.geom,b.geom))
tmp
WHERE rownum < 2;
Advertisements
PostGIS: remove all points (but one) that are closest than “x” meters from the others

Insert binaries into a PostgreSQL table as “bytea” data type

In older versions of PostgreSQL a native function to insert binaries as “bytea” data type is missing, so:

create or replace function bytea_import(p_path text, p_result out bytea)
 language plpgsql as $$
declare
 l_oid oid;
 r record;
begin
 p_result := '';
 select lo_import(p_path) into l_oid;
 for r in ( select data
 from pg_largeobject
 where loid = l_oid
 order by pageno ) loop
 p_result = p_result || r.data;
 end loop;
 perform lo_unlink(l_oid);
end;$$;

then (example of a case where a picture file name is already available in another column):

 UPDATE table SET bytea_column_name = bytea_import('/path/to/photo/' || "column_with_picture_name");
Insert binaries into a PostgreSQL table as “bytea” data type

Block/allow (specific port) access by country using UFW (Uncomplicated Firewall)

Very clear UFW tutorial:

https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-with-ufw-on-ubuntu-14-04

Download a text file with all the IP ranges for a given specific country (select CIDR as format):

https://www.ip2location.com/free/visitor-blocker

Save the (text) file and edit it to remove the first lines (the ones that are comments). After that issue a command like the following (the “insert 1” is to insert the new rules at the top of ufw rules list):

while read line; do sudo ufw insert 1 deny from $line to any port 22; done < bad_country.txt

 

Block/allow (specific port) access by country using UFW (Uncomplicated Firewall)

Some useful RSYNC parameter

1) Sinchronize only files younger/older than “x” days (31 ij the following example):

find /path/to/folder -type f -mtime -31 > /tmp/rsyncfiles

or

find /path/to/folder -type f -mtime +31 > /tmp/rsyncfiles

then

rsync -Ravh --files-from=/tmp/rsyncfiles . /target/folder

2) Ignore existing files

rsync --ignore-existing -raz /source/folder/ /target/folder/

3) Update the remote only if a newer version is on the local filesystem

rsync --update -raz /source/folder/ /target/folder/
Some useful RSYNC parameter

Some example of useful PostgreSQL/PostGIS queries

List database triggers:

SELECT * FROM information_schema.triggers;

List tables owned by a specific user:

SELECT *
FROM pg_tables t
WHERE t.tableowner = 'username';

Change permissions on all tables in the same schema:

REVOKE ALL ON ALL TABLES IN SCHEMA schemaname FROM username;

Loop schema names and run a whatever query:

CREATE OR REPLACE FUNCTION change_all()
 RETURNS VOID AS
 $$
 DECLARE rec RECORD;
 BEGIN
-- Get all the schemas
 FOR rec IN
 SELECT DISTINCT schemaname
 FROM pg_catalog.pg_tables
 -- You can exclude the schema which you don't want to drop by adding another condition here
 WHERE schemaname not like 'pg_catalog'
 LOOP
 EXECUTE 'GRANT ALL ON SCHEMA ' || rec.schemaname || ' TO username';
 END LOOP;
 RETURN;
 END;
 $$ LANGUAGE plpgsql;

SELECT change_all();

Add a sequence to an (editable) view:

ALTER VIEW schema.tablename ALTER gid SET DEFAULT nextval('schema.tablename_gid_seq');
Function/trigger to get/save the (database) username when doing an INSERT or UPDATE

CREATE OR REPLACE FUNCTION adduser()
 RETURNS trigger AS
 $BODY$
 BEGIN
 NEW.column_to_be_filled_with_username = session_user;
 RETURN NEW;
 END;
 $BODY$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS data_adduser ON schema.tablename;
 CREATE TRIGGER data_adduser BEFORE UPDATE ON schema.tablename
 FOR EACH ROW
 WHEN (OLD.edited_column IS DISTINCT FROM NEW.edited_column)
 EXECUTE PROCEDURE adduser();

A very simple but effective way of keeping history of changes in a table (backup table structure identical to the one being backed up, plus some other columns to save some additional info):

CREATE OR REPLACE FUNCTION backup_row()
 RETURNS trigger AS
$BODY$
 BEGIN
INSERT INTO backup_table (col1, col2, editor, col3, username, edit_data, geom) 
VALUES (OLD.col1, OLD.col2, OLD.col3, session_user, current_timestamp, OLD.geom);
 RETURN NEW;
 END;
$BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100;
ALTER FUNCTION backup_row()
 OWNER TO username;


CREATE TRIGGER backup_row
 BEFORE UPDATE OR DELETE
 ON schema.tablenmae
 FOR EACH ROW
 EXECUTE PROCEDURE backup_row();

Function to add some attribute/geometry validation (see more here: http://twiav-tt.blogspot.pt/2012/07/postgis-trigger-function-retrieve.html)

CREATE OR REPLACE FUNCTION update_observation()
 RETURNS trigger AS
$BODY$
 DECLARE
 vf boolean;
 BEGIN
 -- Check the description of the observation
 IF NEW.description IS NULL THEN
 RAISE EXCEPTION 'Description cannot be empty';
 END IF;

 -- Check the geometry is completely contained within the boundaries of a polygon layer
vf := (SELECT ST_Within(NEW.geom, geom) FROM boundaries WHERE ST_Within(NEW.geom, geom));
 IF vf IS NOT TRUE THEN
 RAISE EXCEPTION 'The new geometry must be completely within the boundaries of the limits layer';
 END IF;

RETURN NEW;
 END;
$BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100;
ALTER FUNCTION update_observation()
 OWNER TO username;

A more advanced example on validation and keeping history of changes: http://www.opengis.ch/2018/01/08/postgresql-back-end-solution-for-quality-assurance-and-data-archive/

Some example of useful PostgreSQL/PostGIS queries

Split lines with points, the PostGIS way

The problem:

Given a layer of lines and one of points (the latter not necessarily already over the lines one) , split the lines using the points.

Five years after a feature request on the QGIS bug tracker

https://hub.qgis.org/issues/5040

this tool is not yet available as native QGIS tool and also an (easy) alternative is missing within the other providers in the QGIS Processing toolbox.

Here following a pure SQL/PostGIS solution. Not sure is the most elegant or fast one, but seems to work as expected.

Workflow and notes:

  • As the input points are away from the lines, get the closest place on the line (likely on a segment) using the st_closestpoint function
  • Input lines are densified with the st_segmentize: this will later allow to snap the input points -now over the segments- over a node/vertex, a really close one depending on how much the lines were densified
  • Extract the nodes/vertexes of densified lines with a custom function:
CREATE OR REPLACE FUNCTION ST_AsMultiPoint(geometry) RETURNS geometry AS
'SELECT ST_Union((d).geom) FROM ST_DumpPoints($1) AS d;'
LANGUAGE sql IMMUTABLE STRICT COST 10;
  • Snap points (the ones over lines segments) over the nearest node/vertex of the densified lines using the st_snap function
  • Split the lines using snapped points thanks to the st_split function
  • The above process is necessary as points can be used to split lines only if they are exactly over a line node/vertex (st_split won’t work in the points are over a line segment)

Step1: input points on nearest place over lines segments

CREATE TABLE points_over_lines
AS SELECT a.id,ST_ClosestPoint(ST_Union(b.geom), a.geom)::geometry(POINT,3763) AS geom
FROM points a, lines b
GROUP BY a.geom,a.id;

Step2: densify lines and extract nodes as 1 unique multipoint geometry

CREATE TABLE lines_nodes_densified AS
SELECT 1 AS id, ST_Union(ST_AsMultiPoint(st_segmentize(geom,1)))::geometry(MULTIPOINT,3763) AS geom 
FROM lines;

Step3: snap points over lines nodes/vertexes

CREATE TABLE points_snapped AS
SELECT b.id, ST_snap(ST_Union(b.geom),a.geom, ST_Distance(a.geom,b.geom)*1.01)::geometry(POINT,3763) AS geom 
FROM lines_nodes_densified a, points_over_lines b
GROUP BY a.geom, b.geom, b.id;

Step4: split lines

CREATE TABLE lines_split AS
SELECT a.id, (ST_Dump(ST_split(st_segmentize(a.geom,1),ST_Union(b.geom)))).geom::geometry(LINESTRING,3763) AS geom 
FROM lines a, points_snapped b
GROUP BY a.id;

With one unique query statement:

CREATE TABLE lines_split AS
WITH 
temp_table1 AS (SELECT a.id,ST_ClosestPoint(ST_Union(b.geom), a.geom)::geometry(POINT,3763) AS geom FROM points a, lines b GROUP BY a.geom,a.id),
temp_table2 AS (SELECT 1 AS id, ST_Union(ST_AsMultiPoint(st_segmentize(geom,1)))::geometry(MULTIPOINT,3763) AS geom FROM lines),
temp_table3 AS (SELECT b.id, ST_snap(ST_Union(b.geom),a.geom, ST_Distance(a.geom,b.geom)*1.01)::geometry(POINT,3763) AS geom 
FROM temp_table2 a, temp_table1 b
GROUP BY a.geom, b.geom, b.id)
SELECT a.id, (ST_Dump(ST_split(st_segmentize(a.geom,1),ST_Union(b.geom)))).geom::geometry(LINESTRING,3763) AS geom FROM lines a, temp_table3 b
GROUP BY a.id;
Split lines with points, the PostGIS way