Malte Krupa


Fix Slow Default Index After Ogr2ogr Import - 2024-03-20

Jump to the solution

Background

The other day I created a basic OpenStreetMap related example using the phoenix framework. This example uses protomaps and maplibre-gl to draw maps in the browser without any external resources.

The example app also uses a small static GeoJSON file to draw some streets which I downloaded via overpass-turbo. This works well for small datasets.

Short video: https://youtu.be/AnjkZntIxW0

But what if I wanted to draw the streets of a complete country instead of just the city center of Frankfurt? I assume the GeoJSON file would probably be very big and create slow results.

Instead of a static file I wanted to give PostGIS a try to only return the streets in an area around a specific point.

Setup

Create a Postgres database and enable PostGIS:

CREATE DATABASE streets;
CREATE EXTENSION IF NOT EXISTS postgis;

Now we need data to import. I downloaded a slice of the world from Geofabrik.

There are many ways to import the data into the database. I choose ogr2ogr for no particular reason.

ogr2ogr -f "PostgreSQL" PG:"dbname=streets user=postgres" -sql "SELECT osm_id FROM lines WHERE highway IN ('motorway','trunk','primary','secondary','residential','living_street','tertiary')" -nln <table name> <filename>

Replace <table name> with the table name you want to use and <filename> with the name of the pbf file you just downloaded.

Once the command has finished you should have a rather boring table containing three columns.

The problem

The import with ogr2ogr2 creates an index for the geometries that looks something like this:

streets=# \d index_test
                                           Table "public.index_test"
    Column    |           Type            | Collation | Nullable |
Default
--------------+---------------------------+-----------+----------+---------------------------------------------
 ogc_fid      | integer                   |           | not null | nextval('index_test_ogc_fid_seq'::regclass)
 osm_id       | character varying         |           |          |
 wkb_geometry | geometry(LineString,4326) |           |          |
Indexes:
    "index_test_pkey" PRIMARY KEY, btree (ogc_fid)
    "index_test_wkb_geometry_geom_idx" gist (wkb_geometry)

Running queries with this index results in poor performance, because it looks like the index is just ignored.

streets_dev=# SELECT osm_id FROM index_test WHERE ST_DWithin(wkb_geometry,
'POINT(8.68417 50.11552)'::geography, 100) ORDER BY wkb_geometry <->
'POINT(8.68417 50.11552)'::geography;

  osm_id
-----------
 5069357
 4834447
 686648950
(3 rows)

Time: 417.628 ms

Adding explain (analyze true, verbose true, buffers true) in front of the query shows a rather long query plan which does not involve the index.

The solution

Luckily an answer on StackOverflow explains very well what the issue is (I replaced the column names to fit my example):

The problem is that you are mixing geometry and geography, and PostgreSQL casts
wkb_geometry to geography so that they match.

Now you have indexed wkb_geometry, but not wkb_geometry::geography, which is
something different.

Either use 'SRID=4326;POINT(-0.145872 51.509691)'::geometry as second operand or
create the GiST index on ((geom_bounding_box::geography)) (note the double
parentheses).

After dropping the old index and creating a new one, the query look much better.

streets=# DROP INDEX index_test_wkb_geometry_geom_idx;
DROP INDEX
Time: 6.262 ms

streets=# CREATE INDEX index_test_wkb_geometry_geom_idx ON index_test USING
GIST ((wkb_geometry::geography));
CREATE INDEX
Time: 1154.404 ms (00:01.154)

streets=# SELECT osm_id FROM index_test WHERE ST_DWithin(wkb_geometry,
'POINT(8.68417 50.11552)'::geography, 100) ORDER BY wkb_geometry <->
'POINT(8.68417 50.11552)'::geography;
  osm_id
-----------
 5069357
 4834447
 686648950
(3 rows)

Time: 1.509 ms

This reduced the query time from ~410ms to ~1.5ms. Yay.


Privacy Policy | Imprint