Geospatial Data at Scale (Part 2): Why Your Spatial Queries Are So Slow

Lander Analytics Team
Mar 24
4 min read

From six hours to seconds: speeding up geospatial queries

By Jared Lander and Joe Marlo

This is part two of our series on handling large geospatial data. Part 1 covered file formats and storage. Part 3 will tackle visualization. Here we focus on computation, specifically how to run spatial queries without watching your terminal for an hour.

I love taking a six-hour geospatial query and making it run in three seconds. That's not hyperbole. The difference between naive and optimized spatial computation is often two orders of magnitude.

The Buffer Benchmark

We started with a simple task: create 5km buffers around airport polygons from OpenStreetMap. About 23,000 polygons in Europe (there are not actually that many airports, but our data contained that many polygons). Not massive data, but enough to show the performance gap.

SF in R: 45 seconds for just the 1,900 polygons from two countries. The {sf} package uses S2 spherical geometry by default, which is actually a huge win for correctness (you can pass meters directly without projecting first), but the math is expensive. For exploratory work, fine. For production pipelines, too slow.

PostGIS: About 8 seconds. You have to convert geometry to geography for spherical math, call ST_Buffer, then convert back. More SQL to write, but nearly 6x faster.

DuckDB: About 7 seconds, and that includes pulling the data from Postgres. DuckDB's spatial extension has gotten remarkably capable, and for analytical workloads it often beats PostGIS without the hosting overhead.

The Projection Gotcha

Here's where DuckDB bit us. Unlike PostGIS and SF, DuckDB expects buffer distances in projection units. If you're in EPSG:4326 (lat/long), it wants degrees. Nobody wants to buffer in degrees.

The obvious fix: transform to a meters-based projection, buffer, transform back. So we tried Web Mercator (EPSG:3857), which claims to be in meters globally. When we plotted the results, the DuckDB buffers were visibly smaller than the SF and PostGIS buffers. Something was wrong.

The problem is that 3857 has distortions. It's fine for web maps where you need the whole world on one tile pyramid, but the distance math breaks down away from the equator.

The solution is Universal Transverse Mercator (UTM). UTM divides the world into 122 zones, each 6° of longitude wide, with separate Northern and Southern hemisphere designations. Each zone has minimal distortion within its bounds.

The approach: spatially join each polygon to its UTM zone, project into that zone's CRS, compute the buffer in meters, project back to 4326. In DuckDB, this whole dance takes about 7 seconds. We replicated the same approach in R using the GEOS library and got about 12 seconds, which is competitive enough that you could parallelize it if needed.

The lesson: if you're doing precise distance calculations in DuckDB, you need to handle projections yourself. The spherical geometry you get for free in SF and PostGIS requires manual work in DuckDB.

Spatial Joins at Speed

Once we had correct buffers, we wanted to find which points fell inside them. This is the bread and butter of spatial analytics: point-in-polygon queries. We had 300,000 points and 3,000 (after cleaning up the roughly 23,000 polygons from before) airport buffer polygons.

In DuckDB, with proper spatial indexes on local tables (not views from Postgres), this took less than one second. This is compared to two minutes in Postgres.

H3 for Aggregation

When you need to aggregate points to regions repeatedly, consider Uber's H3 hexagon system. H3 divides the world into deterministic hexagons at 16 resolution levels (0-15). Because the hexagons are pre-defined, mapping a lat/long to its containing hexagon is essentially a lookup table. We've seen 100x speedups for aggregation tasks by converting to H3 first.

In R, the {h3jsr} package handles the conversion. You give it a geometry column, it returns H3 indices. You can then aggregate by index and convert back to polygons for visualization. The newer {h30} package, written in Rust, is a competitive alternative.

Where This Matters

Finance: A commercial real estate firm tagging properties with census tract demographics and flood zone classifications. Their overnight batch job in R became an interactive query in DuckDB.

Public Sector: A city planning department analyzing 311 complaints by council district, with buffers around transit stops to identify complaints within walking distance. PostGIS handles the buffer computation; DuckDB handles the analytical joins.

The stack that's working for us: store everything in PostGIS, run spatial queries in DuckDB (with careful attention to projections), extract results into Arrow. Part 3 will cover what happens next: getting millions of points onto an interactive map.

Jared P. Lander Founder and Chief Data Scientist Lander Analytics

Joe Marlo

Director of Data Science

Lander Analytics

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Jared P. Lander is Chief Data Scientist and founder of Lander Analytics, where he helps organizations build practical, measurable AI workflows grounded in strong data foundations. About the author: Joe Marlo is Director of Data Science at Lander Analytics, where he designs agentic workflows, statistical models, and interactive frontends that put rigorous analysis into production.