Geospatial Data at Scale (Part 3): Visualizing Millions of Points

Lander Analytics Team
Apr 7
4 min read

How modern geo stacks handle massive datasets

By Jared Lander and Joe Marlo

This is the final post in our series on handling large geospatial data. Part 1 covered file formats and storage. Part 2 covered computation. Now we tackle visualization: how to render hundreds of thousands (or millions) of points on an interactive map without crashing your browser.

We've dreamt of only streaming points to the map that are in the viewport. Pan left, and only the points in view load. Pan right, and new points stream in. That dream has become reality, but getting there requires knowing where each technology breaks.

The Scaling Ladder

Leaflet: With regular Leaflet, once you get past about 4,000 points on the map, you're pretty much hitting its limits. Every point gets written into the HTML file as SVG, then each one has to be rendered on screen. 4,000 points isn't big by any measure. For a quick exploratory map, fine. For anything beyond, you need something else.

LeafGL: LeafGL swaps SVG for WebGL, which gets you from 4,000 points to over 400,000—a 100x improvement. One caveat is that LeafGL can only handle regular polygons, not multi-polygons, so you need to cast first (which just breaks multi-polygons into multiple rows). But even at 400,000 points, all the data still has to be serialized into JSON and loaded into the browser. That serialization is ultimately what limits LeafGL, as we learned the hard way.

deck.gl + GeoArrow: The current answer for truly large data. We've stress-tested this combination up to about four million interactive points. That's four million points on a zoomable, pannable map. The key is that deck.gl can access binary data in Arrow files directly, which is incredibly fast. Combined with Arrow streaming (if you have a billion rows, it never all comes into memory, it just streams through), you get performance that would have seemed impossible a few years ago.

The Full Stack

Our most modern geo stack:

Store data in PostGIS with TimescaleDB for time-partitioned queries
Compute spatial queries in DuckDB (with careful projection handling) against the data in PostGIS
Extract results into Arrow for streaming to the browser
Visualize with deck.gl for interactive rendering of millions of points

Production Deployment with Shiny

Sometimes your users don't want pre-baked content—they want to filter, aggregate, and explore data dynamically. For that, we've landed on R Shiny with mapgl and a local DuckDB cache.

The architecture is straightforward. PostGIS hosts the full dataset. When a user starts a Shiny session, DuckDB pulls their relevant subset—maybe a few million rows for a specific region or time period—into a local cache. From that point on, every UI interaction (filters, date ranges, aggregation toggles) hits the local DuckDB, not the remote database.

DuckDB queries PostGIS directly via the Postgres extension for the initial pull. That can take a few seconds depending on the subset size, so we use async background loading (via mirai) and loading indicators — skeleton screens, progress bars, etc.— to keep the experience smooth while DuckDB hydrates the cache. Once cached, response times drop to sub-second. mapgl handles the rendering with WebGL, so you get deck.gl-level performance without writing JavaScript. No complex Tippecanoe setup. No CORS configuration. Just server-side R code.

We've deployed this for a public sector client with billions of rows in PostGIS. Individual analysts work with millions of rows in their sessions, filtering and visualizing interactively. The DuckDB cache is what makes it fly.

From Our Projects

Public Sector - Federal: A government agency with billions of location records needed analysts to explore. We initially built the maps with PMTiles, but the team needed dynamic filtering and aggregation — not just pre-baked tile views. We migrated to Shiny with mapgl and a local DuckDB cache, giving each analyst a responsive session handling millions of rows with all the filtering happening server-side.

Energy: An energy company mapping potential site locations alongside analytical data. LeafGL handled their dataset comfortably—the right tool when you have hundreds of thousands of points and need interactive exploration without the complexity of deck.gl or a tile server.

Public Sector - State: We worked on two projects for a state agency working with hundreds of millions of GPS points from tagged wildlife. In one, we cached relevant subsets locally so analysts could work interactively without round-tripping to the full database. In the other, we focused on rendering the sheer volume of points on a map—the kind of scale where you're choosing between PMTiles and deck.gl depending on whether the data is pre-aggregated or needs to stay interactive. For this particular case PMTiles was the answer.

What We Tell Clients

deck.gl gives you the best scaling, but writing the JavaScript integration is not trivial. We've worked to make the geoarrow-deck.gl library work smoothly with R workflows, including filing issues and pull requests that led to upstream fixes. If your team is comfortable with JavaScript, this is the path to 4 million points.

If your team is stronger in R, the Shiny + mapgl pattern sidesteps JavaScript entirely while still giving you WebGL rendering. You trade some flexibility for a stack your team can maintain without a front-end developer.

The scaling ladder is clear: Leaflet for up to 4,000 points, LeafGL for up to 400,000 and deck.gl gets you to 4 million.

The stack has matured: GeoParquet for storage, PostGIS for persistent queries, DuckDB for computation and session caching, mapgl or deck.gl for visualization, Shiny to tie it together. Large-scale spatial analytics is genuinely accessible now, all with open source software.

Jared P. Lander Founder and Chief Data Scientist Lander Analytics

Joe Marlo

Director of Data Science

Lander Analytics

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Jared P. Lander is Chief Data Scientist and founder of Lander Analytics, where he helps organizations build practical, measurable AI workflows grounded in strong data foundations. About the author: Joe Marlo is Director of Data Science at Lander Analytics, where he designs agentic workflows, statistical models, and interactive frontends that put rigorous analysis into production.