Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Preface

Department of Geography & Sustainability, University of Tennessee, Knoxville

Abstract

This book is a guide to geospatial programming with DuckDB. It is designed for beginners and intermediate users who want to learn how to use DuckDB for geospatial analysis and visualization.

Introduction

In an increasingly data-driven world, the ability to effectively manage and analyze spatial information has become essential. From urban planning and environmental monitoring to logistics and personalized location-based services, geospatial data forms the backbone of numerous applications that influence our daily lives. However, working with spatial data has often been viewed as a specialized and complex domain, requiring intricate tools and a steep learning curve.

Enter DuckDB.

DuckDB is an innovative analytical database designed for efficiency and user-friendliness. As an in-process OLAP (Online Analytical Processing) database, it operates directly within your application, eliminating the need for separate server deployments and cumbersome configurations. This embedded nature, coupled with its column-oriented architecture and vectorized execution engine, makes DuckDB exceptionally fast for analytical queries, even with large datasets. Initially gaining popularity for its general-purpose data processing capabilities, DuckDB’s rapidly expanding ecosystem (especially its extensions for spatial data) represents a transformative shift.

This book, “Spatial Data Management with DuckDB: From SQL Basics to Advanced Geospatial Analytics,” aims to demystify geospatial data and showcase how DuckDB empowers everyone (from data analysts and scientists to developers and GIS professionals) to leverage its capabilities with unprecedented simplicity and speed. We believe that robust spatial analytics should be accessible, not confined to costly specialized software or complex programming languages. With DuckDB, that accessibility becomes a reality.

Our journey begins with the fundamental concepts of spatial data and its representation, establishing a solid foundation in SQL for working with points, lines, and polygons. As we advance, you will discover how DuckDB’s native geospatial features (enhanced by its PostGIS-compatible extension) enable sophisticated operations like spatial joins, buffering, and nearest neighbor searches through elegant SQL queries. We will explore various real-world applications, demonstrating how to load, transform, analyze, and visualize spatial datasets, empowering you to extract meaningful insights from geographic information.

Whether you’re looking to integrate spatial analysis into your data pipelines, perform quick ad-hoc geospatial queries, or develop interactive location-aware applications, this book will serve as your comprehensive guide. We will cover topics ranging from setting up your DuckDB environment and importing diverse spatial file formats (like Shapefiles, GeoJSON, and GeoParquet) to executing complex analytical tasks and integrating with visualization tools.

Our aim is not just to teach you syntax but to cultivate an understanding of why these tools and techniques are powerful. By the end of this book, you will be proficient in using DuckDB as your go-to engine for spatial data management and analysis, unlocking new possibilities for your projects and empowering you to make informed, spatially-aware decisions.

Join us as we delve into the exciting intersection of DuckDB’s analytical capabilities and the rich world of geospatial data. The future of accessible spatial analytics is here, and it runs on DuckDB.

Who This Book Is For

This book is designed for anyone grappling with the complexities of modern spatial data analysis. If you’ve ever spent hours waiting for a spatial join to finish, struggled to load large geographic datasets into memory, or wished for a more straightforward way to combine SQL’s power with spatial operations, this book is for you.

You’ll Find the Most Value If You Are

A GIS Professional frustrated by the limitations of desktop software when handling large datasets. You’re familiar with QGIS or ArcGIS, but you need to analyze millions of features, process extensive GPS tracks, or integrate spatial analysis into automated workflows.

A Data Scientist or Analyst who frequently encounters location data. You’re comfortable with Python and pandas, but spatial data often feels like a mystery. You want to incorporate geographic dimensions into your analyses without diving into complex GIS software.

A Software Developer building applications that incorporate spatial features. You need fast spatial queries, wish to avoid heavy database infrastructure, and prefer working with familiar SQL over specialized spatial libraries.

A Researcher or Academic in fields like geography, environmental science, or urban planning. Your research involves large spatial datasets, and you require reproducible, scalable analysis methods that can adapt to growing data volumes.

A Business Intelligence Professional dealing with location-based business data. Whether it’s store locations, delivery routes, customer territories, or real estate portfolios, you need to merge business metrics with spatial insights.

Essential Prerequisites

You should be comfortable with:

Helpful Background (But Not Required)

If You’re New to Python Programming

If you’re new to geospatial Python programming, the following book provides an excellent introduction to both foundational GIS concepts and Python programming:

Wu, Q. (2025). Introduction to GIS Programming: A Practical Python Guide to Open Source Geospatial Tools. Independently published. ISBN 979-8286979455. https://www.amazon.com/dp/B0FFW34LL3

What This Book Covers

This book offers a structured journey from SQL basics to advanced geospatial analytics, equipping you with practical skills through real-world examples. Each chapter progresses from simple queries to complex spatial analyses, building your expertise in modern geospatial data management.

Part I: DuckDB Foundations (Chapters 1-3)

Master the essential concepts that underpin all subsequent content:

By the end of Part I, you’ll confidently query spatial datasets and integrate DuckDB into any Python-based analysis pipeline.

Part II: Spatial Data Operations (Chapters 4-10)

Dive into the core spatial toolkit, covering everything from data loading to advanced analytics:

By the end of Part II, you’ll be adept at managing any spatial data format, executing complex operations, and creating professional visualizations.

Part III: Real-World Geospatial Analytics (Chapters 11-14)

Explore four comprehensive case studies using large-scale, real datasets:

By the end of Part III, you’ll have portfolio-worthy projects showcasing your advanced spatial analysis capabilities.

Cross-Cutting Themes Throughout

What Makes This Book Different

Unlike theoretical discussions or tool-specific tutorials, this book emphasizes solving real problems. Each technique is rooted in actual analytical challenges, demonstrated with real datasets, and explained in clear terms of when and why to use it.

Getting the Most Out of This Book

To maximize your learning experience with this book, consider the following recommendations:

Set Up a Proper Development Environment: Install Python and the required libraries as described in Chapter 1. A well-configured environment will save you time and frustration throughout your learning journey. Consider using conda or uv to manage your Python packages, as this simplifies the installation of geospatial libraries.

Follow Along with Code Examples: This book is designed to be interactive. Don’t just read the code; type it out, run it, and experiment with modifications. Understanding comes through practice, and each example builds skills you’ll need later.

Work Through the Exercises: Each chapter includes exercises designed to reinforce the concepts you’ve learned. These are not optional extras; they are an integral part of the learning process. Start with the guided exercises, then challenge yourself with your own projects.

Use Real Data: While the book provides datasets for examples and exercises, try applying the techniques to data from your own field or interests. This will help you understand how the concepts apply to real-world scenarios and build confidence in your abilities.

Build Projects: As you progress through the book, consider working on a personal project that interests you. This could be analyzing data from your research, creating maps for your community, or solving a problem you’ve encountered in your work.

Be Patient with Yourself: Programming can be frustrating, especially when you’re learning. Expect to encounter errors, spend time debugging, and occasionally feel stuck. This is normal and part of the learning process. Take breaks when needed, and remember that expertise develops gradually through consistent practice. If you get stuck, don’t hesitate to ask for help on the book’s GitHub repository.

Keep Practicing: The skills in this book require regular practice to maintain and develop. Set aside time regularly to work on geospatial programming projects, even if they’re small ones.

Conventions Used in This Book

This book uses several conventions to help you navigate the content and understand the code examples:

Code Formatting: All Python code appears in monospaced font within code blocks. When code appears within regular text, it is formatted like this. File and directory names are also formatted in monospaced font.

Code Examples: Most code examples are complete and runnable. They include comments explaining the key concepts and techniques being demonstrated. Line numbers may be included for reference in the accompanying text.

# This is an example of a code block
import leafmap
m = leafmap.Map()
m.add_basemap("OpenTopoMap") # add a basemap to the map
m

SQL Style Guide: For consistency and readability, SQL examples follow these patterns:

SELECT name, ST_Area(geometry) as area
FROM neighborhoods
WHERE borough = 'Manhattan'
ORDER BY area DESC;

Command Line Instructions: Commands to be entered at the command line or terminal are shown with a $ prompt (don’t type the $ symbol itself):

$ pip install leafmap
$ python script.py

Downloading the Code Examples

All code examples, datasets, and supplementary materials for this book are freely available on GitHub:

https://github.com/giswqs/duckdb-spatial

To download the materials, you can use one of the following methods:

The repository is regularly updated with corrections, improvements, and additional examples. Check back periodically for updates, or watch the repository on GitHub to be notified of changes.

If you find errors in the code or have suggestions for improvements, please open an issue or submit a pull request on GitHub. Community contributions help make this resource better for everyone.

Video Tutorials and Supplementary Resources

Complementing the written content, this book is supported by a comprehensive series of video tutorials that walk through key concepts and provide additional examples:

https://tinyurl.com/duckdb-spatial-videos

The videos are designed to complement, not replace, the written material. They’re particularly helpful for:

The playlist is organized to follow the book’s structure. You can watch them in order as you progress through the book, or jump to specific topics as needed.

The videos were created in Fall 2023 when I was teaching the Spatial Data Management [1] course at the University of Tennessee. Although the course has concluded, the videos remain relevant and can be used as references for the book. Additional videos will be added in the future.

Community and Feedback

I welcome feedback, questions, and suggestions from readers. Your input helps improve the book and makes it more useful for the geospatial programming community.

For book-related questions and discussions:

Types of feedback that are particularly helpful:

About the Author

Dr. Qiusheng Wu is an Associate Professor in the Department of Geography & Sustainability at the University of Tennessee, Knoxville. He is also an Amazon Scholar. Dr. Wu’s research focuses on advancing open-source geospatial analytics through cloud computing and GeoAI. He is the creator and maintainer of several widely used open-source Python packages, including Geemap [2], Leafmap [3], SAMGeo [4], and GeoAI [5], which integrate cloud-based geospatial platforms with AI-powered analysis and visualization. Dr. Wu’s work bridges remote sensing, Earth observation, and artificial intelligence to make large-scale geospatial data more accessible, reproducible, and intelligent for researchers, educators, and practitioners worldwide. His open-source projects can be found on GitHub at https://github.com/opengeos.

This book embraces the principles of open science and open education. To support transparency, learning, and reuse, the code examples in this book are released under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means you are free to copy, modify, and distribute the code, even for commercial purposes, as long as appropriate credit is given.

Please attribute code usage by citing the book or linking to the GitHub repository:

Wu, Q. (2025). Spatial Data Management with DuckDB: From SQL Basics to Advanced Geospatial Analytics. Independently published. PDF edition ISBN 979-8993859705; Print edition ISBN 979-8274710572. https://duckdb.gishub.org

While the code is freely available, the text, figures, and images in this book are copyrighted by the author and may not be reproduced, redistributed, or modified without explicit permission. This includes all written content, custom diagrams, and embedded visualizations unless otherwise noted.

If you wish to reuse or adapt any non-code material from the book (for example, for teaching, presentations, or publications), please contact the author to request permission.

This dual licensing approach helps balance open access to learning materials with the protection of original creative work. Thank you for respecting these terms and supporting the open-source geospatial community.

Acknowledgments

Thank you to my family and friends for their support and encouragement.

Footnotes