Preface#

Introduction#

In an increasingly data-driven world, the ability to effectively manage and analyze spatial information has become essential. From urban planning and environmental monitoring to logistics and personalized location-based services, geospatial data forms the backbone of numerous applications that influence our daily lives. However, working with spatial data has often been viewed as a specialized and complex domain, requiring intricate tools and a steep learning curve.

Enter DuckDB.

DuckDB is an innovative analytical database designed for efficiency and user-friendliness. As an in-process OLAP (Online Analytical Processing) database, it operates directly within your application, eliminating the need for separate server deployments and cumbersome configurations. This embedded nature, coupled with its column-oriented architecture and vectorized execution engine, makes DuckDB exceptionally fast for analytical queries, even with large datasets. Initially gaining popularity for its general-purpose data processing capabilities, DuckDB’s rapidly expanding ecosystem (especially its extensions for spatial data) represents a transformative shift.

This book, “Spatial Data Management with DuckDB: From SQL Basics to Advanced Geospatial Analytics,” aims to demystify geospatial data and showcase how DuckDB empowers everyone (from data analysts and scientists to developers and GIS professionals) to leverage its capabilities with unprecedented simplicity and speed. We believe that robust spatial analytics should be accessible, not confined to costly specialized software or complex programming languages. With DuckDB, that accessibility becomes a reality.

Our journey begins with the fundamental concepts of spatial data and its representation, establishing a solid foundation in SQL for working with points, lines, and polygons. As we advance, you will discover how DuckDB’s native geospatial features (enhanced by its PostGIS-compatible extension) enable sophisticated operations like spatial joins, buffering, and nearest neighbor searches through elegant SQL queries. We will explore various real-world applications, demonstrating how to load, transform, analyze, and visualize spatial datasets, empowering you to extract meaningful insights from geographic information.

Whether you’re looking to integrate spatial analysis into your data pipelines, perform quick ad-hoc geospatial queries, or develop interactive location-aware applications, this book will serve as your comprehensive guide. We will cover topics ranging from setting up your DuckDB environment and importing diverse spatial file formats (like Shapefiles, GeoJSON, and GeoParquet) to executing complex analytical tasks and integrating with visualization tools.

Our aim is not just to teach you syntax but to cultivate an understanding of why these tools and techniques are powerful. By the end of this book, you will be proficient in using DuckDB as your go-to engine for spatial data management and analysis, unlocking new possibilities for your projects and empowering you to make informed, spatially-aware decisions.

Join us as we delve into the exciting intersection of DuckDB’s analytical capabilities and the rich world of geospatial data. The future of accessible spatial analytics is here, and it runs on DuckDB.

Who This Book Is For#

This book is designed for anyone grappling with the complexities of modern spatial data analysis. If you’ve ever spent hours waiting for a spatial join to finish, struggled to load large geographic datasets into memory, or wished for a more straightforward way to combine SQL’s power with spatial operations, this book is for you.

You’ll Find the Most Value If You Are#

A GIS Professional frustrated by the limitations of desktop software when handling large datasets. You’re familiar with QGIS or ArcGIS, but you need to analyze millions of features, process extensive GPS tracks, or integrate spatial analysis into automated workflows.

A Data Scientist or Analyst who frequently encounters location data. You’re comfortable with Python and pandas, but spatial data often feels like a mystery. You want to incorporate geographic dimensions into your analyses without diving into complex GIS software.

A Software Developer building applications that incorporate spatial features. You need fast spatial queries, wish to avoid heavy database infrastructure, and prefer working with familiar SQL over specialized spatial libraries.

A Researcher or Academic in fields like geography, environmental science, or urban planning. Your research involves large spatial datasets, and you require reproducible, scalable analysis methods that can adapt to growing data volumes.

A Business Intelligence Professional dealing with location-based business data. Whether it’s store locations, delivery routes, customer territories, or real estate portfolios, you need to merge business metrics with spatial insights.

Essential Prerequisites#

You should be comfortable with:

  • Python programming: understanding variables, functions, and how to import libraries (expertise is not required)

  • Data analysis concepts: filtering records, aggregating data, and joining tables

  • SQL fundamentals: familiarity with SELECT, WHERE, and GROUP BY clauses (we’ll cover the spatial aspects)

  • Basics of spatial data: understanding that data has a location (latitude/longitude, projections)

Helpful Background (But Not Required)#

  • Experience with pandas, GeoPandas, or Jupyter notebooks

  • Prior exposure to databases or data warehouses

  • Familiarity with GIS software (QGIS, ArcGIS, PostGIS)

  • Knowledge of spatial file formats (GeoJSON, Shapefiles, Parquet)

If You’re New to Python Programming#

If you’re new to geospatial Python programming, the following book provides an excellent introduction to both foundational GIS concepts and Python programming:

Wu, Q. (2025). Introduction to GIS Programming: A Practical Python Guide to Open Source Geospatial Tools. Independently published. ISBN 979-8286979455. https://www.amazon.com/dp/B0FFW34LL3

What This Book Covers#

This book offers a structured journey from SQL basics to advanced geospatial analytics, equipping you with practical skills through real-world examples. Each chapter progresses from simple queries to complex spatial analyses, building your expertise in modern geospatial data management.

Part I: DuckDB Foundations (Chapters 1-3)#

Master the essential concepts that underpin all subsequent content:

  • Chapter 1: Getting Started with DuckDB: Installation, initial queries, and insights into how DuckDB revolutionizes spatial analytics.

  • Chapter 2: Essential SQL for Spatial Analysis: Key SQL patterns for filtering, aggregating, joining, and optimizing queries tailored for spatial data.

  • Chapter 3: DuckDB Python Integration: Combine SQL’s power with pandas’ flexibility to craft a seamless spatial analysis workflow.

By the end of Part I, you’ll confidently query spatial datasets and integrate DuckDB into any Python-based analysis pipeline.

Part II: Spatial Data Operations (Chapters 4-10)#

Dive into the core spatial toolkit, covering everything from data loading to advanced analytics:

  • Chapter 4: Loading Spatial Data Formats: Import various formats including CSV coordinates, GeoJSON from APIs, massive Shapefiles, and cloud-hosted GeoParquet.

  • Chapter 5: Exporting and Converting Spatial Data: Transform your results into any format your stakeholders require.

  • Chapter 6: Geometry Operations and Functions: Create, measure, and transform spatial features utilizing SQL functions.

  • Chapter 7: Spatial Queries and Relationships: Master spatial equivalents of table joins: containment, intersection, and proximity.

  • Chapter 8: Advanced Spatial Joins: Combine datasets by location rather than IDs, which is the essence of spatial analysis.

  • Chapter 9: Interactive Data Visualization: Generate compelling maps and charts that effectively communicate your data’s spatial narratives.

  • Chapter 10: Working with Vector Tiles and PMTiles: Deploy interactive maps capable of handling millions of features smoothly.

By the end of Part II, you’ll be adept at managing any spatial data format, executing complex operations, and creating professional visualizations.

Part III: Real-World Geospatial Analytics (Chapters 11-14)#

Explore four comprehensive case studies using large-scale, real datasets:

  • Chapter 11: Analyzing the US National Wetlands Inventory: Conduct environmental analyses across all 50 states, processing millions of wetland polygons.

  • Chapter 12: Analyzing Global Building Footprints: Analyze urban data using global building footprints from Overture Maps.

  • Chapter 13: Analyzing NYC Taxi Trip Data: Uncover spatiotemporal patterns from hundreds of millions of taxi trips, revealing insights into urban mobility.

  • Chapter 14: Developing Interactive Dashboards with Streamlit and Solara: Build and deploy web applications that make your analyses accessible to stakeholders.

By the end of Part III, you’ll have portfolio-worthy projects showcasing your advanced spatial analysis capabilities.

Cross-Cutting Themes Throughout#

  • Performance at Scale: Techniques that are effective whether handling thousands or millions of spatial features.

  • Cloud-Native Workflows: Process data directly from S3 and integrate seamlessly with modern data stacks.

  • Reproducible Analysis: Shareable code and methods that can be version-controlled and deployed in production.

  • Real-World Data Challenges: Tackle projection issues, missing values, and data quality concerns.

  • Integration Patterns: Combine DuckDB with the broader Python geospatial ecosystem for enhanced functionality.

What Makes This Book Different#

Unlike theoretical discussions or tool-specific tutorials, this book emphasizes solving real problems. Each technique is rooted in actual analytical challenges, demonstrated with real datasets, and explained in clear terms of when and why to use it.

Getting the Most Out of This Book#

To maximize your learning experience with this book, consider the following recommendations:

Set Up a Proper Development Environment: Install Python and the required libraries as described in Chapter 1. A well-configured environment will save you time and frustration throughout your learning journey. Consider using conda or uv to manage your Python packages, as this simplifies the installation of geospatial libraries.

Follow Along with Code Examples: This book is designed to be interactive. Don’t just read the code; type it out, run it, and experiment with modifications. Understanding comes through practice, and each example builds skills you’ll need later.

Work Through the Exercises: Each chapter includes exercises designed to reinforce the concepts you’ve learned. These are not optional extras; they are an integral part of the learning process. Start with the guided exercises, then challenge yourself with your own projects.

Use Real Data: While the book provides datasets for examples and exercises, try applying the techniques to data from your own field or interests. This will help you understand how the concepts apply to real-world scenarios and build confidence in your abilities.

Build Projects: As you progress through the book, consider working on a personal project that interests you. This could be analyzing data from your research, creating maps for your community, or solving a problem you’ve encountered in your work.

Be Patient with Yourself: Programming can be frustrating, especially when you’re learning. Expect to encounter errors, spend time debugging, and occasionally feel stuck. This is normal and part of the learning process. Take breaks when needed, and remember that expertise develops gradually through consistent practice. If you get stuck, don’t hesitate to ask for help on the book’s GitHub repository.

Keep Practicing: The skills in this book require regular practice to maintain and develop. Set aside time regularly to work on geospatial programming projects, even if they’re small ones.

Conventions Used in This Book#

This book uses several conventions to help you navigate the content and understand the code examples:

Code Formatting: All Python code appears in monospaced font within code blocks. When code appears within regular text, it is formatted like this. File and directory names are also formatted in monospaced font.

Code Examples: Most code examples are complete and runnable. They include comments explaining the key concepts and techniques being demonstrated. Line numbers may be included for reference in the accompanying text.

# This is an example of a code block
import leafmap

m = leafmap.Map()
m.add_basemap("OpenTopoMap")  # add a basemap to the map
m

SQL Style Guide: For consistency and readability, SQL examples follow these patterns:

  • Keywords in UPPERCASE: SELECT, FROM, WHERE, JOIN

  • Function names preserve case: ST_Area(), read_csv_auto()

  • Table and column names in lowercase: cities, population

  • Indentation for readability: Multi-line queries are formatted for clarity

SELECT name, ST_Area(geometry) as area
FROM neighborhoods
WHERE borough = 'Manhattan'
ORDER BY area DESC;

Command Line Instructions: Commands to be entered at the command line or terminal are shown with a $ prompt (don’t type the $ symbol itself):

$ pip install leafmap
$ python script.py

Downloading the Code Examples#

All code examples, datasets, and supplementary materials for this book are freely available on GitHub:

giswqs/duckdb-spatial

To download the materials, you can use one of the following methods:

  • Clone the repository (if you have Git installed):

    $ git clone https://github.com/giswqs/duckdb-spatial.git
    
  • Download as ZIP (if you prefer not to use Git):

    • Visit the GitHub repository page

    • Click the green Code button

    • Select Download ZIP

    • Extract the files to your preferred location

  • Browse individual files online through the GitHub interface if you only need specific examples

The repository is regularly updated with corrections, improvements, and additional examples. Check back periodically for updates, or watch the repository on GitHub to be notified of changes.

If you find errors in the code or have suggestions for improvements, please open an issue or submit a pull request on GitHub. Community contributions help make this resource better for everyone.

Video Tutorials and Supplementary Resources#

Complementing the written content, this book is supported by a comprehensive series of video tutorials that walk through key concepts and provide additional examples:

https://tinyurl.com/duckdb-spatial-videos

The videos are designed to complement, not replace, the written material. They’re particularly helpful for:

  • Visual learners who benefit from seeing code being written and executed

  • Understanding complex concepts through multiple explanations

  • Learning about the development workflow and best practices

  • Seeing how to approach problems and debug issues

The playlist is organized to follow the book’s structure. You can watch them in order as you progress through the book, or jump to specific topics as needed.

The videos were created in Fall 2023 when I was teaching the Spatial Data Management [1] course at the University of Tennessee. Although the course has concluded, the videos remain relevant and can be used as references for the book. Additional videos will be added in the future.

Community and Feedback#

I welcome feedback, questions, and suggestions from readers. Your input helps improve the book and makes it more useful for the geospatial programming community.

For book-related questions and discussions:

Types of feedback that are particularly helpful:

  • Errors or unclear explanations in the text or code

  • Suggestions for additional examples or use cases

  • Ideas for new topics or chapters

  • Reports of compatibility issues with different operating systems or library versions

  • Success stories of how you’ve applied the techniques from the book

Acknowledgments#

This book exists thanks to the contributions of many individuals and the broader open-source geospatial community.

First, I want to thank the DuckDB development team for creating such an exceptional database system. Their commitment to performance, simplicity, and open-source principles has made spatial analytics accessible to a much broader audience. Special recognition goes to the team members who’ve contributed to DuckDB’s spatial capabilities.

I’m grateful to my colleagues and students at the University of Tennessee who have provided feedback, tested examples, and helped refine the content through the Spatial Data Management course. Their questions and insights have made this book much stronger.

The open-source geospatial community deserves particular recognition. Projects like GDAL, GeoPandas, Shapely, and countless others form the foundation that makes modern spatial analysis possible. The collaborative spirit of this community continues to inspire my work.

Thanks to the early readers who provided feedback on draft chapters and helped identify areas that needed clarification or improvement. Their diverse perspectives (from seasoned GIS professionals to data science newcomers) helped ensure this book serves its intended audience.

I want to acknowledge my family for their patience and support during the many evenings and weekends spent writing. Their understanding and encouragement made this project possible.

Finally, thanks to you, the reader, for your interest in spatial data analysis and open-source tools. It’s practitioners like you who drive innovation and make the geospatial field such an exciting place to work.

If this book helps you solve spatial problems more effectively, then all the effort has been worthwhile.

About the Author#

Dr. Qiusheng Wu is an Associate Professor in the Department of Geography & Sustainability at the University of Tennessee, Knoxville. He is also an Amazon Scholar. Dr. Wu’s research focuses on advancing open-source geospatial analytics through cloud computing and GeoAI. He is the creator and maintainer of several widely used open-source Python packages, including Geemap [2], Leafmap [3], SAMGeo [4], and GeoAI [5], which integrate cloud-based geospatial platforms with AI-powered analysis and visualization. Dr. Wu’s work bridges remote sensing, Earth observation, and artificial intelligence to make large-scale geospatial data more accessible, reproducible, and intelligent for researchers, educators, and practitioners worldwide. His open-source projects can be found on GitHub at opengeos.