Loading Spatial Data Formats

Contents

4. Loading Spatial Data Formats#

4.1. Introduction#

4.2. Learning Objectives#

4.3. Sample Datasets#

4.4. Installation and Setup#

4.4.1. Library Import and Initial Setup#

4.5. Installing and Loading Extensions#

4.6. Downloading Sample Data#

4.7. Loading CSV Files with Coordinates#

4.7.1. Basic CSV Loading with Automatic Detection#

4.7.2. Overriding Automatic Detection#

4.7.3. Parallel CSV Reading for Large Files#

4.7.4. Querying CSV Files Directly in SQL#

4.7.5. Performance Considerations and Best Practices#

4.8. Loading JSON Files#

4.8.1. Reading JSON with Automatic Schema Detection#

4.8.2. Understanding JSON Format Variants#

4.8.3. Working with Nested JSON Structures#

4.8.4. Using the JSON Extension for Advanced Operations#

4.9. Querying Pandas DataFrames Directly#

4.9.1. Loading Data into Pandas and Querying with SQL#

4.9.2. When to Use DataFrame Querying#

4.10. Loading Parquet Files for Performance#

4.10.1. Reading Parquet Files in Python#

4.10.2. Querying Parquet Files Directly in SQL#

4.10.3. Cloud Storage and Remote Parquet Files#

4.10.4. When to Convert to Parquet#

4.11. Loading GeoJSON Files with Spatial Geometries#

4.11.1. Discovering Available Spatial Formats#

4.11.2. Loading GeoJSON with ST_Read()#

4.11.3. DuckDB’s Shorthand SQL Syntax#

4.11.4. Creating Persistent Spatial Tables#

4.11.5. Querying the Spatial Table#

4.11.6. Understanding GDAL Dependencies#

4.12. Loading Shapefiles into Modern Workflows#

4.12.1. The Shapefile Multi-File Structure#

4.12.2. Loading Shapefiles with ST_Read()#

4.12.3. Creating Tables from Shapefiles#

4.12.4. Shapefile Limitations and Modern Alternatives#

4.13. Loading GeoParquet for Cloud-Native Spatial Analysis#

4.13.1. Reading Local GeoParquet Files#

4.13.2. Converting WKB Geometries to DuckDB’s Spatial Type#

4.13.3. Loading GeoParquet from Cloud Storage#

4.14. Data Loading Performance Strategies#

4.15. Troubleshooting Common Data Loading Issues#

4.16. Key Takeaways#

4.17. Exercises#

4.17.1. Exercise 1: CSV Loading and Schema Inspection#

4.17.2. Exercise 2: Format Conversion for Performance#

4.17.3. Exercise 3: Loading GeoJSON with Spatial Geometries#

4.17.4. Exercise 4: Working with Shapefiles#

4.17.5. Exercise 5: Querying Pandas DataFrames with SQL#

4.17.6. Exercise 6: Cloud Data Access with GeoParquet#

4.17.7. Exercise 7: Format Comparison with Your Own Data#