DuckDB Python Integration

Contents

3. DuckDB Python Integration#

3.1. Introduction#

3.2. Learning Objectives#

3.3. Sample Datasets#

3.4. Installation and Setup#

3.4.1. Library Import and Initial Setup#

3.5. Installing and Loading Extensions#

3.6. Reading Data from Multiple Sources#

3.6.1. Verifying the Connection#

3.6.2. Reading CSV Files from URLs#

3.6.3. Understanding DuckDB Relations#

3.7. Seamless Integration with Pandas DataFrames#

3.7.1. Querying DataFrames with SQL#

3.7.2. The Bidirectional Workflow#

3.7.3. Performance Considerations#

3.8. Polars Interoperability#

3.9. Result Conversion and Output Formats#

3.9.1. Converting to Python Objects#

3.9.2. Converting to Pandas DataFrames#

3.9.3. Converting to NumPy Arrays#

3.9.4. Converting to Apache Arrow Tables#

3.9.5. Choosing the Right Format#

3.10. Writing Data To Disk#

3.11. Persistent Storage and Database Files#

3.11.1. Creating a Persistent Database#

3.11.2. Connecting to Existing Databases#

3.11.3. Connection Management and Cleanup#

3.11.4. Use Cases for Persistent vs In-Memory Databases#

3.12. Prepared Statements and Parameters#

3.13. Key Takeaways#

3.14. Exercises#

3.14.1. Exercise 1: Installation and Basic Queries#

3.14.2. Exercise 2: Loading Remote Data#

3.14.3. Exercise 3: SQL to DataFrame Conversion#

3.14.4. Exercise 4: Querying DataFrames with SQL#

3.14.5. Exercise 5: Bidirectional Workflows#

3.14.6. Exercise 6: Result Format Conversion#

3.14.7. Exercise 7: Persistent Storage#

3.14.8. Exercise 8: Joining SQL and Python Logic#

3.14.9. Exercise 9: Writing Results to Files#

3.14.10. Exercise 10: Practical Integration Challenge#