initially read my data from whatever source (CSV, relational database somewhere, whatever)
write it to one or more parquet files in a directory
tell duckdb that the directory is my data source
Then duckdb treats the directory just like a databese that you can build indexes on, and since they’re parquet files they’re hella small and have static typing. It was pretty fast and efficient before, and duckdb has really sped up my data wrangling and analysis a ton.
I love duckDB, my usual workflow is:
Then duckdb treats the directory just like a databese that you can build indexes on, and since they’re parquet files they’re hella small and have static typing. It was pretty fast and efficient before, and duckdb has really sped up my data wrangling and analysis a ton.