[Avg. reading time: 9 minutes]

DuckDB

DuckDB is a lightweight analytical database designed to run locally with no external dependencies.

  • Single-file database
  • Zero setup
  • Optimized for analytics
  • Excellent support for modern data formats like Parquet

DuckDB is often called the SQLite for analytics.


Why DuckDB Is Useful Here

DuckDB helps us experience the impact of data formats.

It allows us to:

  • Query CSV and Parquet directly
  • See why columnar formats are faster
  • Run analytical queries without Spark or a cluster

DuckDB is a tool for learning, not the topic itself.


Key Capabilities (High Level)

  • Automatic parallel query execution
  • Fast analytical SQL engine
  • Native Parquet support
  • Reads files directly without loading them into tables
  • Works well with Python and data science workflows

Download the CLI Client

DuckDB in Data Engineering

Download orders.parquet

Open Command Prompt or Terminal

./duckdb

or

duckdb.exe
# Create / Open a database

.open ordersdb

Duckdb allows you to read the contents of orders.parquet as is without needing a table. Double quotes around the file name orders.parquet is essential.

describe table  "orders.parquet"
select * from "orders.parquet" limit 3;
show tables;

create table orders  as select * from "orders.parquet";

select count(*) from orders;

DuckDB supports parallel query processing, and queries run fast.

This table has 1.5 million rows, and aggregation happens in less than a second.

select now(); select o_orderpriority,count(*) cnt from orders group by o_orderpriority; select now();

DuckDB also helps to convert parquet files to CSV in a snap. It also supports converting CSV to Parquet.

COPY "orders.parquet" to 'orders.csv'  (FORMAT "CSV", HEADER 1);Select * from "orders.csv" limit 3;

It also supports exporting existing Tables to Parquet files.

COPY "orders" to  'neworder.parquet' (FORMAT "PARQUET");

DuckDB supports Programming languages such as Python, R, JAVA, node.js, C/C++.

DuckDB ably supports Higher-level SQL programming such as Macros, Sequences, Window Functions.

Get sample data from Yellow Cab

https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Copy yellow cabs data into yellowcabs folder

create table taxi_trips as select * from "yellowcabs/*.parquet";
SELECT
    PULocationID,
    EXTRACT(HOUR FROM tpep_pickup_datetime) AS hour_of_day,
    AVG(fare_amount) AS avg_fare
FROM
    taxi_trips
GROUP BY
    PULocationID,
    hour_of_day;

Extensions

https://duckdb.org/docs/extensions/overview

INSTALL json;
LOAD json;
select * from demo.json;
describe demo.json;

Load directly from HTTP location

select * from 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv'

#duckdb #singlefiledatabase #parquet #tools #cliVer 6.0.18

Last change: 2026-03-03