[Avg. reading time: 4 minutes]
YAML
YAML stands for YAML Ain’t Markup Language.
- Human-readable data serialization format
- Designed for configuration, not large datasets
- Structure is defined by indentation
- Whitespace matters

Core Data Structures
Key–Value (Map / Dictionary)
app: analytics
version: 1.0
List (Sequence / Array)
ports:
- 8080
- 9090
Nested structures
database:
host: localhost
port: 5432
credentials:
user: admin
password: secret
Scalars
- string, int, float, bool, null
- true, false, null are native types
YAML vs JSON
- YAML is superet of JSON, YAML can parse JSON syntax.
- No braces, no commas
- Comments are allowed
- Types inferred, not enforced
- Easier diffs in git
- Easier to break with bad indentation
Tradeoff is real. YAML is readable but fragile.
{"id":1,"name":"event","tags":["click","mobile"]}
id: 1
name: event
tags:
- click
- mobile
Real world usecases
Here are some of the popular usecases in Data Engineering
- CICD
- Terraform
- Docker
- Airflow
JSON is for DATA and YAML is for Config
YAML is a bad choice for Data if
- Dataset is Large
- High Write frequency
- Streaming or Continous Data
- Schema critical systems
Because
- YAML is slow to parse (compared to JSON)
- Hard to validate strictly
- No native indexing
- YAML parsers build large memory trees
Lightweight portable command-line
https://mikefarah.gitbook.io/yq/