[Avg. reading time: 4 minutes]

YAML

YAML stands for YAML Ain’t Markup Language.

  • Human-readable data serialization format
  • Designed for configuration, not large datasets
  • Structure is defined by indentation
  • Whitespace matters


Core Data Structures

Key–Value (Map / Dictionary)

app: analytics
version: 1.0

List (Sequence / Array)

ports:
  - 8080
  - 9090

Nested structures

database:
  host: localhost
  port: 5432
  credentials:
    user: admin
    password: secret

Scalars

  • string, int, float, bool, null
  • true, false, null are native types

YAML vs JSON

  • YAML is superet of JSON, YAML can parse JSON syntax.
  • No braces, no commas
  • Comments are allowed
  • Types inferred, not enforced
  • Easier diffs in git
  • Easier to break with bad indentation

Tradeoff is real. YAML is readable but fragile.

{"id":1,"name":"event","tags":["click","mobile"]}
id: 1
name: event
tags:
  - click
  - mobile

Real world usecases

Here are some of the popular usecases in Data Engineering

  • CICD
  • Terraform
  • Docker
  • Airflow

JSON is for DATA and YAML is for Config

YAML is a bad choice for Data if

  • Dataset is Large
  • High Write frequency
  • Streaming or Continous Data
  • Schema critical systems

Because

  • YAML is slow to parse (compared to JSON)
  • Hard to validate strictly
  • No native indexing
  • YAML parsers build large memory trees

Lightweight portable command-line

https://mikefarah.gitbook.io/yq/

#dataformat #yaml #yqVer 6.0.18

Last change: 2026-03-03