Skip to main content
Version: Next

Output Sinks

An output defines where a pipeline writes its results. Unlike inputs and transformations, outputs do not create new assets — they are sinks that reference an existing asset and persist its dataset to external storage.

Anatomy of an Output

output:
- name: enrichedSales
format: parquet
mode: overwrite
path: "data/output/enriched_sales"

The name field is the key concept: it must match the name of an existing input or transformation. The output takes whatever dataset that asset produces and writes it to the specified path.

FieldTypeRequiredDefaultDescription
nameAssetRefYesMust match the name of an input or transformation asset.
formatstringYesOutput data format (same values as input formats).
pathstringYesDestination path or URI.
modestringNo"error"Write mode (see below).
optionsMapNo{}Format-specific key-value options.
descriptionstringNoHuman-readable description.
tagsList[string]No[]Classification labels.
metaMapNo{}Open key-value metadata.
freshnessstringNoExpected update frequency (ISO 8601 duration).
maturitystringNoData maturity level.

Outputs Reference Existing Assets

This is the most important rule about outputs: the name field does not create a new asset. It points to one that already exists.

input:
- name: raw_orders
format: csv
path: "data/orders.csv"
options:
header: true

transformation:
- name: clean_orders
select:
from: raw_orders
columns: [order_id, customer_id, amount]

output:
- name: clean_orders # <-- references the transformation above
format: parquet
mode: overwrite
path: "data/output/clean_orders/"

Other transformations cannot reference an output by name. Outputs are terminal nodes in the DAG.

Note: Multiple outputs can reference the same asset. Each writes independently, so you can write the same dataset to different formats or locations.

Write Modes

The mode field controls what happens when the destination already contains data.

ModeBehavior
errorFail if the destination already exists. This is the default.
overwriteReplace any existing data at the destination.
appendAdd new data to the existing destination.
ignoreDo nothing if the destination already exists. The write is silently skipped.

Choose the mode that matches your pipeline's semantics:

# Overwrite a daily snapshot
output:
- name: daily_summary
format: parquet
mode: overwrite
path: "data/output/daily_summary/"

# Append to an event log
output:
- name: processed_events
format: parquet
mode: append
path: "data/output/event_log/"

# Write only if the destination is empty
output:
- name: initial_load
format: csv
mode: ignore
path: "data/output/bootstrap.csv"
options:
header: true

Note: If you omit mode, it defaults to "error". This is the safest default — it prevents accidental overwrites.

Metadata Fields

Outputs support two metadata fields that inputs do not have: freshness and maturity. These document the operational expectations for the output dataset.

Freshness

The freshness field declares how often this output is expected to be updated, using ISO 8601 duration syntax.

output:
- name: daily_report
format: parquet
mode: overwrite
path: "data/output/daily_report/"
freshness: "PT24H" # expected every 24 hours

Common duration values:

DurationMeaning
PT1HEvery hour
PT24HEvery 24 hours (daily)
P7DEvery 7 days (weekly)

Maturity

The maturity field indicates the reliability and stability of the output data. Valid values are:

ValueMeaning
highProduction-grade, well-tested, stable schema.
mediumFunctional but may change.
lowExperimental or under development.
deprecatedScheduled for removal. Consumers should migrate away.
output:
- name: fact_sales
format: delta
mode: overwrite
path: "s3://lake/gold/fact_sales"
maturity: "high"
freshness: "PT24H"
description: "Aggregated daily sales facts for BI dashboards."
tags: ["gold", "sales", "production"]

A Complete Example

Putting it all together — a pipeline that reads two sources, transforms them, and writes two outputs:

version: "2.0"

input:
- name: orders
format: csv
path: "data/orders.csv"
options:
header: true

- name: products
format: parquet
path: "data/products.parquet"

transformation:
- name: enriched_orders
join:
left: orders
right: products
on: "orders.product_id = products.id"
type: left

output:
- name: enriched_orders
format: parquet
mode: overwrite
path: "data/output/enriched_orders/"
description: "Orders enriched with product details."
freshness: "PT24H"
maturity: "high"
tags: ["silver", "orders"]

- name: orders
format: csv
mode: append
path: "data/backup/raw_orders/"
options:
header: true
description: "Raw backup of incoming orders."
maturity: "low"

Semantics

  • Writing an empty dataset (zero rows) is valid. The behavior depends on the format — Parquet writes an empty file with schema, CSV with header: true writes a header-only file.
  • If an output references a name that does not exist as an input or transformation, the pipeline fails with error E-REF-001.
  • Outputs are atomic: if the pipeline fails, partial results are not written.