Skip to main content
Version: Next

Column Operations

This section covers transformations that add, remove, rename, or change the type of columns in a dataset.


Add Columns (8.11)

Adds one or more computed columns to a dataset.

Schema:

FieldTypeRequiredDescription
fromAssetRefYesSource asset.
columnsNonEmptyList[ColumnDef]YesColumns to add.

ColumnDef object:

FieldTypeRequiredDescription
namestringYesName of the new column.
expressionExpressionYesExpression to compute the column value.

Example — derived columns:

transformation:
- name: withDerived
addColumns:
from: orders
columns:
- name: total
expression: "quantity * unit_price"
- name: processed_at
expression: "current_timestamp()"
- name: category
expression: "case when amount > 1000 then 'high' else 'low' end"

Key behaviors:

  • If a column with the same name already exists, it is replaced (overwritten).
  • Expressions can reference existing columns and previously added columns in the same addColumns list (evaluated in order).

Tip: Use addColumns to build derived fields step by step. Since columns are evaluated in order, later columns can reference earlier ones defined in the same list.


Drop Columns (8.12)

Removes columns from a dataset.

Schema:

FieldTypeRequiredDescription
fromAssetRefYesSource asset.
columnsNonEmptyList[Column]YesColumns to remove.

Example — remove sensitive fields before output:

transformation:
- name: sanitized
dropColumns:
from: customers
columns:
- ssn
- credit_card_number
- date_of_birth

Constraints:

  • If a named column does not exist, the runtime raises E-COL-001.
  • Dropping all columns is an error (E-SCHEMA-002) — at least one column must remain.

Rename Columns (8.13)

Renames columns using a mapping of old names to new names.

Schema:

FieldTypeRequiredDescription
fromAssetRefYesSource asset.
mappingsMap[Column, string]YesOld name to new name mapping. At least one entry.

Example — standardize column names:

transformation:
- name: renamed
renameColumns:
from: rawData
mappings:
first_name: firstName
last_name: lastName
e_mail: email
phone_number: phone

Constraints:

  • If an old name does not exist, the runtime raises E-COL-001.
  • If a new name collides with an existing (unrenamed) column, the runtime raises E-NAME-003.

Tip: Rename is useful after joins to resolve qualified column names like employees.name into cleaner names.


Cast Columns (8.14)

Changes the data type of one or more columns.

Schema:

FieldTypeRequiredDescription
fromAssetRefYesSource asset.
columnsNonEmptyList[CastDef]YesCast definitions.

CastDef object:

FieldTypeRequiredDescription
nameColumnYesColumn to cast.
targetTypestringYesTarget data type (see Data Types section).

Example — cast string columns to proper types:

transformation:
- name: typed
castColumns:
from: csvImport
columns:
- name: age
targetType: integer
- name: salary
targetType: double
- name: hire_date
targetType: date
- name: is_active
targetType: boolean

Key behaviors:

  • If a value cannot be cast (e.g., "abc" to integer), the value becomes NULL. The pipeline does not fail for individual cast failures.
  • Type names are case-insensitive: Integer, INTEGER, and integer are all equivalent.

Tip: To enforce strict casting where invalid values should fail the pipeline, follow castColumns with an assertion transformation that checks for unexpected NULLs.