Skip to main content
Version: Next

Changelog

v3.0 (2026-03-29)

Full alignment with Apache Spark Connect protocol (expressions.proto, types.proto, relations.proto).

Expression Language

  • Lambda expressions for higher-order functions: x -> x + 1, (acc, x) -> acc + x.
  • Star/wildcard expressions: *, asset.*.
  • Nested value access: struct.field, map['key'], array[0] — composable to arbitrary depth.
  • Inline window expressions: func() OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN ...).
  • TRY_CAST: returns NULL on failure instead of error.
  • String concatenation operator: ||.
  • Null-safe equality operator: <=>.
  • RLIKE for regex matching.
  • Named function arguments: func(key => value).
  • Typed literals: DATE '2025-01-01', TIMESTAMP '...', TIMESTAMP_NTZ '...', X'hex'.
  • Complex literals: ARRAY(...), MAP(...), STRUCT(...), NAMED_STRUCT(...).
  • INTERVAL literals: INTERVAL 30 DAYS, INTERVAL 1 YEAR 6 MONTHS.
Breaking Change

CAST now follows ANSI SQL — it raises an error on failure. Pipelines using CAST for lenient parsing should migrate to TRY_CAST.

Data Types

TypeDescription
byte / tinyint8-bit signed integer
short / smallint16-bit signed integer
bigintAlias for long
char(n)Fixed-length string
varchar(n)Variable-length string with max length
timestamp_ntzTimestamp without timezone
time / time(p)Time of day
interval year to monthYear-month interval
interval day to secondDay-time interval
variantSemi-structured data

Core Functions (7 new categories, 90+ functions)

CategoryCountExamples
Collection24size, array_contains, explode, map_keys, array_sort
Higher-order9transform, filter, aggregate, exists, forall
Struct4struct, named_struct, with_field, drop_fields
JSON6from_json, to_json, get_json_object, parse_json
Interval / temporal13make_interval, extract, date_trunc, current_time
Statistical15stddev, corr, percentile, skewness, grouping
Variant6variant_get, try_variant_get, is_variant_null

New Transformations (14)

TransformSectionDescription
offset8.32Skip first N rows
tail8.33Return last N rows
fillNa8.34Replace NULL values
dropNa8.35Drop rows with NULLs
replace8.36Replace specific values
merge8.37MERGE INTO / upsert
parse8.38Parse JSON/CSV columns
asOfJoin8.39Temporal join
lateralJoin8.40Correlated join
transpose8.41Swap rows and columns
groupingSets8.42Arbitrary grouping sets
describe8.43Descriptive statistics
crosstab8.44Frequency cross-tabulation
hint8.45Optimizer hints

Enhanced Existing Transformations

  • union / intersect / except: byName (match columns by name) and allowMissingColumns (fill missing with NULL).
  • sample: lowerBound / upperBound (range-based sampling) and deterministicOrder.

v2.0 (2026-03-27)

Initial formal specification. Highlights:

  • RFC 2119 requirement levels, EBNF grammar, operator precedence.
  • 31 transformation types covering core SQL operations, reshaping, quality, and custom components.
  • SQL-like expression language with CASE, CAST, IS NULL, IN, BETWEEN, LIKE.
  • SQL three-valued null semantics.
  • Data quality suites with 10 check types and severity escalation.
  • Pipeline metadata, asset metadata, column-level metadata.
  • Exposures for downstream consumer declarations.
  • Variables (${VAR}), secrets ({{secrets.alias}}), config merging.
  • Streaming inputs/outputs with trigger and watermark support.
  • Lifecycle hooks (pre/post execution).
  • Error catalog with 30+ standardized error codes.
  • x- extension mechanism for forward compatibility.
  • JSON Schema (draft 2020-12) for document validation.