Version: Next

Changelog

v3.0 (2026-03-29)

Full alignment with Apache Spark Connect protocol (expressions.proto, types.proto, relations.proto).

Lambda expressions for higher-order functions: x -> x + 1, (acc, x) -> acc + x.
Star/wildcard expressions: *, asset.*.
Nested value access: struct.field, map['key'], array[0] — composable to arbitrary depth.
Inline window expressions: func() OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN ...).
TRY_CAST: returns NULL on failure instead of error.
String concatenation operator: ||.
Null-safe equality operator: <=>.
RLIKE for regex matching.
Named function arguments: func(key => value).
Typed literals: DATE '2025-01-01', TIMESTAMP '...', TIMESTAMP_NTZ '...', X'hex'.
Complex literals: ARRAY(...), MAP(...), STRUCT(...), NAMED_STRUCT(...).
INTERVAL literals: INTERVAL 30 DAYS, INTERVAL 1 YEAR 6 MONTHS.

Breaking Change

CAST now follows ANSI SQL — it raises an error on failure. Pipelines using CAST for lenient parsing should migrate to TRY_CAST.

Type	Description
`byte` / `tinyint`	8-bit signed integer
`short` / `smallint`	16-bit signed integer
`bigint`	Alias for `long`
`char(n)`	Fixed-length string
`varchar(n)`	Variable-length string with max length
`timestamp_ntz`	Timestamp without timezone
`time` / `time(p)`	Time of day
`interval year to month`	Year-month interval
`interval day to second`	Day-time interval
`variant`	Semi-structured data

Category	Count	Examples
Collection	24	`size`, `array_contains`, `explode`, `map_keys`, `array_sort`
Higher-order	9	`transform`, `filter`, `aggregate`, `exists`, `forall`
Struct	4	`struct`, `named_struct`, `with_field`, `drop_fields`
JSON	6	`from_json`, `to_json`, `get_json_object`, `parse_json`
Interval / temporal	13	`make_interval`, `extract`, `date_trunc`, `current_time`
Statistical	15	`stddev`, `corr`, `percentile`, `skewness`, `grouping`
Variant	6	`variant_get`, `try_variant_get`, `is_variant_null`

union / intersect / except: byName (match columns by name) and allowMissingColumns (fill missing with NULL).
sample: lowerBound / upperBound (range-based sampling) and deterministicOrder.

Initial formal specification. Highlights:

RFC 2119 requirement levels, EBNF grammar, operator precedence.
31 transformation types covering core SQL operations, reshaping, quality, and custom components.
SQL-like expression language with CASE, CAST, IS NULL, IN, BETWEEN, LIKE.
SQL three-valued null semantics.
Data quality suites with 10 check types and severity escalation.
Pipeline metadata, asset metadata, column-level metadata.
Exposures for downstream consumer declarations.
Variables (${VAR}), secrets ({{secrets.alias}}), config merging.
Streaming inputs/outputs with trigger and watermark support.
Lifecycle hooks (pre/post execution).
Error catalog with 30+ standardized error codes.
x- extension mechanism for forward compatibility.
JSON Schema (draft 2020-12) for document validation.