Skip to main content

Backends Overview

Teckel ships with four backends, each suited for different deployment scenarios. All implement the same Backend trait, so the same pipeline YAML runs on any of them.

Architecture

Backend Comparison

FeatureDataFusionPolarsSpark ConnectRemote
Execution modelIn-processIn-processDistributedDelegated (gRPC)
DataFrame typeArrow-nativePolars-nativeSpark handleOpaque handle
Best forDev/test, single-machineSmall-medium localProduction clustersMulti-worker
SQL supportFull (via SessionContext)Via SQLContextFull (via SparkSession)Depends on worker
Lazy evaluationYes (logical plans)Yes (LazyFrame)Yes (unresolved plans)N/A
Install overheadNone (pure Rust)None (pure Rust)Requires Spark clusterRequires teckel-worker

Transform Support Matrix

All 45 Teckel v3.0 transforms are supported. The table below highlights where backends diverge:

TransformDataFusionPolarsSpark Connect
SelectNative APILazy + exprdf.select()
WhereNative APILazy + exprdf.filter()
GroupByNative APILazy + aggSQL via temp view
OrderByNative sortLazy sortdf.sort()
JoinFilter-basedSQL (SQLContext)SQL via temp views
UnionNative APIconcat()df.union() / df.union_all()
IntersectNative APISQLdf.intersect()
ExceptNative APISQLdf.except_all()
DistinctNative APIunique_stable()df.distinct()
LimitNative APIhead()df.limit()
AddColumnswith_column()Lazy with_column()df.with_columns()
DropColumnsdrop_columns()drop_many()df.drop()
RenameColumnswith_column_renamed()rename()with_columns_renamed()
CastColumnsSQL CASTLazy cast()SQL CAST
WindowSQL window functionsSQL (SQLContext)SQL window functions
PivotConditional aggregationSQL (SQLContext)Spark PIVOT SQL
UnpivotSQL UNION ALLSQL UNION ALLdf.unpivot() native
FlattenSQL + unnestN/ASQL struct access
SampleRandom filtersample_n_literal()df.sample() native
ConditionalCASE WHEN exprSQL CASE WHENSQL CASE WHEN
SqlFull SQLSQLContextFull SQL
RollupSQL GROUP BY ROLLUPSQLSQL GROUP BY ROLLUP
CubeSQL GROUP BY CUBESQLSQL GROUP BY CUBE
Scd2Complex SQLComplex SQLComplex SQL
RepartitionNo-op (single machine)No-opdf.repartition()
CoalesceNo-op (single machine)No-opdf.coalesce()
MergeNot supportedNot supportedSpark MERGE INTO
ParseSQL-basedNot supportedSQL-based
note

Merge is only supported on the Spark Connect backend because it requires Delta Lake or similar table formats that support MERGE INTO semantics. DataFusion and Polars operate on file-based DataFrames where in-place mutation is not possible.

Choosing a Backend

  • Development and testing: Use DataFusion (default). Zero setup, fast compilation, good SQL support.
  • Small datasets with complex transforms: Use Polars for its efficient lazy evaluation and memory management.
  • Production at scale: Use Spark Connect to execute on an existing Spark cluster.
  • Multi-worker deployment: Use Remote to distribute work across teckel-worker instances.