Skip to main content

Architecture Overview

The Teckel Engine is organized into a layered architecture that separates pipeline orchestration from execution. This design enables pluggable backends: the same pipeline definition can run on DataFusion locally or on a Spark cluster remotely.

Architecture

Design Principles

1. Backend Trait for Pluggable Execution

The core abstraction is the Backend trait in teckel-engine. It defines three operations -- read_input, write_output, and apply -- with an associated DataFrame type that is opaque to the engine. Each backend chooses its own representation:

  • DataFusion: datafusion::DataFrame (Arrow-native, in-process)
  • Polars: polars::DataFrame (lazy evaluation, in-process)
  • Spark Connect: spark_connect_rs::DataFrame (remote, distributed)
  • Remote: RemoteRef (opaque handle ID on a gRPC worker)

2. Engine-Agnostic DAG Execution

The PipelineExecutor in teckel-engine walks the DAG without knowing which backend is in use. It receives a Backend as a generic parameter and delegates all data operations to it. This means adding a new backend requires only implementing the trait -- no changes to the executor.

3. Wave-Parallel Scheduling

The PipelineDag computes execution waves: groups of assets that have no mutual dependencies. Assets within a wave are executed concurrently using tokio::spawn, while waves are processed sequentially. This achieves maximal parallelism while respecting data dependencies.

4. Shared DataFrame Cache

Computed DataFrames are cached in an Arc<RwLock<BTreeMap<String, B::DataFrame>>> so that shared dependencies are only computed once. When asset C depends on both A and B, and A and B are in the same wave, C will find both results in the cache when its wave runs.

Module Layout

teckel-engine

FilePurpose
backend.rsBackend trait with read_input, write_output, apply
dag.rsPipelineDag using petgraph -- topological sort and wave scheduling
executor.rsPipelineExecutor<B> -- wave-parallel DAG walker
dry_run.rsexplain() -- generates human-readable plan without executing

teckel-api

High-level entry points that wire together the parser and engine:

  • etl(yaml, variables) -- parse and execute with default DataFusion backend
  • etl_with(yaml, variables, backend) -- parse and execute with any backend
  • explain(yaml, variables) -- parse and generate execution plan
  • validate(yaml, variables) -- parse only, check for errors