Skip to main content
Version: Next

Check Types

Teckel defines 10 check types covering different quality dimensions. Each check has a type field that determines how validation is performed.


Schema

Validates the structural shape of the dataset: which columns must exist, which must not, and expected data types.

- type: schema
columns:
required: [id, name, email]
forbidden: [ssn, credit_card]
types:
id: integer
amount: double
FieldTypeRequiredDescription
columns.requiredList[string]NoColumns that MUST be present in the dataset.
columns.forbiddenList[string]NoColumns that MUST NOT be present. Supports glob patterns (e.g., pii_*).
typesMap[string, string]NoExpected data types per column.

Completeness

Validates that required values are present (non-null) in a column.

- type: completeness
column: email
threshold: 0.95
FieldTypeRequiredDefaultDescription
columnColumnYesColumn to check.
thresholddoubleNo1.0Minimum fraction of non-null values (0.0 to 1.0).

A threshold of 1.0 means all values must be non-null. A threshold of 0.95 allows up to 5% nulls.


Uniqueness

Validates that values are unique across the dataset. Supports composite keys.

- type: uniqueness
columns: [customer_id]
# Composite uniqueness
- type: uniqueness
columns: [order_id, line_item_id]
threshold: 1.0
FieldTypeRequiredDefaultDescription
columnsNonEmptyList[Column]YesColumns forming the uniqueness key.
thresholddoubleNo1.0Minimum fraction of unique values.

Validity

Validates that values conform to expected formats, ranges, or sets. This is the most versatile check type, supporting five validation modes.

Accepted Values

Restrict a column to a fixed set of allowed values.

- type: validity
column: status
acceptedValues: [active, inactive, pending]

Range

Enforce numeric bounds. Use strictMin and strictMax for exclusive bounds.

- type: validity
column: amount
range:
min: 0
max: 1000000
strictMin: true # > 0, not >= 0

RangeSpec fields:

FieldTypeDefaultDescription
minnumberMinimum value (inclusive by default).
maxnumberMaximum value (inclusive by default).
strictMinbooleanfalseUse exclusive lower bound (> instead of >=).
strictMaxbooleanfalseUse exclusive upper bound (< instead of <=).

Pattern

Validate values against a regular expression.

- type: validity
column: email
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

Format

Validate values against a built-in format validator.

- type: validity
column: phone
format: phone

Length

Enforce string length bounds (inclusive).

- type: validity
column: code
lengthBetween: [3, 10]

Validity Fields Summary

FieldTypeRequiredDefaultDescription
columnColumnYesColumn to validate.
acceptedValuesList[string]NoAllowed values (enumeration).
rangeRangeSpecNoNumeric range constraint.
patternstringNoRegular expression pattern.
formatstringNoBuilt-in format name.
lengthBetween[integer, integer]No[min, max] string length (inclusive).
thresholddoubleNo1.0Minimum fraction of valid values.

Built-in Format Validators

Implementations MUST support these format validators:

FormatDescription
emailRFC 5322 email address.
uuidUUID (any version).
urlValid URL with scheme.
ipv4IPv4 address.
ipv6IPv6 address.
dateISO 8601 date (YYYY-MM-DD).
timestampISO 8601 timestamp.
phoneInternational phone number format.

Statistical

Validates statistical properties of numeric columns: mean, standard deviation, min, max, sum, and quantiles.

- type: statistical
column: salary
mean:
between: [40000, 120000]
stdev:
max: 50000
quantiles:
0.25: { min: 30000 }
0.50: { between: [45000, 80000] }
0.75: { max: 150000 }
FieldTypeRequiredDescription
columnColumnYesNumeric column to analyze.
meanBoundSpecNoExpected mean bounds.
minBoundSpecNoExpected minimum value bounds.
maxBoundSpecNoExpected maximum value bounds.
sumBoundSpecNoExpected sum bounds.
stdevBoundSpecNoExpected standard deviation bounds.
quantilesMap[double, BoundSpec]NoQuantile bounds. Key is the percentile (0.0 to 1.0).

BoundSpec fields:

FieldTypeDescription
minnumberMinimum expected value.
maxnumberMaximum expected value.
between[number, number][min, max] range (inclusive).

Volume

Validates the size of the dataset by row count and/or column count.

- type: volume
rowCount:
between: [1000, 100000]
FieldTypeRequiredDescription
rowCountBoundSpecNoExpected row count bounds.
columnCountBoundSpecNoExpected column count bounds.

Freshness

Validates data recency by checking a timestamp column against the current time.

- type: freshness
column: updated_at
maxAge: "PT24H"
FieldTypeRequiredDescription
columnColumnYesTimestamp column to check.
maxAgestringYesISO 8601 duration. The maximum value in the column must be within this duration of the current time.

Common durations:

  • PT1H — 1 hour
  • PT24H — 24 hours
  • P1D — 1 day
  • P7D — 7 days

Referential

Validates referential integrity — that values in a column exist in another asset (foreign key relationship).

- type: referential
column: customer_id
reference:
asset: customers
column: id
threshold: 1.0
FieldTypeRequiredDefaultDescription
columnColumnYesColumn in the target asset.
reference.assetAssetRefYesThe referenced asset.
reference.columnColumnYesColumn in the referenced asset.
thresholddoubleNo1.0Minimum fraction of values that must exist in the reference.

Cross-Column

Validates relationships between columns within the same dataset.

- type: crossColumn
condition: "start_date <= end_date"
description: "Start date must not be after end date"
FieldTypeRequiredDefaultDescription
conditionConditionYesBoolean expression referencing columns in the target asset.
descriptionstringNoHuman-readable description.
thresholddoubleNo1.0Minimum fraction of rows satisfying the condition.

Custom

Validates using an arbitrary boolean expression evaluated per row.

- type: custom
condition: "amount > 0 AND currency IS NOT NULL"
description: "All transactions must have positive amount and currency"
FieldTypeRequiredDefaultDescription
conditionConditionYesBoolean expression evaluated per row.
descriptionstringNoHuman-readable description.
thresholddoubleNo1.0Minimum fraction of rows passing.

Check Result Model

Each check execution produces a result object with the following fields:

FieldTypeDescription
suitestringSuite name.
check_typestringCheck type identifier.
targetstringAsset name.
columnstring?Column name (if applicable).
statusstringOutcome: "pass", "warn", "fail", or "error".
severitystringConfigured severity level.
observed_valueanyComputed metric value.
expectedstringThreshold or assertion description.
failing_rowsinteger?Number of rows failing (for row-level checks).
failing_percentdouble?Percentage of rows failing.
descriptionstring?Check description.
timestampstringISO 8601 execution timestamp.

Comprehensive Example

The following suite combines multiple check types to validate an orders dataset end-to-end.

quality:
- suite: orders-full-validation
description: "End-to-end quality validation for the orders dataset"
target: orders
severity: error
checks:
# 1. Schema: enforce required columns and types
- type: schema
columns:
required: [order_id, customer_id, amount, status, created_at]
forbidden: [internal_id, debug_flag]
types:
order_id: string
amount: double
created_at: timestamp

# 2. Completeness: critical fields must not be null
- type: completeness
column: customer_id
threshold: 1.0

- type: completeness
column: email
threshold: 0.95
severity: warn

# 3. Uniqueness: order_id is the primary key
- type: uniqueness
columns: [order_id]

# 4. Validity: status must be from known values
- type: validity
column: status
acceptedValues: [pending, confirmed, shipped, delivered, cancelled]

# 5. Validity: amount must be positive
- type: validity
column: amount
range:
min: 0
strictMin: true

# 6. Validity: email format
- type: validity
column: email
format: email
threshold: 0.99
severity: warn

# 7. Statistical: amount distribution sanity
- type: statistical
column: amount
mean:
between: [50, 500]
quantiles:
0.99: { max: 10000 }

# 8. Volume: expected dataset size
- type: volume
rowCount:
between: [100, 10000000]

# 9. Freshness: data should be recent
- type: freshness
column: created_at
maxAge: "PT48H"

# 10. Referential: customer_id must exist in customers
- type: referential
column: customer_id
reference:
asset: customers
column: id

# 11. Cross-column: business logic
- type: crossColumn
condition: "shipped_at IS NULL OR shipped_at >= created_at"
description: "Shipping date must be after order creation"

# 12. Custom: composite business rule
- type: custom
condition: "amount > 0 AND currency IS NOT NULL"
description: "All transactions must have positive amount and currency"
threshold: 0.99
severity: warn