Data Engineering Intermediate ⏱ 4 hours

Data Pipeline Agent Blueprint

Processes, transforms, and validates data pipelines with a complete audit trail for every data access and transformation.

What This Blueprint Does

The Data Pipeline Agent manages ETL workflows: extracting data from source systems, transforming it, validating output, and loading to target stores. Every data access — every read, transform, write — is logged through Sentrely.

This matters for compliance. When an auditor asks “who accessed customer records on March 15th and what happened to the data,” you have a timestamped, attributable answer.

Architecture (Four Isolated Stages)

Extractor — Read-only from source S3 buckets or databases
Transformer — Operates on staging area; no access to source or target
Validator — Checks data quality, flags issues for human review
Loader — Write-only to processed prefix; no source access

The Gateway enforces isolation between stages. The extractor cannot write to target. The loader cannot read from source. This prevents data corruption and establishes a clear chain of custody.

Policy Configuration

project: acme-data
agent: data-pipeline

policies:
  # Extractor: read-only on raw data
  - aws:s3:GetObject on arn:aws:s3:::acme-data-lake/raw/*
  - aws:s3:ListBucket on arn:aws:s3:::acme-data-lake

  # Transformer: read/write staging only
  - aws:s3:GetObject on arn:aws:s3:::acme-data-lake/staging/*
  - aws:s3:PutObject on arn:aws:s3:::acme-data-lake/staging/*

  # Loader: write processed prefix only
  - aws:s3:PutObject on arn:aws:s3:::acme-data-lake/processed/*
  - aws:s3:GetObject on arn:aws:s3:::acme-data-lake/staging/*

  # Quality failures require human review
  - data:quality:override
    requires_approval: true
    approval_channel: slack:#data-quality

audit:
  include_row_counts: true
  include_schema_changes: true

notifications:
  on_complete: slack:#data-pipelines
  on_quality_issue: slack:#data-quality

Data Quality Gates

The validator checks every batch:

Schema conformance — Columns match expected schema, types are correct
Completeness — Required fields populated, null rates within bounds
Referential integrity — Foreign keys resolve, no orphan records
Anomaly detection — Volume and field distributions within normal bounds
Business rules — Custom checks specific to your domain

On failure, the pipeline pauses and posts a quality report to Slack with approve/reject/investigate options.

Compliance Value

Every operation logged with: timestamp, agent identity, operation, specific files accessed, row counts, schema version, and outcome. Satisfies SOC 2, HIPAA, and GDPR data access requirements.

// get-started

Deploy this blueprint

Get this agent running in 4 hours with Sentrely's managed control plane.

Get Early Access More Blueprints