I build cloud-native data platforms: ingestion → warehouse → transformation → activation, with infrastructure as code, keyless CI/CD, and observability baked in from day one.
A four-part GCP data-engineering portfolio of escalating complexity — each a standalone, tested, CI-green, Terraform-deployed repo using only public/mock data.
| Project | What it shows | Level |
|---|---|---|
| SkyCast | Scheduled API → BigQuery → dbt ELT (Cloud Functions, Cloud Run, Workflows) | Beginner |
| PulseStream | Real-time event streaming — Pub/Sub adapter/mapper/redrive with dead-letter handling | Intermediate |
| TripLake | Cost-efficient BigQuery lakehouse over NYC taxi data — external tables + incremental dbt MERGE | Intermediate |
| OmniPipe | End-to-end governed platform — Datastream CDC, Cloud Workflows, PII governance, reverse-sync, multi-env Terraform | Advanced |
Together they cover scheduled, streaming, batch, and CDC ingestion; dbt transformation; data governance; reverse-ETL; reusable IaC + CI/CD; and observability.
Cloud: Google Cloud — BigQuery, Cloud Run, Cloud Functions, Pub/Sub, Cloud Workflows, Datastream, Cloud SQL, Cloud Storage, Secret Manager, Data Catalog Data: dbt, SQL, incremental models, partitioning/clustering, PII governance Languages: Python (ruff, pytest), SQL Platform: Terraform, GitHub Actions (Workload Identity Federation), Docker
