Senior AI Engineer building cost-aware AI evaluation tools, production LLM/RAG systems, and privacy-first data products.
I work across model development, evaluation, backend services, MLOps/LLMOps, Kubernetes deployment, monitoring, and business-impact optimization. My recent professional work includes ESA and EUMETSAT-related AI initiatives for satellite operations, multi-agent LLM workflows, synthetic QA generation for RAG evaluation, MLflow-based monitoring, and on-prem Kubernetes deployments with GitOps.
My current public work is focused on useful open-source and product systems: Pangolin Eval for LLM/RAG/agent evaluation, and Health Passport for privacy-first wearable health data continuity.
- Cost-aware LLM, RAG, and agent evaluation
- AI quality, latency, reliability, and cost gates
- Production LLM systems, observability, and LLMOps
- Privacy-first product engineering for sensitive data
- Open-source tools for AI product teams, startup CTOs, and senior engineers
- Built and published pangolin-eval, an open-source Python CLI/library for LLM, RAG, and agent workload evaluation with cost, latency, quality, reliability, gates, RAG diagnostics, TraceCards, and OTel-style exports.
- Building Health Passport, a privacy-first iOS product for importing Fitbit/Google wearable data, preserving it locally, and writing supported records back to Apple Health with user permission.
- Improved synthetic QA generation quality from about 65% to about 80% by redesigning OCR integration, chunking, and LLM prompting workflows.
- Reduced token usage and latency by restructuring multi-step LLM generation pipelines and context management.
- Contributed to ESA and EUMETSAT AI initiatives across satellite health forecasting, telemetry anomaly detection, AI validation, and mission operations support.
- Reduced monthly cloud expenditure by about 35% / $32K+ in a previous data science role through cloud and model infrastructure optimization.
Open-source toolkit for measuring and comparing LLM, RAG, and agent workloads across cost, latency, quality, and reliability.
Current scope:
- prompt/model comparison reports
- weighted evaluators and configurable token counters
- budget, quality, latency, and reliability gates
- synthetic RAG evaluation with context diagnostics
- local agent/workflow TraceCards
- Markdown, JSON, static HTML, and OTel-style exports
- OpenAI-compatible, LiteLLM, Ollama, and vLLM gateway examples
Privacy-first continuity layer for wearable health data. The iOS-first product imports supported Fitbit/Google wearable data, stores normalized records locally, and writes clean supported samples back to Apple Health with explicit permission.
Current scope:
- native SwiftUI iOS app with HealthKit integration
- shared TypeScript normalization, dedupe, and receipt rules
- local-first vault and sync receipt model
- backend skeleton for Pro accounts, encrypted backup, and AI relay boundaries
- public architecture and Xcode setup documentation
Product-style macOS maintenance CLI with dry-run-first safety, local memory, rules, profiles, hooks, and scriptable output.
RAG and LLM application experiments, including a PDF chat application using LangChain, FAISS, and OpenAI embeddings.
Python, FastAPI, LangChain, LlamaIndex, MLflow, Kubernetes, Docker, GitOps, Flux, Airflow, OpenTelemetry, AWS, GCP, PostgreSQL, MongoDB, Weaviate, PyTorch, TensorFlow, scikit-learn, Spark, Swift, SwiftUI, HealthKit, TypeScript, Node.js.
- Website: aidenerdogan.github.io
- LinkedIn: linkedin.com/in/aiden-erdogan
- GitHub: github.com/aidenerdogan

