Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.

Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at a private GitHub Security Advisory on this repository, or the maintainers (@d3v07, @Frex22). All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series of actions.

**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].

Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC].

For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations].

[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

41 changes: 41 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Contributing

Thanks for your interest in the Evidence-Driven DBRE Agent.

## Development setup

```bash
uv sync --dev
cp .env.example .env # fill MongoDB + (prod) Vertex values
cd dashboard && npm install
```

## Workflow

- **Every change starts as an issue**, then a branch (`feat/…`, `fix/…`, `docs/…`), then a PR.
No direct commits to `main`.
- **Tests must pass before a PR merges**, and code review is required on every PR.
- Commits reference issues where applicable: `feat: description (Closes #N)`.

## Checks (run before opening a PR)

```bash
uv run ruff format --check . && uv run ruff check . # Python format + lint
uv run pytest -q # unit + contract (live tests auto-skip with no DB)
cd dashboard && npm run lint && npm run build # dashboard typecheck + build
```

## Conventions

- Python: PEP 8, type annotations, `ruff` (line length 100). TypeScript: immutable updates,
complete effect dependency arrays.
- **`EvidencePack` v1 is frozen** (`contracts/`) — additive-only, and only with explicit review.
- **Agents stay read-only.** Mutation happens only in the deterministic controller, after a
hash-bound human approval.
- **No secrets in code** — `.env` locally, Secret Manager in production. Never commit `.env` or keys.

## Tests

- Unit + contract tests run offline. Live integration tests are gated on a MongoDB connection
string and skip without one.
- New behavior needs a test; security-critical code (auth) is held to 100% coverage.
200 changes: 157 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,107 @@
# Evidence-Driven DBRE Agent

Two personas, one MongoDB performance loop:

- **Users** run real query workloads against a live Atlas collection from a guided console. Each
query's real `explain` evidence is captured and attributed to whoever ran it.
- A **DBRE** triages the *actual* slowest captured queries — ranked by explain evidence (blocking
sort, collection scan, over-scan ratio), not wall-clock — diagnoses one, and approves an
ESR-correct index fix. The controller applies it behind a hash-bound human gate, then verifies it.
> A MongoDB reliability agent that watches real query workloads, catches the slow ones with
> hard `explain` evidence, and won't touch an index until a human approves the exact hash it
> reviewed — then proves the fix worked.

![Python](https://img.shields.io/badge/Python-3.12-3776AB?logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-read%20API-009688?logo=fastapi&logoColor=white)
![Next.js](https://img.shields.io/badge/Next.js-console-000000?logo=nextdotjs&logoColor=white)
![Gemini](https://img.shields.io/badge/Gemini-Vertex%20AI-4285F4?logo=googlecloud&logoColor=white)
![Agent Engine](https://img.shields.io/badge/Vertex%20AI-Agent%20Engine-4285F4?logo=googlecloud&logoColor=white)
![MongoDB](https://img.shields.io/badge/MongoDB-Atlas-47A248?logo=mongodb&logoColor=white)
![MCP](https://img.shields.io/badge/MCP-mongodb--mcp--server-FF6D00)
![Cloud Run](https://img.shields.io/badge/Cloud%20Run-deploy-4285F4?logo=googlecloud&logoColor=white)
![License](https://img.shields.io/badge/license-Apache--2.0-D22128)

Two personas, one performance loop: **users** run real queries against a live collection; a
**DBRE** triages the *actual* slowest ones, and a Gemini-powered agent proposes an ESR-correct
index that a deterministic controller applies and verifies — behind a hash-bound human gate.

## Table of contents

- [The idea](#the-idea)
- [What makes it different](#what-makes-it-different)
- [The two-persona flow](#the-two-persona-flow)
- [Architecture](#architecture)
- [The remediation lifecycle](#the-remediation-lifecycle)
- [Partner integration — MongoDB via MCP](#partner-integration--mongodb-via-mcp)
- [Safety model](#safety-model)
- [Tech stack](#tech-stack)
- [Repo layout](#repo-layout)
- [Getting started](#getting-started)
- [The demo walk](#the-demo-walk)
- [License](#license)

---

## The idea

Slow MongoDB queries are usually a missing or mis-ordered index. The fix is risky (a wrong index
makes it worse), the diagnosis is fiddly (Equality → Sort → Range key order), and the audit trail
is thin (who approved what, and did it actually help?). This project turns that into a safe,
evidence-driven loop: **agents recommend, deterministic code decides, humans approve, and
verification proves.**

## What makes it different

| Most "AI DBA" demos | This |
|---|---|
| One canned query, forever | Real user workload, captured and attributed per user |
| The LLM applies the change | The LLM only *recommends*; deterministic code decides and mutates |
| "Trust the model" | A **hash-bound human approval** is required before any index is created |
| "It looks faster" | A re-`explain` proves the blocking SORT is gone and docs-examined dropped |
| A chat box | A multi-step task: detect → diagnose → approve → apply → verify |

## The two-persona flow

```mermaid
flowchart LR
U["User (Dev / Aakash)"] -->|guided query| C[Workload Console]
C -->|"explain (read-only)"| API[(Read API · Cloud Run)]
API -->|capture, attributed| QL[(query_log)]
D[DBRE] -->|triage| Q["Slow-Query Queue<br/>ranked by evidence"]
Q -->|Diagnose| AE["Agent Engine roles<br/>+ deterministic ESR"]
AE --> PACK[DIAGNOSED EvidencePack]
PACK -->|hash-bound Approve| APV[apply index + re-explain]
APV --> V[VERIFIED]
```

There is no hardcoded demo query: the queries the DBRE fixes are the ones users really ran.
A user runs a guided, read-only query; its real `explain` evidence is captured to `query_log`,
attributed to them. The DBRE sees the slowest captured queries — ranked by **evidence** (blocking
sort, collection scan, docs-examined-to-returned ratio), not wall-clock — picks one, and drives it
through diagnosis to a verified fix.

## Flow
## Architecture

```text
USER ─ login ─> Workload Console ─ guided query ─> read API ─ explain ─> Atlas
└─ capture (attributed) ─> query_log
DBRE ─ login ─> Slow-Query Queue (ranked by evidence)
└─ Diagnose ─> deterministic ESR diagnosis ─> DIAGNOSED EvidencePack
└─ hash-bound Approve ─> apply index + re-explain ─> VERIFIED
```mermaid
flowchart TD
B["Browser · Next.js console<br/>(role-based login)"] --> CR[Read API · FastAPI on Cloud Run]
CR --> AE["3 read-only Vertex AI<br/>Agent Engine roles · Gemini"]
CR --> DC["Deterministic controller<br/>validate · apply · verify"]
AE -. "read-only tools" .-> M[(MongoDB Atlas)]
DC --> M
DC --> L["Evidence ledger<br/>query_log · evidence_packs · approvals · …"]
```

## Architecture

- **Dashboard** — Next.js (App Router). Seeded role-based login backed by an httpOnly session
cookie; the user persona is confined to the workload console, the DBRE to the triage + review
planes. The read API is the security authority — it re-verifies the session bearer on every data
call, and the approver identity always comes from the verified session, never the browser.
- **Read API** — FastAPI on Cloud Run. Guided, validated, read-only workload queries; evidence
capture; the evidence-ranked queue; and the DIAGNOSE → (human APPROVE) → VERIFY remediation flow.
Index mutation happens only after a matching hash-bound approval.
- **Diagnosis** — a pure, deterministic ESR analyzer derives the correct index key order
(Equality → Sort → Range) from each query's own structure. In production three read-only Vertex AI
Agent Engine roles narrate the diagnosis; locally the controller runs deterministically.
- **State** — MongoDB Atlas. `dbre_state` holds `users`, `query_log`, `evidence_packs`, and the
internal ledger collections; the demo workload runs against `sample_supplies.sales_agent_demo`.

The dashboard reads only `EvidencePack` v1 JSON; that contract is frozen in `contracts/`.
- **Dashboard** — Next.js (App Router). Seeded role-based login on an httpOnly session cookie; the
user persona is confined to the workload console, the DBRE to the triage + review planes. The
read API re-verifies the session bearer on every call and derives the approver from it.
- **Read API** — FastAPI on Cloud Run. Validated read-only workload queries, evidence capture, the
evidence-ranked queue, and the DIAGNOSE → (human APPROVE) → VERIFY flow.
- **Diagnosis** — a pure, deterministic ESR analyzer derives the correct index key order from each
query's own equality/sort/range structure. In production, three read-only Vertex AI Agent Engine
roles (Gemini) narrate it; locally the controller runs deterministically.
- **State** — MongoDB Atlas: `users`, `query_log`, `evidence_packs`, and the internal ledger
collections. The dashboard reads only `EvidencePack` v1 JSON, frozen in `contracts/`.

## The remediation lifecycle

`DIAGNOSE` (read-only) → `APPROVE` (human, hash-bound) → `VERIFY`. A pack is marked **VERIFIED**
only if a strict three-check rail passes on the re-`explain`: the blocking **SORT is removed**,
the **recommended index is the one used**, and **at least one metric improves**. Otherwise it
stays `APPROVED` (applied but not proven). The `evidence_hash` binds the before-evidence to the
recommendation, so an approval can only apply the exact fix the human saw.

## Partner integration — MongoDB via MCP

Expand All @@ -49,11 +115,48 @@ uv run --with python-dotenv python -m agents.run

The MCP wiring lives in `agents/agent.py` (`build_mcp_toolset`, ADK `MCPToolset`) and
`agents/run.py` (raw stdio JSON-RPC). In the managed Vertex AI Agent Engine runtime the same
read-only operations run as native Python tools (`agents/native_mongo_tools.py`), because MCP's
read-only operations run as native Python tools (`agents/native_mongo_tools.py`) because MCP's
stdio transport can't run inside that sandbox — so the MCP integration is demonstrated locally /
controller-side and the production agent uses the native equivalents.

## Quickstart
## Safety model

| Guarantee | How |
|---|---|
| Agents are read-only | Tool allowlist; only the deterministic controller mutates |
| Mutation is gated | A one-time, hash-bound human approval ticket is required to apply |
| Approver is authoritative | Derived from the verified DBRE session — never the browser |
| Queries are safe | Guided builder, allowlist-validated, capped limit + `maxTimeMS`, no raw filters |
| No secrets in code | `.env` (local) / Secret Manager (prod); nothing committed |
| Everything is audited | Each phase writes to the ledger collections |

## Tech stack

| Layer | Technology |
|---|---|
| Agent | Gemini · Vertex AI Agent Engine (3 read-only roles) |
| Partner tool | MongoDB MCP server (`mongodb-mcp-server`) |
| Backend | FastAPI · Python 3.12 · pymongo |
| Frontend | Next.js (App Router) · TypeScript |
| Data | MongoDB Atlas |
| Hosting | Google Cloud Run · Secret Manager · Cloud Build |
| Contract | EvidencePack v1 (frozen JSON schema) |

## Repo layout

```text
.
├── api/ # FastAPI read API: routes, auth, workload, Agent Engine wiring
├── controller/ # deterministic core: ESR diagnosis, orchestrator, EvidencePack, ledger
├── agents/ # Vertex AI Agent Engine roles + MongoDB MCP integration
├── dashboard/ # Next.js operator console (login, workload, DBRE queue, run review)
├── seed/ # demo data + workload-baseline + account seeding
├── contracts/ # frozen EvidencePack v1 JSON schema
├── deploy/ # Cloud Run deploy script + runbook
└── tests/ # unit, contract, and live integration tests
```

## Getting started

```bash
uv sync --dev
Expand All @@ -62,9 +165,9 @@ cp .env.example .env # fill MongoDB + (prod) Vertex values; set SESSION_SECR
# one-time data + accounts (against your Atlas cluster)
uv run python seed/seed_demo_fixture.py seed # 300k demo docs
uv run python seed/seed_workload.py verify # baseline indexes + prove the trap presets
uv run python seed/seed_users.py # Dev Trivedi, Aakash Singh, DBRE — prints passwords once
uv run python seed/seed_users.py # seed the demo accounts — prints passwords once

uv run pytest -q # unit + contract (live integration auto-skips with no conn)
uv run pytest -q # unit + contract (live integration auto-skips)
```

Run the full stack locally (deterministic controller, no Vertex needed):
Expand All @@ -79,15 +182,26 @@ cd dashboard && npm install && \
```

Re-run `seed/seed_workload.py reset` between demos — an approved fix removes the trap for a whole
store/method class.

## Safety

- Agents and tools are read-only; only the deterministic controller mutates, and only after a
matching hash-bound approval.
- The approver identity comes from the verified DBRE session — never from the browser.
- Secrets live in `.env` (local) / Secret Manager (prod); none are committed.
store/method class. Cloud Run deploy: `deploy/cloudrun.md` + `dashboard/DEPLOY.md`.

## The demo walk

1. **Dev Trivedi** logs in to the workload console.
2. Runs a few guided queries against the live 300k-doc collection.
3. **Aakash Singh** logs in and runs a few more — each capture is attributed to its user.
4. **DBRE** logs in to the Slow-Query Queue, ranked by evidence.
5. Picks the worst query and clicks **Diagnose**.
6. Reviews the `EvidencePack` — finding, ESR recommendation, and the evidence hash.
7. **Approves** the hash-bound fix.
8. The backend applies the recommended index.
9. Verification re-`explain`s: **SORT removed**, docs-examined collapses (e.g. 100,073 → 25).
10. The system map shows the path: browser → Cloud Run API → Agent Engine roles → deterministic
controller → MongoDB + ledger.

## License

Apache-2.0 — see [`LICENSE`](LICENSE).

---

_Built for the Google Cloud Agents challenge · MongoDB partner track._
Loading
Loading