✨ catalogd graphql shift to file-based cache#2732
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR shifts catalogd’s GraphQL implementation from building/caching schemas off in-memory catalog metas to a file-based approach: schema metadata is discovered during Store() and persisted to disk, while query execution loads requested objects from catalog.jsonl on-demand using byte offsets from index.json. It also introduces a query complexity validation helper and updates handlers/tests to use the new GraphQL service interface.
Changes:
- Persist GraphQL schema metadata to
graphql-schema.jsonduring catalog storage and load it from disk for schema building. - Switch query execution to disk-backed object loading using index-based byte offsets (reducing RSS growth from caching parsed objects).
- Add a GraphQL query complexity validation helper and expand concurrency/singleflight tests around cache misses/builds.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/catalogd/storage/localdir.go | Writes graphql-schema.json on store, adds disk-backed object loader, and adjusts GraphQL pre-warm behavior. |
| internal/catalogd/storage/index.go | Exposes schema-section byte ranges for disk-backed GraphQL pagination. |
| internal/catalogd/service/graphql_service.go | Refactors service to use a storage-provided data provider and adds query validation/timeout. |
| internal/catalogd/service/graphql_service_test.go | Updates tests to use the new provider-based GraphQL service and adds singleflight coverage via provider counting. |
| internal/catalogd/server/handlers.go | Updates GraphQL handler to execute queries without needing a catalog FS. |
| internal/catalogd/server/handlers_test.go | Updates mocks/tests for the new GraphQL service interface and error behavior. |
| internal/catalogd/graphql/validation.go | Adds AST-based query complexity validation (depth/aliases/fields). |
| internal/catalogd/graphql/graphql.go | Adds schema serialization/deserialization and switches to loader-based query-time object retrieval. |
| hack/demo/graphql-demo-server/main.go | Updates demo server to build schema once and serve GraphQL directly from the generated schema. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| buf := make([]byte, sec.Length) | ||
| if _, err := f.ReadAt(buf, sec.Offset); err != nil { | ||
| return nil, fmt.Errorf("error reading section at offset %d: %w", sec.Offset, err) | ||
| } |
| return func(schemaName string, offset, limit int) ([]map[string]interface{}, error) { | ||
| sections := idx.GetSchemaSections(schemaName) | ||
| if sections == nil { | ||
| return nil, nil | ||
| } |
| s.graphqlSvc.InvalidateCache(catalog) | ||
|
|
||
| if _, err := s.graphqlSvc.GetSchema(context.Background(), catalog); err != nil { | ||
| // Schema build failed — remove the catalog to maintain consistency. |
| func serializableToFieldInfo(sfi *serializableFieldInfo) *FieldInfo { | ||
| k := stringToKind(sfi.JSONType) | ||
| fi := &FieldInfo{ | ||
| Name: sfi.Name, | ||
| OriginalName: sfi.OriginalName, | ||
| JSONType: k, | ||
| IsArray: sfi.IsArray, | ||
| GraphQLType: jsonTypeToGraphQL(k, sfi.IsArray, nil), | ||
| } |
| for _, sel := range ss.Selections { | ||
| switch s := sel.(type) { | ||
| case *ast.Field: | ||
| c.fields++ | ||
| if c.fields > MaxQueryFields { | ||
| return fmt.Errorf("query exceeds maximum field count of %d", MaxQueryFields) | ||
| } | ||
| if s.Alias != nil && s.Alias.Value != "" { | ||
| c.aliases++ | ||
| if c.aliases > MaxQueryAliases { | ||
| return fmt.Errorf("query exceeds maximum alias count of %d", MaxQueryAliases) | ||
| } | ||
| } | ||
| if err := c.walkSelectionSet(s.SelectionSet, depth+1); err != nil { | ||
| return err | ||
| } | ||
| case *ast.InlineFragment: | ||
| if err := c.walkSelectionSet(s.SelectionSet, depth+1); err != nil { | ||
| return err | ||
| } | ||
| case *ast.FragmentSpread: | ||
| c.fields++ | ||
| } |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2732 +/- ##
==========================================
- Coverage 67.07% 66.64% -0.43%
==========================================
Files 149 150 +1
Lines 11339 11565 +226
==========================================
+ Hits 7606 7708 +102
- Misses 3179 3291 +112
- Partials 554 566 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR proposes a shift in the graphql service endpoint from in-memory to on-disk caching which leverages common architectural conventions like fan-out catalog index generation (graphql-schema.json) which is then accessed to fulfill queries to eliminate RSS increases from the feature initial implementation.
Included is a structured graphql validation package as a focus for future limits enforcement and adds several cold-cache/cache-miss concurrency tests.
Reviewer Checklist