-
-
Notifications
You must be signed in to change notification settings - Fork 2
Copilot: Fix memory leaks causing OOM crashes in production #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,10 +4,10 @@ import { sequence } from '@sveltejs/kit/hooks'; | |
| import { building } from '$app/environment'; | ||
| import OTEL from '$lib/otel'; | ||
| import { tryVerifyAPIToken, tryVerifyCookie } from '$lib/server/auth'; | ||
| import { QueueConnected, getQueues } from '$lib/server/bullmq'; | ||
| import { QueueConnected, closeAllConnections, getQueues } from '$lib/server/bullmq'; | ||
| import { bullboardHandle } from '$lib/server/bullmq/BullBoard'; | ||
| import { allWorkers } from '$lib/server/bullmq/BullMQ'; | ||
| import { DatabaseConnected } from '$lib/server/prisma'; | ||
| import { DatabaseConnected, closeDatabaseConnection } from '$lib/server/prisma'; | ||
|
|
||
| const handleAPIRoute: Handle = async ({ event, resolve }) => { | ||
| if (event.route.id === '/(api)/health') return resolve(event); | ||
|
|
@@ -35,10 +35,52 @@ if (!building) { | |
| // Likewise, initialize the Prisma connection heartbeat | ||
| DatabaseConnected(); | ||
|
|
||
| // Graceful shutdown | ||
| process.on('sveltekit:shutdown', async () => { | ||
| OTEL.instance.logger.info('Shutting down gracefully...'); | ||
| await Promise.all(allWorkers.map((worker) => worker.worker?.close())); | ||
| // Graceful shutdown handler | ||
| const shutdown = async (signal: string) => { | ||
| OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`); | ||
| try { | ||
| // Close all workers first | ||
| await Promise.all(allWorkers.map((worker) => worker.worker?.close())); | ||
| OTEL.instance.logger.info('All workers closed'); | ||
|
|
||
| // Close all queue and Redis connections | ||
| await closeAllConnections(); | ||
| OTEL.instance.logger.info('All connections closed'); | ||
|
|
||
| // Close database connection | ||
| await closeDatabaseConnection(); | ||
| OTEL.instance.logger.info('Database connection closed'); | ||
|
|
||
| process.exit(0); | ||
| } catch (error) { | ||
| OTEL.instance.logger.error('Error during shutdown', { | ||
| error: error instanceof Error ? error.message : String(error) | ||
| }); | ||
| process.exit(1); | ||
| } | ||
| }; | ||
|
Comment on lines
+38
to
+61
|
||
|
|
||
| // Register shutdown handlers | ||
| process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown')); | ||
| process.on('SIGTERM', () => shutdown('SIGTERM')); | ||
| process.on('SIGINT', () => shutdown('SIGINT')); | ||
|
|
||
| // Handle uncaught errors | ||
| process.on('uncaughtException', async (error) => { | ||
| OTEL.instance.logger.error('Uncaught exception', { | ||
| error: error.message, | ||
| stack: error.stack | ||
| }); | ||
| console.error('Uncaught exception:', error); | ||
| await shutdown('uncaughtException'); | ||
| }); | ||
|
|
||
| process.on('unhandledRejection', async (reason) => { | ||
| OTEL.instance.logger.error('Unhandled rejection', { | ||
| reason: reason instanceof Error ? reason.message : String(reason) | ||
| }); | ||
| console.error('Unhandled rejection:', reason); | ||
| await shutdown('unhandledRejection'); | ||
| }); | ||
|
Comment on lines
+63
to
84
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🩺 Stability & Availability | 🟠 Major 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Locate and print the relevant section of src/hooks.server.ts (with line numbers)
FILE="src/hooks.server.ts"
if [ ! -f "$FILE" ]; then
echo "Missing: $FILE"
exit 1
fi
echo "---- $FILE (lines 1-140) ----"
nl -ba "$FILE" | sed -n '1,140p'Repository: sillsdev/appbuilder-buildengine-api Length of output: 185 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Find any existing shutdown definition/usages to see whether it is already guarded/idempotent.
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|function\s+shutdown|const\s+shutdown|shutdown\(" src/hooks.server.ts src -S || trueRepository: sillsdev/appbuilder-buildengine-api Length of output: 967 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Check for other shutdown/idempotency patterns and where process.on/once handlers are registered in the repo.
rg -n --hidden --no-ignore-vcs "process\.on\(|process\.once\(|uncaughtException|unhandledRejection|sveltekit:shutdown|SIGTERM|SIGINT" src -S || trueRepository: sillsdev/appbuilder-buildengine-api Length of output: 607 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Print shutdown implementation area with line numbers (using awk since nl isn't available)
awk 'NR>=25 && NR<=110 {printf "%d:%s\n", NR, $0}' "$FILE" | sed -n '1,120p'Repository: sillsdev/appbuilder-buildengine-api Length of output: 3091 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Also quickly show where shutdown is defined/called to confirm there isn't a different wrapper elsewhere in the file
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|const\s+shutdown|process\.on\('sveltekit:shutdown'|process\.on\('SIGTERM'|process\.on\('SIGINT'|uncaughtException|unhandledRejection" "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 517 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Find the implementations/usages of closeAllConnections and closeDatabaseConnection
rg -n --hidden --no-ignore-vcs "function\s+closeAllConnections|const\s+closeAllConnections|closeAllConnections\s*=" src -S
rg -n --hidden --no-ignore-vcs "function\s+closeDatabaseConnection|const\s+closeDatabaseConnection|closeDatabaseConnection\s*=" src -S
rg -n --hidden --no-ignore-vcs "allWorkers" src/hooks.server.ts src -SRepository: sillsdev/appbuilder-buildengine-api Length of output: 664 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/queues.ts"
# Print around closeAllConnections
awk 'NR>=130 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 2095 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/lib/server/prisma.ts"
# Print around closeDatabaseConnection
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 2410 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullMQ.ts"
# Print around allWorkers definition
awk 'NR>=1 && NR<=120 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 433 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Inspect the top portion of queues.ts for shared state (_queues, connections)
FILE="src/lib/server/bullmq/queues.ts"
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 5067 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Find BullWorker implementations and inspect close behavior
rg -n --hidden --no-ignore-vcs "class\s+|export class|close\(" src/lib/server/bullmq/BullWorker* src/lib/server/bullmq -SRepository: sillsdev/appbuilder-buildengine-api Length of output: 2560 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Show BullWorker files that exist
fd -t f "BullWorker*" src/lib/server/bullmqRepository: sillsdev/appbuilder-buildengine-api Length of output: 115 🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullWorker.ts"
awk 'NR>=1 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api Length of output: 5872 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Look for any guards like "if (this.worker)" or "closed" in BullWorker
rg -n "close\(\)|closed|closing|if\s*\(this\.|worker\?" src/lib/server/bullmq/BullWorker.tsRepository: sillsdev/appbuilder-buildengine-api Length of output: 252 Make
Suggested patch+ let shutdownPromise: Promise<void> | null = null;
+
- const shutdown = async (signal: string) => {
+ const shutdown = (signal: string) => {
+ if (shutdownPromise) return shutdownPromise;
+ shutdownPromise = (async () => {
OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
- ...
- };
+ ...
+ })();
+ return shutdownPromise;
+ };
- process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown'));
- process.on('SIGTERM', () => shutdown('SIGTERM'));
- process.on('SIGINT', () => shutdown('SIGINT'));
+ process.once('sveltekit:shutdown', () => void shutdown('sveltekit:shutdown'));
+ process.once('SIGTERM', () => void shutdown('SIGTERM'));
+ process.once('SIGINT', () => void shutdown('SIGINT'));🤖 Prompt for AI Agents |
||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,9 @@ | ||
| export { QueueConnected, getQueueConfig, getQueues, getWorkerConfig } from './queues'; | ||
| export { | ||
| QueueConnected, | ||
| closeAllConnections, | ||
| closeAllQueues, | ||
| getQueueConfig, | ||
| getQueues, | ||
| getWorkerConfig | ||
| } from './queues'; | ||
| export * as BullMQ from './types'; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,7 @@ import OTEL from '$lib/otel'; | |
| class Connection { | ||
| private conn: Redis; | ||
| private connected: boolean; | ||
| private heartbeatInterval: NodeJS.Timeout | null = null; | ||
| constructor(isQueueConnection = false, keyPrefix?: string) { | ||
| this.conn = new Redis({ | ||
| host: process.env.NODE_ENV === 'development' ? 'localhost' : process.env.VALKEY_HOST, | ||
|
|
@@ -41,7 +42,7 @@ class Connection { | |
| console.error('Valkey connection error', err); | ||
| } | ||
| }); | ||
| setInterval(() => { | ||
| this.heartbeatInterval = setInterval(() => { | ||
| if (this.connected) { | ||
| this.conn | ||
| .ping() | ||
|
|
@@ -60,7 +61,9 @@ class Connection { | |
| } | ||
| }); | ||
| } | ||
| }, 10000).unref(); // Check every 10 seconds | ||
| }, 10000); | ||
| // Ensure the interval doesn't prevent Node from exiting | ||
| this.heartbeatInterval.unref(); | ||
| } | ||
| public IsConnected() { | ||
| return this.connected; | ||
|
|
@@ -69,6 +72,15 @@ class Connection { | |
| public connection() { | ||
| return this.conn; | ||
| } | ||
|
|
||
| public async close() { | ||
| if (this.heartbeatInterval) { | ||
| clearInterval(this.heartbeatInterval); | ||
| this.heartbeatInterval = null; | ||
| } | ||
| this.connected = false; | ||
| await this.conn.quit(); | ||
| } | ||
|
Comment on lines
+79
to
+83
|
||
| } | ||
|
|
||
| let _workerConnection: Connection | undefined = undefined; | ||
|
|
@@ -144,3 +156,33 @@ export function getQueues() { | |
| } | ||
| return _queues; | ||
| } | ||
|
|
||
| export async function closeAllQueues() { | ||
| if (_queues) { | ||
| await Promise.all([ | ||
| _queues.Builds.close(), | ||
| _queues.S3.close(), | ||
| _queues.Releases.close(), | ||
| _queues.Polling.close(), | ||
| _queues.SystemStartup.close(), | ||
| _queues.SystemRecurring.close() | ||
| ]); | ||
| _queues = undefined; | ||
| } | ||
| } | ||
|
Comment on lines
+160
to
+172
|
||
|
|
||
| export async function closeAllConnections() { | ||
| await closeAllQueues(); | ||
| if (_workerConnection) { | ||
| await _workerConnection.close(); | ||
| _workerConnection = undefined; | ||
| } | ||
| if (_queueConnection) { | ||
| await _queueConnection.close(); | ||
| _queueConnection = undefined; | ||
| } | ||
| if (_authConnection) { | ||
| await _authConnection.close(); | ||
| _authConnection = undefined; | ||
| } | ||
| } | ||
|
Comment on lines
+160
to
+188
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🩺 Stability & Availability | 🟠 Major 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
# Basic sanity: file exists and print the relevant region with line numbers
ls -l "$file"
echo "---- closeAllQueues / closeAllConnections region ----"
nl -ba "$file" | sed -n '130,220p'
echo "---- find singleton declarations/initialization ----"
rg -n "let _queues|_queues\s*=|_workerConnection|_queueConnection|_authConnection" "$file" || true
echo "---- find usage of closeAllQueues/closeAllConnections ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || true
echo "---- find where queues/connections are created (likely _queues/worker/queue/auth assignment) ----"
rg -n "_queues\s*=\s*|_workerConnection\s*=\s*|_queueConnection\s*=\s*|_authConnection\s*=\s*" "$file" || trueRepository: sillsdev/appbuilder-buildengine-api Length of output: 275 🏁 Script executed: #!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
echo "---- file exists ----"
ls -l "$file"
echo "---- locate functions ----"
rg -n "export async function closeAllQueues|export async function closeAllConnections" "$file"
echo "---- print relevant lines (approx 145-205) ----"
cat -n "$file" | sed -n '140,210p'
echo "---- singleton declarations/initialization in this file ----"
rg -n "(_queues\b|_workerConnection\b|_queueConnection\b|_authConnection\b)" "$file"
echo "---- find where closeAll* are called in repo ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || trueRepository: sillsdev/appbuilder-buildengine-api Length of output: 3802 🏁 Script executed: #!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
echo "---- file header/imports ----"
sed -n '1,140p' "$file"
echo "---- inspect closeAllQueues/closeAllConnections call site ----"
sed -n '1,120p' src/hooks.server.tsRepository: sillsdev/appbuilder-buildengine-api Length of output: 8871 Make shutdown teardown best-effort and always clear cached BullMQ/Redis singletons.
Update teardown to continue closing everything even when one close fails (e.g., 🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
Repository: sillsdev/appbuilder-buildengine-api
Length of output: 12970
🏁 Script executed:
Repository: sillsdev/appbuilder-buildengine-api
Length of output: 6471
🏁 Script executed:
Repository: sillsdev/appbuilder-buildengine-api
Length of output: 615
🏁 Script executed:
Repository: sillsdev/appbuilder-buildengine-api
Length of output: 967
🏁 Script executed:
Repository: sillsdev/appbuilder-buildengine-api
Length of output: 9636
Don’t abort shutdown on the first cleanup error.
src/hooks.server.tsusesawait Promise.all(...)for worker closing; if any worker close rejects, thecatchcallsprocess.exit(1)and skipscloseAllConnections()+closeDatabaseConnection().Suggested patch
const shutdown = async (signal: string) => { OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`); - try { - // Close all workers first - await Promise.all(allWorkers.map((worker) => worker.worker?.close())); - OTEL.instance.logger.info('All workers closed'); - - // Close all queue and Redis connections - await closeAllConnections(); - OTEL.instance.logger.info('All connections closed'); - - // Close database connection - await closeDatabaseConnection(); - OTEL.instance.logger.info('Database connection closed'); - - process.exit(0); - } catch (error) { - OTEL.instance.logger.error('Error during shutdown', { - error: error instanceof Error ? error.message : String(error) - }); - process.exit(1); - } + const failures: unknown[] = []; + + const workerResults = await Promise.allSettled(allWorkers.map((worker) => worker.worker?.close())); + for (const result of workerResults) { + if (result.status === 'rejected') failures.push(result.reason); + } + + try { + await closeAllConnections(); + } catch (error) { + failures.push(error); + } + + try { + await closeDatabaseConnection(); + } catch (error) { + failures.push(error); + } + + if (failures.length > 0) { + OTEL.instance.logger.error('Error during shutdown', { failureCount: failures.length }); + process.exit(1); + } + process.exit(0); };Make
shutdownidempotent to avoid overlapping closes.shutdown()is wired to multiple independentprocess.on(...)events with noisShuttingDown/promise guard, so repeated signals/errors can trigger concurrent runs and double-close behavior.🤖 Prompt for AI Agents