System health monitoring with auto-remediation and alerting
Watchdog is Oshun's immune system. It monitors everything — database, Redis, email, Chatwoot, OAuth tokens, and voice infrastructure — and either fixes issues automatically or alerts via Telegram. Runs on hourly checks, daily summaries, and weekly analysis.
External service: Internal system — no third-party API dependency. Alerts are delivered via Telegram (same bot used for Karen).
Watchdog underpins every journey by keeping the infrastructure healthy:
src/watchdog.js — Orchestrator. Wires all sub-modules together and starts the monitoring loop.src/watchdog_checks.js — Core health checks. Key exports: checkInfrastructure() (line 18), checkMemorySystems() (line 91), checkTenantSystems() (line 217).src/watchdog_runners.js — Scheduled check runners. Key exports: runHourlyCheck(), runDailySummary().src/watchdog_formatters.js — Alert and report formatting. Key exports: formatUrgentAlert(), formatDailySummary().src/watchdog_remediation.js — Auto-fix logic for common issues. Key exports: autoRemediateBullMQ(), autoRemediateListmonk().src/watchdog_voice.js — Voice bridge monitoring and call transcript sync. Key exports: runVoiceBridgeMonitor(), runCallTranscriptSync().0 * * * * — Hourly infrastructure + tenant health check0 13 * * * — Daily summary (8 AM EST)0 */3 * * * — Audit sweep (every 3 hours)0 14 * * 1 — Weekly feedback analysis (Monday, 9 AM EST)0 1 * * * — Evening check (8 PM EST)0 14 * * * — Voice bridge monitor (9 AM EST)0 15 * * * — GBP review (10 AM EST)None — the watchdog catches everyone else's issues.