Troubleshooting Workflow¶
A structured approach to debugging when you've been stuck for more than 30 minutes.
When to Use¶
- Stuck >30 minutes on the same issue
- Tests failing unexpectedly
- Build broken and unclear why
- Security issue discovered
- Performance degraded significantly
How to Invoke¶
@.claude/skills/troubleshooting/SKILL.md
Been stuck on "Connection pool exhausted" error for 45 minutes
The Process¶
Step 1: STOP¶
Do not keep trying random solutions.
Stop Immediately
After 30 minutes of the same issue, random attempts waste time and may make things worse.
Step 2: Document¶
What have you tried?
## Attempted Solutions
1. Restarted the server - still fails
2. Increased pool size to 50 - same error
3. Added connection timeout - no change
Step 3: Simplify¶
Can you reproduce in isolation?
- Create minimal test case
- Remove complexity
- Isolate the failing component
Step 4: Check Fundamentals¶
Common culprits:
- Dependencies installed and correct version?
- Configuration correct?
- Environment variables set?
- File permissions correct?
- Right version running? (restart server, clear cache)
Step 5: Search¶
- Google the exact error message
- Check GitHub issues for dependencies
- Search Stack Overflow
- Review
.claude/memory/for similar problems
Step 6: Ask for Help¶
Present a clear problem statement:
## Problem
Connection pool exhausted after ~100 requests
## What I'm Trying To Do
Handle concurrent API requests
## What Happens Instead
Error: "Connection pool exhausted" after 100 requests
## What I've Tried
1. Increased pool size (10 → 50)
2. Added connection timeout (30s)
3. Restarted server
## Environment
- Node.js 20
- PostgreSQL 16
- pg library 8.11
Step 7: Document Solution¶
Once resolved, create .claude/memory/YYYY-MM-DD-issue-name.md:
# Issue: Connection Pool Exhausted
**Date**: 2025-01-15
**Resolved**: Yes
## Problem
Connection pool exhausted after ~100 concurrent requests.
## Root Cause
Connections not being released after query completion.
Missing `finally` block to release connection.
## Solution
```javascript
const client = await pool.connect();
try {
const result = await client.query(sql);
return result;
} finally {
client.release(); // Always release!
}
Prevention¶
- Add linter rule for unreleased connections
- Add connection monitoring
- Review similar code for same pattern
--- ## Common Issues ### Tests Breaking Unexpectedly **Diagnosis Steps**: 1. When did it break? ```bash git diff HEAD~1 ``` 2. Isolate the failing test: ```bash npm test -- --grep "specific test" ``` 3. Check for flakiness (run 10 times) 4. Check for test interdependence **Recovery**: ```bash # Revert to last working state git reset --hard HEAD~1 # Re-apply changes incrementally # Test after each change
Build Failing¶
Common Causes:
- Dependency version mismatch
- Missing environment variable
- TypeScript/compiler error
- Circular dependency
Quick Fixes:
# Clear and reinstall dependencies
rm -rf node_modules package-lock.json
npm install
# Clear build cache
rm -rf dist .next .cache
npm run build
Security Issue Found¶
PRIORITY: CRITICAL
Security issues take priority over all other work.
Immediate Actions:
- DO NOT commit vulnerable code
- Fix immediately - Security takes priority
- Add regression test - Prevent reintroduction
- Review similar code - Same pattern elsewhere?
- Document in
.claude/memory/
Common Security Fixes:
Performance Degraded¶
Diagnosis:
-
Measure - Don't guess
-
Identify bottleneck
- N+1 queries?
- Large data in memory?
- Missing index?
-
Blocking I/O?
-
Fix the biggest issue first
Common Fixes:
- Add database indexes
- Use pagination/streaming
- Add caching
- Fix N+1 queries with eager loading
Red Flags¶
Stop and reassess if you see:
Code Red Flags
- Same error after 3 different fixes
- Solution getting more complex
- "It just works now" without understanding why
- Touching >10 files for a "simple" fix
- Breaking tests to make new code work
Process Red Flags
- Skipping tests "I'll add them later"
- Committing commented-out code
- Ignoring linter errors
- Using
anyto "make TypeScript happy" - Copying code without understanding it
When you see red flags:
- STOP adding code
- Revert to last working state
- Apply COMPLEX mode methodology
- Ask for guidance
Recovery Checklist¶
After resolving any major issue:
- Tests passing
- Guardrails validated
- Root cause understood (not just symptom fixed)
- Similar code reviewed for same issue
- Solution documented in
.claude/memory/ - Prevention added (test, linter rule, etc.)
Getting Help¶
From AI¶
Provide clear context:
@.claude/skills/troubleshooting/SKILL.md
Problem: API returns 500 on user creation
Tried: Checking logs, validating input, restarting server
Error: "Cannot read property 'id' of undefined"
From Documentation¶
- Guardrails - Check if you're violating rules
- Language Guides - Language-specific debugging
From Humans¶
Prepare:
- Minimal reproduction case
- Error messages (full text)
- What you've tried
- Environment details
Related¶
-
Guardrails
Rules that help prevent issues.
-
Methodology
Structured approach to development.