Building a Reliable Webhook System for Your SaaS: The Complete Guide
Your SaaS doesn't exist in a vacuum. The moment customers start integrating your product into their workflows, they'll need one thing above all: real-time event notifications. That's where webhooks come in.
A webhook system sounds simple — send an HTTP POST when something happens. But building one that's reliable, secure, and scalable is an engineering challenge that trips up many SaaS teams. Dropped events, duplicate deliveries, security vulnerabilities, and debugging nightmares are all common pitfalls.
This guide walks you through building a production-grade webhook system from the ground up.
Why Webhooks Matter for SaaS
Webhooks are the standard for event-driven integrations. Unlike polling (where clients repeatedly ask "anything new?"), webhooks push events to subscribers the moment they happen. Benefits include:
- Real-time updates — customers get notified instantly
- Reduced API load — no more polling every 30 seconds
- Better integrations — Zapier, Make, and n8n all rely on webhooks
- Customer stickiness — deep integrations make your product harder to replace
If you're building a B2B SaaS, a solid webhook system isn't optional — it's expected.
Architecture Overview
A reliable webhook system has five core components:
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ Your App │────▶│ Event Queue │────▶│ Delivery │
│ (events) │ │ (Redis/SQS) │ │ Worker │
└─────────────┘ └──────────────┘ └───────┬───────┘
│
┌───────▼───────┐
│ Customer │
│ Endpoint │
└───────┬───────┘
│
┌───────▼───────┐
│ Retry Queue │
│ + Dead Letter │
└───────────────┘
Never send webhooks synchronously from your main application flow. Always queue the event and let a background worker handle delivery. This prevents slow or failing customer endpoints from affecting your application's performance.
Step 1: Design Your Event Schema
Consistency is king. Every webhook event should follow the same structure:
{
"id": "evt_2xK9mPqR4sT7vW1y",
"type": "invoice.paid",
"apiVersion": "2026-03-01",
"createdAt": "2026-03-15T08:00:00Z",
"data": {
"id": "inv_8nM3kL6jH2fD",
"amount": 9900,
"currency": "eur",
"customerId": "cus_4rT7yU2iO9pA"
}
}
Key design decisions:
- Unique event ID — essential for idempotency on the receiver side
- Namespaced event types — use
resource.actionformat (e.g.,subscription.created,invoice.paid) - API versioning — lets you evolve payloads without breaking existing integrations
- Timestamp — always include when the event occurred
- Full resource in data — include the complete object, not just an ID (saves customers an extra API call)
Step 2: Subscription Management
Let customers register webhook endpoints through your API and dashboard:
// Webhook endpoint registration
interface WebhookEndpoint {
id: string;
url: string; // Customer's HTTPS endpoint
events: string[]; // ["invoice.paid", "subscription.*"]
secret: string; // For signature verification
active: boolean;
createdAt: Date;
metadata?: Record<string, string>;
}
// Database schema (Prisma)
model WebhookEndpoint {
id String @id @default(cuid())
tenantId String
url String
events String[] // Subscribed event types
secret String // HMAC signing secret
active Boolean @default(true)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
tenant Tenant @relation(fields: [tenantId], references: [id])
deliveries WebhookDelivery[]
}
Important considerations:
- Only allow HTTPS URLs — never send webhook payloads over unencrypted connections
- Support wildcard subscriptions —
invoice.*subscribes to all invoice events - Generate a unique secret per endpoint — used for HMAC signature verification
- Validate URLs on registration — send a test event or verification challenge
Step 3: Secure Signing with HMAC
Every webhook delivery must be signed so the receiver can verify it came from you:
import crypto from 'crypto';
function signWebhookPayload(
payload: string,
secret: string,
timestamp: number
): string {
const signedContent = `${timestamp}.${payload}`;
return crypto
.createHmac('sha256', secret)
.update(signedContent)
.digest('hex');
}
function buildWebhookHeaders(
payload: string,
secret: string
): Record<string, string> {
const timestamp = Math.floor(Date.now() / 1000);
const signature = signWebhookPayload(payload, secret, timestamp);
return {
'Content-Type': 'application/json',
'X-Webhook-Id': crypto.randomUUID(),
'X-Webhook-Timestamp': timestamp.toString(),
'X-Webhook-Signature': `v1=${signature}`,
};
}
Including the timestamp in the signature prevents replay attacks — receivers should reject signatures older than 5 minutes.
Step 4: The Delivery Worker
This is the heart of your system. The worker pulls events from the queue and delivers them:
import { Queue, Worker } from 'bullmq';
const webhookQueue = new Queue('webhooks', {
connection: redis
});
// Enqueue an event
async function emitWebhookEvent(
tenantId: string,
eventType: string,
data: any
) {
const event = {
id: `evt_${generateId()}`,
type: eventType,
apiVersion: '2026-03-01',
createdAt: new Date().toISOString(),
data,
};
// Find all matching endpoints for this tenant
const endpoints = await db.webhookEndpoint.findMany({
where: {
tenantId,
active: true,
events: { hasSome: [eventType, eventType.split('.')[0] + '.*'] },
},
});
// Queue a delivery job for each endpoint
for (const endpoint of endpoints) {
await webhookQueue.add('deliver', {
event,
endpointId: endpoint.id,
url: endpoint.url,
secret: endpoint.secret,
}, {
attempts: 8,
backoff: { type: 'exponential', delay: 60_000 },
removeOnComplete: 1000,
removeOnFail: 5000,
});
}
}
// The delivery worker
const worker = new Worker('webhooks', async (job) => {
const { event, endpointId, url, secret } = job.data;
const payload = JSON.stringify(event);
const headers = buildWebhookHeaders(payload, secret);
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
const response = await fetch(url, {
method: 'POST',
headers,
body: payload,
signal: controller.signal,
});
// Log the delivery attempt
await db.webhookDelivery.create({
data: {
endpointId,
eventId: event.id,
eventType: event.type,
statusCode: response.status,
success: response.status >= 200 && response.status < 300,
attemptNumber: job.attemptsMade + 1,
},
});
// Treat non-2xx as failure
if (response.status < 200 || response.status >= 300) {
throw new Error(`Endpoint returned ${response.status}`);
}
} finally {
clearTimeout(timeout);
}
}, { connection: redis, concurrency: 20 });
Step 5: Retry Strategy with Exponential Backoff
Failed deliveries should be retried with increasing delays. A common pattern:
| Attempt | Delay | Total elapsed |
|---|---|---|
| 1 | Immediate | 0 |
| 2 | 1 minute | 1 min |
| 3 | 5 minutes | 6 min |
| 4 | 30 minutes | 36 min |
| 5 | 2 hours | ~2.5 hours |
| 6 | 8 hours | ~10.5 hours |
| 7 | 24 hours | ~34.5 hours |
| 8 | 48 hours | ~82.5 hours |
After all retries are exhausted, move the event to a dead letter queue and optionally notify the customer:
worker.on('failed', async (job, err) => {
if (job.attemptsMade >= job.opts.attempts) {
// All retries exhausted — disable endpoint
await db.webhookEndpoint.update({
where: { id: job.data.endpointId },
data: { active: false },
});
// Notify customer via email
await sendEmail({
to: customer.email,
subject: 'Webhook endpoint disabled',
body: `Your endpoint ${job.data.url} has been disabled after repeated failures.`,
});
}
});
Step 6: Delivery Logs and Debugging
Give your customers a webhook delivery log in your dashboard. This is crucial for debugging:
model WebhookDelivery {
id String @id @default(cuid())
endpointId String
eventId String
eventType String
statusCode Int?
success Boolean
attemptNumber Int
requestBody String? // Store for debugging
responseBody String? // First 1KB of response
duration Int? // ms
createdAt DateTime @default(now())
endpoint WebhookEndpoint @relation(fields: [endpointId], references: [id])
}
Essential dashboard features:
- Recent deliveries with status, response code, and timing
- Retry button for failed deliveries
- Test endpoint button that sends a sample event
- Event type filter to find specific events quickly
Step 7: Rate Limiting and Circuit Breaking
Protect both yourself and your customers' endpoints:
import { RateLimiterRedis } from 'rate-limiter-flexible';
// Per-endpoint rate limiter: max 100 deliveries per minute
const endpointLimiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: 'webhook_rl',
points: 100,
duration: 60,
});
// Circuit breaker: pause delivery after 5 consecutive failures
async function checkCircuitBreaker(endpointId: string): Promise<boolean> {
const recentFailures = await db.webhookDelivery.count({
where: {
endpointId,
success: false,
createdAt: { gte: new Date(Date.now() - 300_000) }, // Last 5 min
},
});
return recentFailures < 5; // true = circuit closed (OK to send)
}
Step 8: Monitoring and Alerting
Track these metrics for your webhook system:
- Delivery success rate — should be >99% for healthy endpoints
- Queue depth — growing queue means workers can't keep up
- P95 delivery latency — how long from event to delivery
- Retry rate — high retries indicate problematic endpoints
- Dead letter queue size — failed events that need attention
// Prometheus metrics example
import { Counter, Histogram, Gauge } from 'prom-client';
const webhookDeliveries = new Counter({
name: 'webhook_deliveries_total',
help: 'Total webhook delivery attempts',
labelNames: ['status', 'event_type'],
});
const deliveryDuration = new Histogram({
name: 'webhook_delivery_duration_seconds',
help: 'Webhook delivery duration',
buckets: [0.1, 0.5, 1, 2, 5, 10],
});
const queueDepth = new Gauge({
name: 'webhook_queue_depth',
help: 'Number of pending webhook deliveries',
});
Production Checklist
Before launching your webhook system, verify:
- HTTPS only — reject HTTP URLs on registration
- HMAC signatures — every delivery is signed
- Timestamp in signature — prevents replay attacks
- 10-second timeout — don't wait forever for slow endpoints
- Exponential backoff — at least 5 retry attempts
- Dead letter queue — don't silently drop events
- Idempotency keys — unique event IDs for deduplication
- Delivery logs — customers can see what was sent
- Rate limiting — protect endpoints from event storms
- IP allowlist documentation — publish your outbound IPs
- Test mode — customers can send test events from the dashboard
- Event catalog — document every event type and its payload
Conclusion
A webhook system is one of those features that seems simple until you build it for real. The difference between a toy implementation and a production-grade system comes down to reliability — retries, monitoring, security, and debugging tools.
Invest the time to build it right. Your customers' integrations depend on it, and every missed webhook is a broken workflow somewhere. The good news: once you have a solid foundation, it scales beautifully and becomes one of your product's strongest selling points.
Need help building a reliable webhook system for your SaaS? Get in touch — we've built event-driven systems for dozens of SaaS products.