Back to blog
webhooksintegrationsapiarchitecturesaas

Building a Reliable Webhook System for Your SaaS: The Complete Guide

By SaaS Masters15 maart 20269 min read
Building a Reliable Webhook System for Your SaaS: The Complete Guide

Building a Reliable Webhook System for Your SaaS: The Complete Guide

Your SaaS doesn't exist in a vacuum. The moment customers start integrating your product into their workflows, they'll need one thing above all: real-time event notifications. That's where webhooks come in.

A webhook system sounds simple — send an HTTP POST when something happens. But building one that's reliable, secure, and scalable is an engineering challenge that trips up many SaaS teams. Dropped events, duplicate deliveries, security vulnerabilities, and debugging nightmares are all common pitfalls.

This guide walks you through building a production-grade webhook system from the ground up.

Why Webhooks Matter for SaaS

Webhooks are the standard for event-driven integrations. Unlike polling (where clients repeatedly ask "anything new?"), webhooks push events to subscribers the moment they happen. Benefits include:

  • Real-time updates — customers get notified instantly
  • Reduced API load — no more polling every 30 seconds
  • Better integrations — Zapier, Make, and n8n all rely on webhooks
  • Customer stickiness — deep integrations make your product harder to replace

If you're building a B2B SaaS, a solid webhook system isn't optional — it's expected.

Architecture Overview

A reliable webhook system has five core components:

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Your App   │────▶│  Event Queue │────▶│  Delivery     │
│  (events)   │     │  (Redis/SQS) │     │  Worker       │
└─────────────┘     └──────────────┘     └───────┬───────┘
                                                  │
                                          ┌───────▼───────┐
                                          │  Customer      │
                                          │  Endpoint      │
                                          └───────┬───────┘
                                                  │
                                          ┌───────▼───────┐
                                          │  Retry Queue   │
                                          │  + Dead Letter │
                                          └───────────────┘

Never send webhooks synchronously from your main application flow. Always queue the event and let a background worker handle delivery. This prevents slow or failing customer endpoints from affecting your application's performance.

Step 1: Design Your Event Schema

Consistency is king. Every webhook event should follow the same structure:

{
  "id": "evt_2xK9mPqR4sT7vW1y",
  "type": "invoice.paid",
  "apiVersion": "2026-03-01",
  "createdAt": "2026-03-15T08:00:00Z",
  "data": {
    "id": "inv_8nM3kL6jH2fD",
    "amount": 9900,
    "currency": "eur",
    "customerId": "cus_4rT7yU2iO9pA"
  }
}

Key design decisions:

  • Unique event ID — essential for idempotency on the receiver side
  • Namespaced event types — use resource.action format (e.g., subscription.created, invoice.paid)
  • API versioning — lets you evolve payloads without breaking existing integrations
  • Timestamp — always include when the event occurred
  • Full resource in data — include the complete object, not just an ID (saves customers an extra API call)

Step 2: Subscription Management

Let customers register webhook endpoints through your API and dashboard:

// Webhook endpoint registration
interface WebhookEndpoint {
  id: string;
  url: string;            // Customer's HTTPS endpoint
  events: string[];       // ["invoice.paid", "subscription.*"]
  secret: string;         // For signature verification
  active: boolean;
  createdAt: Date;
  metadata?: Record<string, string>;
}

// Database schema (Prisma)
model WebhookEndpoint {
  id        String   @id @default(cuid())
  tenantId  String
  url       String
  events    String[] // Subscribed event types
  secret    String   // HMAC signing secret
  active    Boolean  @default(true)
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt

  tenant    Tenant   @relation(fields: [tenantId], references: [id])
  deliveries WebhookDelivery[]
}

Important considerations:

  • Only allow HTTPS URLs — never send webhook payloads over unencrypted connections
  • Support wildcard subscriptionsinvoice.* subscribes to all invoice events
  • Generate a unique secret per endpoint — used for HMAC signature verification
  • Validate URLs on registration — send a test event or verification challenge

Step 3: Secure Signing with HMAC

Every webhook delivery must be signed so the receiver can verify it came from you:

import crypto from 'crypto';

function signWebhookPayload(
  payload: string, 
  secret: string, 
  timestamp: number
): string {
  const signedContent = `${timestamp}.${payload}`;
  return crypto
    .createHmac('sha256', secret)
    .update(signedContent)
    .digest('hex');
}

function buildWebhookHeaders(
  payload: string, 
  secret: string
): Record<string, string> {
  const timestamp = Math.floor(Date.now() / 1000);
  const signature = signWebhookPayload(payload, secret, timestamp);
  
  return {
    'Content-Type': 'application/json',
    'X-Webhook-Id': crypto.randomUUID(),
    'X-Webhook-Timestamp': timestamp.toString(),
    'X-Webhook-Signature': `v1=${signature}`,
  };
}

Including the timestamp in the signature prevents replay attacks — receivers should reject signatures older than 5 minutes.

Step 4: The Delivery Worker

This is the heart of your system. The worker pulls events from the queue and delivers them:

import { Queue, Worker } from 'bullmq';

const webhookQueue = new Queue('webhooks', { 
  connection: redis 
});

// Enqueue an event
async function emitWebhookEvent(
  tenantId: string, 
  eventType: string, 
  data: any
) {
  const event = {
    id: `evt_${generateId()}`,
    type: eventType,
    apiVersion: '2026-03-01',
    createdAt: new Date().toISOString(),
    data,
  };

  // Find all matching endpoints for this tenant
  const endpoints = await db.webhookEndpoint.findMany({
    where: {
      tenantId,
      active: true,
      events: { hasSome: [eventType, eventType.split('.')[0] + '.*'] },
    },
  });

  // Queue a delivery job for each endpoint
  for (const endpoint of endpoints) {
    await webhookQueue.add('deliver', {
      event,
      endpointId: endpoint.id,
      url: endpoint.url,
      secret: endpoint.secret,
    }, {
      attempts: 8,
      backoff: { type: 'exponential', delay: 60_000 },
      removeOnComplete: 1000,
      removeOnFail: 5000,
    });
  }
}

// The delivery worker
const worker = new Worker('webhooks', async (job) => {
  const { event, endpointId, url, secret } = job.data;
  const payload = JSON.stringify(event);
  const headers = buildWebhookHeaders(payload, secret);

  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 10_000);

  try {
    const response = await fetch(url, {
      method: 'POST',
      headers,
      body: payload,
      signal: controller.signal,
    });

    // Log the delivery attempt
    await db.webhookDelivery.create({
      data: {
        endpointId,
        eventId: event.id,
        eventType: event.type,
        statusCode: response.status,
        success: response.status >= 200 && response.status < 300,
        attemptNumber: job.attemptsMade + 1,
      },
    });

    // Treat non-2xx as failure
    if (response.status < 200 || response.status >= 300) {
      throw new Error(`Endpoint returned ${response.status}`);
    }
  } finally {
    clearTimeout(timeout);
  }
}, { connection: redis, concurrency: 20 });

Step 5: Retry Strategy with Exponential Backoff

Failed deliveries should be retried with increasing delays. A common pattern:

AttemptDelayTotal elapsed
1Immediate0
21 minute1 min
35 minutes6 min
430 minutes36 min
52 hours~2.5 hours
68 hours~10.5 hours
724 hours~34.5 hours
848 hours~82.5 hours

After all retries are exhausted, move the event to a dead letter queue and optionally notify the customer:

worker.on('failed', async (job, err) => {
  if (job.attemptsMade >= job.opts.attempts) {
    // All retries exhausted — disable endpoint
    await db.webhookEndpoint.update({
      where: { id: job.data.endpointId },
      data: { active: false },
    });
    
    // Notify customer via email
    await sendEmail({
      to: customer.email,
      subject: 'Webhook endpoint disabled',
      body: `Your endpoint ${job.data.url} has been disabled after repeated failures.`,
    });
  }
});

Step 6: Delivery Logs and Debugging

Give your customers a webhook delivery log in your dashboard. This is crucial for debugging:

model WebhookDelivery {
  id            String   @id @default(cuid())
  endpointId    String
  eventId       String
  eventType     String
  statusCode    Int?
  success       Boolean
  attemptNumber Int
  requestBody   String?  // Store for debugging
  responseBody  String?  // First 1KB of response
  duration      Int?     // ms
  createdAt     DateTime @default(now())

  endpoint WebhookEndpoint @relation(fields: [endpointId], references: [id])
}

Essential dashboard features:

  • Recent deliveries with status, response code, and timing
  • Retry button for failed deliveries
  • Test endpoint button that sends a sample event
  • Event type filter to find specific events quickly

Step 7: Rate Limiting and Circuit Breaking

Protect both yourself and your customers' endpoints:

import { RateLimiterRedis } from 'rate-limiter-flexible';

// Per-endpoint rate limiter: max 100 deliveries per minute
const endpointLimiter = new RateLimiterRedis({
  storeClient: redis,
  keyPrefix: 'webhook_rl',
  points: 100,
  duration: 60,
});

// Circuit breaker: pause delivery after 5 consecutive failures
async function checkCircuitBreaker(endpointId: string): Promise<boolean> {
  const recentFailures = await db.webhookDelivery.count({
    where: {
      endpointId,
      success: false,
      createdAt: { gte: new Date(Date.now() - 300_000) }, // Last 5 min
    },
  });
  return recentFailures < 5; // true = circuit closed (OK to send)
}

Step 8: Monitoring and Alerting

Track these metrics for your webhook system:

  • Delivery success rate — should be >99% for healthy endpoints
  • Queue depth — growing queue means workers can't keep up
  • P95 delivery latency — how long from event to delivery
  • Retry rate — high retries indicate problematic endpoints
  • Dead letter queue size — failed events that need attention
// Prometheus metrics example
import { Counter, Histogram, Gauge } from 'prom-client';

const webhookDeliveries = new Counter({
  name: 'webhook_deliveries_total',
  help: 'Total webhook delivery attempts',
  labelNames: ['status', 'event_type'],
});

const deliveryDuration = new Histogram({
  name: 'webhook_delivery_duration_seconds',
  help: 'Webhook delivery duration',
  buckets: [0.1, 0.5, 1, 2, 5, 10],
});

const queueDepth = new Gauge({
  name: 'webhook_queue_depth',
  help: 'Number of pending webhook deliveries',
});

Production Checklist

Before launching your webhook system, verify:

  • HTTPS only — reject HTTP URLs on registration
  • HMAC signatures — every delivery is signed
  • Timestamp in signature — prevents replay attacks
  • 10-second timeout — don't wait forever for slow endpoints
  • Exponential backoff — at least 5 retry attempts
  • Dead letter queue — don't silently drop events
  • Idempotency keys — unique event IDs for deduplication
  • Delivery logs — customers can see what was sent
  • Rate limiting — protect endpoints from event storms
  • IP allowlist documentation — publish your outbound IPs
  • Test mode — customers can send test events from the dashboard
  • Event catalog — document every event type and its payload

Conclusion

A webhook system is one of those features that seems simple until you build it for real. The difference between a toy implementation and a production-grade system comes down to reliability — retries, monitoring, security, and debugging tools.

Invest the time to build it right. Your customers' integrations depend on it, and every missed webhook is a broken workflow somewhere. The good news: once you have a solid foundation, it scales beautifully and becomes one of your product's strongest selling points.

Need help building a reliable webhook system for your SaaS? Get in touch — we've built event-driven systems for dozens of SaaS products.