โ† Back to blog
saasarchitecturebackendscalingdevops

Background Jobs and Queue Architecture for Your SaaS: The Complete Guide

By SaaS Masters10 maart 20266 min read
Background Jobs and Queue Architecture for Your SaaS: The Complete Guide

Every successful SaaS reaches a point where not everything can be handled within an HTTP request. Sending emails, generating PDFs, importing data, calling AI models โ€” if you do this synchronously, your users will suffer through painful load times. The solution? Background jobs and a solid queue architecture.

In this article, we'll show you how to set up a robust system for background tasks, which patterns work, and which pitfalls to avoid.

Why background jobs are essential

Imagine: a user uploads a CSV with 10,000 contacts. Without background jobs, that user has to wait until everything is processed โ€” that could take minutes. With a queue system, you give immediate feedback ("Your import is being processed") and handle the work in the background.

Typical use cases in SaaS:

  • ๐Ÿ“ง Transactional emails and notifications
  • ๐Ÿ“Š Report and export generation
  • ๐Ÿ”„ Data synchronization with external systems
  • ๐Ÿค– AI/ML processing (embeddings, classification)
  • ๐Ÿ’ณ Payment provider webhook processing
  • ๐Ÿ“ File processing (resize, conversion, virus scanning)
  • ๐Ÿงน Periodic cleanup and maintenance

The anatomy of a queue system

A queue system consists of three core components:

1. Producer (the sender)

The code that places a task on the queue:

import { queue } from './lib/queue';

export async function handleCSVUpload(userId: string, fileUrl: string) {
  const importRecord = await db.import.create({
    data: { userId, status: 'processing', fileUrl }
  });

  await queue.add('process-csv-import', {
    userId,
    fileUrl,
    importId: importRecord.id,
  }, {
    attempts: 3,
    backoff: { type: 'exponential', delay: 5000 },
    timeout: 300_000, // 5 minutes max
  });

  return { status: 'processing', importId: importRecord.id };
}

2. Queue (the buffer)

The buffer between producer and consumer. Popular options:

TechnologyProsCons
Redis + BullMQFast, mature, great DXNeed to manage/host Redis
PostgreSQL + pgBossNo extra infra neededLess performant at high volume
AWS SQSManaged, scalableVendor lock-in, higher latency
RabbitMQFeature-rich, routingMore complex to manage

Our recommendation for most SaaS projects: BullMQ with Redis. It offers the best balance of features, performance, and developer experience. For smaller projects, pgBoss is a smart choice โ€” you don't need extra infrastructure.

3. Worker (the processor)

The process that picks up tasks from the queue and executes them:

import { Worker } from 'bullmq';
import { redis } from './lib/redis';

const worker = new Worker('process-csv-import', async (job) => {
  const { userId, fileUrl, importId } = job.data;
  
  const records = await downloadAndParseCSV(fileUrl);
  
  const batchSize = 100;
  for (let i = 0; i < records.length; i += batchSize) {
    const batch = records.slice(i, i + batchSize);
    await processContactBatch(userId, batch);
    
    await job.updateProgress(
      Math.round((i / records.length) * 100)
    );
  }
  
  await db.import.update({
    where: { id: importId },
    data: { status: 'completed', processedCount: records.length }
  });
  
  await sendNotification(userId, 
    'Your import of ' + records.length + ' contacts is complete!'
  );
}, { connection: redis, concurrency: 5 });

worker.on('failed', (job, err) => {
  console.error('Job ' + job?.id + ' failed:', err);
});

Essential patterns for production

Retry with exponential backoff

External services go down sometimes. Always build in retry logic:

await queue.add('send-email', emailData, {
  attempts: 5,
  backoff: {
    type: 'exponential',
    delay: 2000, // 2s, 4s, 8s, 16s, 32s
  },
});

Dead Letter Queue (DLQ)

Jobs that fail after all retries need to go somewhere โ€” not silently disappear:

worker.on('failed', async (job, err) => {
  if (job && job.attemptsMade >= job.opts.attempts) {
    await deadLetterQueue.add('failed-job', {
      originalQueue: 'send-email',
      jobData: job.data,
      error: err.message,
      failedAt: new Date().toISOString(),
    });
    
    await alertOps('DLQ: Job ' + job.id + ' permanently failed');
  }
});

Idempotency โ€” the golden rule

A job can be executed multiple times (due to retries, crashes, deploys). Ensure duplicate execution causes no harm:

async function processPaymentWebhook(job) {
  const { eventId, paymentId } = job.data;
  
  const existing = await db.processedEvent.findUnique({
    where: { eventId }
  });
  
  if (existing) {
    console.log('Event ' + eventId + ' already processed, skipping');
    return;
  }
  
  await db.$transaction([
    db.processedEvent.create({ data: { eventId } }),
    db.subscription.update({
      where: { paymentId },
      data: { status: 'active' }
    }),
  ]);
}

Priorities and separate queues

Not every task is equally urgent. Separate your queues based on priority:

// High priority: payment-related
await queue.add('process-payment', data, { priority: 1 });

// Normal priority: email
await queue.add('send-email', data, { priority: 5 });

// Low priority: analytics
await queue.add('update-analytics', data, { priority: 10 });

Or use separate queues with dedicated workers, so a flood of analytics jobs doesn't slow down your payment processing.

Scheduled jobs and cron tasks

Besides event-driven jobs, you also need periodic tasks:

await queue.add('daily-report', {}, {
  repeat: { pattern: '0 8 * * *' },
  jobId: 'daily-report',
});

await queue.add('trial-expiry-check', {}, {
  repeat: { pattern: '0 */4 * * *' },
  jobId: 'trial-expiry-check',
});

await queue.add('cleanup-old-data', {}, {
  repeat: { pattern: '0 3 * * 0' },
  jobId: 'cleanup-old-data',
});

Monitoring and observability

A queue system without monitoring is a ticking time bomb. Implement dashboards for:

  • Queue depth โ€” How many jobs are waiting?
  • Processing time โ€” How long does an average job take?
  • Failure rate โ€” What percentage of jobs fail?
  • DLQ size โ€” How many jobs have permanently failed?
async function collectQueueMetrics() {
  const counts = await queue.getJobCounts(
    'active', 'completed', 'failed', 'delayed', 'waiting'
  );
  
  await metrics.gauge('queue.waiting', counts.waiting);
  await metrics.gauge('queue.active', counts.active);
  await metrics.gauge('queue.failed', counts.failed);
  
  if (counts.waiting > 1000) {
    await alertOps('Queue depth > 1000, check workers!');
  }
}

Deployment and scaling

Workers as a separate process

Do not run workers in your web server process. They deserve their own deployment:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
CMD ["node", "dist/worker.js"]
services:
  web:
    build: .
    ports: ["3000:3000"]
  
  worker:
    build:
      dockerfile: Dockerfile.worker
    deploy:
      replicas: 3
    depends_on:
      - redis
  
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

Graceful shutdown

During deploys, running jobs must be completed properly:

async function gracefulShutdown() {
  console.log('Shutdown initiated, finishing current jobs...');
  await worker.close();
  await redis.quit();
  process.exit(0);
}

process.on('SIGTERM', gracefulShutdown);
process.on('SIGINT', gracefulShutdown);

Common mistakes

โŒ Too much data in the job payload

// Wrong: putting entire file in the job
await queue.add('process', { csvData: entireFileContent }); // Too large

// Right: store a reference, worker fetches data
await queue.add('process', { fileUrl: 's3://bucket/file.csv' }); // Compact

โŒ No timeout configured

A hanging job can block your entire queue. Always set a timeout.

โŒ Keeping queue state in memory

If your worker crashes, everything is lost. Always use a persistent backing store (Redis, PostgreSQL).

โŒ No monitoring

"We'll notice when users complain" is not a strategy.

Conclusion

Background jobs aren't a nice-to-have โ€” they're a fundamental part of every scalable SaaS. Start simple with BullMQ or pgBoss, implement the basic patterns (retry, idempotency, monitoring), and expand as you grow.

The investment in a good queue system pays for itself twice over: better user experience, higher reliability, and an architecture that's ready to scale.

Need help setting up robust background processing in your SaaS? Get in touch โ€” we'd love to help.