Every successful SaaS reaches a point where not everything can be handled within an HTTP request. Sending emails, generating PDFs, importing data, calling AI models โ if you do this synchronously, your users will suffer through painful load times. The solution? Background jobs and a solid queue architecture.
In this article, we'll show you how to set up a robust system for background tasks, which patterns work, and which pitfalls to avoid.
Why background jobs are essential
Imagine: a user uploads a CSV with 10,000 contacts. Without background jobs, that user has to wait until everything is processed โ that could take minutes. With a queue system, you give immediate feedback ("Your import is being processed") and handle the work in the background.
Typical use cases in SaaS:
- ๐ง Transactional emails and notifications
- ๐ Report and export generation
- ๐ Data synchronization with external systems
- ๐ค AI/ML processing (embeddings, classification)
- ๐ณ Payment provider webhook processing
- ๐ File processing (resize, conversion, virus scanning)
- ๐งน Periodic cleanup and maintenance
The anatomy of a queue system
A queue system consists of three core components:
1. Producer (the sender)
The code that places a task on the queue:
import { queue } from './lib/queue';
export async function handleCSVUpload(userId: string, fileUrl: string) {
const importRecord = await db.import.create({
data: { userId, status: 'processing', fileUrl }
});
await queue.add('process-csv-import', {
userId,
fileUrl,
importId: importRecord.id,
}, {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 },
timeout: 300_000, // 5 minutes max
});
return { status: 'processing', importId: importRecord.id };
}
2. Queue (the buffer)
The buffer between producer and consumer. Popular options:
| Technology | Pros | Cons |
|---|---|---|
| Redis + BullMQ | Fast, mature, great DX | Need to manage/host Redis |
| PostgreSQL + pgBoss | No extra infra needed | Less performant at high volume |
| AWS SQS | Managed, scalable | Vendor lock-in, higher latency |
| RabbitMQ | Feature-rich, routing | More complex to manage |
Our recommendation for most SaaS projects: BullMQ with Redis. It offers the best balance of features, performance, and developer experience. For smaller projects, pgBoss is a smart choice โ you don't need extra infrastructure.
3. Worker (the processor)
The process that picks up tasks from the queue and executes them:
import { Worker } from 'bullmq';
import { redis } from './lib/redis';
const worker = new Worker('process-csv-import', async (job) => {
const { userId, fileUrl, importId } = job.data;
const records = await downloadAndParseCSV(fileUrl);
const batchSize = 100;
for (let i = 0; i < records.length; i += batchSize) {
const batch = records.slice(i, i + batchSize);
await processContactBatch(userId, batch);
await job.updateProgress(
Math.round((i / records.length) * 100)
);
}
await db.import.update({
where: { id: importId },
data: { status: 'completed', processedCount: records.length }
});
await sendNotification(userId,
'Your import of ' + records.length + ' contacts is complete!'
);
}, { connection: redis, concurrency: 5 });
worker.on('failed', (job, err) => {
console.error('Job ' + job?.id + ' failed:', err);
});
Essential patterns for production
Retry with exponential backoff
External services go down sometimes. Always build in retry logic:
await queue.add('send-email', emailData, {
attempts: 5,
backoff: {
type: 'exponential',
delay: 2000, // 2s, 4s, 8s, 16s, 32s
},
});
Dead Letter Queue (DLQ)
Jobs that fail after all retries need to go somewhere โ not silently disappear:
worker.on('failed', async (job, err) => {
if (job && job.attemptsMade >= job.opts.attempts) {
await deadLetterQueue.add('failed-job', {
originalQueue: 'send-email',
jobData: job.data,
error: err.message,
failedAt: new Date().toISOString(),
});
await alertOps('DLQ: Job ' + job.id + ' permanently failed');
}
});
Idempotency โ the golden rule
A job can be executed multiple times (due to retries, crashes, deploys). Ensure duplicate execution causes no harm:
async function processPaymentWebhook(job) {
const { eventId, paymentId } = job.data;
const existing = await db.processedEvent.findUnique({
where: { eventId }
});
if (existing) {
console.log('Event ' + eventId + ' already processed, skipping');
return;
}
await db.$transaction([
db.processedEvent.create({ data: { eventId } }),
db.subscription.update({
where: { paymentId },
data: { status: 'active' }
}),
]);
}
Priorities and separate queues
Not every task is equally urgent. Separate your queues based on priority:
// High priority: payment-related
await queue.add('process-payment', data, { priority: 1 });
// Normal priority: email
await queue.add('send-email', data, { priority: 5 });
// Low priority: analytics
await queue.add('update-analytics', data, { priority: 10 });
Or use separate queues with dedicated workers, so a flood of analytics jobs doesn't slow down your payment processing.
Scheduled jobs and cron tasks
Besides event-driven jobs, you also need periodic tasks:
await queue.add('daily-report', {}, {
repeat: { pattern: '0 8 * * *' },
jobId: 'daily-report',
});
await queue.add('trial-expiry-check', {}, {
repeat: { pattern: '0 */4 * * *' },
jobId: 'trial-expiry-check',
});
await queue.add('cleanup-old-data', {}, {
repeat: { pattern: '0 3 * * 0' },
jobId: 'cleanup-old-data',
});
Monitoring and observability
A queue system without monitoring is a ticking time bomb. Implement dashboards for:
- Queue depth โ How many jobs are waiting?
- Processing time โ How long does an average job take?
- Failure rate โ What percentage of jobs fail?
- DLQ size โ How many jobs have permanently failed?
async function collectQueueMetrics() {
const counts = await queue.getJobCounts(
'active', 'completed', 'failed', 'delayed', 'waiting'
);
await metrics.gauge('queue.waiting', counts.waiting);
await metrics.gauge('queue.active', counts.active);
await metrics.gauge('queue.failed', counts.failed);
if (counts.waiting > 1000) {
await alertOps('Queue depth > 1000, check workers!');
}
}
Deployment and scaling
Workers as a separate process
Do not run workers in your web server process. They deserve their own deployment:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
CMD ["node", "dist/worker.js"]
services:
web:
build: .
ports: ["3000:3000"]
worker:
build:
dockerfile: Dockerfile.worker
deploy:
replicas: 3
depends_on:
- redis
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
Graceful shutdown
During deploys, running jobs must be completed properly:
async function gracefulShutdown() {
console.log('Shutdown initiated, finishing current jobs...');
await worker.close();
await redis.quit();
process.exit(0);
}
process.on('SIGTERM', gracefulShutdown);
process.on('SIGINT', gracefulShutdown);
Common mistakes
โ Too much data in the job payload
// Wrong: putting entire file in the job
await queue.add('process', { csvData: entireFileContent }); // Too large
// Right: store a reference, worker fetches data
await queue.add('process', { fileUrl: 's3://bucket/file.csv' }); // Compact
โ No timeout configured
A hanging job can block your entire queue. Always set a timeout.
โ Keeping queue state in memory
If your worker crashes, everything is lost. Always use a persistent backing store (Redis, PostgreSQL).
โ No monitoring
"We'll notice when users complain" is not a strategy.
Conclusion
Background jobs aren't a nice-to-have โ they're a fundamental part of every scalable SaaS. Start simple with BullMQ or pgBoss, implement the basic patterns (retry, idempotency, monitoring), and expand as you grow.
The investment in a good queue system pays for itself twice over: better user experience, higher reliability, and an architecture that's ready to scale.
Need help setting up robust background processing in your SaaS? Get in touch โ we'd love to help.