Every successful SaaS application revolves around data. How you store, structure, and evolve that data alongside your product determines whether your platform scales smoothly or grinds to a halt at the worst possible moment. In this guide, we take a deep dive into database design and migration strategies specifically relevant to SaaS platforms.
Why Database Choices Matter in SaaS
With a traditional web app, you make a database choice once and live with it. SaaS is different: your data grows exponentially with every new customer, your schema must evolve without downtime, and you need to think about data isolation between tenants.
A wrong choice early on can cost you months to recover from. Let's walk through the key decisions.
SQL vs. NoSQL: The Nuanced Reality
When to Choose SQL (PostgreSQL, MySQL)
SQL databases are the default choice for most SaaS applications, and for good reason:
- ACID compliance: transactions are reliable, crucial for billing and user management
- Complex queries: joins, aggregations, and reporting are first-class features
- Schema validation: the database enforces data structure
- Ecosystem: ORMs like Prisma, Drizzle, and TypeORM work excellently with SQL
-- Example: multi-tenant table structure with Row-Level Security
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
plan TEXT NOT NULL DEFAULT 'free',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id),
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Row-Level Security for tenant isolation
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects
USING (org_id = current_setting('app.current_org')::UUID);
When to Choose NoSQL (MongoDB, DynamoDB)
NoSQL isn't a replacement for SQL, but a complement for specific use cases:
- Unstructured data: logs, events, user activity
- High write volumes: IoT data, analytics events
- Flexible schemas: when each tenant needs different fields
- Key-value lookups: caching, sessions, feature flags
The pragmatic approach: use PostgreSQL as your primary database and add Redis or DynamoDB for specific workloads.
Schema Design for Multi-Tenancy
There are three main patterns for multi-tenant databases. Each has trade-offs:
1. Shared Database, Shared Schema (Most Common)
All tenants share the same tables, distinguished by an org_id column.
// Prisma schema example
model Project {
id String @id @default(cuid())
orgId String
org Organization @relation(fields: [orgId], references: [id])
name String
data Json?
createdAt DateTime @default(now())
@@index([orgId])
}
Pros: simple, cost-effective, easy to maintain
Cons: risk of data leakage without proper filtering, "noisy neighbor" problem
2. Shared Database, Separate Schemas
Each tenant gets their own PostgreSQL schema within the same database.
-- Create tenant-specific schema
CREATE SCHEMA tenant_acme;
CREATE TABLE tenant_acme.projects (LIKE public.projects INCLUDING ALL);
Pros: better isolation, easier per-tenant backups
Cons: more complex migration management, higher operational overhead
3. Database per Tenant
Each tenant gets their own database instance.
Pros: maximum isolation, independent scaling
Cons: high cost, complex management, not realistic for 1000+ tenants
Our recommendation: start with option 1 (shared schema with org_id) and implement Row-Level Security in PostgreSQL. This gives you the simplicity of a shared schema with the safety of isolation.
Database Migrations Without Downtime
Migrations are the most nerve-wracking part of database management in production. One bad migration can bring your entire platform down. Here are proven strategies:
The Expand-Contract Pattern
Instead of destructive changes in one step, split them up:
Step 1 — Expand: add the new structure alongside the old
-- Add new column (non-blocking)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false;
Step 2 — Migrate data: populate the new structure with existing data
-- Backfill in batches to avoid overloading the database
UPDATE users SET email_verified = true
WHERE verified_at IS NOT NULL
AND id IN (SELECT id FROM users WHERE email_verified IS NULL LIMIT 1000);
Step 3 — Contract: remove the old structure after validation
-- Only after full validation
ALTER TABLE users DROP COLUMN verified_at;
Tooling for Safe Migrations
// Using prisma/migrations with a CI/CD check
// In your deployment pipeline:
// 1. Run migration against a shadow database
// 2. Verify the migration is reversible
// 3. Measure duration on a copy of production data
// 4. Only then deploy to production
// Example GitHub Actions step:
// - name: Validate migration
// run: |
// npx prisma migrate diff \
// --from-schema-datasource prisma/schema.prisma \
// --to-migrations prisma/migrations \
// --shadow-database-url $SHADOW_DB_URL
Migrating Large Tables
With tables containing millions of rows, you need to be smart:
-- BAD: this locks the entire table
ALTER TABLE events ALTER COLUMN payload TYPE JSONB;
-- GOOD: create a new column and migrate gradually
ALTER TABLE events ADD COLUMN payload_v2 JSONB;
-- Trigger for new data
CREATE OR REPLACE FUNCTION sync_payload() RETURNS TRIGGER AS $$
BEGIN
NEW.payload_v2 := NEW.payload::JSONB;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_payload_trigger
BEFORE INSERT OR UPDATE ON events
FOR EACH ROW EXECUTE FUNCTION sync_payload();
-- Backfill old data in batches of 10,000
DO $$
DECLARE
batch_size INT := 10000;
affected INT;
BEGIN
LOOP
UPDATE events
SET payload_v2 = payload::JSONB
WHERE payload_v2 IS NULL
AND id IN (
SELECT id FROM events WHERE payload_v2 IS NULL LIMIT batch_size
);
GET DIAGNOSTICS affected = ROW_COUNT;
EXIT WHEN affected = 0;
PERFORM pg_sleep(0.1); -- Breathe
END LOOP;
END $$;
Indexing: The 80/20 Rule
Most performance issues in SaaS databases boil down to missing or incorrect indexes.
Essential Indexes for SaaS
-- 1. Always index on tenant ID
CREATE INDEX idx_projects_org ON projects(org_id);
-- 2. Composite indexes for frequently used queries
CREATE INDEX idx_projects_org_created ON projects(org_id, created_at DESC);
-- 3. Partial indexes for active records
CREATE INDEX idx_users_active ON users(email)
WHERE deleted_at IS NULL;
-- 4. GIN index for JSONB searches
CREATE INDEX idx_settings_data ON settings USING GIN(data);
Monitoring Indexes
-- Find unused indexes
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find missing indexes (slow queries)
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
Connection Pooling: A Must for SaaS
Every tenant generates database connections. Without pooling, you'll quickly hit limits.
// PgBouncer configuration (pgbouncer.ini)
// [databases]
// myapp = host=localhost dbname=myapp
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 20
// min_pool_size = 5
// In your Next.js app with Prisma:
// Use a connection string via PgBouncer
// DATABASE_URL="postgresql://user:pass@localhost:6432/myapp?pgbouncer=true"
For serverless environments (Vercel, AWS Lambda), a managed pooler like Neon's connection pooler or Supabase's Supavisor is essential. Without pooling, each serverless invocation creates a new connection — and that doesn't scale.
Backup and Disaster Recovery Strategy
A solid backup plan isn't optional. It's part of your product.
The 3-2-1 Rule
- 3 copies of your data
- 2 different storage media
- 1 off-site backup
# Automated daily backup with pg_dump
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="backup_${TIMESTAMP}.sql.gz"
pg_dump $DATABASE_URL | gzip > /backups/$BACKUP_FILE
# Upload to S3
aws s3 cp /backups/$BACKUP_FILE s3://my-saas-backups/daily/
# Delete local backups older than 7 days
find /backups -name "*.sql.gz" -mtime +7 -delete
Point-in-Time Recovery
Managed databases like Neon, Supabase, and AWS RDS offer point-in-time recovery (PITR). This means you can restore your database to any moment in the past X days. Make sure this is enabled.
Checklist: Production-Ready Database
Before going live, run through this checklist:
- Tenant isolation: every query filters on
org_id(or RLS is active) - Indexes: all foreign keys and frequently used WHERE clauses are indexed
- Connection pooling: PgBouncer or managed pooler is configured
- Migration strategy: expand-contract pattern is documented
- Backups: daily backups + PITR enabled
- Monitoring: slow query log and
pg_stat_statementsactive - Data encryption: at-rest and in-transit encryption enabled
- Soft deletes: important data is never hard-deleted
Conclusion
Database design for SaaS isn't a one-time choice — it's an ongoing process that grows with your product. Start simple with a shared schema and PostgreSQL, implement Row-Level Security for tenant isolation, and use the expand-contract pattern for safe migrations.
The most important advice: measure everything. Use pg_stat_statements, monitor your query times, and only optimize when the data tells you where the bottleneck is. Premature optimization is the root of all evil — but ignoring database performance is the root of all churn.
Need help setting up a scalable database architecture for your SaaS? Get in touch for a free consultation.