PostgreSQL as a Job Queue: No Redis, No Kafka, No Problem

Every time I tell another engineer that GetHook runs its delivery queue on Postgres without Redis or Kafka, I get some variation of: "Really? Is that production-grade?"

Yes. Here's why it works, how it works, and when you should use a different approach.

The Postgres Job Queue Pattern

The core idea is simple. You have an events table with a status column and a next_attempt_at timestamp. Your worker polls for events that are ready to process:

sql

-- Worker claims a batch of events atomically
SELECT id, payload, destination_id, attempt_number
FROM events
WHERE status = 'queued'
  AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED

FOR UPDATE SKIP LOCKED is the magic. It:

›Locks the rows for the current transaction
›Skips rows that are already locked by another transaction

This means you can run 10 worker processes on the same table, and they'll each pick up different events. No coordination required, no message broker, no Redis.

After claiming events, the worker delivers them and updates the status:

sql

UPDATE events
SET status = 'delivered',
    delivered_at = NOW()
WHERE id = $1

If delivery fails, schedule a retry:

sql

UPDATE events
SET status = 'retry_scheduled',
    next_attempt_at = NOW() + INTERVAL '30 seconds',
    attempt_number = attempt_number + 1
WHERE id = $1

The Complete Schema

Here's the schema GetHook uses for events and delivery tracking:

sql

CREATE TYPE event_status AS ENUM (
  'received',
  'queued',
  'delivering',
  'delivered',
  'retry_scheduled',
  'dead_letter',
  'replayed'
);

CREATE TABLE events (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  account_id        UUID NOT NULL REFERENCES accounts(id),
  source_id         UUID REFERENCES sources(id),
  destination_id    UUID REFERENCES destinations(id),
  event_type        TEXT,
  payload           JSONB NOT NULL,
  status            event_status NOT NULL DEFAULT 'received',
  attempt_number    INTEGER NOT NULL DEFAULT 0,
  max_attempts      INTEGER NOT NULL DEFAULT 5,
  next_attempt_at   TIMESTAMPTZ DEFAULT NOW(),
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  delivered_at      TIMESTAMPTZ
);

-- Critical indexes for the worker poll query
CREATE INDEX idx_events_queue
  ON events (status, next_attempt_at)
  WHERE status IN ('queued', 'retry_scheduled');

CREATE INDEX idx_events_account
  ON events (account_id, created_at DESC);

The partial index WHERE status IN ('queued', 'retry_scheduled') is critical. Without it, the worker scans the entire events table on every poll. With it, the index only includes undelivered events — typically a small fraction of the total.

Performance Benchmarks

Here's what to expect from a Postgres job queue on typical cloud hardware:

Events / Second	Workers	DB CPU	Latency (p50)	Notes
10	1	< 5%	< 100ms	Single process
100	5	~15%	< 200ms	Comfortable range
500	20	~40%	< 500ms	Works well
1,000	50	~65%	~1s	Approaching limits
2,000	100	~85%	~3s	Consider partitioning
5,000+	—	—	—	Upgrade to Kafka

For most SaaS companies (< 1,000 events/second = < 86M events/day), Postgres is more than sufficient and dramatically simpler than Kafka.

Why Not Redis?

Redis is often the first thing people reach for when they need a queue. LPUSH and BRPOP are fast, simple, and well-understood.

Concern	PostgreSQL	Redis
Durability	WAL-based, fully durable	Depends on `appendonly` config
Exactly-once semantics	With `FOR UPDATE SKIP LOCKED`	Needs `GETDEL` + Lua scripting
Dead-letter queue	Just another row status	Separate list, manual management
Event history / replay	Rows persist, query anytime	Keys expire, no history
Visibility into queue	SQL query	Separate tooling (RedisInsight)
Additional infrastructure	Not needed (you have Postgres)	Another service to run + monitor
Cost	$0 additional	$50–$500/month for managed Redis

The key insight: you already have Postgres. Webhook events need to be stored anyway (for replay, delivery logs, audit trails). Storing them in the same Postgres instance that serves as the queue means you don't need a separate system.

Why Not Kafka?

Kafka is excellent for very high-throughput event streaming (millions of events/second) with multiple consumer groups. It's also:

›Complex to operate (ZooKeeper or KRaft, brokers, partitions, consumer groups)
›Expensive in managed form ($200–$2,000/month on Confluent Cloud)
›Overkill for < 10K events/second

For GetHook's target use case (SMB to mid-market SaaS, up to ~1M events/month), Postgres is the right tool. We can always add Kafka later if a customer's scale demands it. Starting with Kafka means paying Kafka's operational complexity from day one.

Handling Worker Crashes: The Heartbeat Problem

One subtlety: if a worker crashes while holding a lock (between claiming an event and updating its status), the lock releases automatically when the database connection closes.

But here's the edge case: what if the worker successfully delivers the event to the destination but crashes before updating the database? The event is stuck in delivering status forever.

GetHook's fix: A cleanup job runs every 5 minutes:

sql

-- Reset events stuck in 'delivering' for more than 2 minutes
UPDATE events
SET status = 'queued',
    next_attempt_at = NOW()
WHERE status = 'delivering'
  AND updated_at < NOW() - INTERVAL '2 minutes'

This might cause a duplicate delivery (the destination gets the same event twice), but that's preferable to losing the event. Combined with idempotency keys, this is safe.

Dead-Letter Queue Management

When an event reaches max_attempts, it moves to dead_letter status:

sql

UPDATE events
SET status = 'dead_letter'
WHERE id = $1
  AND attempt_number >= max_attempts

Dead-letter events are visible in the GetHook dashboard. You can inspect the delivery attempts, see the error responses, fix the underlying issue, and then replay the event:

sql

-- Replay a dead-letter event
INSERT INTO events (
  account_id, source_id, destination_id, event_type,
  payload, status, original_event_id
)
SELECT account_id, source_id, destination_id, event_type,
       payload, 'queued', id
FROM events
WHERE id = $1;

UPDATE events SET status = 'replayed' WHERE id = $1;

Scaling Beyond Postgres

If you outgrow the Postgres queue (typically > 5,000 events/second sustained), the migration path is:

›Partition the events table by account_id or created_at — extends Postgres throughput significantly
›Upgrade to Amazon SQS — managed, scales to any throughput, but no replay/history
›Upgrade to Kafka or Redpanda — best for multi-consumer scenarios at extreme scale

For GetHook, we've designed the delivery layer with an abstraction so we can swap the queue backend without changing the rest of the system. But for the vast majority of customers, Postgres is the right choice and we don't plan to change it.

The Underrated Advantage: Observability

With Postgres as your queue, every event is a row in a table. You can:

sql

-- Events in dead-letter for account
SELECT * FROM events
WHERE account_id = $1
  AND status = 'dead_letter'
ORDER BY created_at DESC;

-- Delivery success rate last 24 hours
SELECT
  destination_id,
  COUNT(*) as total,
  COUNT(*) FILTER (WHERE status = 'delivered') as delivered,
  ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'delivered') / COUNT(*), 2) as success_rate
FROM events
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY destination_id;

-- Average retry count for failed events
SELECT AVG(attempt_number) FROM events WHERE status = 'dead_letter';

With Kafka or Redis, you'd need a separate data store for this kind of historical analysis. With Postgres, it's just SQL.

Conclusion

The Postgres job queue pattern is underrated. For event volumes up to ~1M events/day, it provides:

›Full durability (WAL-backed)
›Concurrent worker support (FOR UPDATE SKIP LOCKED)
›Event history and replay (just rows in a table)
›Zero additional infrastructure
›SQL-based observability

Don't reach for Redis or Kafka until your volume genuinely demands it. Start simple, instrument well, and scale when you have data that says you need to.

This is the pattern GetHook uses in production. If you want to see the full implementation, the code is on GitHub.

PostgreSQL as a Job Queue: No Redis, No Kafka, No Problem

The Postgres Job Queue Pattern

The Complete Schema

Performance Benchmarks

Why Not Redis?

Why Not Kafka?

Handling Worker Crashes: The Heartbeat Problem

Dead-Letter Queue Management

Scaling Beyond Postgres

The Underrated Advantage: Observability

Conclusion

Related articles

Multi-Tenant Webhook Architecture: How to Deliver Events to Thousands of Customers

Scaling Webhook Infrastructure to 1 Million Events Per Day

From Zero to Production: Webhook Infrastructure for Startups

Stop losing webhook events.