Back to Blog
postgresqlarchitecturejob queueengineering

PostgreSQL as a Job Queue: No Redis, No Kafka, No Problem

We use PostgreSQL as GetHook's delivery queue using FOR UPDATE SKIP LOCKED. Here's why this pattern is underrated, when it breaks down, and the exact schema we use in production.

L
Lena Hartmann
Infrastructure Engineer
October 28, 2025
10 min read

Every time I tell another engineer that GetHook runs its delivery queue on Postgres without Redis or Kafka, I get some variation of: "Really? Is that production-grade?"

Yes. Here's why it works, how it works, and when you should use a different approach.


The Postgres Job Queue Pattern

The core idea is simple. You have an events table with a status column and a next_attempt_at timestamp. Your worker polls for events that are ready to process:

sql
-- Worker claims a batch of events atomically
SELECT id, payload, destination_id, attempt_number
FROM events
WHERE status = 'queued'
  AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED

FOR UPDATE SKIP LOCKED is the magic. It:

  1. Locks the rows for the current transaction
  2. Skips rows that are already locked by another transaction

This means you can run 10 worker processes on the same table, and they'll each pick up different events. No coordination required, no message broker, no Redis.

After claiming events, the worker delivers them and updates the status:

sql
UPDATE events
SET status = 'delivered',
    delivered_at = NOW()
WHERE id = $1

If delivery fails, schedule a retry:

sql
UPDATE events
SET status = 'retry_scheduled',
    next_attempt_at = NOW() + INTERVAL '30 seconds',
    attempt_number = attempt_number + 1
WHERE id = $1

The Complete Schema

Here's the schema GetHook uses for events and delivery tracking:

sql
CREATE TYPE event_status AS ENUM (
  'received',
  'queued',
  'delivering',
  'delivered',
  'retry_scheduled',
  'dead_letter',
  'replayed'
);

CREATE TABLE events (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  account_id        UUID NOT NULL REFERENCES accounts(id),
  source_id         UUID REFERENCES sources(id),
  destination_id    UUID REFERENCES destinations(id),
  event_type        TEXT,
  payload           JSONB NOT NULL,
  status            event_status NOT NULL DEFAULT 'received',
  attempt_number    INTEGER NOT NULL DEFAULT 0,
  max_attempts      INTEGER NOT NULL DEFAULT 5,
  next_attempt_at   TIMESTAMPTZ DEFAULT NOW(),
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  delivered_at      TIMESTAMPTZ
);

-- Critical indexes for the worker poll query
CREATE INDEX idx_events_queue
  ON events (status, next_attempt_at)
  WHERE status IN ('queued', 'retry_scheduled');

CREATE INDEX idx_events_account
  ON events (account_id, created_at DESC);

The partial index WHERE status IN ('queued', 'retry_scheduled') is critical. Without it, the worker scans the entire events table on every poll. With it, the index only includes undelivered events — typically a small fraction of the total.


Performance Benchmarks

Here's what to expect from a Postgres job queue on typical cloud hardware:

Events / SecondWorkersDB CPULatency (p50)Notes
101< 5%< 100msSingle process
1005~15%< 200msComfortable range
50020~40%< 500msWorks well
1,00050~65%~1sApproaching limits
2,000100~85%~3sConsider partitioning
5,000+Upgrade to Kafka

For most SaaS companies (< 1,000 events/second = < 86M events/day), Postgres is more than sufficient and dramatically simpler than Kafka.


Why Not Redis?

Redis is often the first thing people reach for when they need a queue. LPUSH and BRPOP are fast, simple, and well-understood.

ConcernPostgreSQLRedis
DurabilityWAL-based, fully durableDepends on appendonly config
Exactly-once semanticsWith FOR UPDATE SKIP LOCKEDNeeds GETDEL + Lua scripting
Dead-letter queueJust another row statusSeparate list, manual management
Event history / replayRows persist, query anytimeKeys expire, no history
Visibility into queueSQL querySeparate tooling (RedisInsight)
Additional infrastructureNot needed (you have Postgres)Another service to run + monitor
Cost$0 additional$50–$500/month for managed Redis

The key insight: you already have Postgres. Webhook events need to be stored anyway (for replay, delivery logs, audit trails). Storing them in the same Postgres instance that serves as the queue means you don't need a separate system.


Why Not Kafka?

Kafka is excellent for very high-throughput event streaming (millions of events/second) with multiple consumer groups. It's also:

  • Complex to operate (ZooKeeper or KRaft, brokers, partitions, consumer groups)
  • Expensive in managed form ($200–$2,000/month on Confluent Cloud)
  • Overkill for < 10K events/second

For GetHook's target use case (SMB to mid-market SaaS, up to ~1M events/month), Postgres is the right tool. We can always add Kafka later if a customer's scale demands it. Starting with Kafka means paying Kafka's operational complexity from day one.


Handling Worker Crashes: The Heartbeat Problem

One subtlety: if a worker crashes while holding a lock (between claiming an event and updating its status), the lock releases automatically when the database connection closes.

But here's the edge case: what if the worker successfully delivers the event to the destination but crashes before updating the database? The event is stuck in delivering status forever.

GetHook's fix: A cleanup job runs every 5 minutes:

sql
-- Reset events stuck in 'delivering' for more than 2 minutes
UPDATE events
SET status = 'queued',
    next_attempt_at = NOW()
WHERE status = 'delivering'
  AND updated_at < NOW() - INTERVAL '2 minutes'

This might cause a duplicate delivery (the destination gets the same event twice), but that's preferable to losing the event. Combined with idempotency keys, this is safe.


Dead-Letter Queue Management

When an event reaches max_attempts, it moves to dead_letter status:

sql
UPDATE events
SET status = 'dead_letter'
WHERE id = $1
  AND attempt_number >= max_attempts

Dead-letter events are visible in the GetHook dashboard. You can inspect the delivery attempts, see the error responses, fix the underlying issue, and then replay the event:

sql
-- Replay a dead-letter event
INSERT INTO events (
  account_id, source_id, destination_id, event_type,
  payload, status, original_event_id
)
SELECT account_id, source_id, destination_id, event_type,
       payload, 'queued', id
FROM events
WHERE id = $1;

UPDATE events SET status = 'replayed' WHERE id = $1;

Scaling Beyond Postgres

If you outgrow the Postgres queue (typically > 5,000 events/second sustained), the migration path is:

  1. Partition the events table by account_id or created_at — extends Postgres throughput significantly
  2. Upgrade to Amazon SQS — managed, scales to any throughput, but no replay/history
  3. Upgrade to Kafka or Redpanda — best for multi-consumer scenarios at extreme scale

For GetHook, we've designed the delivery layer with an abstraction so we can swap the queue backend without changing the rest of the system. But for the vast majority of customers, Postgres is the right choice and we don't plan to change it.


The Underrated Advantage: Observability

With Postgres as your queue, every event is a row in a table. You can:

sql
-- Events in dead-letter for account
SELECT * FROM events
WHERE account_id = $1
  AND status = 'dead_letter'
ORDER BY created_at DESC;

-- Delivery success rate last 24 hours
SELECT
  destination_id,
  COUNT(*) as total,
  COUNT(*) FILTER (WHERE status = 'delivered') as delivered,
  ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'delivered') / COUNT(*), 2) as success_rate
FROM events
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY destination_id;

-- Average retry count for failed events
SELECT AVG(attempt_number) FROM events WHERE status = 'dead_letter';

With Kafka or Redis, you'd need a separate data store for this kind of historical analysis. With Postgres, it's just SQL.


Conclusion

The Postgres job queue pattern is underrated. For event volumes up to ~1M events/day, it provides:

  • Full durability (WAL-backed)
  • Concurrent worker support (FOR UPDATE SKIP LOCKED)
  • Event history and replay (just rows in a table)
  • Zero additional infrastructure
  • SQL-based observability

Don't reach for Redis or Kafka until your volume genuinely demands it. Start simple, instrument well, and scale when you have data that says you need to.

This is the pattern GetHook uses in production. If you want to see the full implementation, the code is on GitHub.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.