From Zero to Production: Webhook Infrastructure for Startups

Webhook infrastructure is one of those things every SaaS company needs but few founders anticipate. You're three weeks from launch, Stripe is set up, and then you realize: you need to handle payment.succeeded, customer.subscription.deleted, invoice.payment_failed, and twenty other event types — reliably.

This guide walks through webhook infrastructure decisions at three stages: MVP, growth, and scale.

Stage 1: MVP (< 10K events/month)

At the MVP stage, you have one integration, a few dozen customers, and limited engineering bandwidth. The right approach is to keep it simple and move fast.

What You Need

›A webhook endpoint that accepts events
›Basic signature verification
›A simple async handler
›Logging so you can debug failures

What You Don't Need Yet

›Retry infrastructure
›Fan-out routing
›Dead-letter queues
›Event replay
›Multi-tenant isolation

The MVP Architecture

Stripe ──→ POST /webhooks/stripe ──→ SQS/Background job ──→ Your handler

Or even simpler for a true MVP:

Stripe ──→ POST /webhooks/stripe ──→ Your handler (synchronous, 5 second timeout)

Yes, synchronous is acceptable for MVP. Stripe has built-in retry. Your handler is simple. Just make sure to:

›Verify the signature
›Respond 200 before doing anything expensive
›Keep handlers under 3 seconds

Recommended Stack

›Receiver: Stripe's built-in webhook testing tool for local dev, expose with ngrok or Cloudflare Tunnel
›Processing: Simple function call or background job queue (Sidekiq, Celery, BullMQ)
›Database: Just write to your existing app DB
›Monitoring: Stripe Dashboard → Webhooks tab (shows delivery attempts for free)

Cost

$0 additional. Stripe's webhook delivery is free. Your existing web server handles it. You don't need GetHook yet.

When to move to Stage 2: When you have more than 3 webhook integrations, start seeing delivery failures, need fan-out routing, or want to replay missed events.

Stage 2: Growth (10K–500K events/month)

You've shipped the MVP. Customers are using it. You've added Stripe, GitHub, Shopify, and maybe Twilio. Event failures are causing occasional customer complaints.

This is when webhook reliability becomes a real investment.

What Goes Wrong at This Stage

Problem	Frequency	Impact
Provider webhook bursts (traffic spike)	1–2×/week	Queue backup, delayed processing
Destination restart during deploy	3–5×/day	Missed events for 30s windows
Multi-provider signature format differences	Ongoing	Developer confusion, bugs
Customer asks "why didn't I get this event?"	1–5×/week	Support burden
Need to backfill a new destination	1×/quarter	Manual work

The Growth Architecture

                                    ┌─── Billing Service
Stripe ──┐                         │
GitHub ──┼──→ GetHook Gateway ─────┼─── Fulfillment Service
Shopify ─┤    (ingest + queue)     │
Twilio ──┘                         └─── Analytics / Reporting

Each provider's events are accepted at the GetHook ingest layer, verified, queued, and fanned out to the right destinations based on event type patterns.

Key Capabilities You Now Need

1. Fan-out routing

Different event types go to different destinations:

payment.succeeded    → billing-service, email-service, analytics
payment.failed       → billing-service, alerting
order.shipped        → fulfillment-service, email-service, sms-service
user.signup          → crm-service, onboarding-service

2. Independent retry per destination

If your email service is down, fulfillment shouldn't be blocked. Each destination has its own retry queue.

3. Event replay

When you deploy a new service, you need to replay 30 days of events. When a bug causes incorrect processing, you need to re-process specific events.

4. Delivery observability

"Did webhook X reach service Y?" should be answerable in 30 seconds with a timestamp and HTTP response code.

Provider-Specific Considerations

At this stage, you're integrating with multiple providers. Each has its own signature format:

Provider	Events typically needed
Stripe	`payment_intent.`, `customer.subscription.`, `invoice.*`
GitHub	`push`, `pull_request`, `deployment`
Shopify	`orders/`, `products/`, `fulfillments/*`
Twilio	`message-status.`, `call.`

Using GetHook abstracts the signature format differences — you configure the verification preset per source, and your handlers receive pre-verified events with a consistent format.

Cost Benchmark (Stage 2)

Approach	Monthly Cost	Engineering Time
Build in-house	$300–$600/month infra + 40h/month maintenance	8–12h/month ongoing
GetHook Growth plan	$49/month	~2h integration, ~0 ongoing

Stage 3: Scale (500K–10M events/month)

You've raised a Series A or B. Engineering team is 10–30 people. Webhooks are serious infrastructure — downtime has direct revenue impact measured in thousands of dollars per hour.

What Changes at Scale

Multi-tenancy becomes critical. You're now both a consumer (receiving from providers) and a producer (delivering to your customers' endpoints). Your platform needs per-customer:

›Separate signing secrets
›Independent retry and dead-letter
›Delivery logs visible to customers via your own dashboard
›Custom domains for white-labeled delivery

Compliance and audit requirements arrive. SOC 2, PCI-DSS, and enterprise customer security reviews start asking about event audit trails, data retention policies, and encryption at rest.

Outbound webhooks become a product feature. Your largest customers want to configure webhooks from your platform to their own systems. This is 3–6 months of engineering work if built in-house.

The Scale Architecture

                                  ┌──────────────────────────────┐
External Providers                │          GetHook             │
Stripe, GitHub, ─────────────────→│ Ingest → Queue → Fan-out    │
Shopify, etc.                     │                              │
                                  │ Per-source HMAC verification │
                                  │ Per-destination retry        │
Your Platform                     │ Per-tenant isolation         │
(your app) ─────────────────────→│ Outbound delivery            │
                                  │                              │
                                  └──────────────────┬───────────┘
                                                     │
                                  ┌──────────────────┼───────────┐
                                  │  Your customers  │          │
                                  │  Customer A ─────┘          │
                                  │  Customer B ────────────────│
                                  │  Customer C ────────────────│
                                  └────────────────────────────┘

White-Labeling for Outbound

When your customers configure webhook endpoints in your product, they receive signed events from a domain like webhooks.yourapp.com (not webhooks.gethook.to). GetHook's custom domain support makes this transparent.

Each customer has:

›A unique signing secret (used to verify events you send them)
›A custom domain for the webhook portal
›Independent delivery logs and retry controls

Compliance at Scale

Requirement	GetHook Feature
Data at rest encryption	AES-256-GCM for secrets, Postgres-level encryption for payloads
API key audit trail	Key prefix + creation time logged, full keys never stored
Data retention controls	Configurable retention period, automatic cleanup
Tenant data isolation	account_id filtering enforced at all queries
Immutable delivery logs	delivery_attempts table is append-only

Choosing Between Build vs. Buy at Each Stage

Stage	Events/month	Recommendation	Reason
MVP	< 10K	Build basic	Too early to invest
Early growth	10K–100K	Use GetHook	Provider complexity, fan-out needs
Growth	100K–1M	Use GetHook	Reliability SLA, multi-tenant needs
Scale	1M–10M	Use GetHook (enterprise)	Compliance, white-labeling, outbound
Hyper-scale	> 10M	Evaluate options	May need custom infrastructure

Common Mistakes Startup Founders Make

1. Building retry before building idempotency

Retry without idempotency = duplicate charges. Always implement idempotency first.

2. Using the same webhook secret for all customers

A leaked secret from one customer compromises all of them. Per-customer secrets are non-negotiable at Stage 3+.

3. Not monitoring the dead-letter queue

Dead-letter events accumulate silently. Alert when DLQ grows, and review them weekly.

4. Logging raw webhook bodies

Webhook bodies often contain PII and sensitive data. Log event IDs and types, not bodies.

5. Treating webhook infrastructure as a "later" problem

The cost of retrofitting reliability onto an unreliable system is always higher than building it right the first time. If you're at 10K+ events/month and still using a simple HTTP handler with no retry, upgrade now.

Quick-Start Checklist

MVP → Growth transition:

› Set up GetHook account (10 minutes)
› Configure one source per provider (Stripe, GitHub, etc.)
› Set up destinations for each internal service
› Create routes with event type patterns
› Test delivery end-to-end
› Set up dead-letter queue alerting

Growth → Scale transition:

› Enable per-customer signing secrets
› Configure custom domain for outbound delivery
› Set up brand settings for white-labeled portal
› Review data retention policies
› Enable delivery logs for customer-facing observability
› Test replay from dead-letter queue

Conclusion

Webhook infrastructure isn't glamorous, but it's load-bearing. Get the foundation right at the MVP stage (verify signatures, don't lose events), invest in reliability at the growth stage (retry, fan-out, observability), and build for multi-tenancy at the scale stage (per-customer secrets, white-labeling, compliance).

GetHook is designed to grow with you through all three stages without changing your integration code.

Start building →