Webhook infrastructure is one of those things every SaaS company needs but few founders anticipate. You're three weeks from launch, Stripe is set up, and then you realize: you need to handle payment.succeeded, customer.subscription.deleted, invoice.payment_failed, and twenty other event types — reliably.
This guide walks through webhook infrastructure decisions at three stages: MVP, growth, and scale.
Stage 1: MVP (< 10K events/month)
At the MVP stage, you have one integration, a few dozen customers, and limited engineering bandwidth. The right approach is to keep it simple and move fast.
What You Need
- ›A webhook endpoint that accepts events
- ›Basic signature verification
- ›A simple async handler
- ›Logging so you can debug failures
What You Don't Need Yet
- ›Retry infrastructure
- ›Fan-out routing
- ›Dead-letter queues
- ›Event replay
- ›Multi-tenant isolation
The MVP Architecture
Stripe ──→ POST /webhooks/stripe ──→ SQS/Background job ──→ Your handler
Or even simpler for a true MVP:
Stripe ──→ POST /webhooks/stripe ──→ Your handler (synchronous, 5 second timeout)
Yes, synchronous is acceptable for MVP. Stripe has built-in retry. Your handler is simple. Just make sure to:
- ›Verify the signature
- ›Respond 200 before doing anything expensive
- ›Keep handlers under 3 seconds
Recommended Stack
- ›Receiver: Stripe's built-in webhook testing tool for local dev, expose with ngrok or Cloudflare Tunnel
- ›Processing: Simple function call or background job queue (Sidekiq, Celery, BullMQ)
- ›Database: Just write to your existing app DB
- ›Monitoring: Stripe Dashboard → Webhooks tab (shows delivery attempts for free)
Cost
$0 additional. Stripe's webhook delivery is free. Your existing web server handles it. You don't need GetHook yet.
When to move to Stage 2: When you have more than 3 webhook integrations, start seeing delivery failures, need fan-out routing, or want to replay missed events.
Stage 2: Growth (10K–500K events/month)
You've shipped the MVP. Customers are using it. You've added Stripe, GitHub, Shopify, and maybe Twilio. Event failures are causing occasional customer complaints.
This is when webhook reliability becomes a real investment.
What Goes Wrong at This Stage
| Problem | Frequency | Impact |
|---|---|---|
| Provider webhook bursts (traffic spike) | 1–2×/week | Queue backup, delayed processing |
| Destination restart during deploy | 3–5×/day | Missed events for 30s windows |
| Multi-provider signature format differences | Ongoing | Developer confusion, bugs |
| Customer asks "why didn't I get this event?" | 1–5×/week | Support burden |
| Need to backfill a new destination | 1×/quarter | Manual work |
The Growth Architecture
┌─── Billing Service
Stripe ──┐ │
GitHub ──┼──→ GetHook Gateway ─────┼─── Fulfillment Service
Shopify ─┤ (ingest + queue) │
Twilio ──┘ └─── Analytics / Reporting
Each provider's events are accepted at the GetHook ingest layer, verified, queued, and fanned out to the right destinations based on event type patterns.
Key Capabilities You Now Need
1. Fan-out routing
Different event types go to different destinations:
payment.succeeded → billing-service, email-service, analytics
payment.failed → billing-service, alerting
order.shipped → fulfillment-service, email-service, sms-service
user.signup → crm-service, onboarding-service
2. Independent retry per destination
If your email service is down, fulfillment shouldn't be blocked. Each destination has its own retry queue.
3. Event replay
When you deploy a new service, you need to replay 30 days of events. When a bug causes incorrect processing, you need to re-process specific events.
4. Delivery observability
"Did webhook X reach service Y?" should be answerable in 30 seconds with a timestamp and HTTP response code.
Provider-Specific Considerations
At this stage, you're integrating with multiple providers. Each has its own signature format:
| Provider | Events typically needed |
|---|---|
| Stripe | payment_intent.*, customer.subscription.*, invoice.* |
| GitHub | push, pull_request, deployment |
| Shopify | orders/*, products/*, fulfillments/* |
| Twilio | message-status.*, call.* |
Using GetHook abstracts the signature format differences — you configure the verification preset per source, and your handlers receive pre-verified events with a consistent format.
Cost Benchmark (Stage 2)
| Approach | Monthly Cost | Engineering Time |
|---|---|---|
| Build in-house | $300–$600/month infra + 40h/month maintenance | 8–12h/month ongoing |
| GetHook Growth plan | $49/month | ~2h integration, ~0 ongoing |
Stage 3: Scale (500K–10M events/month)
You've raised a Series A or B. Engineering team is 10–30 people. Webhooks are serious infrastructure — downtime has direct revenue impact measured in thousands of dollars per hour.
What Changes at Scale
Multi-tenancy becomes critical. You're now both a consumer (receiving from providers) and a producer (delivering to your customers' endpoints). Your platform needs per-customer:
- ›Separate signing secrets
- ›Independent retry and dead-letter
- ›Delivery logs visible to customers via your own dashboard
- ›Custom domains for white-labeled delivery
Compliance and audit requirements arrive. SOC 2, PCI-DSS, and enterprise customer security reviews start asking about event audit trails, data retention policies, and encryption at rest.
Outbound webhooks become a product feature. Your largest customers want to configure webhooks from your platform to their own systems. This is 3–6 months of engineering work if built in-house.
The Scale Architecture
┌──────────────────────────────┐
External Providers │ GetHook │
Stripe, GitHub, ─────────────────→│ Ingest → Queue → Fan-out │
Shopify, etc. │ │
│ Per-source HMAC verification │
│ Per-destination retry │
Your Platform │ Per-tenant isolation │
(your app) ─────────────────────→│ Outbound delivery │
│ │
└──────────────────┬───────────┘
│
┌──────────────────┼───────────┐
│ Your customers │ │
│ Customer A ─────┘ │
│ Customer B ────────────────│
│ Customer C ────────────────│
└────────────────────────────┘
White-Labeling for Outbound
When your customers configure webhook endpoints in your product, they receive signed events from a domain like webhooks.yourapp.com (not webhooks.gethook.to). GetHook's custom domain support makes this transparent.
Each customer has:
- ›A unique signing secret (used to verify events you send them)
- ›A custom domain for the webhook portal
- ›Independent delivery logs and retry controls
Compliance at Scale
| Requirement | GetHook Feature |
|---|---|
| Data at rest encryption | AES-256-GCM for secrets, Postgres-level encryption for payloads |
| API key audit trail | Key prefix + creation time logged, full keys never stored |
| Data retention controls | Configurable retention period, automatic cleanup |
| Tenant data isolation | account_id filtering enforced at all queries |
| Immutable delivery logs | delivery_attempts table is append-only |
Choosing Between Build vs. Buy at Each Stage
| Stage | Events/month | Recommendation | Reason |
|---|---|---|---|
| MVP | < 10K | Build basic | Too early to invest |
| Early growth | 10K–100K | Use GetHook | Provider complexity, fan-out needs |
| Growth | 100K–1M | Use GetHook | Reliability SLA, multi-tenant needs |
| Scale | 1M–10M | Use GetHook (enterprise) | Compliance, white-labeling, outbound |
| Hyper-scale | > 10M | Evaluate options | May need custom infrastructure |
Common Mistakes Startup Founders Make
1. Building retry before building idempotency
Retry without idempotency = duplicate charges. Always implement idempotency first.
2. Using the same webhook secret for all customers
A leaked secret from one customer compromises all of them. Per-customer secrets are non-negotiable at Stage 3+.
3. Not monitoring the dead-letter queue
Dead-letter events accumulate silently. Alert when DLQ grows, and review them weekly.
4. Logging raw webhook bodies
Webhook bodies often contain PII and sensitive data. Log event IDs and types, not bodies.
5. Treating webhook infrastructure as a "later" problem
The cost of retrofitting reliability onto an unreliable system is always higher than building it right the first time. If you're at 10K+ events/month and still using a simple HTTP handler with no retry, upgrade now.
Quick-Start Checklist
MVP → Growth transition:
- › Set up GetHook account (10 minutes)
- › Configure one source per provider (Stripe, GitHub, etc.)
- › Set up destinations for each internal service
- › Create routes with event type patterns
- › Test delivery end-to-end
- › Set up dead-letter queue alerting
Growth → Scale transition:
- › Enable per-customer signing secrets
- › Configure custom domain for outbound delivery
- › Set up brand settings for white-labeled portal
- › Review data retention policies
- › Enable delivery logs for customer-facing observability
- › Test replay from dead-letter queue
Conclusion
Webhook infrastructure isn't glamorous, but it's load-bearing. Get the foundation right at the MVP stage (verify signatures, don't lose events), invest in reliability at the growth stage (retry, fan-out, observability), and build for multi-tenancy at the scale stage (per-customer secrets, white-labeling, compliance).
GetHook is designed to grow with you through all three stages without changing your integration code.