Charity Donation App

Reading guide This article is written for backend engineers preparing for system design interviews or building real-world payment-heavy systems.
By the end, you should be able to:

Explain why a charity donation platform needs an intent-based payment model on top of Stripe PaymentIntents

Sketch the end-to-end flow from “Click Donate” to eventual stats update

Reason about idempotency, webhooks, reconciliation, and event-driven analytics under high load (10M+ donations in 3 days)

1. Overview

Summary
A high-volume charity donation platform using Stripe PaymentIntent and an internal intent-based state machine, with RabbitMQ for events and a reconciliation worker to guarantee no duplicate charges, no lost payments, and eventual consistency under load.
Scale assumptions
Total volume
~10M donations
Duration
3-day event
Peak load
hundreds QPS
Design assumptions
horizontal scaling of the Donation API, Stripe handling payment throughput, queue-based decoupling for stats and notifications
Design goals
No duplicate charges
At most one successful charge per donation intent.
Enforced via Stripe idempotency keys and an internal state machine.
No lost payments
Every intent reaches a terminal state (SUCCEEDED or FAILED).
Webhooks plus a reconciliation worker resolve PENDING/UNKNOWN against Stripe.
Eventual consistency
The system converges to a consistent state under failures or delayed webhooks.
State machine, webhooks, and reconciliation worker.
Minimal PCI exposure
Card data never touches our servers.
Client-side tokenization (Stripe Elements); minimal PCI scope.
Near real-time charity statistics
Per-charity totals updated and exposed at high QPS.
Event-driven updates via RabbitMQ and Redis cache.
Functional requirements
Core business flows supported by the system.
Create donation intent
User selects charity, amount, and payment method; system creates an internal intent and a Stripe PaymentIntent.
Returns intent ID and client secret to the client.
Confirm payment
User completes payment on the client (Stripe Elements).
Stripe handles tokenization, 3DS, and authorization; client receives success, failure, or pending.
Webhook handling
System receives Stripe webhooks (payment_intent.succeeded / payment_intent.payment_failed).
Updates intent status; publishes events for stats and notifications.
Charity stats
Per-charity totals (amount, count) updated eventually and exposed via API.
Read path cached (e.g. Redis) for high QPS.
Receipt / notification
User receives a receipt (e.g. email) after successful payment.
Sending is asynchronous and idempotent.
Reconciliation
System periodically resolves intents stuck in PENDING/UNKNOWN by querying Stripe.
Publishes reconciled events so stats and side effects stay consistent.
Multi-charity
Multiple charities supported; each donation tied to a charity.
Stats isolated per charity.
Non-functional requirements
System guarantees under expected and peak load.
No duplicate charges
At most one successful charge per donation intent.
Stripe idempotency keys and internal state machine.
No lost payments
Every intent eventually reaches SUCCEEDED or FAILED.
Webhooks plus reconciliation worker cover missed or delayed webhooks.
Scalability
System handles ~10M donations over 3 days.
Donation API and consumers scale horizontally; payment throughput delegated to Stripe.
Latency
Intent creation and redirect/confirmation stay within acceptable latency.
Heavy work (stats, email) off the hot path.
PCI / security
Card data never touches our servers.
Client-side tokenization (Stripe Elements); minimal PCI scope.
Observability
Monitoring and alerts for success rate, failure rate, UNKNOWN rate, webhook/queue lag, reconciliation backlog.
Availability
Payment and intent creation remain available under expected load.
Dependencies (DB, Stripe, queue) have clear failure modes and mitigations.

2. System Architecture

Core components and their responsibilities in the donation workflow.

Client (Web / iOS / Android)Client
Collects donation details and payment method; confirms payment via Stripe SDK.
No card data on our servers.
Donation API ServiceCore Service
Creates and tracks donation intents; issues Stripe PaymentIntents; handles webhooks; publishes events.
Central orchestration; horizontal scaling.
StripeExternal
Payment provider: PaymentIntents, tokenization, webhooks, idempotency.
Relational DatabaseStorage
Primary source of truth for donation_intents, webhook_events, charity_stats.
Single source of truth; scales with Donation API.
RabbitMQAsync
Message broker / event bus for donation.succeeded and downstream consumers.
Decouples payment path from stats and notifications.
Notification ConsumerAsync
Consumes events; sends receipts (e.g. email) with idempotency.
Reconciliation WorkerAsync
Scans PENDING/UNKNOWN intents; queries Stripe to finalize state.
Publishes reconciled events so stats and side effects stay consistent.
RedisStorage
Stats cache for high-QPS charity totals.
Read path for per-charity totals.

High-level architecture diagram (click to zoom):

3. Payment Model Overview (Stripe)

Capabilities provided by Stripe

PaymentIntents API
Create and confirm payment intents; single primitive for charge lifecycle.
Client-side tokenization (Stripe Elements)
Collect card details securely without touching our servers.
Webhooks
Real-time events for payment_intent.succeeded and payment_failed.
Idempotency Keys
At-most-once charge per key; we pass a key per donation intent.
Transaction lookup APIs
Query Stripe to resolve PENDING/UNKNOWN and reconcile state.

How this system uses them

Stripe
Stripe Elements
Minimizes PCI exposure; card data never touches our servers.
Stripe PaymentIntent
External payment execution primitive; we create and confirm via API.
Our system
Internal donation_intents table
Business-level intent state anchor; state machine lives here.

High-level idea

Money Flow
Stripe PaymentIntent is the source of truth.
Business Flow
Internal donation_intents table is the state machine.
Webhooks + a reconciliation worker bridge the two worlds and guarantee eventual consistency.

4. End-to-End Payment Flow

4.1 Sequence Diagram

Create Intent
1Client creates donation intent
2Server creates Stripe PaymentIntent
Payment Confirmation
3Client confirms payment
Webhook & Publish
4Stripe sends webhook
5System updates state + publishes event
Reconcile
6Reconciliation worker resolves pending/unknown states

System guarantees
No duplicate chargesIdempotency keys and webhook deduplication ensure at-most-once charge per intent.
No lost paymentsWebhooks plus a reconciliation worker cover failures and long-running payments.
Eventual consistencyStripe and internal state stay in sync via webhooks and reconciliation.

4.2 Detailed Flow (Stripe-based)

PhasesIntent CreationPayment ConfirmationWebhook ProcessingReconciliation / Convergence
Intent Creation
SYNC
Flow: Client → API → DB → Stripe → Client
Main Flow
Client sends charityId, amount_cents, currency, email (optional request_id for idempotency).
Server validates input; uses Redis SETNX to lock request_id (5–10s) to prevent duplicate intent creation.
Insert row into donation_intents with status = CREATED; obtain intentId.
Create Stripe PaymentIntent with amount, currency, metadata: { intentId }.
Return intentId and client_secret to the client.
Guarantee: At-most-once intent creation
request_id → same intentId + client_secret (idempotent response).
Redis SETNX lock prevents duplicate inserts for the same request_id.
API / Webhook involved
POST/v1/donations/intent
DB writes
donation_intents (intent_id, status=CREATED, charity_id, amount_cents, stripe_payment_intent_id)
Failure & retry
Client retries with same request_id; server returns existing intent.
Stripe PaymentIntent create is retried with backoff.
Payment Confirmation
SYNC
Flow: Client → Stripe (no server in path)
Main Flow
User enters card details in Stripe-hosted fields (PCI-safe); client calls stripe.confirmCardPayment(client_secret).
Stripe handles tokenization, 3DS (SCA), authorization and capture.
Client receives immediate success/failure or "processing" (finalized later via webhook).
No server call in this step; Donation Service does not see card data.
Guarantee: No duplicate confirmations
Stripe deduplicates by PaymentIntent; same confirm returns same result.
Client can send optional request_id for idempotent UI retries.
API / Webhook involved
Stripe.js confirmCardPayment (client-side)
DB writes
None in this phase (state updated in Webhook Processing).
Failure & retry
Client retries confirmCardPayment; Stripe returns same result.
For "processing", wait for webhook; no server retry in this phase.
Webhook Processing
ASYNC
Flow: Stripe → API → DB + RabbitMQ
Main Flow
Stripe sends POST to /v1/webhooks/stripe: payment_intent.succeeded, payment_intent.payment_failed, payment_intent.processing.
Deduplicate by Stripe event_id (store in webhook_events or Redis); Redis SETNX lock per payment_intent (5–10s).
Read intentId from PaymentIntent metadata; update donation_intents to SUCCEEDED or FAILED.
Publish domain event to RabbitMQ (e.g. donation.succeeded) for stats and email consumer.
Guarantee: At-most-once webhook processing
event_id stored; at-most-once processing per event.
Redis lock per payment_intent (5–10s) prevents duplicate handler runs.
API / Webhook involved
POST/v1/webhooks/stripe
payment_intent.succeeded / payment_failed / processing
DB writes
webhook_events (event_id); donation_intents (status); outbox if used.
Failure & retry
Stripe retries webhook with exponential backoff.
Return 200 after processing to avoid duplicate delivery.
Reconciliation / Convergence
RECONCILE
Flow: Worker → Stripe API → DB + RabbitMQ
Main Flow
Reconciliation worker periodically scans donation_intents with status PENDING or UNKNOWN.
Queries Stripe API for each PaymentIntent to get final state (succeeded, failed, expired).
Updates donation_intents to SUCCEEDED / FAILED / EXPIRED; publishes reconciled events to RabbitMQ.
Downstream consumers (stats, email) process events idempotently; DB and Redis converge.
Guarantee: Read-only convergence, no duplicate charges
Worker updates only if internal state still PENDING/UNKNOWN (read-only convergence).
Consumers use event_id or intent_id for dedupe; no duplicate charges from reconciliation.
API / Webhook involved
Stripe API: retrieve PaymentIntent (server-side)
DB writes
donation_intents (status); charity_stats; Redis cache (per-charity totals).
Failure & retry
Worker retries on next run; Stripe API retries with backoff.
Read-only reconciliation; no duplicate charges.
Guarantee mappingWhich phases enforce each guarantee
No duplicate charges
Intent Creation (request_id + Redis lock), Webhook Processing (event_id + Redis lock per payment_intent).
No lost payments
Webhook Processing (Stripe retries); Reconciliation (worker resolves PENDING/UNKNOWN and long-running processing).
Eventual consistency
Webhook Processing (updates DB + publish); Reconciliation (worker converges state); consumers (idempotent stats + Redis).

5. Payment State Machine

Visual flow
PENDING→PROCESSING→SUCCEEDED/FAILED
State definitions
PENDINGIntent created; payment not yet attempted.
Created when client requests an intent with charityId, amount, and optional request_id. Redis SETNX lock for idempotency.
PROCESSINGPayment in progress; waiting for Stripe and webhook.
Intent stays here until Stripe confirms success, failure, or timeout. Webhook or reconciliation will transition to final state.
SUCCEEDEDPayment succeeded. Final state.
Stats and side effects (e.g. notifications) are driven by events. No further transitions allowed.
FAILEDPayment failed or expired. Final state.
failure_reason is recorded. No further charge attempts.
Transition table
Initial state Event Target state Action
PENDING Client confirms payment PROCESSING Call Stripe; create/update PaymentIntent
PROCESSING Webhook: payment_intent.succeeded SUCCEEDED Update intent; publish donation.succeeded
PROCESSING Webhook: payment_intent.payment_failed FAILED Update intent; set failure_reason
PROCESSING Reconciliation (timeout) FAILED Resolve via Stripe API
LOGIC HIGHLIGHTING
Transition guard
Only when current state is PROCESSING (or UNKNOWN for reconciliation) do we accept a webhook update to SUCCEEDED. Use WHERE status IN ('PROCESSING', 'UNKNOWN') and atomic update so only one transition wins.
Idempotency
Same request_id returns the same intent_id and client_secret. Redis SETNX lock prevents duplicate intent creation. Once in SUCCEEDED or FAILED, no further charge is possible.
Concurrency handling
Webhook and reconciliation may both try to update the same intent. Use atomic updates and publish events after DB commit for eventual consistency.

Initial state	Event	Target state	Action
PENDING	`Client confirms payment`	PROCESSING	Call Stripe; create/update PaymentIntent
PROCESSING	`Webhook: payment_intent.succeeded`	SUCCEEDED	Update intent; publish donation.succeeded
PROCESSING	`Webhook: payment_intent.payment_failed`	FAILED	Update intent; set failure_reason
PROCESSING	`Reconciliation (timeout)`	FAILED	Resolve via Stripe API

6. Reconciliation Flow

Reconciliation is responsible for cleaning up long-lived PENDING / UNKNOWN intents when:

Webhooks are not received
Stripe API calls time out
PAYMENT_PENDING has lasted longer than a safe threshold


Pending/Unknown intents
Reconciliation Worker
Query Stripe API
Stripe response
Update donation_intents
Publish reconciled event
Final
SUCCEEDEDFAILEDEXPIRED

When Stripe returns processing, the worker only updates next_reconcile_at and retry_count (no event published); see §6.2.

6.1 Reconciliation Sequence (Stripe + RabbitMQ)

6.2 When Stripe Returns `processing` (Long-Running Payment)

Principle
When Stripe returns `processing`, do **not** mark the intent as failed. Use **exponential backoff** to reschedule checks and a **final TTL** (e.g. 24h); only then mark as **EXPIRED** if still unresolved.

If the Reconciliation Worker queries Stripe and the PaymentIntent is still processing (or in API terms, not yet succeeded or failed), the payment is in a long-running state. Examples: async payment methods (wire transfer, Sofort, SEPA), bank fraud checks, or 3DS opened but not completed. The charge may complete minutes or hours later.

Standard handling: exponential backoff + final TTL

Step 1 — Preserve non-terminal state

Keep local status as PAYMENT_PENDING (or UNKNOWN). Only move to SUCCEEDED, FAILED, or EXPIRED when appropriate.

Step 2 — Exponential backoff reschedule

Set a next check time with increasing intervals (e.g. 5 min → 15 min → 30 min → 1 h → 4 h). Store this in next_reconcile_at on donation_intents. The worker only picks intents where next_reconcile_at <= NOW().

Step 3 — Retry counter

Increment retry_count each time you re-query and still get processing. Use it to compute the next interval and to cap retries or alert if abnormally high.

Step 4 — Final TTL expiration

Define a maximum wait (e.g. 24 hours). If Stripe still returns processing after that, mark the intent as EXPIRED. In practice Stripe usually resolves within 24h; for large amounts you may notify the user or support.

Example logic (Java):

Java
1public void reconcileProcessingIntent(DonationIntent intent, PaymentIntent stripeIntent) {
2    // Check if the intent is still processing
3    // See Stripe docs: https://stripe.com/docs/payments/payment-intents/lifecycle
4    if ("processing".equals(stripeIntent.getStatus())) {
5        System.out.println("Processing intent " + intent.getId());
6
7        // TODO: Calculate the next retry time with exponential backoff
8        // e.g. 5m, 15m, 1h, 4h...
9        Instant nextCheckTime = calculateNextRetry(intent.getRetryCount());
10
11        // Check if we've exceeded the maximum TTL (e.g. 24 hours)
12        boolean isExpired = intent.getCreatedAt()
13            .plus(Duration.ofHours(MAX_TTL_HOURS))
14            .isBefore(Instant.now());
15
16        if (isExpired) {
17            // TODO: Mark as EXPIRED or verify one last time
18            donationIntentRepository.updateStatus(intent.getId(), IntentStatus.EXPIRED);
19        } else {
20            // Update the intent with the new retry time and status
21            donationIntentRepository.update(intent.getId(), UpdateIntent.builder()
22                .nextReconcileAt(nextCheckTime)
23                .retryCount(intent.getRetryCount() + 1)
24                .lastStripeStatus("processing")
25                .build());
26        }
27    }
28}

Why processing happens

Async payment methods (wire, Sofort, SEPA, etc.) can take 1–3 business days.
Bank delays (extra fraud or compliance checks).
3DS pending (user opened verification but did not complete or close).

Index for the worker

Use a composite index on (status, next_reconcile_at) so each run can efficiently select intents that are due for a check:

SQL
1WHERE status IN ('PAYMENT_PENDING', 'UNKNOWN') AND next_reconcile_at <= NOW()

This avoids full table scans and keeps reconciliation bounded to intents that are due for a check. The schema in §8.1 includes retry_count, next_reconcile_at, and last_stripe_status for this flow.

7. Event-Driven Stats Update

After a donation succeeds, downstream updates should be event-driven (stats, notifications, audit):

Consumer responsibilities

Deduplicate events (e.g. by intentId + SUCCEEDED).
Retry on transient failures with backoff.
Maintain eventual consistency between DB, cache, and side effects.

7.1 Summary Table (`charity_stats`): Real-Time vs Consistency

Principle
Use a dedicated summary table for per-charity totals; populate it with a strategy that balances real-time needs and consistency (e.g. async consumer + atomic updates + periodic calibration).

Why a separate table?

Although you could compute totals with SELECT SUM(amount_cents) FROM donation_intents WHERE charity_id = 1 AND status = 'SUCCEEDED', once donation volume reaches tens or hundreds of thousands of rows, this query becomes slow and can overload the database. A pre-aggregated charity_stats table (see §8.3) gives fast reads for dashboards and APIs.

Schema (reference): charity_id (PK), total_amount_cents, donation_count, updated_at.

How the table is populated — three common approaches:

Approach	Description	Pros	Cons
A. In-transaction sync update	In the same DB transaction that writes the donation/transaction, run `UPDATE charity_stats SET total_amount_cents = total_amount_cents + :amount WHERE charity_id = :id`.	Strong consistency; DB totals are always correct.	Row lock contention. A hot charity with many donations per second serializes on that single row; throughput drops.
B. Async message-queue driven (recommended)	Webhook only persists the transaction and publishes an event; a Stats Worker (RabbitMQ consumer) runs `UPDATE charity_stats ...`.	Peak smoothing. Bursts of donations are processed in the background; the webhook path stays fast.	Slight delay (e.g. hundreds of ms), acceptable for dashboards.
C. Redis increment + periodic flush	All real-time increments go to Redis (e.g. `HINCRBY`); a worker every N minutes flushes Redis deltas into `charity_stats`.	Very low DB write load; good for extreme peaks (e.g. thousands of donations/sec).	More moving parts; still need a flush worker and calibration.

Best practice: incremental atomic update

Always update the summary table with atomic in-place addition, not read-then-write.

Wrong (race-prone):

SQL
1-- Read then write: another request can overwrite your update
2SELECT total_amount_cents FROM charity_stats WHERE charity_id = 1;
3-- application: new_total = old + 50
4UPDATE charity_stats SET total_amount_cents = 150 WHERE charity_id = 1;

Correct (atomic):

SQL
1UPDATE charity_stats
2SET total_amount_cents = total_amount_cents + :amount_cents,
3    donation_count = donation_count + 1,
4    updated_at = NOW()
5WHERE charity_id = :charity_id;

Eventually consistent safety net (reconciliation as calibrator)

No matter which approach you use, the summary table can drift from the transaction table (e.g. due to bugs or partial failures). A Reconciliation Worker should act as a calibrator:

Periodically (e.g. daily) run:
SELECT charity_id, SUM(amount_cents) AS total FROM donation_intents WHERE status = 'SUCCEEDED' GROUP BY charity_id.
Compare with charity_stats and UPDATE charity_stats with the computed totals to correct drift.

This keeps the summary table eventually consistent with the source of truth (the transaction/intent table).

Recommendation (summary)

Structure: Keep a charity_stats table (§8.3).
Population: Have the RabbitMQ consumer (Stats Worker) that already handles donation events asynchronously update charity_stats after updating Redis (approach B).
Safety: Use atomic SET total_amount_cents = total_amount_cents + :inc (and same for donation_count).
Calibration: Run a daily (or periodic) full SUM from the transaction/intent table and correct charity_stats to handle any residual drift.

7.2 Where to Update Redis: In the Consumer, Not the Webhook

Principle
Redis totals are updated in the message consumer, not in the Webhook. This gives a single writer for stats and eventual consistency without overloading the Webhook path.

Why not update Redis in the Webhook?

Distributed transaction risk: If the Webhook updates the DB and then Redis, and Redis fails (e.g. network blip) after you have already returned 200 OK to Stripe, the Redis total is short by that donation until a reconciliation or calibration run. You cannot roll back the 200.
Latency and throughput: Stripe holds the HTTP connection until the Webhook responds. Any extra work (Redis, push, etc.) in the Webhook slows the response and limits how many Webhooks you can handle per second.
Heavy logic: Even with locks, doing stats and push in the Webhook makes the hot path complex and increases the chance of timeouts and retries.

Standard flow: Webhook as producer, consumer as single writer

Step	Webhook (producer)	Stats consumer
1	Verify signature; persist intent status SUCCEEDED in DB.	—
2	Publish one message to RabbitMQ (e.g. `charity_id`, `amount_cents`, `intent_id`).	—
3	Return 200 OK to Stripe immediately.	—
4	—	Update Redis (e.g. `HINCRBY charity:total:{charity_id} amount_cents`).
5	—	Optionally trigger push (WebSocket / batch notification).
6	—	ACK the message only after Redis (and DB if applicable) update succeeds.

If the consumer does not ACK (e.g. Redis is down), the broker will redeliver the message. No 200 has been sent for that work, so there is no “already committed” response to Stripe.

Failure handling in the consumer

Redis down: Do not ACK. The queue will retry (e.g. after 5 seconds). After N failed attempts, send the message to a dead-letter queue (DLQ) and alert; a compensation job or operator can replay or fix once Redis is healthy.
Duplicate delivery: If the consumer updates Redis but crashes before ACK, the same message can be processed again. Use idempotency: e.g. in Redis, SETNX processed:intent:{intent_id} with a TTL (e.g. 24 hours). If SETNX fails, treat the message as already processed—skip the increment and ACK to avoid double-counting.

Optional: batch aggregation in the consumer

To reduce Redis and push load during bursts:

Batch consume: Pull multiple messages at once (e.g. 50).
Aggregate in memory: Sum amount_cents per charity_id for that batch.
One Redis update per charity: e.g. HINCRBY charity:total:{charity_id} {sum} once per charity in the batch.
One push per charity: Send a single WebSocket or notification per charity for the batch.

Summary

Keep the Webhook thin: verify, persist, publish, return 200. All stats (Redis and DB summary table) and optional push run in the Stats Consumer, with retries, DLQ, and idempotency so the system stays consistent and scalable.

8. Data Model

The data model keeps intent state, webhook deduplication, and read-optimized aggregates separated:

8.1 `donation_intents`

TABLEdonation_intents
One row per user donation intent, used as the durable business anchor for Stripe PaymentIntents.
intent_idUUIDPK
Internal business identifier for the donation intent (also used in URLs and logs).
stripe_payment_intent_idVARCHAR(255)FK
Foreign reference to the Stripe PaymentIntent (`pi_...`).
emailVARCHAR(255)
Donor email address, used for receipts and communication.
charity_idVARCHAR(255)FK
Identifier of the target charity receiving this donation.
amount_centsINTEGER
Donation amount in the smallest currency unit (e.g. cents).
statusVARCHAR(32)
High-level state (CREATED, PAYMENT_PENDING, SUCCEEDED, FAILED, UNKNOWN, EXPIRED).
failure_reasonVARCHAR(255)NULL
Optional machine-readable failure reason when the payment does not succeed.
retry_countINTEGER
Number of reconciliation attempts for this intent (used when Stripe returns processing).
next_reconcile_atTIMESTAMPNULL
When the reconciliation worker should re-check this intent (exponential backoff).
last_stripe_statusVARCHAR(255)NULL
Last status returned from Stripe (e.g. processing) for debugging and backoff logic.
created_atTIMESTAMP
When the intent row was first created.
updated_atTIMESTAMP
Last time the intent row was updated.
Indexes(status, next_reconcile_at)(status, updated_at)(charity_id, status)

8.2 `webhook_events`

TABLEwebhook_events
Stores Stripe webhook deliveries for idempotent processing and auditability.
stripe_event_idVARCHAR(255)PK
Unique Stripe event id, used to deduplicate webhook deliveries.
intent_idUUIDFK
Foreign key back to `donation_intents.intent_id`.
received_atTIMESTAMP
When this webhook was first received by the system.
Indexes(stripe_event_id)(intent_id)

8.3 `charity_stats`

TABLEcharity_stats
Aggregated donation statistics per charity for fast reads.
charity_idVARCHAR(255)PK
Identifier of the charity (matches the primary key in the charities table).
total_amount_centsINTEGER
Total donated amount for this charity in cents.
donation_countINTEGER
Number of successful donations recorded for this charity.
updated_atTIMESTAMP
Last time the aggregated stats row was updated.
Indexes(charity_id)

9. Idempotency Strategy

9.1 Confirm Idempotency

Gateway-level: Stripe PaymentIntent prevents duplicate charges at the provider boundary.
Business-level: internal donation_intents transitions ensure each intent is finalized at most once.

9.2 Webhook Idempotency

Storage guard: webhook_events enforces unique stripe_event_id, so the same webhook cannot be applied twice.

9.3 Consumer Idempotency

Message guard: consumers deduplicate by business key (e.g. intentId + SUCCEEDED).
Replay safety: event replay is supported without double-counting stats.

10. Alternatives Considered

Direct Charge Without Internal Intent
Rejected
Why:Hard to reconcile against Stripe without an internal durable anchor
No single place to reason about business-level state
Synchronous Stats Update
Rejected
Why:Slows down the confirmation endpoint
Couples the hot payment path to aggregation and reporting
No Reconciliation Worker
Rejected
Why:Webhooks are not guaranteed to be delivered
Timeouts and network partitions are inevitable; you need an explicit repair loop

11. Risks & Mitigations

Duplicate Charges

Risk: Users might be charged twice in edge cases.
Mitigation: rely on Stripe PaymentIntent idempotency + enforce a single successful transition per intent.

Lost Webhooks

Risk: Missing webhooks leave intents stuck in PAYMENT_PENDING / UNKNOWN.
Mitigation: Reconciliation Worker periodically re-queries Stripe and finalizes intent state.

Stats Inconsistency

Risk: charity_stats and Redis totals can diverge.
Mitigation: consumer-side deduplication + replay-safe repair path.

12. Monitoring & Metrics

Payment Metrics

Success rate
Failure rate
UNKNOWN rate
Stripe latency p95 / p99

System Metrics

RabbitMQ queue depth / consumer lag
Reconcile backlog
DB lock wait time

Alerts

UNKNOWN ratio > 2%
Spike in Webhook processing errors
Reconciliation backlog growing over time

13. SLOs

Use explicit SLOs so failure handling and scaling decisions stay measurable:

Payment success availability: >= 99.9% successful intent processing (excluding external issuer declines).
Webhook acknowledgment latency: p95 < 2s, p99 < 5s.
UNKNOWN ratio: < 2% over rolling 15 minutes.
Reconciliation completion window: intents in UNKNOWN/PAYMENT_PENDING are revisited within configured backoff + TTL policy.
Stats freshness: dashboard totals converge within an agreed lag window (e.g. <= 60s under normal load).

14. Scaling Strategy

API tier: horizontally scale Donation API instances.
Payment execution tier: delegate charge-path scaling to Stripe.
Queue tier: shard/route RabbitMQ by intentId (routing key or hash).
Consumer tier: auto-scale workers from queue depth + processing lag.
Read tier: front high-QPS stats endpoints with Redis.

15. Summary

This design:

Uses Stripe PaymentIntent in the intended, safe way to avoid duplicate charges.
Separates business intents from payment execution; internal state machine is the anchor.
Treats webhooks as the authoritative source of payment state.
Uses a Reconciliation Worker to handle UNKNOWN states and lost webhooks.
Decouples stats and side effects (emails, notifications) via events.
Maintains strong end-to-end idempotency guarantees.

16. Interview-Oriented Discussion Questions

If you are using this document to prepare for system design interviews, here are some follow-up questions to challenge your understanding:

Hot path vs. cold path
- Where is the true hot path in this system?
- If you had to cut 50% of the complexity, what could you remove from the cold path without violating the SLOs?
Failure modes and trade-offs
- What happens if Stripe Webhooks are delayed by 30 minutes during peak traffic?
- How would you surface “eventual success” vs “final failure” back to the user and to internal operations?
Backpressure and rate limiting
- How would you protect the Donation API from sudden traffic spikes (e.g., a celebrity tweet) without losing valid donations?
- Where would you place rate limiting and circuit breakers (client, API gateway, Donation Service, Stripe)?
Multi-tenant and per-charity isolation
- How would you isolate noisy or misbehaving charities so that they do not affect others?
- Would you shard donation_intents / charity_stats by charity, region, or something else?
Schema and evolution
- If you later add recurring donations or refunds, how would you extend the current data model and state machine?
- Which parts of the design are most fragile under such changes?
Cost and observability
- Which components are likely to dominate your cloud bill (Stripe fees, DB, RabbitMQ, Redis, compute)?
- What metrics and dashboards would you build first to catch regressions in payment success rate?

Try to answer these questions using the flows, state machine, and data model in this article. In a real interview, you can treat this system as a “pattern” and adapt it to any payment-heavy or intent-based workflow.

Charity Donation App

1. Overview

Design goals

Functional requirements

Non-functional requirements