Scalability patterns
The Asset Tokenization Kit achieves 1,000 transactions per second with sub-2-second latency through layered optimization and horizontal scaling. This page explains the scalability architecture from blockchain gas optimization through API caching to frontend code splitting, with metrics showing how the platform maintains performance under production load.
Performance requirements drive architecture decisions
ATK targets three scalability benchmarks: 1,000 blockchain transactions per second throughput, sub-2-second P95 latency from user action to confirmation, and operational cost below $0.50 per transaction including gas fees. These requirements shape every layer of the stack—from smart contract batch operations through API caching strategies to frontend lazy loading.
The unified DALP architecture (Data Access and Logic Platform) enables scalability by consolidating asset logic in a single smart contract system rather than fragmenting across multiple protocols. When you process 10,000 subscription requests, the platform executes a single batch transaction instead of 10,000 individual calls. When investors query portfolio balances, the API serves from cache instead of hitting the blockchain 1,000 times per second. When compliance officers load dashboards, the frontend loads only the modules they need instead of shipping the entire application.
Scalability emerges from three principles: stateless horizontal scaling at the API layer, batch optimization at the blockchain layer, and progressive loading at the frontend layer. The observability stack validates these strategies—the transaction dashboard shows whether you're meeting throughput targets, the latency panel surfaces bottlenecks, and the cost tracking dashboard reveals whether gas optimization is working.
Blockchain layer optimization reduces gas costs and increases throughput
Smart contracts represent the most expensive and slowest layer of the stack. A
single transfer operation costs 50,000 gas (approximately $0.10 at 2 gwei),
and Ethereum mainnet processes 15 transactions per second. ATK optimizes
blockchain interaction through batching, multicall aggregation, and strategic
use of events.
Batch operations consolidate multiple actions into single transactions. When
an issuer distributes coupon payments to 1,000 bondholders, the Yield addon
processes the entire batch in one transaction using batchTransfer. The gas
cost increases from 50,000 to 2,500,000 gas, but the per-holder cost drops from
$0.10 to $0.0025—a 40x reduction. The transaction dashboard shows batch
operation success rates and average gas consumption per item.
Multicall aggregation reduces RPC round trips. When the dApp loads an asset details page, it needs token metadata, compliance status, current supply, and user balance. Without multicall, this requires four separate RPC calls with 200ms latency each (800ms total). With multicall, all four queries bundle into a single call (200ms total). The API latency panel shows multicall effectiveness through reduced P95 response times.
Event-driven architecture offloads computation from mainnet. Instead of storing historical data on-chain (expensive storage slots), ATK emits events and indexes them through TheGraph subgraph. When compliance officers query "all transfers in the last 30 days," the subgraph serves the data in 50ms instead of scanning 43,200 blocks on mainnet (which would take minutes and cost significant RPC provider fees). The subgraph monitoring dashboard tracks indexing lag—currently less than 5 seconds behind the chain head.
RPC endpoint redundancy provides failover and load distribution. The platform configures three RPC providers (Alchemy, Infura, QuickNode) with automatic failover. When one provider rate-limits or experiences downtime, viem retries against the next endpoint. The blockchain connectivity panel shows which provider currently serves traffic and displays error rates per endpoint.
The cost tracking dashboard reveals the impact of these optimizations. Before batch operations, processing 1,000 subscriptions cost $100 in gas fees. After optimization, the same workload costs $25—a 75% reduction while maintaining sub-2-second confirmation times.
API layer caching and rate limiting handle request spikes
The ORPC API layer scales horizontally by deploying stateless instances behind a load balancer. Each instance runs in a container with 2 CPU cores and 2GB memory, handling approximately 1,000 requests per second at 70% CPU utilization. When traffic increases, Kubernetes automatically spawns additional pods within 30 seconds.
Redis caching reduces database load by 80%. Frequently accessed data—asset metadata, user identity claims, token balances—cache in Redis with 5-minute TTLs. When 10,000 investors check their portfolios simultaneously, Redis serves 8,000 requests from cache while only 2,000 hit PostgreSQL. The cache hit rate panel shows current effectiveness (target: >85%) and eviction rate (which indicates whether cache memory is sufficient).
Rate limiting prevents abuse and maintains quality of service. Each
authenticated user receives 100 requests per minute per endpoint.
Unauthenticated requests (public asset listings) allow 20 requests per minute
per IP address. When a client exceeds limits, the API returns
429 Too Many Requests with a Retry-After header. The rate limiting dashboard
shows top consumers and blocked request counts—compliance officers use this to
detect scraping attempts.
Connection pooling prevents database saturation. Each API instance maintains
a pool of 20 PostgreSQL connections managed by Drizzle ORM. With 10 API
instances running, the system maintains 200 application connections. PgBouncer
sits between the application and database, multiplexing those 200 connections
into 20 actual database connections. This prevents the max_connections=100
limit from becoming a bottleneck.
Database read replicas scale read-heavy workloads. Analytics queries, compliance reports, and portfolio aggregations execute against read replicas instead of the primary database. The primary handles writes (subscriptions, transfers, compliance updates), while two replicas handle reads. Replication lag stays below 500ms—the database monitoring panel alerts if lag exceeds 1 second, indicating the primary is overloaded.
Load testing demonstrates API scaling effectiveness. Under sustained 10,000 requests per second load with 10 API instances, the platform maintains P95 latency of 180ms and zero error rate. When load spikes to 15,000 requests per second, auto-scaling adds 5 instances within 45 seconds. During scale-up, P95 latency increases temporarily to 450ms before returning to 180ms once new pods stabilize.
Frontend layer code splitting and lazy loading reduce initial load time
The TanStack Start dApp ships as a server-side rendered application with progressive enhancement. Initial HTML renders in 50ms, critical JavaScript loads in 200ms, and secondary features load on demand. The performance monitoring dashboard tracks Core Web Vitals across user sessions.
Route-based code splitting prevents loading unused code. The investor portal, issuer admin panel, and compliance dashboard load as separate bundles. When an investor logs in, their browser downloads 180KB of JavaScript (compressed). An issuer downloads 220KB. A compliance officer downloads 190KB. Before code splitting, every user downloaded 450KB regardless of role. The bundle size panel shows size per route and detects regressions when new features ship.
Component-level lazy loading defers non-critical features. The investor dashboard initially renders portfolio balances and recent transactions (critical path). The portfolio analytics chart (secondary feature) loads when the user scrolls to that section. This reduces Time to Interactive from 1.2 seconds to 0.6 seconds. The Core Web Vitals panel shows real user metrics—TBT (Total Blocking Time) dropped 50% after implementing lazy loading.
Asset prefetching anticipates user navigation. When an investor views their portfolio, the dApp prefetches the top 3 asset detail pages in the background. If they click through to an asset, the page renders instantly from cache. TanStack Router handles prefetch logic—the navigation panel tracks prefetch hit rate (currently 73%, meaning users navigate to prefetched pages 73% of the time).
Image optimization reduces bandwidth and improves LCP. Asset logos and document previews serve from CDN in modern formats (WebP with JPEG fallback). The CDN resizes images based on viewport—mobile users download 50KB versions while desktop users get 150KB versions. The image performance panel shows cumulative layout shift (CLS) and largest contentful paint (LCP) per page.
Geographic distribution reduces cross-continent latency. The dApp deploys to four regions (us-east-1, eu-west-1, ap-southeast-1, sa-east-1). Users route to the nearest region via GeoDNS. An investor in Singapore experiences 40ms latency instead of 180ms (if serving from us-east-1 only). The geographic latency panel shows P95 latency per region—Asia-Pacific recently increased from 35ms to 55ms, prompting investigation into ap-southeast-1 network issues.
Frontend scaling follows a different pattern than API scaling because the dApp runs as containers rather than serverless functions. Each region runs 3 dApp instances behind a load balancer. Auto-scaling adds instances when CPU exceeds 70%—usually triggered when hundreds of users simultaneously load dashboards during market events (bond coupon payments, voting deadlines).
Load testing validates scalability under realistic conditions
ATK runs three load test scenarios weekly: steady-state baseline (1,000 concurrent users), spike load (10,000 concurrent users for 5 minutes), and sustained growth (ramping from 1,000 to 5,000 users over 30 minutes). Results feed into the performance operations dashboard.
Steady-state baseline establishes normal operating conditions. With 1,000 concurrent users performing mixed operations (70% reads, 30% writes), the platform maintains:
| Metric | Target | Actual | Status |
|---|---|---|---|
| API P95 latency | <300ms | 185ms | ✓ |
| Blockchain confirmation | <2s | 1.4s | ✓ |
| Cache hit rate | >85% | 91% | ✓ |
| Database connection pool | <80% | 62% | ✓ |
| Error rate | <0.1% | 0.02% | ✓ |
| Cost per transaction | <$0.50 | $0.31 | ✓ |
The steady-state test runs for 1 hour with randomized user behavior (subscription purchases, portfolio checks, document downloads, compliance queries). The test framework generates realistic data—bonds with various maturity dates, users with different identity claim combinations, transfers requiring multi-hop compliance checks.
Spike load tests validate auto-scaling responsiveness. At T+0, the load generator increases from 1,000 to 10,000 concurrent users within 10 seconds. The API layer initially saturates—P95 latency spikes to 850ms as existing instances hit 95% CPU. At T+30s, Kubernetes spawns 15 additional API pods. By T+90s, latency returns to baseline (210ms) as new instances stabilize. The auto-scaling timeline panel visualizes this recovery.
During spike tests, the platform implements graceful degradation:
- Cache TTLs extend from 5 minutes to 15 minutes, serving slightly stale data to reduce database load
- Rate limiting tightens from 100 to 50 requests per minute, prioritizing authenticated users over anonymous traffic
- Background jobs (analytics aggregation, report generation) pause, freeing worker capacity for user-facing operations
- Non-critical features (portfolio analytics, historical charts) disable temporarily, reducing JavaScript bundle size and API calls
The spike test reveals bottlenecks—database connection pool saturation occurred when 10,000 users simultaneously queried portfolios. The fix involved increasing pool size from 20 to 30 connections per instance and adding a third read replica.
Sustained growth tests validate capacity planning assumptions. The test ramps from 1,000 to 5,000 users over 30 minutes (steady 133 users/minute growth). Auto-scaling keeps pace—adding 2 API pods every 5 minutes. The database monitoring panel shows read replica lag staying under 400ms throughout the test. The cost tracking dashboard reveals linear cost scaling: 1,000 users = $15/hour infrastructure cost, 5,000 users = $68/hour (4.5x increase for 5x traffic).
The sustained growth test exposes subtle issues that spike tests miss. Redis
memory utilization grew linearly with user count, reaching 85% at 5,000 users.
Extrapolating to 10,000 users would trigger evictions (currently configured with
allkeys-lru policy). The remediation plan added a third Redis cluster node,
increasing capacity from 8GB to 12GB.
Cost optimization balances performance and expense
Infrastructure costs scale with usage, but optimization strategies reduce the cost-per-transaction. The cost tracking dashboard shows hourly expenditure broken down by layer: blockchain gas, RPC provider fees, database compute, cache memory, CDN bandwidth, and compute instances.
Blockchain gas represents 60% of operational cost. At 2 gwei base fee and $3,000 ETH price, a batch transfer of 1,000 items costs $7.50. Individual transfers would cost $300. The gas optimization panel compares actual vs theoretical costs:
| Operation Type | Individual Cost | Batch Cost | Savings | Volume/Day | Daily Savings |
|---|---|---|---|---|---|
| Coupon payments | $0.10 | $0.0025 | 97.5% | 5,000 | $487.50 |
| Subscription batches | $0.10 | $0.015 | 85% | 2,000 | $170.00 |
| Compliance updates | $0.12 | $0.008 | 93% | 1,000 | $112.00 |
| Dividend distribution | $0.10 | $0.003 | 97% | 3,000 | $291.00 |
Annual gas savings from batching: $379,575 at current volumes. The operations team monitors the gas price alert—when base fee exceeds 20 gwei, non-urgent batch operations delay until fees drop.
RPC provider costs scale with query volume. Alchemy charges $0.0001 per request above the free tier (10M requests/month). With 50M requests/month (1.65M/day), the monthly bill is $4,000. Multicall aggregation reduced request volume by 35%—from 77M to 50M requests/month—saving $2,700/month. The RPC usage panel tracks requests per endpoint and forecasts when you'll exceed free tier limits.
Database costs scale vertically and horizontally. The PostgreSQL primary runs on a db.r6g.2xlarge instance ($0.504/hour = $362/month). Two read replicas cost $724/month. Adding a third replica would cost $362/month but enable handling 50% more read traffic. The database cost-benefit panel compares replica cost vs API autoscaling cost—currently more economical to add replicas than scale API instances (which would increase database connection pressure).
Cache memory costs scale with eviction rate. The 3-node Redis cluster uses 12GB total memory at $0.023/GB-hour ($198/month). When eviction rate exceeds 1,000 keys/minute, the system likely needs additional capacity. The cache cost panel shows a tradeoff: adding a fourth node ($66/month) vs reducing TTLs (which increases database load and potentially requires another read replica at $362/month). Current configuration favors longer TTLs and three nodes.
CDN bandwidth costs $0.02/GB. The dApp serves 500GB/month (approximately 17GB/day) of assets, fonts, and images. Cost: $10/month. Image optimization reduced bandwidth by 40%—from 833GB to 500GB—saving $6.60/month. Not a major cost driver, but the CDN usage panel tracks bandwidth per region to detect anomalies (unusual traffic spikes often indicate scraping attempts).
Layer 2 migration would reduce gas costs by 95% but requires infrastructure changes. A feasibility analysis shows:
- Gas cost savings: $7.50 batch transfer → $0.38 on Arbitrum or Optimism
- Bridge liquidity requirements: $50,000 stablecoin liquidity for investor on/off-ramps
- Subgraph deployment: TheGraph supports L2 networks, no architectural change needed
- RPC provider costs: Slightly lower on L2 due to lower block production rate
- Migration effort: 3-4 weeks to test contracts on L2, update frontend, migrate data
The L2 migration decision depends on transaction volume. At 10,000 transactions/day, mainnet costs $775/day ($23,250/month) while L2 costs $38/day ($1,140/month)—a $22,110/month savings that justifies migration effort. At current 3,000 transactions/day, mainnet costs $232/day ($6,960/month) while L2 would cost $11/day ($330/month)—a $6,630/month savings that's meaningful but not urgent.
Monitoring validates scalability improvements
The observability stack tracks four categories of scalability metrics: throughput (operations per second), latency (P50/P95/P99 response times), cost (dollars per operation), and reliability (error rate, uptime). Each category has dedicated dashboards with threshold-based alerts.
The throughput dashboard tracks operations per second across all layers. Blockchain transaction rate (current: 12 tx/s, target: 15 tx/s), API request rate (current: 3,200 req/s, capacity: 10,000 req/s), database query rate (current: 8,500 queries/s, capacity: 15,000 queries/s), cache operation rate (current: 45,000 ops/s, capacity: 150,000 ops/s). The dashboard highlights bottlenecks—currently, the blockchain layer is the constraint.
The latency dashboard breaks down P95 response times by component. End-to-end investor subscription flow: 1,850ms total (blockchain confirmation: 1,400ms, API processing: 280ms, frontend rendering: 170ms). The critical path analysis panel shows which components contribute most to latency—blockchain dominates, so optimization efforts focus there (batching, L2 migration).
The cost dashboard aggregates hourly expenditure by layer. Current hourly cost at 3,000 tx/day load: $9.67 (blockchain gas: $5.80, RPC fees: $1.33, database: $1.51, cache: $0.27, CDN: $0.01, compute: $0.75). The cost projection panel forecasts monthly expense based on current trends—trending toward $7,000/month at steady growth rate.
The reliability dashboard tracks error rates and success rates per operation. Subscription success rate: 99.8% (12 failures in 6,000 attempts this week), transfer success rate: 99.9%, compliance check success rate: 99.7%. The error breakdown panel categorizes failures—insufficient vault balance (8 failures), identity claim missing (3 failures), network timeout (1 failure). This guides operational improvements: vault monitoring alerts and automated claim validation.
Alerts fire when metrics exceed thresholds:
- Critical: API P95 > 500ms for 2 minutes → pages on-call engineer
- Warning: Cache hit rate < 80% for 10 minutes → Slack notification to ops channel
- Info: Database connection pool > 75% for 5 minutes → email to database team
The auto-scaling events panel visualizes capacity adjustments. Last week saw 47 scaling events (34 scale-ups, 13 scale-downs). The panel correlates scaling triggers with business events—spike at 14:00 UTC Wednesday coincided with a bond coupon payment deadline (1,200 redemption requests within 5 minutes).
Capacity planning uses historical trend analysis from the observability stack. The trend panel shows 15% month-over-month growth in transaction volume over the last 6 months. Extrapolating forward, the platform will exceed current API capacity (10,000 req/s) in 4 months. The recommendation: add a fourth database read replica next month ($362/month) and budget for compute cluster expansion in 3 months ($800/month increase).
Graceful degradation maintains service during overload
When load exceeds scaling capacity—either because auto-scaling can't add instances fast enough or because you've reached infrastructure limits—the platform degrades gracefully rather than failing catastrophically. The degradation strategy prioritizes core operations (identity verification, compliance checks, asset transfers) over secondary features (analytics, historical reports).
Stage 1 degradation activates at 80% capacity utilization. Cache TTLs extend from 5 minutes to 10 minutes, rate limits tighten from 100 to 75 requests per minute, and background job processing pauses (analytics aggregation, scheduled reports). The degradation status panel shows current stage and affected features.
Stage 2 degradation activates at 90% capacity utilization. Non-critical API
endpoints (historical transaction search, advanced filtering) return
503 Service Unavailable with a Retry-After: 300 header. The frontend
disables portfolio analytics charts and document preview features. Cache TTLs
extend to 15 minutes. Rate limits drop to 50 requests per minute.
Stage 3 degradation activates at 95% capacity utilization. Only identity verification, compliance checks, and urgent transfers remain available. All other operations queue for delayed processing. The frontend displays a banner: "Platform experiencing high load. Some features temporarily unavailable." Queue depth monitoring shows pending operations—currently empty, but stage 3 hasn't triggered in production yet.
The degradation panel tracks how often each stage activates. Last month saw 3 stage 1 events (total 47 minutes), 0 stage 2 events, and 0 stage 3 events. The operations team reviews degradation triggers to understand whether they represent genuine capacity issues or transient spikes that auto-scaling handles.
Recovery from degradation follows automatic hysteresis: after load drops below 70% capacity, the platform waits 5 minutes before restoring normal operation. This prevents thrashing (rapidly enabling and disabling features as load fluctuates near the threshold).
Horizontal scaling patterns enable linear capacity growth
ATK's stateless architecture enables linear scaling—doubling instances doubles capacity. This simplicity comes from architectural decisions that eliminate stateful bottlenecks.
API instances hold no session state. User authentication relies on JWT tokens validated against Redis (session data) and PostgreSQL (user profile). Any API instance can handle any request. When auto-scaling adds 5 instances, throughput increases from 5,000 req/s to 10,000 req/s proportionally. Load balancers use round-robin distribution to spread traffic evenly.
Database read replicas handle read-heavy workloads. Portfolio queries, compliance reports, and analytics execute against replicas. Writes (subscriptions, transfers, compliance updates) execute against the primary. With 80% of queries being reads, adding replicas scales the system effectively. The read/write ratio panel tracks this proportion—if write percentage increases above 30%, vertical scaling of the primary becomes necessary.
Redis cluster mode shards data across nodes. The cache keyspace distributes via consistent hashing. Adding a fourth node (expanding from 3 to 4) increases capacity by 33% and throughput proportionally. During cluster expansion, Redis redistributes keys to the new node (taking approximately 10 minutes with zero downtime).
Frontend instances run identically in all regions. Each deployment uses the same container image, environment configuration, and database connection string. Adding a fifth region (us-west-2) involves deploying the existing configuration to the new region and updating GeoDNS routing—no code changes required.
The scaling limits panel identifies maximum capacity per component:
- API layer: 100 instances (limited by Kubernetes cluster size), 100,000 req/s theoretical capacity
- Database layer: 1 primary + 10 replicas (PostgreSQL replication limit), 50,000 queries/s capacity
- Cache layer: 12 nodes (Redis cluster limit), 600,000 ops/s capacity
- Blockchain layer: 15 tx/s (Ethereum mainnet limit), requires L2 migration to exceed
Currently operating at 10% of scaling limits across all layers. The runway analysis panel forecasts that at 15% month-over-month growth, the blockchain layer will saturate in 18 months—making L2 migration a strategic priority for 2026.
See also
- Frontend and API optimization - Bundle size reduction, lazy loading, API response caching strategies
- Infrastructure performance - Database query optimization, Redis caching patterns, smart contract gas reduction
- Performance operations - Load testing procedures, monitoring dashboard setup, performance regression detection