Methodology

How CapPulse measures latency, error rates, and reliability for AI providers. This page is updated as our measurement pipeline evolves.

Last updated: 2025-12-30 07:20:21 UTCData window: public: last 24hHow we measure

P95 latency

Error rate

Rate limit

What we measure

Latency (p50, p95): median and tail response times for successful requests.
Error rate: share of requests returning 5xx or network errors.
Rate-limit rate: share of requests returning 429 or provider throttling signals.
Availability: percent of successful responses across probes.
Throughput: estimated requests per second observed by probes.

How we measure

Distributed probes in North America, Europe, and APAC.
Sampling every 60 seconds across chat, embeddings, image, and audio endpoints.
Requests use consistent prompts and payloads to reduce variance.
Measurements are normalized to remove client-side overhead.

Reliability score

Reliability is a weighted blend of availability, error rate, and latency stability. We publish ranges rather than exact weights to reduce gaming and keep the score stable as measurement improves.

Typical weighting range: availability (40-50%), error rate (25-35%), latency stability (15-25%), rate limits (5-10%).

Latency stability is normalized using p95 thresholds to keep scores consistent across providers.

Data freshness and retention

Rollups update every 5 minutes. Freshness badges show on-time, delayed, or stale data.
Public dashboards show the last 24 hours of data.
Paid plans unlock 7d, 30d, and 90d retention windows.
Incident timelines include evidence snapshots for verification.

What we do not claim

CapPulse is not an official provider status page.
We do not guarantee uptime or availability for any provider.
Metrics are not financial advice and should not be used for trading decisions.

Data quality and confidence

Confidence indicators are based on sample count and regional coverage. Low confidence means fewer samples or limited regions.

High confidence typically requires 5,000+ samples and three or more regions in the selected window.

Methodology change log

2025-12-27 - Added freshness badges and 5-minute rollup windows.

2025-12-01 - Added rate-limit classification and endpoint normalization.

2025-10-15 - Expanded probes to APAC and EU coverage.

2025-08-02 - Introduced reliability scoring framework.

Want deeper data?

Access raw metrics, longer retention, and webhook events through the CapPulse API.

View API Docs Get Alerts

What we measure

Latency (p50, p95): median and tail response times for successful requests.

Error rate: share of requests returning 5xx or network errors.

Rate-limit rate: share of requests returning 429 or provider throttling signals.

Availability: percent of successful responses across probes.

Throughput: estimated requests per second observed by probes.

Reliability score

Reliability is a weighted blend of availability, error rate, and latency stability. We publish ranges rather than exact weights to reduce gaming and keep the score stable as measurement improves.

Typical weighting range: availability (40-50%), error rate (25-35%), latency stability (15-25%), rate limits (5-10%).

Latency stability is normalized using p95 thresholds to keep scores consistent across providers.