Question 1

What is the HH Score?

Accepted Answer

A 0–100 composite of three normalized signals — quality, price-performance (cost) and latency — weighted by what you choose to optimize for. On the Balanced default it leads with quality, so the cheapest option never wins just for being cheap. Pick Cost, Latency or Quality to re-weight; the score and ranking update live.

Question 2

Why is the cheapest option not always on top?

Accepted Answer

Because a 60%-accuracy model at half the price is usually the wrong call. We rank on performance-led value by default and surface Cheapest / Fastest as badges, not as the sort order. You can still optimize purely for cost in one click.

Question 3

How honest are the cost numbers?

Accepted Answer

Managed costs use real per-token list prices at your stated volume — committed-use, batch and enterprise contracts will differ, and we say so. Self-host costs use representative cloud GPU $/hr and first-order VRAM + memory-bandwidth math; treat them as ±10–20% and validate with a real benchmark.

Question 4

Do referral links change the ranking?

Accepted Answer

No. Ranking is computed only from your inputs and real numbers. Some managed links are affiliate links, always disclosed. A provider appears only when it genuinely fits, and within a model we always list every provider cheapest-first — nothing hidden.

Question 5

How do you keep this fresh?

Accepted Answer

Model pricing refetches on every build. Benchmarks, GPU specs and GPU prices are refreshed by a scripted update bot (scripts/check_for_updates.sh) that re-reads the sources above and proposes changes for review.

Source	What it feeds	Freshness
models.dev	The catalog itself — every chat model and which providers serve it (managed + self-host, open and proprietary), with live per-token pricing.	Fetched every build.
Artificial Analysis	Intelligence Index (quality), Coding Index (code lane), and measured output speed + time-to-first-token (latency).	Live via API (per build).
LMArena (Chatbot Arena)	Crowd-voted quality Elo — cross-checks the fallback quality estimates.	Snapshot in quality.json.
HuggingFace model configs	Architecture facts (layers, KV heads, head dim, MoE) for VRAM and throughput math on open models.	Per model addition.
TechPowerUp GPU DB + vendor sheets	GPU VRAM, memory bandwidth, FP16 throughput.	Refreshed via the data bot.
RunPod · Vast.ai · Lambda	Representative on-demand cloud GPU $/hr for self-host cost.	Refreshed via the data bot.

Methodology & sources

Where the catalog comes from

The HH Score

The deployment spectrum

Our sources — in the open

Frequently asked questions

What is the HH Score?

Why is the cheapest option not always on top?

How honest are the cost numbers?

Do referral links change the ranking?

How do you keep this fresh?