Methodology & sources
The Advisor only earns trust if you can audit it. Here is exactly how the HH Score works and every source feeding it. No black boxes, no hidden ranking.
291 models (162 open weights, 129 proprietary) · 81 self-hostable · 132 benchmark-scored · 17 GPUs · pricing refreshed Jun 1, 2026.
Where the catalog comes from
The model list is not hand-picked. We take models.dev as the source of truth — every chat model offered by a vetted provider (first-party labs, the hyperscalers and the major inference clouds), open weights and proprietary alike — and collapse the same model across providers into one entry with a cheapest-first price stack. Then we enrich each model with two owned layers: a quality score and, for open models with known internals, the architecture needed for self-host math.
The HH Score
Every model + deployment is scored on three normalized axes and combined into a single 0–100 number, weighted by what you optimize for:
- Quality — a 0–100 capability score. Primary source is the Artificial Analysis Intelligence Index (independent, held-out benchmarks), mapped to 0–100; the code use-case uses AA's Coding Index instead. Models AA doesn't cover yet fall back to a curated estimate or a conservative heuristic, flagged with an asterisk in the UI. Adjusted for use-case fit and a mild recency nudge (a 2024 model is a real downside vs a 2026 one).
- Price-performance — real cost at your volume, on a log scale so genuinely cheap options separate cleanly.
- Latency — for managed models we use Artificial Analysis's measured output speed (tokens/sec) and time-to-first-token where available; for self-host we model decode throughput from memory bandwidth. Reshaped by your SLA (real-time penalizes anything that can't keep up).
On the Balanced default we enforce a quality floor, so a cheap but weak model never wins by default. Switch to Cost, Latency or Quality to re-weight — the score and order recompute live. Cheapest and Fastest are badges, never the sort.
The deployment spectrum
The Advisor ranks the whole spectrum, not just hosted APIs: Managed API → Serverless GPU → IaaS (rent a GPU, run open weights) → DIY (your own hardware). At higher volume the open-weights + IaaS path is often several times cheaper than a managed API — use the Self-host tab to see the crossover.
Our sources — in the open
| Source | What it feeds | Freshness |
|---|---|---|
| models.dev | The catalog itself — every chat model and which providers serve it (managed + self-host, open and proprietary), with live per-token pricing. | Fetched every build. |
| Artificial Analysis | Intelligence Index (quality), Coding Index (code lane), and measured output speed + time-to-first-token (latency). | Live via API (per build). |
| LMArena (Chatbot Arena) | Crowd-voted quality Elo — cross-checks the fallback quality estimates. | Snapshot in quality.json. |
| HuggingFace model configs | Architecture facts (layers, KV heads, head dim, MoE) for VRAM and throughput math on open models. | Per model addition. |
| TechPowerUp GPU DB + vendor sheets | GPU VRAM, memory bandwidth, FP16 throughput. | Refreshed via the data bot. |
| RunPod · Vast.ai · Lambda | Representative on-demand cloud GPU $/hr for self-host cost. | Refreshed via the data bot. |
Architecture internals are informed by published model cards and Sebastian Raschka's architecture gallery, which we cite as a reference rather than send you away to.
Frequently asked questions
What is the HH Score?
A 0–100 composite of three normalized signals — quality, price-performance (cost) and latency — weighted by what you choose to optimize for. On the Balanced default it leads with quality, so the cheapest option never wins just for being cheap. Pick Cost, Latency or Quality to re-weight; the score and ranking update live.
Why is the cheapest option not always on top?
Because a 60%-accuracy model at half the price is usually the wrong call. We rank on performance-led value by default and surface Cheapest / Fastest as badges, not as the sort order. You can still optimize purely for cost in one click.
How honest are the cost numbers?
Managed costs use real per-token list prices at your stated volume — committed-use, batch and enterprise contracts will differ, and we say so. Self-host costs use representative cloud GPU $/hr and first-order VRAM + memory-bandwidth math; treat them as ±10–20% and validate with a real benchmark.
Do referral links change the ranking?
No. Ranking is computed only from your inputs and real numbers. Some managed links are affiliate links, always disclosed. A provider appears only when it genuinely fits, and within a model we always list every provider cheapest-first — nothing hidden.
How do you keep this fresh?
Model pricing refetches on every build. Benchmarks, GPU specs and GPU prices are refreshed by a scripted update bot (scripts/check_for_updates.sh) that re-reads the sources above and proposes changes for review.
Found a better public source, or a number that looks off? Tell us — accuracy is the product.