[ ZERO STAT ]

How Zero Stat ranks.

A transparent, defensible method for ranking AI products. Built so anyone curious about AI — not just buyers or builders — can see how the picks get made. We publish the rubric. We publish the data sources. We don't move goalposts between videos.

v1 2026-06-28 Last updated by Newton · the editor-in-chief. The Opus that authored the original charter lives somewhere in this doc's git history, but the rubric is human-curated now.

The pinned-comment version:

We rank on five dimensions, weighted by what matters to business buyers: capability 30%, cost-efficiency 25%, reliability 20%, ecosystem/automation 15%, momentum 10%. Every product is rated 1 to 10 on each axis against its category peers (LLMs vs LLMs, video models vs video models). Composite is the weighted sum. Sources are linked under every number on-screen and in the description.

The five axes

Every product on every Ranked video gets these five scores. The dimensions are constant across videos — only the shortlist changes. Composite is the weighted sum; it's a 1–10 number, and it's the basis for the on-screen tier list.

Capability
Raw task performance: can it actually do the job at the level a decision-maker needs? Benchmarks (MMLU, SWE-bench, GPQA, HumanEval), our own test prompts when relevant, public eval suites, real-world performance for the named use case.
30%
Cost-efficiency
Performance per dollar. Input + output pricing per M tokens, subscription cost, total cost including iteration burn. Critical for high-volume use; weighted heavily because that's where most business buyers make or save money.
25%
Reliability
Uptime, consistency, hallucination rate, prompt-to-prompt variance. Reliability matters more for production systems than for one-shot demos.
20%
Ecosystem / automation
API quality, MCP support, SDK ergonomics, integrations, tooling maturity — the "can I actually plug this in?" axis. Weighted higher for coding and agent categories where the harness matters as much as the model.
15%
Momentum
Release cadence, trajectory, signal that the vendor is investing vs. drifting. Critical for fast-moving categories (image, video, coding) where the leaderboard churns every 2–3 months.
10%

Why these weights, and why they don't change per video

A "best of" video is only as credible as its rubric. The weights above are the channel's rubric — they're written into the Stat Sheet's ranking-rubric block and reviewed only when we add a new category. The rubric is constant; only the shortlist changes.

Three implications:

  1. A "best cheap-volume LLM" Ranked and a "best creative-writing LLM" Ranked both use the same five axes with the same weights — we just steer different contenders into each shortlist based on the task, and our commentary weighs the verdict differently within the same scoring frame.
  2. If a viewer's answer is "you under-weighted cost" — fair debate, but the answer is the same: the rubric is the rubric, your call to use it or not. We can surface scenarios where a re-weight would change the winner (e.g., "at cost-efficiency = 50%, this is the new ranking").
  3. If a viewer's answer is "you forgot about [X axis]" — they may be right. Open it as a tracked suggestion; revise at the next quarterly review. We do not move goalposts between videos.

Per-category nuances

The five axes are the same everywhere, but how each axis plays out varies by category:

LLM (text-only and multimodal)

Image generation

Video generation

Voice / TTS

Coding / agentic coding

Avatar (talking-head)

Where the data comes from

Every claim in a Ranked video traces to one of:

  1. Vendor docs / release notes — primary, preferred (pricing pages, product docs).
  2. Independent benchmarks — SWE-bench, MMLU, Artificial Analysis, etc.
  3. Our test prompts — limited use, when a category is moving fast and benchmarks lag reality.
  4. Reputable press / analysis — for context, not as a source of truth on numbers.

What we don't use:

What we publish, and what we hold back

We publish: the rubric, the weights, what each axis means, where the data comes from.

We hold back: the exact 1–10 scores per product, the composite arithmetic, in-flight Stat Sheet annotations.

Why: scores are opinion. The rubric is the method. Viewers who care about the method can re-derive their own composite from the public rubric + their own weighting. It's the journalism version of showing your work without doxxing your sources.

What viewers should take away

Every Ranked is a defensible opinion, not a true ranking. The method is rigorous. The picks are good faith. Disagree with our picks? Cite a specific rubric violation ("you said X had 8 reliability but their uptime is closer to 6"). We'll either correct or explain. Don't argue vibes; argue axes.

The audience is anyone curious about AI — buyers, builders, and people who just want to understand the field. The rubric is the same regardless. Sourcing and rigor are the same regardless.

Definitions

Mapped (in Stat Sheet)
Identity + qualitative fields populated; volatile fields (current version, pricing, context window, benchmarks, our rating) are null and listed in the product's _verify array. Stable across re-scores pending a vendor change.
Worked example
All fields populated with verified, sourced data. Use as a gold-standard template for how to score.
Composite
Weighted sum of the five axis scores. Always 1–10. Meanings are comparative within a category, not absolute.

Revision log

We don't move goalposts between videos. The rubric only changes at quarterly review with a documented revision entry here.

Date Change
2026-06-28 Initial draft, posted publicly for week-1 Ranked launch. Authored by Newton (the editor-in-chief) on behalf of Opus, the master-prompt author of the channel's charter.