CageMetrics

The CageMetrics Elo

A modified Elo rating updates every fighter's number after every UFC fight. The math underneath is the same system Arpad Elo wrote for chess in the 1960s. Our modifications are about making the rating respond appropriately to the different kinds of UFC wins. A 30-second knockout should not move the rating by the same amount as a five-round split decision.

Live snapshot

Fighters rated 2701 643 clear the 3-fight, 2.0-year active gate
K-factor 64 96 for a fighter's first 5 bouts
Decisions parsed 3956 of 4071 completed (the rest are draws / NCs)
Finish multiplier ×1.80 K-factor scaling on a clean KO/TKO/Sub

What Elo measures

Elo assigns every player in a competitive pool a single number representing their skill relative to the rest of the pool. Two ratings, passed through a logistic curve, give you the predicted probability that one player beats the other. After each match both ratings update. Winners gain, losers lose. The size of the swing depends on how surprising the result was. Beating a much higher-rated opponent moves your rating more than beating someone you were already expected to beat.

The math:

expected_a = 1 / (1 + 10^((rating_b - rating_a) / 400))
rating_a  += K × (score_a - expected_a)

Here score_a is 1 for a win, 0.5 for a draw, 0 for a loss. K is the step size that controls how aggressively the rating responds to new results.

Why classical Elo fits MMA poorly

Chess Elo treats every game as a binary outcome. MMA is not that. A 30-second KO and a razor-thin split decision both count as wins, but they tell you very different things about the gap between the two fighters. Treating them identically throws away information the bout actually produced.

Three problems with using the chess version on MMA.

First, margin of victory carries real information. A 30-27 / 30-27 / 30-27 win was clearly more decisive than a 29-28 / 29-28 / 28-29 win on the same cards. A rating system that updates them by the same amount is leaving signal on the table.

Second, new fighters need to converge to their true rating quickly. A debutant whose actual skill is around 1,700 should not need 20 fights to climb out of the default 1,500.

Third, MMA is high variance. Even with margin-of-victory scaling, single-bout noise stays high. The rating system has to handle that gracefully.

What we changed

1. Provisional K-factor for newcomers

A fighter's first 5 UFC bouts use K = 96 instead of the standard 64. Their early rating is the least trustworthy number we have, so larger updates let it converge faster to whatever it should be. After 5 fights the K-factor drops to the standard value and the rating stabilises.

2. Margin-of-victory scaling on K

Rather than a single K for every result, K is multiplied by a factor that depends on how the fight was won.

The multipliers came from a parameter sweep over every UFC fight in our database. The sweep scores each combination on accuracy, log-loss, and Brier.

3. Continuous decision-MOV via judge rounds-won

This is the recent change worth dwelling on. It also makes our rating behave differently from other public MMA Elos.

A decision is a vote. The three judges score the fight round by round under the 10-point must system, and whoever has the higher total on two of three cards wins. ufcstats publishes those totals as part of every decision result. On the Reyes vs Walker bout in March 2024, for example, the scorecards read Sal D'Amato 28 - 29. Solimar Miranda 29 - 28. Junichiro Kamijo 28 - 29.

Under the 10-point must system, the winner of each round scores 10 and the loser scores 9 (or 8, or 7, when the round is dominant enough). That lets us recover the number of rounds each fighter won on each judge's card directly from the totals:

rounds_won = max(0, score − 9 × scored_rounds)

The Reyes split decoded (loser-score is listed first by ufcstats convention, then winner):

That is a fair read of a razor-thin split. Reyes won because two of three judges had him ahead on points, but the judge-round count was essentially a coin flip: 5 of 9 against 4 of 9. Across the 3956 parsed decisions in our database, decisive wins land at 7 to 9 of 9 rounds. The closest splits push down toward 5 of 9.

The fraction (winner-rounds-won divided by total-judge-rounds) drives the continuous decision multiplier. We interpolate linearly between 0.50 at fraction 0.5 (a hypothetical even split on the rounds) and 1.80 at fraction 1.0 (a 30-27 / 30-27 / 30-27 shutout). The shape:

50% ×0.50 56% ×0.64 78% ×1.22 100% ×1.80 winner's share of judge-rounds

A 30-27 sweep on all three cards (9 of 9 rounds) moves the rating by K × 1.80, about the same as a clean finish. A 29-28 / 29-28 / 28-29 squeaker (5 of 9 rounds) moves it by something closer to K × 0.50, which is less than a flat-band unanimous decision used to. The rating now reflects how dominant the win actually was instead of treating every decision as identical.

A subtle gotcha

ufcstats lists the judge totals as "loser's score - winner's score" rather than "fighter A - fighter B". We figured this out by cross-checking against bouts where the eventual winner appeared on either side. The scorecards are oriented per fight using the recorded winner.

How the multipliers were tuned

The parameter sweep runs the full Elo system over every completed UFC fight chronologically for each combination of multipliers. It reports four numbers per combination.

The continuous decision-MOV beat the old flat unanimous-×1.30 / split-×0.65 bands by about 0.21 percentage points of accuracy. It was also better on log-loss and Brier on the most recent backtest pass (8,700+ completed fights, 4,800+ established). We picked the 0.50 / 1.80 endpoints because they were the calibrated peak in the search. Pushing further (close 0.30, sweep 2.50) gained another 0.04 percentage points of accuracy but hurt log-loss. The probabilities got worse even though the binary picks improved. We chose the row that improved every metric.

Vs vanilla Elo

The natural question is how much these modifications actually buy you. We ran the same backtest on both systems. One was the production CageMetrics weights. The other was a vanilla Elo with a flat K-factor of 40, no provisional bump, and no margin-of-victory scaling. Both ran over the same 8,700-plus UFC fights chronologically and were scored on the 4,877 bouts where both fighters had at least two prior UFC results.

MetricCageMetrics EloVanilla Elo (K=40)
Accuracy on established fights58.38%56.04%
Log-loss0.67780.6822
Brier score0.24190.2446

The modifications gain 2.34 percentage points of accuracy and tighten both calibration metrics. In practical terms the modified system gets about 58 of every 100 picks right where the vanilla version gets about 56. The provisional K-factor does most of its work in the first half-dozen fights of a newcomer's career. The MOV multipliers do theirs across the board, with the continuous decision scaling contributing most of the late-stage gain.

The full parameter set

ParameterValueRole
Starting rating1500Every new fighter enters with this rating
Base K-factor64Step size for established fighters
Provisional K-factor96Larger step for a fighter's first few bouts
Provisional bouts5Fights of provisional K before downshifting
Title-fight multiplier1.00Extra weight for title fights (currently a no-op; tested, didn't help)
Finish multiplier1.80K scaling on clean finishes
Decision: close end0.50K scaling at fraction 0.5 (razor-thin)
Decision: sweep end1.80K scaling at fraction 1.0 (every judge-round)
Unanimous fallback1.30Used for decisions with no parsed scorecard
Split/majority fallback0.65Used for split/majority decisions with no scorecard

The current top of the board

To anchor what these numbers look like on real fighters, here is the active pound-for-pound top:

#FighterDivisionEloBouts
1 Jon Jones Heavyweight 2269 24
2 Islam Makhachev Welterweight 2241 18
3 Charles Oliveira Lightweight 2123 37
4 Ciryl Gane Heavyweight 2103 14
5 Justin Gaethje Lightweight 2082 16
6 Kamaru Usman Welterweight 2082 19
7 Alexander Volkanovski Featherweight 2072 18
8 Ilia Topuria Lightweight 2069 10

The full board, with division-by-division leaderboards alongside the pound-for-pound view, is at /rankings/.

What we tried and didn't keep

A title-fight bonus. We tested multiplying K by 1.10, 1.25, and 1.50 for championship fights. None of them helped accuracy or log-loss. The bonus is still in the parameter set at ×1.00 in case a future tune wants to revisit it.

An inactivity decay on Elo. Bleeding a fighter's rating toward 1,500 after a year off backtested at +1.66 percentage points of accuracy. It also penalised every successful comeback the sport has produced. Jon Jones would have walked into the Gane fight having shed 427 points of Elo and then won the bout from a 1,717 rating. We reverted the feature: the average lift was not worth the systematic bias against returning fighters.

An aggressive accuracy-peak MOV. A higher finish multiplier (×2.00) with a lower split multiplier (×0.50) earned 0.06 percentage points more accuracy than what we ship. It hurt log-loss and Brier, so the implied probabilities got worse even though the binary picks edged up. We chose the calibrated peak instead.

What it can't see

Strength of schedule beyond opponent Elo. Elo compresses everything about an opponent into one number. A grinder who beats other wrestlers but loses to strikers will look like a stylistic match to both. Head-to-head style data is not in the model.

Weight class transitions. A fighter moving up two weight classes carries their old Elo with them. There is no penalty for the size disadvantage. These cases are rare enough that we have not corrected for them.

Draws and no-contests. Both get scored 0.5 because the source data does not distinguish them. Rare results, small impact.

10-8 and 10-7 rounds. The rounds-won formula slightly under-counts when a winner sweeps via a 10-8 round, because the loser's non-10 round still adds to the loser's total. The qualitative ordering is preserved and the loser's total is clamped to 0, so the practical impact is small. We accept the approximation rather than try to reverse-engineer it from the totals.

The next layer (Elo combined with rolling form, age, and a handful of other features in a gradient-boosted classifier) closes about two-thirds of the remaining gap to the betting market. That is described on the predictor methodology page.