How, summing (not averaging) to 58 of 1000 possible points (0-100 in each of ten...

NitpickLawyer · 2025-10-26T19:25:51 1761506751

It's confusing. The 10 tracks each get 10%. So they add up all the percentages from every track. When you see the first table, 10% on math means "perfect" math basically. Not 10% of math track.

alexwebb2 · 2025-10-26T19:21:12 1761506472

0-10 in each domain. It’s a weird table.

jagrsw · 2025-10-27T09:14:25 1761556465

The simple additive scoring here is sus here. It means a model that's perfect on 9/10 axes but scores 0% on Speed (i.e., takes effectively infinite time to produce a result) would be considered "90% AGI".

By this logic, a vast parallel search running on Commodore 64s that produces an answer after BeaverNumber(100) years would be almost AGI, which doesn't pass the sniff test.

A more meaningful metric would be more multiplicative in nature.