Lord Kelvin said it best : “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” Congrats to @BrendanFoody and @mercor_ai on delivering this for AI models
AI has its PhD and now it’s on the job market. Introducing the AI Productivity Index (APEX), a benchmark that measures how well we’ve automated the most valuable industries in the world. Most benchmarks study abstract capabilities. APEX evaluates model performance on real deliverables across law, finance, consulting, and medicine. The models most capable of doing work today, according to APEX: 🥇 GPT 5 🥈 Grok 4 🥉 Gemini 2.5 Flash Other findings: - GPT 5 demonstrates the strongest performance across all 4 domains - Some cheaper models outperform more expensive models from the same provider (e.g. Gemini 2.5 Flash vs. Gemini 2.5 Pro) - The best open source model, Qwen (7th), performs only 2% behind Grok 4 overall
Show original
8.63K
36
The content on this page is provided by third parties. Unless otherwise stated, OKX is not the author of the cited article(s) and does not claim any copyright in the materials. The content is provided for informational purposes only and does not represent the views of OKX. It is not intended to be an endorsement of any kind and should not be considered investment advice or a solicitation to buy or sell digital assets. To the extent generative AI is utilized to provide summaries or other information, such AI generated content may be inaccurate or inconsistent. Please read the linked article for more details and information. OKX is not responsible for content hosted on third party sites. Digital asset holdings, including stablecoins and NFTs, involve a high degree of risk and can fluctuate greatly. You should carefully consider whether trading or holding digital assets is suitable for you in light of your financial condition.