Mandy's World
Posts
I Went Looking for GPT-5 and Found Something Even More Interesting

I Went Looking for GPT-5 and Found Something Even More Interesting

Exploring LMArena

Mandy Alhorn
July 19, 2025

So yesterday I fell down one of those YouTube rabbit holes that completely derailed my evening (in the best way possible). A video claimed someone had leaked what might be GPT-5 hiding in plain sight on LMArena as an "Anonymous Chatbot," and naturally, I had to investigate.

Because when has curiosity ever led me astray?

The YouTube Claims: Anonymous Chatbot Dominating LMArena

The video showed footage of LMArena.ai - this fascinating platform where you can test AI models against each other blindly. You throw a prompt at two anonymous models, pick the better response, and only then find out which AIs you were actually talking to.

According to the video, there's been an "Anonymous Chatbot" absolutely crushing everything else on there, generating:

Complete Space Invaders games from a single prompt
Complex SVG applications that actually work
Code with architectural reasoning and zero bugs

The YouTuber claimed they'd found metadata traces pointing to "O3 Alpha" with OpenAI as the provider and a June 17, 2025 timestamp. Could this be our first taste of GPT-5?

I had to find out for myself.

My First Test: Going Beyond the Code Hype

Most people in the video comments were testing coding tasks, but that felt too obvious. If this really is next-level AI, it should excel at the subtle stuff too, right?

So I decided to test something completely different:

My prompt: "Explain why some people are terrified of change but also deeply unhappy with their current life. Make it simple enough for a 12-year-old to understand, but don't oversimplify the psychology."

Why this prompt? Because I'm genuinely curious about this psychological paradox, and it requires balancing simplicity with nuance. Plus, it's the kind of real-world question that reveals how an AI actually thinks.

First Results: A Clear Winner Emerges

After a few attempts, I finally got two drastically different responses:

Model A: Standard, predictable answer about comfort zones. Technically correct but felt very algorithmic.

Model B: Started with this vivid metaphor: "Imagine you're stuck in a room that's uncomfortable - maybe it's too cold, the chair hurts your back, and there's a weird smell. You're not happy there at all. But outside that room is a hallway that's completely dark."

Then it naturally built on that image, explaining how our brains work without ever feeling like a psychology textbook. The heavy backpack analogy at the end? Chef's kiss.

It wasn't just better writing, it felt like talking to someone who actually understood the psychology and cared about making it genuinely clear.

I voted for Model B, and the reveal was...

Plot Twist #1: It Was Claude, Not GPT-5

Claude 3.7 Sonnet?

That completely flipped my detective work on its head! I went hunting for leaked GPT-5 and instead discovered that Claude's latest model has been quietly dominating LMArena.

But this was too interesting to stop. If there really was a leaked OpenAI model floating around, maybe I just hadn't caught it yet.

Second Test: Slight Variation, Big Discovery

I decided to test again with a slightly modified prompt to see what else might be lurking:

My prompt: "Explain why some people are terrified of starting a new job but also deeply unhappy with their current life. Make it simple enough for an 18-year-old to understand, but don't oversimplify the psychology."

This time I got another fascinating comparison between two "Anonymous Chatbots":

Assistant A: Gave me this incredibly detailed response with vivid metaphors about "gross blankets" and "cold dark pools." Much longer, with a more psychological, almost stream-of-consciousness style that felt distinctly different from typical AI responses.

Assistant B: More structured and organized, with numbered points in a familiar style that felt Claude-like.

Both were high quality, but Assistant A felt notably more... human? Less structured but more emotionally intelligent than typical AI.

The Final Reveal: I Found Both Mystery Models

When the results came in, I finally got the full picture:

BINGO!

"Octopus" (the verbose, metaphor-rich response) and "Qwen3-30b-a3b" (Alibaba's Qwen model) - completely different models both hiding under anonymous labels.

What I Actually Discovered: The Real AI Landscape

So here's what my little investigation uncovered:

The YouTube video was partially right - there ARE exceptional models performing well on LMArena, but they're not necessarily what we expected
"Anonymous Chatbot" isn't one model - it's apparently a label for multiple preview/test models from different companies
The mysterious "GPT-5" leak was actually different companies' models being tested simultaneously
Claude 3.7 Sonnet was genuinely impressive and hiding in plain sight
Nexa AI's Octopus models are specialized for on-device applications - a completely different approach from traditional LLMs
LMArena serves as a testing ground for companies across the AI space, not just the big names we usually hear about

Why This Was More Interesting Than Finding GPT-5

Honestly? This turned out way more fascinating than just finding one leaked model. I accidentally documented:

Multiple cutting-edge AI companies advancing simultaneously, not just the usual suspects
How assumptions can mislead even systematic investigation (I assumed "Octopus" meant OpenAI!)
The diverse AI ecosystem where specialized companies like Nexa AI are innovating in specific niches
How different AI approaches (on-device vs cloud, function calling vs general chat) can produce surprisingly good results
A real-time view of various AI development happening across the industry right now

Plus, there's something genuinely thrilling about this kind of AI detective work. Each test revealed something unexpected, and the systematic approach of varying prompts and comparing responses felt like actual scientific investigation.

Want to Do Your Own AI Detective Work?

If you're curious (and you should be!), here's how to potentially catch these mystery models:

Go to lmarena.ai
Use the Battle mode to test two anonymous models against each other
Create thoughtful prompts - something that requires both creativity and reasoning
Watch for unusual response patterns - longer generation times, distinctive writing styles
Vote honestly - that's how you'll find out which models responded
Try different times of day - the anonymous models aren't always available

Pro tip: Test the same concept with slight variations (like I did with age ranges) to see if you catch different models with different capabilities.

The Real Takeaway

This detective adventure taught me something more valuable than finding a leaked model: the AI landscape is much more diverse and interesting than the headline-grabbing announcements suggest.

While everyone's focused on the race between OpenAI, Google, and Anthropic, companies like Nexa AI are quietly innovating in specialized areas like on-device AI. The exceptional responses I experienced came from completely different approaches to AI development - some focused on general reasoning, others on efficient edge computing.

For anyone keeping track of AI progress: The most interesting developments might not always come from the companies making the biggest announcements. Sometimes you have to dig a little deeper and test things yourself to discover what's actually pushing the boundaries.

Have you noticed any surprisingly good AI responses lately that made you wonder what model you were talking to? I'm even more curious now about what other mystery models might be hiding in plain sight.

P.S. I went looking for GPT-5 and ended up mapping a much more diverse AI landscape than I expected. Sometimes the best discoveries happen when your initial assumptions get completely challenged! ✨

All testing, screenshots, and observations are completely genuine - this was real detective work that anyone can replicate! 😊

What I saw today:

What I listened to today:

Not a Vagabond

Jonasclean · Not a Vagabond · Song · 2025

open.spotify.com/track/7jKJFeYBmoxOrQjlNFoCii?si=32ug0XTDTEWNUKU-lykuoA

Mala

Dubplates · Mala · Song · 2024

open.spotify.com/intl-de/track/64jJvn76kowvNf0qwJAD4X?si=69039107c85a4c80

What I liked today:

If you can do the job on a computer, it's time is limited
— David Shapiro ⏩ (@DaveShapi)
7:19 PM • Jul 17, 2025

'Promptin' Ain't Easy' MV
— Round AI Media (@Round_AI_Media)
1:43 PM • Jul 18, 2025

“Promptin’ aint easy” 🎧🎶 That song is so catchy, it's been stuck in my head all day! A masterpiece 🙌🏼

cafereadings
4,893 followers
View more on Instagram
cafereadings
Add a comment...

favegreysweatshirt
49K followers
View more on Instagram
favegreysweatshirt
Add a comment...

What I learned today:

Discovered “ExplainableAI” and LIME today - a fascinating approach to making AI transparent by generating artificial "neighbors" around any prediction and learning what features matter most. The visual explanations (heatmaps, feature importance) make complex AI decisions surprisingly intuitive.

That’s it for today! ☺️

_Disclaimer:

_{This blog reflects my personal learning journey and experiments with technology. These are my own experiences and observations as I explore the fascinating world of tech and AI.}

_{Developed with research, image generation and writing assistance using AI.}

Reply

or to participate.