Musk's xAI company released Grok 4.1 in a low-profile manner, topping the LMArena charts with 1483 points and bagging the top two spots in the EQ-Bench3 emotional intelligence test. The new model realizes a qualitative leap in creativity, emotional interaction and collaborative interaction, with a user preference selection rate of 64.78% and a significant reduction in the rate of disillusionment, and has been fully opened through, the X platform and the mobile application.
- 此摘要由AI分析文章内容生成,仅供参考。
Without warning, Musk's xAI company quietly released its latest big model, Grok 4.1. There was no grand launch, no overwhelming publicity, just like a master who hides his work and name, and speaks only with strength.
Grok 4.1 is now fully available on the Grok website, the X platform, and in the iOS and Android apps. This seemingly low-key release has started a quiet revolution in the AI space.

Real-world capabilities: more than parameters, more than experience
The most surprising thing about Grok 4.1 is not the underlying performance, but how it performs in real-world scenarios. xAI team emphasized in the announcement that the new model delivers a qualitative leap in creativity, emotional interactions, and collaborative interactions.
The model's ability to perceive subtle intentions is significantly enhanced, the dialog is more natural and smooth, and the overall personality performance is more coherent. What is most rare is that these emotional enhancements do not sacrifice the powerful intelligence and reliability of the predecessor model.
Behind this is xAI's further optimization of the model's style, personality, helpfulness, and alignment, on top of the same large-scale reinforcement learning infrastructure that underpins Grok 4. To optimize these dimensions, which are difficult to assess quantitatively, xAI has even developed new methods for large-scale autonomous assessment and iteration using cutting-edge intelligent bodily reasoning models as reward models.
The real-world data shows that Grok 4.1 has a 64.78% probability of being selected by user preference in a comparative evaluation compared to this previous online production model. This is not a data game in the lab, but the result of real user voting.

Generic ability at the top: SOTA scores crushed across the board
On LMArena's Text Arena leaderboard, Grok 4.1's inference model topped the overall list with an Elo score of 1,483, a full 31 points ahead of the highest non-xAI model. This gap amounts to a cross-generational advantage in AI performance evaluation.
The non-reasoning model of Grok 4.1 is not far behind, ranking second with an Elo score of 1465. This means that even without deep thinking enabled, Grok 4.1 outperforms the other models when played at full strength.
Compared to Grok 4's previous ranking of only 33rd, Grok 4.1 has made an amazing leap forward. This is not an incremental improvement, but a disruptive breakthrough.

The Emotional Intelligence Revolution: reading hearts and minds, not just understanding words
xAI benchmarked the emotional intelligence of the Grok 4.1 in the EQ-Bench3 benchmark. This test, judged by the Big Language Model, specifically assesses active emotional intelligence, including emotional understanding, insight, empathy, and interpersonal skills.
The test consisted of 45 challenging role-playing scenarios, most of which consisted of three rounds of pre-written dialog prompts. The results show that Grok 4.1's Reasoning and Non-Reasoning modes rounded out the top two on the list.

This means that Grok 4.1 not only understands what you say, but also senses why you say it, and even captures the unspoken emotions between the lines. This is especially valuable when you need a listener, not just an answerer.

Creative writing: from cold tool to warm collaborator
In the Creative Writing v3 benchmark test, Grok 4.1 also demonstrated amazing creative writing capabilities. In a test of 32 different writing prompts, Grok 4.1 came in second and third place for inferential and non-inferential modes, respectively, only slightly behind the earlier GPT 5.1.

This creative ability is not simply a template filler, but a deep blend of a true understanding of context, style and emotion. Whether you're creating a novel snippet or writing marketing copy, Grok 4.1 injects a unique human touch while maintaining professional standards.

Reducing illusions: a more reliable AI assistant
As an AI assistant for everyday use, accuracy is critical. xAI has specifically focused on reducing factual errors in the information query category of prompts in the post-training process of Grok 4.1.
Test results show that Grok 4.1 has a significantly lower rate of disillusionment in a sample of production environment information query prompts. Grok 4.1 performed well in the FActScore benchmark test (containing 500 biography-type questions about different people).

Why is this release so low-key?
Interestingly, unlike xAI's previous high-profile style, the release of Grok 4.1 was unusually low-key. This may reflect Musk's new thinking about AI development: technological breakthroughs should not rely solely on marketing, but should rely on actual performance to win user recognition.
Today, when the AI race is getting hotter and hotter, xAI chooses to speak with the product and let the user experience be the judge. This pragmatic attitude, on the contrary, highlights the strong confidence in the performance of Grok 4.1.

How to experience Grok 4.1
Grok 4.1 is now fully open:
- Visit Grok's official website
- Using the X Platform's integrated Grok functionality
- Download the iOS and Android apps
- Manual selection of Grok 4.1 in the model selector
Grok 4.1 will be pushed automatically in Auto mode to provide the best experience for users. Whether you are a creative worker, a researcher, or a regular user, you will be able to find the right usage scenario for you.

Technical resources::
- Model cards:https://data.x.ai/2025-11-17-grok-4-1-model-card.pdf
- Official Blog:https://x.ai/news/grok-4-1


Comment List (13):
Load More Comments Loading...