
Marina Ramos / Câmara dos Deputados Flávia Morais: iniciativa faz avançar o grau de informação

Valter Campanato/Agência Brasil Congresso iluminado pelo Dia Mundial da Conscientização sobre a Epilepsia (Purple Day)

Bruno Spada / Câmara dos Deputados Delegada Ione: impossibilidade de fazer a prova no dia

Vinicius Loures / Câmara dos Deputados Reimont: medida permitirá que o Disque 100 ganhe amparo

Elon Musk agora é a primeira pessoa na história a alcançar o status de trilionário.

O evento reuniu prefeitos, vice-prefeitos, secretários municipais, lideranças empresariais, representantes de instituições parceiras e gestores

On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14-page technical report to arXiv that sent shockwaves through the AI research community. Their claim: a language model with just 3 billion parameters can match or exceed the reasoning performance of flagship systems from Google DeepMind, OpenAI, Anthropic, and DeepSeek that are hundreds of times larger. The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters, and ahead of Gemini 3 Pro, Google's high-performance flagship reasoning system, which scored 91.7. With a test-time scaling technique the team calls Claim-Level Reliability Assessment, the score climbs to 97.1, edging past virtually every system in the public record. Within hours of publication, the paper had drawn 62 upvotes on Hugging Face's daily papers feed, the model repository had accumulated 130 likes, and the GitHub repository had reached 685 stars. But the reaction on social media was not uniformly celebratory. It was, in many cases, deeply skeptical. "WHAT THE HELL is happening in AI?" wrote the user @orcus108 on X, in a post that accumulated over 161,000 views. "A 3B parameter model just put up coding benchmark scores in the same league as Claude Opus 4.5… I genuinely don't know if this is a breakthrough or if the benchmarks are broken." That tension — between genuine scientific advancement and the growing suspicion that AI benchmarks have become gameable to the point of meaninglessness — sits at the heart of the VibeThinker-3B story. And the answer matters enormously, not just for academic bragging rights, but for the multibillion-dollar question of whether the AI industry's relentless push toward ever-larger models is the only path to intelligence. Benchmark scores that defy the scaling laws of modern AI The results reported in the technical report are, by any conventional standard, extraordinary. On the mathematics side, VibeThinker-3B achieved 91.4 on AIME 2025, 94.3 on AIME 2026, 89.3 on HMMT 2025 (the Harvard-MIT Mathematics Tournament), 93.8 on BruMO 2025 (the Brown University Math Olympiad), and 76.4 on IMO-AnswerBench, a benchmark comprising 400 problems at the level of the International Mathematical Olympiad. In coding, it posted an 80.2 Pass@1 on LiveCodeBench v6, a benchmark designed to test executable code generation, and achieved a 96.1 percent acceptance rate on unseen LeetCode weekly and biweekly contests from late April through late May 2026. On instruction following, it scored 93.4 on IFEval. To put the parameter disparity in perspective: DeepSeek V3.2 has 671 billion parameters — roughly 224 times the size of VibeThinker-3B. GLM-5, from Zhipu AI, has 744 billion parameters. Kimi K2.5, from Moonshot AI, exceeds 1 trillion. VibeThinker-3B's 3 billion parameters could run on a consumer laptop. The researchers frame this result not as an anomaly but as evidence for a broader theoretical claim. They introduce what they call the "Parametric Compression-Coverage Hypothesis," which argues that different types of AI capability have fundamentally different relationships to model size. Verifiable reasoning — the kind tested by math competitions and coding challenges, where answers can be definitively checked — is what the paper calls a "parameter-dense" capability: one that can be compressed into a compact core. Open-domain knowledge, by contrast, is "parameter-expansive," requiring broad coverage across facts, concepts, and edge cases that inherently demands more parameters. The paper acknowledges this distinction directly. On GPQA-Diamond, a graduate-level science knowledge benchmark, VibeThinker-3B scored just 70.2 — well behind the 91.9 achieved by Gemini 3 Pro and the 87.0 scored by Claude Opus 4.5. The authors write that this gap "is consistent with our claim rather than a contradiction to it: the main finding is not that a 3B model has fully replaced leading general-purpose models, but that a small model can reach first-tier performance on many verifiable reasoning tasks." Inside the four-stage training pipeline that powers a tiny reasoning engine VibeThinker-3B is not built from scratch. It is post-trained on top of Qwen2.5-Coder-3B, a compact foundation model from Alibaba's Qwen team, through what the Weibo AI researchers call the "Spectrum-to-Signal Principle" — a multi-stage pipeline first introduced in the team's earlier VibeThinker-1.5B work in November 2025. The training unfolds in four major phases. The first is a two-stage supervised fine-tuning process that uses curriculum learning: the model first trains on a broad mixture of math, code, STEM reasoning, general dialogue, and instruction-following data, then shifts to a curated subset of harder, longer-horizon reasoning problems. In the second stage, samples with reasoning traces shorter than 5,000 tokens are discarded, and problems that VibeThinker-1.5B can solve more than 75 percent of the time are filtered out, forcing the model to focus on genuinely difficult challenges. The second phase applies reinforcement learning across multiple domains — mathematics, code, and STEM — using the team's MaxEnt-Guided Policy Optimization algorithm, or MGPO, which prioritizes training on problems at the model's current capability boundary rather than problems it already solves easily or finds impossible. Notably, the team found that a strategy that worked well at the 1.5B scale — progressively expanding the context window during RL training — actually hurt performance at 3B. They hypothesize that the stronger starting checkpoint meant that truncating reasoning traces during warm-up was no longer removing noise but disrupting valid reasoning patterns. The solution was to train with a single 64,000-token context window throughout. Within the math RL phase, the team also introduces what it calls "Long2Short Math RL," a secondary optimization stage that redistributes rewards to favor shorter correct solutions over longer ones, reducing verbosity without sacrificing accuracy. The technique uses a zero-sum reward redistribution that avoids biasing the overall reward signal while nudging the model toward more efficient reasoning. The third phase extracts high-quality reasoning trajectories from the RL-trained checkpoints and distills them back into a unified model through supervised fine-tuning. The team uses

A Associação Brasileira de Crédito Digital (ABCD), que representa as fintechs de crédito, assinou um acordo de cooperação com a Associação Brasileira de

Microsoft CEO Satya Nadella published a sweeping essay on Sunday laying out what he describes as the defining economic challenge of the AI

Os líderes organizacionais têm quase o dobro da probabilidade de ocultar a utilização da IA em comparação com todos os outros funcionários, 42%

Moonshot AI lançou Kimi K2.7-Code esta semana, uma atualização de código aberto para sua família de modelos de codificação K2, alegando um raciocínio

O governo dos EUA emitiu ontem à noite uma diretiva de controle de exportação sem precedentes, ordenando que a Antrópico suspenda imediatamente todo

Os criadores da variante OpenClaw de código aberto e de sucesso, NanoClaw, estão em parceria com o líder de gerenciamento da cadeia de

As habilidades dos agentes se tornaram uma parte importante dos aplicativos de IA do mundo real, fornecendo um mecanismo – geralmente um conjunto

O novo Siri AI da Apple, revelado ontem na Worldwide Developers Conference anual da Apple (WWDC 2026), pode parecer uma história de produto

Marina Ramos / Câmara dos Deputados Flávia Morais: iniciativa faz avançar o grau de informação e reflexão das mulheres A Comissão de Defesa
Quando Cabo Verde entrou em campo na Copa do Mundo de 2026, uma das histórias mais curiosas do torneio veio junto com a

Decisões recentes do STF e do STJ sobre os relatórios do Coaf reacenderam o debate sobre os limites das técnicas de investigação contra

À frente da Branding Digital, tem transformado conhecimento em crescimento de marcas, formação de talentos e fortalecimento do protagonismo feminino no ambiente empresarial,

Evento reuniu empreendedores, lideranças e parceiros em uma noite de fé, reconhecimento, conexões e apresentação do Troféu Imprensa Tok de Empreendedorismo 2026 A

Valter Campanato/Agência Brasil Congresso iluminado pelo Dia Mundial da Conscientização sobre a Epilepsia (Purple Day) A Comissão de Saúde da Câmara dos Deputados

Em um encontro marcado pela elegância e pela aproximação cultural entre Brasil e Japão, o cônsul-geral Takashi Manabe abriu as portas de sua

Com apresentação de Tucco e estreia digital marcada pela participação de Anna Lourensetti, o programa amplia o acesso às entrevistas já exibidas na

Biomédica e empresária brasileira transforma pesquisa científica, formação profissional e inovação em um ecossistema que já impactou mais de 56 mil pacientes e

Durante muitos anos, a imagem do Brasil no exterior esteve associada quase exclusivamente ao futebol, ao carnaval, às belezas naturais e ao agronegócio.

Assessoria estratégica integra engenharia, meio ambiente, urbanismo e inteligência técnica para reduzir riscos e acelerar aprovações de empreendimentos públicos e privados Durante muitos

Com atendimento personalizado, medicina preventiva e uma estrutura moderna, profissional aposta em um novo modelo de cuidado para aumentar a qualidade de vida
© 2025 Todos os direitos reservados a Handelsblatt