Attacking Hard Open Problems
From Axiom's Mathematical Discovery Team
Discovery in Today’s AI4Math Landscape
Mathematical discovery is the art of finding what doesn't yet exist, the treasure hunt for mathematical examples – finding the exceptional graph, the perfect matrix, the function that solves a century-old problem. A construction shows what's possible; a proof shows what's necessary. They're two sides of mathematical progress.
In our previous blog, Axiom laid out our vision for an AI mathematician that brings unprecedented rigor via verification. Today, we introduce the complementary effort from our discovery team: solving specialized open problems by generating novel mathematical constructions, translating problems to solutions and in the meantime harvesting intuitions, and building mathematical world models trained on special classes of mathematical objects. By finding hidden patterns and synthesizing the observations into intuitions, we eventually create the constructions that make proofs possible. A discovery.
The transformation is real, but it's not what you think
The way we do mathematics is changing. Informal large language models can be a good tool to streamline literature searches, explain complex concepts, and even sharpen one’s mathematical arguments. At the same time, we at Axiom are building a prover system empowered by reinforcement learning and formal verification that automates large portions of proofs, handling technical lemmas that once took hours of a mathematician's time. This synergy marks a true and powerful revolution in the way mathematical discovery is done.
Yet it’s still day zero. “Math AGI Achieved” headlines overlook one fact: we've been here before. When symbolic notation emerged in the 17th century, it revolutionized mathematical thinking. The arrival of computers turned once-promising broad research areas – such as calculating precise approximations of pi or solving specialized systems by hand – into niche pursuits in between math and computer science. Stunningly, the four-color theorem and Kepler's conjecture became solvable because we could verify millions of cases. Every time, naysayers claimed the subject would become obsolete. Every time, they were wrong.
Will chatbots "doom" mathematics? Are we headed toward a future where machines turn us into passive spectators of LLMs’ magic, amateur solvers of AI brainteasers, or data labellers? Some advocates of large language models seem to believe so.
At Axiom, we see a richer future, demanding a diverse approach that’s not one-size-fits-all. Our prover team builds rigorous verification systems. Today, we're introducing our discovery branch: a new kind of science that generates novel mathematical objects, and connects them back to proofs.
Why AI for math discovery differs from everything else
Every text on the history of science clearly illustrates Thomas Kuhn's profound influence. In 1962, he distinguished between "normal science" – the steady, incremental progress built on previous knowledge – and major paradigm shifts that completely change the rules. Mathematics thrives on these revolutionary changes.
Consider this: 16th-century mathematicians were able to articulate formulas for cubic and quartic roots, showcasing remarkable symmetry. Then Galois appeared and decisively demonstrated that such extensions are impossible for higher degrees. But he didn't just show a negative result. He introduced transformations that kept the invariance of the equation roots – planting the seeds of group theory. More recently, Grothendieck didn't just contribute to algebraic geometry – he revolutionized it entirely with schemes, fundamentally transforming the field.
These works are more than mere refinements. They opened new universes of mathematics.
Mathematical objects are purely abstract, with no physical reason to favor one over another. Despite this, mathematicians have a remarkable talent for choosing the right abstractions – those that are both beautiful and occasionally useful. Eugene Wigner called it "the unreasonable effectiveness of mathematics," where these abstract toys keep showing up as the exact tools physics requires to describe reality.
How does this magic happen? Through discovery. Not literature searching. Discovery means conjuring genuinely new mathematical objects and exploring their potential.
Why LLMs won't discover the next Galois theory, and that's okay
Large language models are pattern synthesis machines. They've read everything, and they're excellent at remixing existing knowledge. Need to adapt a proof technique from topology to number theory? They'll suggest three methods. Want a summary of current research on elliptic curves? They'll write you a survey paper.
But here's their Achilles' heel: they're trained to synthesize, not to leap. Their extensive pre-training on existing text and their autoregressive generation mode bias them toward interpolating within the known distribution. They're the ultimate "normal science" accelerators. Paradigm shifts require out-of-distribution thinking – exactly what transformers struggle with when they're pre-trained on everything humanity has written.
Provers, an important part of our system, can sometimes face the opposite challenge. As AI mathematicians, their personalities are too uptight. They never hallucinate, their logic is airtight, but they require a level of rigor that makes mathematical discovery feel like drafting legal contracts. Every statement must be precise down to the last quantifier. That's great for verification but disastrous for exploration, where you often prefer a "roughly correct" wild idea over a precisely formulated, mundane one.
By combining discovery to formal mathematics, the model is enabled to think outside of the box. Let’s add some mad scientist vibes.
Our Specialized Toolkits for Specialized Problems
Axiom’s discovery team is using a completely different approach than how large language models do math. Instead of training on everything, we focus on specialized models for specific problem areas. Instead of natural language, we use precise symbolic representations (but without the pedantry of a proof assistant). The goal isn't to reproduce existing knowledge – it's to create solutions for problems we don't yet know how to solve.
Here are the three branches of work we do, here at Axiom:
Generative models (PatternBoost): When good examples have something in common
Many mathematical problems are really optimization or constraint satisfaction puzzles: maximize some quantity under conditions, or find exceptional objects satisfying many criteria simultaneously. Find the largest graph with no four-cycle. Build Hadamard matrices where every column agrees on exactly half its entries—all 1s and -1s, perfectly balanced. These aren't proofs – they're constructions. And the search space explodes quickly.
We developed PatternBoost to attack these problems. The idea is simple but powerful: generate lots of random solutions, keep the best ones, train a model to recognize what makes them good, and then use that model to generate better solutions. Repeat and break records.
We found counterexamples to completely disprove a 30-year-old graph theory conjecture this way. Since then, by applying Patternboost, we've generated hard-to-compute objects in high-energy physics – collision events with many jets – that physicists struggle to simulate. We've also built special triangulations of four-dimensional polytopes for string theorists classifying Calabi-Yau manifolds.
The magic ingredient of these approaches is that good solutions really do have "something in common." Even when we can't articulate what that is, models can learn them.
Translation models (Int2Int): Translating from problems to solutions and Deriving Intuitions
For many mathematical problems, no general solution method exists. We can solve particular instances – the easy cases, the small-sized examples – but each new problem requires ingenuity and trial-and-error. What if we could train models to translate problems directly into solutions?
The approach is to train on problem-solution pairs. Sometimes we work forward from problems we already know how to solve. Often we work backward – pick a random solution, then construct a problem it solves, a task that's frequently simpler than the reverse. Fine-tune on the special cases where we have techniques.
We applied this to a 130-year-old challenge: discovering Lyapunov functions for dynamical systems. Since 1892, we've known these functions control global stability – they guarantee that solutions stay bounded rather than diverging to infinity. But no general method exists to find them for arbitrary systems. By a wide margin, we discovered new Lyapunov functions that beat state-of-the-art methods, applying the translative method. The work automated mathematical creativity in domains where we've been stuck for over a century.
These “translation tools” can also be used for discovery in a different way. Sometimes, translative models help find intuitions. Train multiple models with slightly different parameters on the same problem and compare model performance. One may uncover unexpected properties of the underlying problem.
We’ve done this to bootstrapped amplitudes – fundamental objects in theoretical physics. By training models to predict one half of an amplitude symbol from the other, we found surprising regularities in their performance that pointed to new physical intuitions.
We've also applied this to predicting the Frobenius coefficients of elliptic curves. Experiments point to yet unknown relations between the parity of successive coefficients. These experiments are easy to automate – we've previously released an open-source framework called Int2Int for exactly this purpose.
What's Next for Axiom: Mathematical world models
Pre-training is a common technique for building language models. Models are first trained, unsupervised, on vast amounts of text. In this initial phase, they acquire language proficiency and learn general information about the world. Then, the model is fine-tuned on specific tasks. It is generally accepted that pre-training is key to emergent properties of LLMs: models become capable of performing tasks they would not have learned if trained on the final tasks alone. This is a consequence of models learning deep representations of language during the pre-training phase.
What if we did the same for specific families of mathematical objects? We call this approach "mathematical world models." Train a model on multiple tasks that all take the same mathematical object as input – say, a graph. Use a shared encoder to build a common representation, but task-specific decoders for different outputs. Unlike language model pre-training, this would be supervised: the model learns from labeled examples across many tasks simultaneously.
The goal is to learn a universal representation of the mathematical object – say, characteristics of the graph – that makes it easier to learn harder tasks later. These encoders could be built for different mathematical objects and then reused and composed like building blocks.
Understanding what representations these models learn may provide new insights about the mathematical objects themselves. A special bonus.
The real revolution: proving + conjecturing + discovery
The future of mathematics lies not in choosing between discovery and verification, but in their convergence. Mathematical bounds often need both a construction (what's possible) and a proof (what's impossible). Constructions aren't just endpoints—they're the building blocks that guide proof strategies, the counterexamples that stress-test lemmas, and sometimes they are the proof itself.
This is why Axiom's teams form a unified system. An informal-formal hybrid prover verifies details, explores casework, and handles logical deduction. A conjecturer proposes new problems but also formulates backward and forward conjectures essential for complex multi-step proofs. Discovery tools generate the concrete objects and patterns that fuel mathematical insight. Together, they create a feedback loop where wild creativity meets mathematical rigor.
The change isn't about replacing human mathematicians – it's about amplification. We're building AI that generates and derives, that combines linguistic and symbolic reasoning, that utilizes the strength of machines while maintaining the precision mathematics demands. We need both the careful derivation steps of proof and the wild leaps of creation.
The universe will always reward genuinely new ideas. Join Axiom to build the system to discover them.
Building The Reasoning Engine at Axiom
How hierarchical planning, verification, and self-play converge to mathematical superintelligence
For centuries, mathematics has been humanity's most powerful sandbox – a place where we construct logical frameworks to understand reality's most complex systems. Why do grid cells fire in a hexagonal pattern? Why do quantum energy levels align with the spacing of primes?
Yet mathematical progress has always been shackled by a fundamental cruel bottleneck: the scarcity of extraordinary minds. When Évariste Galois revolutionized algebra in a fevered night before his fatal duel at twenty, he left behind ideas so far ahead of their time that decades passed before his contemporaries could grasp them. Srinivasa Ramanujan, before he succumbed to starvation and ill health, channeled thousands of formulas from his dreams into notebooks – results so profound that mathematicians spent nearly a century proving what he had intuited.
Even when breakthroughs occur as mathematicians' legacy, the extreme fragmentation of modern mathematics means experts in different subfields often cannot understand each other's work, causing vital connections between domains to remain hidden for decades. This combination of scarce genius and siloed knowledge creates an extraordinarily long pipeline from discovery to application – fundamental theorems discovered today might take generations before their full implications reshape technology and society – a delay so long that Hardy didn't anticipate in A Mathematician's Apology.
We're on a moonshot mission to change that. Axiom is building a reasoning engine capable of mathematical discoveries at scales and speed previously unimaginable – an AI mathematician.
The Timing
Three trends are colliding for the first time in history.
First, neural networks have stepped beyond pattern matching into scalable reasoning, improving as compute, model size, and data grow.
Second, mathematical formalization has come of age through languages like Lean: by the Curry–Howard correspondence, proofs become executable programs, and programming languages are no longer just tools for producing outputs but instruments for certifying properties of abstract objects.
And lastly, LLMs have crossed a critical threshold in code generation, reliably producing high-quality code across many languages – including formal specification languages – and serving as a strong prior over the otherwise infinite action space of mathematics.
This synergy creates an unprecedented opportunity: reasoning engines that can conjecture and prove infinite theorems with zero human involvement.
The Convergence
Autoformalization is the Natural Language Compiler
Our data factory
Imagine you were a programmer in the 1950s. Your day to day was punching machine code into cards.
In the 1970s, you were wrestling with FORTRAN. By the 1990s, maybe C++. Today? You're basically just talking to the computer in English. Turing's childhood dream is now your reality with coding agents.
Each generation got to think a little less about the machine and a little more about the fun problem they were actually trying to solve.
Modern compilers are one-way DAGs. They take Python and transform it down the stack through multiple representations until it becomes machine code. There's some upward flow - as you type in your IDE, a partial compilation happens via Language Server Protocol, feeding information back up to create those squiggly lines and suggestions. But compilers never go fully back up the abstraction ladder. Disassembly exists, but it doesn't reconstruct your original high-level intent.
Mathematics needs something compilers never achieved: a true bidirectional cycle. For thousands of years, mathematicians think in high-level, intuitive leaps, not formal logic. Yet, math is undergoing a modernization. With proofs now spanning hundreds of pages, they are often riddled with hidden errors. In fact, every time proofs dogmatically resist being formally proven, the informal human source was wrong – just recently, mistakes were fixed during the formalization effort of Fermat's Last Theorem. The bottom formal layer catches what high-level intuition misses.
Meanwhile, autoformalization - the process of converting natural language proofs to Lean code - is a form of hierarchical planning, bridging between the abstraction layers.
Going down the stack: Autoformalization translates intuitive mathematical reasoning into formal proof – the compiler's traditional direction.
Going up the stack: Autoinformalization translates formal proofs back into human intuition – something compilers never truly do.
When combined, these create an interactive prover in natural language, freeing mathematicians to explore dozens of strategies and rapidly prototype ideas while the machine handles formalization.
But here's the most powerful part: machine discoveries at the formal level feed back up. The machine finds patterns, lemmas, and connections in the formal space that humans never intuited, then surfaces them as new high-level insights. The mathematician's intuition becomes augmented by discoveries happening in a space they couldn't naturally explore.
The compiler revolution took decades; the mathematical compiler revolution is happening now.
Formal Verification Guides Mathematical World Modeling
Our algorithmic core
You are a gold prospector in 1849. Everyone brings you shiny rocks claiming they've struck it rich.
Experienced prospectors examine them: "Looks like gold to me."
But when confronted with exotic ores, even experts disagree. Their pattern matching fails on things they've never seen.
Then someone brings an assayer's scale. The metal either dissolves in acid or it doesn't. Binary truth.
When you write a proof, it's either correct or it's not. Formal verifiers like Lean provide perfect ground truth while model judges are pattern matchers that fail when being pushed on generating genuinely novel proofs. From an engineering angle, verification gives us efficiently scalable rewards for learning.
And here's the philosophical perspective of why we need formal verification: Our self-consistent, observer-supporting universe follows rules that can be captured mathematically – from laws of physics to probability theory. Mathematics is the consistent language of our consistent universe and formal languages like Lean let us consistently operate in the mathematical world model.
We are training systems in mathematics as reality's minimal simulation – by learning to navigate the world of mathematics one grounded step at a time, the hope is that the AI has learned some fundamental rules that our reality has to follow. Video generation models learn physics too. Sometimes one ponders … where do abstract reasoning and spatial reasoning join?
Conjecturing-Proving Loop Realizes Self-Improving AI
Our discovery engine
While able to test if gold is real, finding new veins is harder: working in concert with verification, we enter the chapter of scientific discovery. Imagine you're in the middle of an ocean. Sailing towards new lands, of course, you start daydreaming about mathematics:
Your map is the Knowledge Base – showing where you've been. The entire corpus of mathematics indexed into a knowledge graph: definitions, theorems, and proofs. Formalized mathematics as a searchable, Internet-scale dataset.
Your ship is the Conjecturer – navigating uncharted territories. It spots distant landmasses through fog: "something valuable three days west." Built for open-ended exploration beyond known results, it samples out of distribution and generalizes with leaps guided by intrinsic motivations.
But when you spot an unknown island on the horizon, how do you know if it's India or the West Indies? The shape looks right, the distance seems plausible, but educated guess isn't certainty. You ask the experienced captain for wisdom that you trust – that is the Prover. Successful proofs extend the knowledge base. Failed attempts provide signals for improving both the Conjecturer and Prover. While formal verification turns "might be true" into "is true," counterexample construction shows "is false." Both grow the library.
The loop is self-reinforcing. More verified theorems mean a richer knowledge base. A richer knowledge base enables more sophisticated conjectures. More proof attempts (successful and failed) train better models. Better models generate more interesting conjectures and find proofs faster.
Axiom is building the AlphaGo for mathematics, but with infinite branching.
The Path Forward
The implications extend far beyond pure mathematics. Every complex system humans want to understand – from protein folding to quantum field theory and economic models – ultimately reduces to mathematical structures. A reasoning engine that can autonomously explore mathematical space and generate new theories doesn't just solve math problems; it provides a general-purpose tool for understanding reality.
Our founding team brings together precisely the expertise needed for this revolution. We were among the first to apply AI to compilers, bringing deep experience in programming languages and compiler technology. Our work spans from AI for mathematical discovery to pioneering self-improving systems. We're building reasoning engines that can operate in the mathematical world model at superhuman scale to tackle our most complex challenges.
The mathematical renaissance isn't coming. It's here.
Axiom is here.