#3918 Matrix Multiplication Simplification

#3918 Describe how AI recently developed a more efficient matrix multiplication

This recent breakthrough is primarily attributed to Google DeepMind, specifically their AI agent AlphaTensor (introduced in late 2022) and its successor AlphaEvolve (May 2025).

For over 50 years, mathematicians relied on an algorithm (Strassen’s algorithm) that was considered the efficiency standard. AI managed to break this record not by "calculating" faster, but by discovering entirely new mathematical formulas that humans had missed.

Here is the breakdown of how the AI achieved this.

1. The Core Innovation: Gamification

DeepMind’s key insight was to translate the abstract math problem of matrix multiplication into a 3D board game called TensorGame.

The "Board": The math problem was represented as a massive 3D grid of numbers (a tensor).
The "Moves": The AI could select different numerical factors to subtract from the tensor.
The Goal: The objective was to reduce all numbers on the 3D grid to zero in the fewest possible moves. Each "move" represented one multiplication operation in the final algorithm.

2. Reinforcement Learning (Trial and Error)

DeepMind used a reinforcement learning technique similar to AlphaZero (which mastered Chess and Go). The AI played this "TensorGame" millions of times against itself.

Exploration: Initially, it played randomly.
Feedback Loop: It received a "reward" for finishing the game and a "penalty" for every step it took. This incentivized it to find the shortest path (fewest multiplications) to zero.
Result: Over time, it learned strategies that human mathematicians had never intuitively grasped, effectively "discovering" new algorithms.

3. The Specific Breakthroughs

The AI found shortcuts for specific matrix sizes that reduce the computational load:

AlphaTensor (2022): For 4x4 matrices (in specific binary fields), it found a way to do the calculation in 47 multiplications, beating Strassen’s record of 49. For 5x5 matrices, it reduced the required multiplications from 80 to 76.
AlphaEvolve (2025): In a more recent update, this newer agent solved a harder version of the problem for complex-valued matrices (used in physics and engineering). It found a way to multiply 4x4 matrices using 48 scalar multiplications, essentially breaking a 56-year-old mathematical record.

4. Hardware-Specific Optimization

Beyond just theoretical math, the AI was given a "bonus" reward if the algorithm it found ran faster on specific physical chips (like an NVIDIA V100 GPU or a Google TPU).

The AI successfully found "custom" algorithms tailored to the physical architecture of these chips.
These custom algorithms were 10-20% faster than standard libraries for large matrix multiplications, proving that AI can optimize code better than human engineers by adapting to the hardware's quirks.

Why It Matters

Matrix multiplication is the fundamental "engine" behind almost all modern computing—including graphics rendering, physics simulations, and training AI models themselves. Because these calculations are performed trillions of times per day globally, even a small efficiency gain (like reducing 49 steps to 47) compounds into massive savings in computing time and energy.

Next Step

Would you like me to explain how these new algorithms are being applied to speed up the training of current LLMs (Large Language Models)?

How AI Discovered a Faster Matrix Multiplication Algorithm This video features a mathematician explaining how AlphaTensor's 2022 discovery works and how it even helped human mathematicians subsequently improve upon the AI's results.

I would have thought this would have already been done by mathematicians evaluating the existing matrix multiplication equations, looking for ways to simplify them

You are completely right to intuitively think that humans should have solved this. Matrix multiplication is arguably the most studied algorithm in computer science, and thousands of brilliant mathematicians have spent decades staring at these equations.

However, they hit a "mathematical wall" that AI was uniquely suited to break. Here is why humans couldn't "just simplify" it further, and why AI succeeded.

1. The "Haystack" was too big (Combinatorial Explosion)

You might imagine simplifying an equation is like untangling a knot—you just keep pulling loops until it gets smaller. But for matrix multiplication, it’s not a knot; it’s a search for a specific combination of numbers.

The Scale: To find a better algorithm for a 4x4 matrix, you aren't just rearranging $x$ and $y$ . You are searching through a space of roughly $10^{30}$ possible combinations.
The Human Limit: Humans are good at spotting patterns (symmetry, repetition). Strassen's original 1969 algorithm was discovered because he looked for a specific kind of symmetrical breakdown.
The AI Advantage: The efficient algorithms the AI found look "ugly" to a human. They lack obvious symmetry or logic. No human mathematician would ever guess to try them because they look like random noise, but they work mathematically.

2. The "Recursion Trap" (The Technical Blocker)

This is the specific reason human attempts failed where AI succeeded.

Mathematicians did find faster ways to multiply 4x4 matrices years ago (e.g., the Winograd 48-step method), but they had a fatal flaw: they only worked for numbers, not matrices.

The Problem: To make an algorithm faster for huge matrices, you have to use a "divide and conquer" strategy (recursion).¹ You split a giant matrix into smaller 4x4 blocks, multiply those, and so on.
The Catch: When you multiply simple numbers (scalars), $A \times B$ is the same as $B \times A$ . This is called commutativity.
The Failure: Many human-discovered algorithms relied on this rule ( $A \times B = B \times A$ ) to work. But Matrices are non-commutative (² $A \times B \neq B \times A$ ).³
The Breakthrough: The AI (AlphaEvolve) found a 48-step algorithm that does not assume commutativity.⁴ This means it works for matrices-within-matrices, allowing it to be applied recursively to supercomputers, which previous human "simplifications" could not do.

3. We didn't know what we were looking for

For 50 years, we assumed the best algorithms would have "low rank" (a specific mathematical property). Humans optimized for this property because the math was solvable. The AI discovered that the fastest algorithms on real hardware sometimes violate these theoretical "rules of thumb." It optimized for actual speed, not mathematical elegance.

Summary

Mathematicians were looking for patterns and rules (symmetry, commutativity).

The AI was just playing a game where "ugly" and "messy" moves were allowed, as long as they resulted in the correct answer. It turned out the most efficient math is messier than we thought.

What was the original "equation" for 4x4 matrix multiplication?

The "original equation" is actually a set of 64 separate multiplication operations (plus additions) that we have taught in algebra classes for centuries.

If you were to write out the "original" instructions for multiplying two $4 \times 4$ matrices (let's call them Matrix $A$ and Matrix $B$ ) to get a result (Matrix $C$ ), it would look like the standard "Row $\times$ Column" method.

The Standard Method (The "Original" Equation)

For every single number in the resulting matrix, you take a row from the first matrix and multiply it by a column from the second.¹

To get just one number in the answer (e.g., the top-left number, C_1,1), you do this:

C_{1,1} = (A_{1,1} \times B_{1,1}) + (A_{1,2} \times B_{2,1}) + (A_{1,3} \times B_{3,1}) + (A_{1,4} \times B_{4,1})

C_{1,1} = (A_{1,1} \times B_{1,1}) + (A_{1,2} \times B_{2,1}) + (A_{1,3} \times B_{3,1}) + (A_{1,4} \times B_{4,1})

The Math Count

To fill the entire $4 \times 4$ answer grid, you have to repeat that process 16 times (once for each spot in the grid).

Multiplications per cell: 4
Total cells: 16
Total Multiplications: $4 \times 16 = \mathbf{64}$

For decades, 64 multiplications was considered the absolute hard limit. It was the "speed of light" for $4 \times 4$ matrices.

The Human Record (Strassen's Algorithm)

In 1969, a mathematician named Volker Strassen found a clever trick.² He realized that if you performed some extra additions (which are cheap for computers), you could eliminate some of the multiplications (which are expensive).

He found a way to do it in 49 multiplications (applied recursively).

Standard: 64 multiplications
Strassen (Human Best): 49 multiplications³
DeepMind's AI: 47 multiplications

Why "47" is such a big deal

It might sound like the AI only saved 2 multiplications (49 $\to$ 47), but mathematically, this is like finding a new prime number. The equation it found is not a neat "row times column" anymore; it is a tangled web of operations that somehow, magically, cancels out to the correct answer with fewer steps.

What is the improved method?

The "improved method" is not a single simple equation like $E=mc^2$ . Instead, it is a recipe involving 47 specific steps (for the modulo 2 case) that looks completely random to a human eye but works out mathematically.

While Strassen's method was based on logical algebraic grouping (like factoring a polynomial), the AI's method is based on Tensor Decomposition.

Here is the breakdown of how this improved method actually works and what the "equations" look like.

1. The Method: "Low-Rank Decomposition"¹

The AI didn't "do math" in the traditional sense. It solved a 3D puzzle.

The Puzzle: Imagine a ² $4\times4\times4$ cube of numbers (a tensor) that represents the concept of matrix multiplication.³
The Goal: Break this cube apart into the fewest possible "rank-1" pieces.⁴
The Result: Each "piece" it breaks off becomes one multiplication step in the final algorithm.⁵
- Standard Method: Breaks the cube into 64 pieces.
- Strassen (Human): Breaks it into 49 pieces.
- AlphaTensor (AI): Broke it into 47 pieces (for binary arithmetic).

2. What the "Equations" Look Like

In the standard method, you multiply specific rows by specific columns.⁶ In the AI's method, you multiply weighted mixtures of the input numbers.

If you were to write out just one of the 47 steps the AI discovered, it would look something like this (conceptually):

M_1 = (a_{1,1} + a_{2,3} - a_{4,4}) \times (b_{1,2} + b_{3,4} - b_{4,1})

Step A: It takes a random-looking sum of numbers from the first matrix ( $a$ ).
Step B: It takes a random-looking sum of numbers from the second matrix ( $b$ ).
Step C: It multiplies those two weird sums together to get a temporary number (⁷ $M_1$ ).⁸
Step D: It repeats this 47 times to get $M_1$ through $M_{47}$ .
Final Step: It adds and subtracts these 47 temporary numbers in a specific pattern to reconstruct the final answer matrix.

Why is this "improved"?

Even though preparing the sums (Step A & B) takes extra addition work, additions are incredibly fast for computers. The "expensive" part is the multiplication (Step C). By doing this weird mixing, the AI tricks the math into needing fewer multiplications total.

3. The "Hardware-Aware" Twist⁹

The most practical improvement wasn't just the "47 steps" record. The AI developed different "recipes" for different physical chips.¹⁰

For an NVIDIA GPU: It found an algorithm that didn't have the fewest steps (maybe 76 steps instead of the theoretical best), but the specific "mixing" it chose aligned perfectly with how data flows through that specific GPU's circuits.
The Result: This tailored method ran 10-20% faster than the standard one, not because the math was "shorter," but because it was "smoother" for the electricity flowing through the chip.

Summary

The improved method essentially treats the input numbers like ingredients in a blender. It mixes them into 47 unique "smoothies" (intermediate factors), multiplies those smoothies, and then un-mixes them to get the perfect result. No human would ever think to mix $a_{1,1}$ with $a_{4,4}$ because it makes no logical sense—but the AI found that it creates a mathematical shortcut.

Discovering new matrix multiplication algorithms with AlphaTensor

This video is a presentation by a DeepMind researcher explaining specifically how they turned matrix multiplication into a "single-player game" to discover these new algorithms.

From <https://gemini.google.com/app/> Google Gemini 3.0 Pro (Thinking)

Search This Blog

Learning from ChatGPT