As generative AI moves from model-building to model-serving, the economics of compute are rapidly changing. Tensordyne Co-Founder &Chief Product Officer RK Anand argues that this shift to inference is moving AI from a cost center to a profit center and opening the door for new system architectures that aren’t burdened by “training-first” design tradeoffs. We spoke with RK about how Tensordyne is rethinking algorithmic math from the ground up, why scale-up networking is becoming table stakes for large models, and what it takes for a hardware startup to deliver with pace and precision for hyperscaler customers.

As generative AI moves from model-building to model-serving, the economics of compute are rapidly changing. Tensordyne Co-Founder &Chief Product Officer RK Anand argues that this shift to inference is moving AI from a cost center to a profit center and opening the door for new system architectures that aren’t burdened by “training-first” design tradeoffs. We spoke with RK about how Tensordyne is rethinking algorithmic math from the ground up, why scale-up networking is becoming table stakes for large models, and what it takes for a hardware startup to deliver with pace and precision for hyperscaler customers.
RK: Tensordyne is an AI inference systems company. We’re building systems for generative AI inference, purpose-built for running AI models efficiently in production. We’re a global team of more than 100 people, based primarily in Silicon Valley and Munich, Germany, with a multi-disciplinary team building AI math, chips, hardware, and software from first-principles.
RK: AI, at its core, is matrix math, multiplications and additions. Every time you generate a token, which are tiny units of text that models use to understand and generate language, you’re doing an enormous number of operations inside the chip and system.
Traditionally, those operations are done with floating point arithmetic. That’s why engineers reference FLOPs: floating point operations per second. But floating point math is demanding. It burns energy, it takes significant silicon real estate, and it drives up system cost.
Our thesis was: do the core mathematics more simply. If we can reduce how often the hardware has to do true multiplication, we can dramatically improve efficiency.
So we represent numbers in the logarithmic domain – often log base 2 because that maps naturally to digital hardware. In that representation, multiplication turns into addition: A × B becomes log(A) + log(B). Practically, that means what used to require a bulky multiplier circuit can often be handled by a much simpler adder circuit, which is smaller and more energy-efficient.
RK: Think about doing math on a whiteboard. Adding two numbers is quick, but multiplication takes more steps even for simple calculations. The same analogy applies inside a chip.
Addition takes less area and fewer steps. Multiplication takes more area, more steps, more power, and silicon area correlates directly with compute density, cost, and energy consumption. Those become detrimental factors when you’re scaling inference.
By shifting to the logarithmic domain, we can compact the hardware, increase density, and reduce power. The result is: we can deliver materially better compute density and energy consumption for the same workload.

RK: At the highest level, the opportunity is that training is a cost center, while inference is the revenue and profit center. Training creates the model, but inference is where it gets monetized—every token served, every agent run, every workload delivered. As AI consumption accelerates, the focus shifts from “How fast can we train?” to “How efficiently can we serve models at scale?”
For the last 36 months, the industry understandably prioritized training. Building frontier models required enormous datasets and tens of thousands of GPUs operating as one machine for months. That drove a training-first infrastructure stack and a huge concentration of engineering effort.
But training hardware isn’t the answer for the inference era. If you run inference on training-optimized systems, you’re paying a “training tax” on capabilities and architectural tradeoffs that are optimized for training, but are inessential for inference.
Inference is judged by different economics: cost per token, watts per token, throughput under latency constraints, and the ability to run many models efficiently and reliably. Purpose-built inference systems optimize for those realities instead of inheriting training’s compromises.
Given its relative maturity, training incumbents have created strong moats. Software ecosystems like CUDA are high walls. In inference though, those walls are lower. That’s why the shift to inference opens the window for new entrants to compete on what ultimately matters: production efficiency and economics.
Training is a cost center. Inference is the revenue and profit center.
RK: Models keep getting larger because quality requirements are rising: robustness, fewer hallucinations, better reasoning, better outputs for agents and code generation. Model builders are using new techniques to push quality up, like pre-training, post-training, reinforcement learning, or test-time methods.
As models get bigger, they can’t run on a single device. They need to run across many devices. That introduces sharding: you chop the model into chunks, distribute the work, and continuously exchange information across shards to complete each step.
This makes scale-up networking essential. The chips can’t be islands. You need a network that is low-latency, high-bandwidth, any-to-any, resilient, so the whole system behaves like unitary compute.
NVIDIA’s approach, systems that string together large numbers of accelerators, proved the point. Tensordyne will be another vendor delivering a scale-up network as part of the system, so customers can run larger models and multiple copies of larger models simultaneously.
RK: If you want to be a system vendor in AI, you need robust software infrastructure. Customers start with a trained model, with weights and parameters, and they want to map it onto your hardware. Before they “open the spigot” to users, they need tooling that can take the model, quantize it where appropriate, compile it, and generate the code artifacts that run efficiently on your system.
We’re doing extensive work on software, arguably as much as the work on chips and systems, so the developer experience is seamless and fast. The goal is straightforward: customers should be able to extract value quickly and easily from a system they’re spending serious money on, and then deliver value to their end users just as seamlessly.
RK: This is where partnerships matter. I spent nearly 17 years at Juniper Networks, and we learned early that to ship systems at scale you need supply chain, silicon partners, and manufacturing partners that can execute under demand volatility.
On silicon, you need partners who can deliver robust, qualified silicon, and deliver volume when demand spikes. In manufacturing, you need contract manufacturers that can build and test systems at scale and ship globally.
We’re also leveraging our partnership through HPE Juniper Networking. Their ecosystem, subsystems, and manufacturing relationships help us scale as a startup.
The broader point is: forecasting and operations rigor matter. Chips take months. Customers need predictable delivery so they can rack, provision, and turn systems into revenue-generating services.
RK: In the late 90s, the internet was doubling every six to nine months. Cisco was dominant, but demand was so insatiable that the market wanted multiple vendors. Within a few years, Juniper gained meaningful share by delivering something differentiated.
Today, the scale is much larger, AI is an enormous market, and the leading incumbent is far bigger than Cisco was then. That makes the challenge harder. But the same market dynamic holds: buyers want multiple suppliers. They don’t want single-vendor dependency, and they will evaluate alternatives; if the alternative is truly differentiated and proves real operational and economic value.
RK: A real pivot has continuity. It can’t be a 90-degree turn where everything you did gets thrown away. Our first foray was inference chips and systems for computer vision. But the fundamental innovation, the log-domain math approach, was the core building block. We retained and expanded those building blocks as we developed systems aimed at generative AI inference for hyperscaler and cloud data centers.
But we also looked at market realities. Automotive has much longer qualification cycles and, in our view, a smaller near-term opportunity relative to what generative AI created after the ChatGPT moment.
We formed a “tiger team” to test applicability, identify blind spots, and understand competency gaps. Once we had conviction, we brought laser focus, because a startup can’t straddle two radically different markets. We aligned with the board and investors, and the team doubled down because our mission became more consequential than ever. And our opportunity was suddenly larger than ever before.
RK: Conviction and patience. This is a long game. In software, you can get quick signals. In hardware systems, you’re competing against companies with effectively no capital constraint. It’s an uneven fight.
Our investors have been supportive, and we keep educating them on what we’re building, the market dynamics, and the competitive landscape. But the key is alignment: understanding the timeline, and staying the course as we turn differentiation into real, measurable customer value.