7 Comments
User's avatar
Emanuel Maceira's avatar

The inference inflection point section nails something that doesn't get enough attention: the edge deployment gap between what's technically possible and what's actually shippable.

From the IoT connectivity side, the companies you mention (WebAI, FemtoAI, PolarGrid, etc.) are solving the compute layer, but there's a parallel infrastructure problem nobody's mapped well yet -- the connectivity layer for distributed edge inference. When you push models onto gateways, sensors, and industrial controllers, the model update pipeline (OTA weights, quantized checkpoints, distilled configs) becomes a connectivity engineering problem as much as an ML one. You need reliable, low-latency uplinks that work in environments where WiFi doesn't exist -- warehouses, agricultural fields, oil rigs, contested defense environments.

This is where multi-IMSI eSIM gateways and hybrid connectivity stacks (cellular + LoRa + satellite fallback) become critical edge AI infrastructure. The model is only as good as your ability to update it and monitor its drift in production.

I'd argue there's a missing sixth frontier here: the MLOps-for-edge stack. Monitoring thousands of tiny models deployed across heterogeneous hardware, detecting drift without cloud round-trips, and managing rollbacks at the device level. That's a fundamentally different problem than cloud MLOps, and nobody has won it yet.

Naveen Rao's avatar

"World models are not a vertical-specific kind of tool — they’re a new substrate for machine intelligence, analogous to what LLMs did for text-based reasoning. The industries that build on top of them early will have a significant head start on deploying agents that work in the real world. We’re excited about companies building the architectures and simulators that make world models possible across industries."

Where do you see "neuroAI" in all of this evolution? My best guess would be a niche within the segment you descibe in World Models..? 'Physical world' in this case being human cognition, based on biological and behavioral data stacks. Thanks for writing this and shedding any light on this area.

Dorian Innes's avatar

Great insight, Janelle and your insight is exactly mirroring why we held off building our consumer platform until recently. We needed memory and context layers as well as continual and reinforcement learning to become more production ready. This time last year, when we did some internal tests, it wasn't even close. We tried again last summer, and things were better, but still not there. But now it feels like we're there or so close we'll be there in a matter of weeks/months at most. It's a really exciting time because now, after all these years, we can finally build our dream platform.

Koen The AI Plumber's avatar

This is phenomenal work, Janelle — it perfectly captures what I’ve been seeing on the ground with enterprise AI programs in Europe. The shift you describe from “training-first” to an inference-centered, always-on AI stack is exactly where the real operational pain (and value) now lives.

Rohan Jaiswal's avatar

The 78% invisible failures statistic is provocative, because it implies the metric exists despite the failure being invisible by definition. The 93% persistence rate is the more useful finding. It says model upgrades don't fix the underlying pattern. Tracking AI infrastructure at theaifounder.substack.com, I see most teams measure visible failures and assume the rest are rare. How does the Bigspin methodology detect invisible failure, and what proxy stands in for something that escapes traditional monitoring?

Emanuel Maceira's avatar

The edge and on-device inference section is where this roadmap gets most interesting for physical AI deployment. Companies like WebAI, FemtoAI, and Aizip are building the compute layer, but the connectivity orchestration between edge nodes and cloud remains the biggest unsolved gap. When you're deploying inference on industrial robots, autonomous vehicles, or IoT sensor fleets, you need eSIM-based multi-carrier failover, OTA model update pipelines, and deterministic data routing -- not just efficient silicon. The next frontier isn't just inference optimization, it's the fleet-scale operations layer that keeps thousands of heterogeneous edge devices connected, governed, and continuously updated.

David Wilkens's avatar

Head spinning. More and more memory seems like the key anchor point of inference. Forgive me if I not fully understanding. It seems that a performance ceiling will be hit unless software defined memory tools such as Weka, Supermemory, Letta and Mem0 get more deeply integrated with the underlying hardware and the hardware for that matter gets ever more integrated into a unified and scaled system. This is probably where NVDA is trying to go, but their inference tooling is not necessarily "the way." Is this another $4T company in the making? Perhaps that is Google with their TPUs, but do they tool their memory differently? Really it's a coordination issue with the whole memory and storage stack from on-chip SRAM to the HBM KV cache and the SSD further down. Do you think the Inference and memory need their own integration of software and hardware? Does the memory work that way? Does it need to?