AMD shipped Nvidia's new AI laptop computer over a yr in the past, and the software program is lastly catching up

Following Nvidia’s reveal of its RTX Spark laptops, I attended an AMD and HP roundtable at Computex. A fellow reporter requested Rahul Tikoo of AMD and Jim Nottingham of HP, two Vice Presidents at their respective corporations, whether or not they welcomed the brand new competitor. In any case, each corporations have been pitching small machines that run huge AI fashions domestically for fairly some time now. And Nvidia had simply tossed its personal hat into the ring.

In response, Tikoo stood and gestured towards the HP-made and AMD-powered merchandise scattered throughout the desk in entrance of us. He picked up the HP Strix Halo mini PC on the desk, held it out for the room to see, and turned to Nottingham to ask a easy query: “Jim, when did you launch this technique?”

Nottingham’s reply was quick, although accompanied by a slight grin — “CES 2025.” Tikoo, nonetheless standing, slowly repeated Nottingham’s reply, earlier than trying round and grabbing one other HP-made laptop computer from the desk. Turning again to Nottingham, Tikoo was now mirroring his grin: “And this product, Jim — when did you launch it?”

At this stage it was pretty clear what Tikoo’s level was going to be.

“Two months later. February or March 2025,” got here the reply.

Tikoo, handing the laptop computer over to us on the desk earlier than returning to his seat, circled again to the unique query, answering it with a smile. “We’ve 35 merchandise with Strix Halo in market,” he mentioned. “Welcome, Nvidia, to the trendy compute journey.”

AMD, clearly, has confidence, and it isn’t exhausting to see why from the skin trying in. In any case, we do not have pricing particulars for the RTX Spark, hands-on experiences have been guided, and the chip itself will not ship till later this yr. Tikoo, to his credit score, additionally mentioned he is genuinely interested by seeing what Nvidia has particularly labored on and what it has achieved, however he was assured that Gorgon Halo (a Strix Halo refresh) could be a greater product when it arrives in Q3.

That very same reporter who requested whether or not AMD welcomed a brand new competitor requested in regards to the one factor most would say Nvidia has a transparent benefit in: the software program stack. In any case, CUDA is the explanation “simply purchase Nvidia” is a genuinely profitable technique, and ROCm, AMD’s different, has traditionally lacked lots of the options builders want for native AI deployment and growth. AMD has come a great distance, and whereas it nonetheless is not CUDA, saying which you can’t do native AI on AMD could be inaccurate as of late. The response to that stack query was extra nuanced, however the quick type is that this: AMD is engaged on it, and as somebody who’s been operating ROCm for some time on a 7900 XTX, the outdated assumptions about AMD and native AI are shortly going stale.

AMD has been promoting this class of machine since early 2025

Nvidia is late to the get together

Nvidia’s RTX Spark is the GB10 primarily relaunched for laptops and small Home windows PCs. It already powers the DGX Spark desktop, which is one thing that firm CEO Jensen Huang himself confirmed when he tied the buyer N1 and N1X chips to the identical design. It pairs a 20-core Arm “Grace” CPU with a Blackwell GPU carrying 6,144 CUDA cores, scales from 16GB as much as 128GB of unified reminiscence, and quotes as much as 300 GB/s of bandwidth with a claimed 1 PFLOP of FP4 compute. It was introduced at Computex, it ships within the fall, and Nvidia hasn’t put a value on it past saying it targets the premium finish.

AMD’s equal has been on sale for over a yr. The Ryzen AI Max+ 395, codename Strix Halo, packs 16 Zen 5 cores and 32 threads alongside a 40-CU RDNA 3.5 iGPU and as much as 128GB of unified reminiscence, with as a lot as 96GB of that addressable as VRAM. It turned up in laptops like HP’s ZBook Extremely G1a in early 2025 and in a wave of mini-PCs not lengthy after. That is the lineup Tikoo was pointing to when he requested Nottingham about these merchandise.

At this yr’s Computex, AMD additionally launched a turnkey rival to Nvidia’s DGX Spark, the Ryzen AI Halo developer mini-PC. It is a Strix Halo machine with 128GB of reminiscence that runs fashions as much as 200 billion parameters, boots each Home windows 11 and Linux, and opens for pre-orders this month at a value of $3,999. The worth is focused as nicely, because it’s the identical determine Nvidia charged for the DGX Spark earlier than the corporate later elevated it to $4,699.

By AMD’s personal telling, the {hardware} is not actually the purpose of the Halo. Tikoo describes it as an train in making the software program layer disappear, with ROCm, PyTorch, and a handful of fashions preinstalled and held in what he referred to as a best-known configuration. AMD ships it as a Ryzen AI Developer Middle, with AI playbooks on the machine and out there for obtain, validated mannequin packages so issues run on first launch, and a dedication to re-qualify the entire stack each month so it retains working because the underlying items shift. Organising one thing like OpenClaw can eat an entire weekend even when you already know what you are doing, Tikoo mentioned, and the field exists to provide you that weekend again.

The spec sheets are shut

Principally the identical specs on paper

On the spec sheet the 2 sit shut. Each high out at 128GB of unified reminiscence, and Nvidia’s 20-core Arm CPU traces up towards AMD’s 16-core, 32-thread x86 half. AMD’s subsequent step, Gorgon Halo, will push that to 192GB and 300-billion-parameter fashions within the third quarter. The underside of Nvidia’s vary can be related right here, as its configurations will start at 16GB of RAM. When requested whether or not AMD had something for patrons who do not need a $4,000 machine, the reply was fairly easy: you’ll be able to already purchase that as we speak, due to the cheaper Ryzen AI 300 and 400 chips that already cowl the lighter finish.

Nvidia does maintain some paper benefits. Its 300 GB/s of reminiscence bandwidth nearly surpasses AMD’s 256 GB/s theoretical ceiling. As nicely, the Blackwell GPU with its FP4 tensor {hardware} ought to give it a lead on uncooked compute and immediate processing, which is the a part of inference that decides how lengthy you look forward to the primary token. There is a caveat to that 300 GB/s quantity, by the best way: the GB10 was additionally mentioned to be a chip with 300 GB/s reminiscence bandwidth at Scorching Chips final yr, although truly, it ended up being 273 GB/s.

On the desktop facet, the maths already favors AMD earlier than the laptop computer combat even begins. The Ryzen AI Halo dev field is dear (although presently cheaper than the DGX Spark), however a 128GB Strix Halo mini-PC like GMKtec’s EVO-X2 sells for round $3,300, and Framework’s Desktop begins across the identical, too. You pay the $700 extra premium for the validated, certified software program stack fairly than the silicon, which is strictly the friction Tikoo says that field exists to take away.

ROCm has quietly grow to be usable for the software program individuals really run

PyTorch and co work now

An image showing an AMD Radeon RX 7900 XTX GPU that's installed on a test bench.

Tikoo answered a query on AMD’s stack with a proclamation that ROCm has come a great distance, that open supply is AMD’s “guess,” and that “open supply is the best way to go.” Tikoo is correct on that time, as ROCm is now not a easy mission bolted onto the facet of the AI ecosystem. PyTorch might be the very best demonstration of that. PyTorch 2.9 introduced ROCm into its experimental wheel-variant work, making set up much less awkward with appropriate tooling, however the extra essential components are what’s occurred since. 2.11 added device-side assertions and TopK/radix-select optimizations for AMD GPUs, and a pair of.12 added expandable reminiscence segments, rocSHMEM symmetric reminiscence collectives, and FlexAttention pipelining. The framework that sits below an enormous quantity of contemporary AI work now treats AMD as an precise accelerator goal, fairly than a group workaround.

HIP is what closes a few of the remaining distance. On a ROCm construct of PyTorch, a lot of the acquainted torch.cuda API nonetheless exists, however these calls execute by HIP on AMD GPUs fairly than by CUDA on Nvidia {hardware}. In day-to-day use, it means a number of PyTorch code written with CUDA assumptions can run on AMD {hardware} with out being rewritten round a separate AMD-specific API.

The remainder of the native inference toolkit has adopted. llama.cpp has each Vulkan and ROCm backends, and Ollama, LM Studio, and ComfyUI all run on AMD now. On Strix Halo, Vulkan is commonly the simplest path and, in lots of llama.cpp-style setups, will be quicker for token technology, whereas ROCm tends to matter extra for immediate processing, long-context habits, Flash Consideration, rocWMMA, and something that advantages from AMD’s HIP compute stack. Which backend wins depends upon the mannequin, context size, quantization, and app, however the essential half is that there is now a alternative. Two years in the past, this was a weekend of repeated compilations, testing, measurements, and hope. Now, a lot of it simply installs and works.

The model state of affairs has improved too. ROCm 7.2.4 is the most recent Linux-side high quality launch targeted on inference efficiency and stability for Intuition GPUs, whereas AMD’s Radeon and Ryzen documentation individually lists ROCm 7.2.1 assist for Radeon RX 9000, choose RX 7000 playing cards, and Ryzen AI Max, AI 300, and choose AI 400 APUs. Home windows is now a part of AMD’s ROCm story for client {hardware}, particularly round PyTorch, even when Linux stays the broader and extra mature goal.

I’ve been operating ROCm lengthy sufficient to recollect when a 7900 XTX meant manually constructing half the stack and accepting that a few of it might by no means work. That card and the {hardware} round it sit on the official assist checklist as we speak. “You want CUDA” is more and more changing into much less related as time goes on, and that is implausible to see.

ROCm nonetheless trails CUDA in particular methods

Nevertheless it’s much less of an issue that you simply may suppose

None of that makes ROCm equal to CUDA, however AMD is refreshingly conscious of that. Once I requested Tikoo what gaps nonetheless wanted filling, the very first thing he named was sandboxing for brand new agentic use circumstances. “That is one of many issues we wish to deal with shortly,” he mentioned.

The larger one, although, was what he referred to as getting “day-zero on the endpoints.” The purpose is to make ROCm growth for AMD’s data-center Intuition playing cards “100% usable” on endpoint integrated-graphics options, which means the iGPUs in machines like Strix Halo. “So we preserve growing that library,” he mentioned. It’s a candid reply, as a result of it will get on the hole AMD nonetheless has to shut: ROCm is significantly better than it was, however AMD nonetheless wants extra of that data-center software program work to hold cleanly right down to client and built-in graphics.

The NPU has the identical drawback, simply with a distinct software program stack. “While you need efficiency you go to the GPU, however while you need effectivity you go to the NPU,” Tikoo mentioned. Meaning AMD wants an ISV library for NPU use circumstances too, and he referred to as that “an enormous focus.”

lm studio running showing that the gpt-oss-120b model is loaded

Traditionally, AMD has struggled with that sort of software program parity. Flash Consideration on ROCm has usually lagged behind, and Strix Halo has already uncovered the type of edge case that also makes ROCm really feel much less mature, with PyTorch Flash Consideration failing on gfx1151 in some builds. Quantization libraries like bitsandbytes are now not CUDA-only in the best way they as soon as have been, however they’re nonetheless an excellent instance of the issue: CUDA is the default path, and AMD assist tends to reach later, with extra caveats, and after extra work from AMD, maintainers, and customers.

Coaching is the weakest story of all, though that is altering too. Zyphra’s ZAYA1-8B was educated on AMD Intuition MI300X clusters, which is a crucial milestone for AMD to cross. The corporate describes it as the primary large-scale MoE basis mannequin educated totally on AMD Intuition MI300X GPUs, AMD Pensando networking, and ROCm. It is spectacular, however learn between the traces: we’re a number of years into the LLM craze, and if that sort of run can nonetheless be newsworthy, it tells you ways CUDA-first the coaching world stays.

The broader ecosystem additionally strikes CUDA-first. New fashions? CUDA first. AI software program packages? CUDA first. ROCm ultimately arrives, however often with a delay, and infrequently with extra tough edges than the Nvidia path. Ollama, for instance, nonetheless has a behavior on Strix Halo of timing out whereas it hunts for the GPU and quietly dropping to the CPU. None of that is a dealbreaker in case you do not thoughts tinkering, however you may hit issues a CUDA consumer by no means would.

The {hardware} itself will be one other hurdle for AMD. Nvidia’s Tensor Cores and CUDA libraries stay the reference level for the low-precision matrix math these workloads lean on, particularly when you get into FP8, FP4, and the quick prompt-processing path reviewers can be watching intently. If Nvidia’s laptop computer seems to be quicker at immediate processing as soon as shoppers pay money for it by the tip of the yr, I will not be too shocked.

With all of that mentioned, ROCm is sweet sufficient to cease being the explanation to not purchase AMD. It is not adequate to name it CUDA, however for an fanatic operating native LLMs on a Strix Halo machine, PyTorch with llama.cpp or Ollama or LM Studio is a supported, working path. As are lots of the native AI workloads you’d wish to run, akin to ComfyUI. The software program hole that used to justify ignoring AMD has shrunk to a listing a lot smaller than you’d suppose. Whereas Nvidia might nicely ship one thing wonderful within the fall, you should buy AMD’s model as we speak and run it, which is greater than you’ll be able to say for Nvidia’s machine trapped behind glass with no guarantees on the subject of pricing.