Q: After NVIDIA (NVDA) recently surged to a high of $1000(before 2024 split), there has been intense debate between bulls and bears, with many comparing it to Cisco’s peak during the dot-com bubble. What are your thoughts?

A: In most cases, using old data to predict the stock market in the short term is not very meaningful: there are too many variables, and the probability of a situation repeating exactly is extremely low. However, analyzing tech stocks requires examining their development logic over the coming decades, and this logic is consistent.

Q: But there are few people who can clearly explain NVIDIA’s logic. At least, I haven’t seen it.

A: Understanding NVIDIA’s logic indeed has a relatively high threshold; it has at least three layers:

  1. Is the total AI market demand (TAM) large enough? How many years can this business be profitable?
  2. Will NVIDIA’s currently booming business be short-lived? Will major companies stop purchasing once they complete their hardware arms race?
  3. Is NVIDIA’s moat deep enough? How much market share can future competitors take away?

Q: I know that most tech giants have a characteristic called a “moat.” But what is NVIDIA’s moat about? There are all sorts of opinions out there, and it’s hard to know whom to trust.

A: I’ve always believed that the moat of tech companies is mostly a combination of software and hardware, allowing them to last for over half a century. There are certainly successful companies with purely software or hardware, but this doesn’t affect my core view.

Q: Can you provide some examples?

A: I’ll pick an example from each decade over the past 50 years to illustrate.

In the 1970s, the most important software technology was the relational database, which we’ll call SQL. SQL initiated enterprise informatization, turning handwritten accounts into electronic ones, creating the first trillion-dollar track. The company we need to mention, which combined software and hardware, is IBM. In simple terms, IBM’s job was to enable enterprises to use SQL with IBM’s mainframes. In the 1980s, everyone heard about the miraculous CEO making IBM dance, but it was mainly self-promotion by professional managers. The so-called dance was cutting hardware to transition to services, but in reality, mainframes were IBM’s true moat, and they were never cut. The “service” business of installing SQL was due to explosive market demand, not invented by professional managers. IBM’s mainframe business has been enduring for 50 years, and most of our deposits are still managed by IBM mainframes. Although many people think IBM is finished, its stock price is actually at an all-time high.

In the 1980s, the most important software technology was the graphical user interface, which we’ll call GUI. GUI led to the explosive growth of PCs, becoming a tool for everyone. The representative company here is WINTEL. Microsoft and Intel are indeed two companies, but since they were the de facto standard setters for PCs, they were deeply bound in software and hardware. Their success continues to this day, although Intel has faced many issues in recent years, it remains a leader in the PC field.

In the 1990s, the most important software technology was the World Wide Web, which we’ll call WWW. The most important company combining software and hardware in WWW was Cisco (CSCO). From then until now, Cisco has been the most important company in the internet backend. If we exclude the peak of the 2000 bubble, Cisco’s stock price has actually been steadily rising. Like IBM, Cisco’s core business has been profitable since then, but it’s not as large as people imagined, and the frequency of equipment replacement is relatively low. Cisco made a huge mistake by choosing Linux for its WRT54G routers, which, due to the GPL license, forced them to release the source code, allowing all companies to make ordinary routers. If Cisco had chosen FreeBSD like Apple’s MacOS, this combination of software and hardware might have made much more money.

In the 2000s, the most important software technology was virtualization, also known as Hypervisor. The most famous company here is VMWare, whose core product is actually a bare-metal operating system. VMWare’s software is excellent, but lacking deep hardware integration, it couldn’t become a super giant and was instead bought and sold by hardware companies. The companies that successfully applied virtualization to achieve a combination of software and hardware were Amazon, Google, and Microsoft. They turned the internet into essential infrastructure for work and life, providing various information and products.

In the 2010s, the most important software technology was mobile operating systems, iOS and Android. Apple achieved a combination of software and hardware with iOS, capturing 90% of the profits in the mobile industry, which doesn’t need much elaboration.

In the 2020s, the most important software technology is clearly large language models, which we’ll call LLM.

Q: The examples you listed of software and hardware integration are all aimed at making big money for 50 years, I admit. But can LLM really be compared to those predecessors?

A: The answer is yes. This is the first layer of being bullish on NVIDIA, and I believe Wall Street has reached a consensus, which is why they are frantically chasing it.

Hinton said that GenAI (LLM) marks the victory of bionic intelligence (connectionist) over symbolic logic intelligence (symbolist) in their competition. This victory has suddenly clarified the roadmap for AI to replace humans in almost all tasks, and even how machines can do things humans can’t is becoming clear.

Jensen Huang said that human DNA sequences are also a language. We don’t yet know what it means or what the various proteins it forms do, but LLM will likely be able to tell us in the future. This opens a huge door for future medicine.

In simple terms, in a few years, LLM will become an indispensable personal companion for everyone, making the enormous cost of learning foreign languages and science subjects over a decade meaningless.

Q: Stop! You’ve been praising LLM, but NVIDIA doesn’t make LLMs. Didn’t you say it was about software and hardware integration? Also, many companies can make GPUs for training LLMs. Moreover, NVIDIA’s performance exceeding expectations in recent quarters is due to major companies rushing to purchase AI platforms to catch up with companies like OpenAI. Once they’ve all made their purchases and data centers are built, won’t NVIDIA’s performance decline?

Q: Indeed, it’s currently an arms race in the AI field, with companies rushing to acquire equipment. According to data from analysts, the first two major companies have the largest procurement volumes, while others are still far behind. It’s currently impossible to determine if demand has peaked, and based on NVIDIA’s guidance, delivery is still an issue. This arms race will likely continue for at least another year. The demand for inference, leading to massive data center rebuilds, will also persist for years. Once this round of arms buildup is complete, it will be time for upgrades. The current hardware still has significant performance issues; for instance, training something like GPT-4 takes over a year. According to Jensen Huang’s estimates, AI computing power will increase by a million times over the next decade. This is interesting as it forces major companies to continuously upgrade. This is what Jensen Huang means when he says NVIDIA will compete with itself.

Q: I understand the second layer, let’s look at the third layer. The growth in computing power isn’t necessarily monopolized by NVIDIA. How deep is NVIDIA’s moat? I’ve heard CUDA is highly praised, but isn’t it just a software library? Competitors have alternatives too. AMD has ROCm, and Intel has oneAPI.

A: Have you noticed how hard it is to find genuine comparative reviews online? Why is that? Because the gap between them and NVIDIA is much larger than you might think.

Q: I saw Intel CEO Pat Gelsinger say, “We think CUDA’s moat is shallow and small.” Silicon legend Jim Keller said, “CUDA is a swamp, not a moat.” These big names clearly don’t think much of CUDA.

A: I must admit, after seeing these comments, you might feel that CUDA isn’t a big deal. But in reality, they are using vague language to deliberately create this illusion. Gelsinger actually added some explanation, saying CUDA is only useful for training, and inference can be done without it using Intel’s AI processors. Jim Keller didn’t seriously explain what he meant by “swamp,” but he also considers X86 a swamp. In fact, it’s precisely this swamp, accumulated over more than a decade, that competitors can’t replicate. You know how to pave an asphalt road, but you don’t know how to create an identical swamp. It’s like Microsoft’s Office, a design and code mess, yet a swamp that can maintain backward compatibility.

Q: You’re still not clear. For a company like AMD, which already has strong GPGPU capabilities, why not create their own standard library instead of copying CUDA, like paving a new asphalt road?

A: This brings us to what NVIDIA’s integrated hardware and software really means. In fact, 15 years ago, manufacturers came together to create a computing framework called OpenCL. However, due to the small market and differing goals, bugs remained unresolved for a long time, leaving it half-dead. AMD’s CUDA competitor platform ROCm has been around for over seven years, but due to insufficient resources, various issues have driven users away. Meanwhile, NVIDIA, with Jensen Huang claiming it’s a software company, has more software engineers than hardware engineers.

Q: So CUDA doesn’t have any serious competitors?

A: Intel saw that both OpenCL and ROCm were stuck in the mud and decided to pave a new road, which is oneAPI. Objectively speaking, oneAPI is indeed ambitious, attempting to encompass GPUs, CPUs, FPGAs, and more from all companies into a high-level abstraction platform.

Q: I don’t quite understand. If AMD can’t handle one type of hardware, how can you handle various types?

A: To draw an analogy, what Intel is doing is somewhat like what Google did with Android using Java, running on different hardware from various manufacturers. CUDA, on the other hand, is like iOS, running only on NVIDIA GPUs but with optimal performance. Intel acquired a company called Codeplay, aiming to achieve cross-platform and portable libraries through the SYCL language. However, the challenge is that SYCL is far less popular than Java was back then, and there aren’t nearly as many programmers available.

Q: Got it, so CUDA has met a challenger?

A: Not quite yet. High-performance computing requires a comprehensive integration of hardware layers, drivers, clusters, low-level libraries, and upper-level applications (like PyTorch and compilers). CUDA has no weak points. Its competitors, apart from having similar basic GPU chip performance, lag far behind in other areas, with buggy drivers being just one example. AMD’s MI300, although strong in single-machine performance, doesn’t have much practical significance. This is what Jensen Huang means when he says their hardware is free, but the cost of creating LLMs is higher than NVIDIA’s due to time-consuming errors.

Q: What do you mean by clusters?

A: Training super-large LLMs requires the collaboration of thousands or even tens of thousands of GPUs. At this scale, NVIDIA is the only one with a dedicated software and hardware ecosystem for LLM clusters. Even when it comes to small clusters of a few GPUs, NVIDIA is far ahead. This is evident in the Hopper architecture, where NVIDIA’s proprietary NVLink and NVSwitch connect GPUs, while AMD’s PCIe can’t compete. Combined with CUDA, NVIDIA’s parallel acceleration is specially optimized. The H200, the first to adopt HBM3e, boasts a terrifying 4.8TB/s bandwidth to effectively break through the memory wall. In a few days, NVIDIA will hold GTC, and who knows what else they’ll unveil to surprise us.

Q: You’re really a fan of NVIDIA. Doesn’t it have any weaknesses that can be exploited?

A: In the U.S., with the H1B visa lottery environment, programmers are a very scarce resource. Coupled with a developed internet industry and weak foundational education, there’s a huge shortage of programmers. Low-cost-benefit tasks like driver programs and computing libraries are not favored by most American programmers.

In stark contrast is China, where foundational education is extremely competitive, and programmer resources are abundant. We see that due to the decoupling of high-tech between China and the U.S., China is bound to fully develop its new productive forces. Those libraries that AMD and Intel are struggling with are open source. With sufficient investment and focused effort, we can significantly narrow the gap with NVIDIA’s CUDA.