AMD launched another impact on Intel server chips

In the past year, as AMD talks about its re-entry into the server processor field for the first time, and brings Intel some real, much-needed, very direct competition plans, and then again and again in its processor line After the delivery on the picture, AMD gradually proved that they are serious in the X86 computing field dominated by Intel.

With the introduction of the third-generation “Milan” Epyc 7003 processor, it will become even easier. However, customers hope that AMD’s processor should be delivered many years ago.

But don’t be confused. Making things easier does not mean easy, because the latest quarterly financial results of Intel’s data center business unit are more advanced than before, which means that Epyc’s comeback is not as easy as the Opteron offensive ten and a half years ago.

The enthusiasm for AMD X86 server processors is affected by many factors, the most important of which is that Intel’s computing, networking and storage capabilities in 2021 are much stronger than when AMD launched Opterons in 2000.

Although Intel has messed up their roadmap and manufacturing in the past few years, this is nowhere near as bad as their decision to make Itanium, which is not really compatible with Xeon.

So, when AMD is able to capture 20% or more of the market share in certain market segments of the X86 server space fairly quickly, it is not surprising.

With the launch of the third-generation Epyc, we have high expectations for the fourth-generation “Genoa” Epyc 7004 series that will be released in 2022. However, AMD’s market share growth rate has been slow.

Compared with the mid-2000s, the server shipments of the new era have increased by about 50% every quarter-and some of these products (such as hyperscale producers and cloud builders) are definitely huge. We believe that this time the Epyc server chip business is on a better and more sustainable growth path, which will bring a lot of pain to Intel in the next few years. This should be because every IT customer should benefit from fierce and direct competition, and Intel has not really owned it in the server processor field for more than a decade, and during the hegemony period, its gross profit in the data center group proved The benefits it brings far exceed the suspicion of him.

The indirect competition between IBM’s Power processors and fleeting members of the Arm team is not enough to weaken Intel’s armor. With the emergence of the Epyc processor, the re-emergence of AMD will make Intel more difficult.

Under the guidance of President and CEO Lisa Su, the company is climbing rapidly. With Intel jumping off the castle steps due to a 10-nanometer manufacturing error, the company has been able to leave some traces on Intel’s armor.

Although the upcoming “Ice Lake” Xeon SP processor will enable Intel to withstand the Epyc 7003 attack from Milan, this actually started when AMD and Intel began delivering their chips to hyperscale manufacturers and cloud manufacturers in the fourth quarter. Yes, but the fact remains that Ice Lake should compete with the second-generation “Rome” Epyc 7002s, but this is not the case. With the help of Ice Lake and “Sapphire Rapids” follow-up products, Intel will get better, the technology is based on the improved 10-nanometer manufacturing process introduced later this year or early next year. However, Intel’s fab did not complete the 10-nanometer manufacturing on time, and it was even a little late, rather than being so severely delayed as it is now.

let it go. This is the chip business, and this is how the chip sometimes falls. Everyone-we mean everyone-will encounter some problems in the chip foundry, which will be plagued by manufacturing capacity limitations and other delays in future process leaps. Everyone will enter the penalty box for the longest period of time, especially as Moore’s Law has progressed from a slowdown in the past few years to intubation. As far as we know, 10 nanometers and 7 nanometers are difficult for everyone, and 5 nanometers will be even more difficult. We don’t have much hope for anything that is easy in the 3 nanometer cycle. There are small chips everywhere! And AMD already knows how to do better than Intel.

In this context, we will introduce AMD’s new product Milan series, and will continue to study the new processors in the Epyc 7003 series, including an overview of the new Milan chips and their comparison with the previous generation Opteron and Epyc processors. In order to in-depth discussion of the architecture, the competitive position of these CPUs in the server space, and the competitive response of Intel and other vendors that provide server CPUs, as well as the OEMs and ODMs that consume them.

There is a feedback loop between the design of the PC and the server. RISC/Unix server vendors could use this feedback loop to amortize the design cost on a broader basis, thereby gaining more profit from the customer. But currently, only X86 server manufacturers Intel and AMD and GPU manufacturers Nvidia and AMD are still able to perform this operation for their computing engines. One day, there may be an Arm supplier that is both a client and a server. It may be Nvidia or Apple. Intel also hopes to provide GPUs for both clients and servers. AMD’s Ryzen chip for the client and the Epyc chip for the server have the same architecture. The Milan server chip is based on the Zen3 core, a technology that has been used in PC CPUs for many months.

As far as Milan chips are concerned, the memory and I/O hub chips at the core of the architecture are still basically the same, except for some adjustments to support the nested paging of the main memory and running the Infinity Fabric interconnect to link the Zen3 core. Link to the memory and the I/O hub chip (and therefore between each other) at the same 1.6 GHz clock speed as the main memory clock (pumped twice with the main memory clock to make the main memory run at 3.2 GHz). In the past, the two clocks were not synchronized, and this synchronization was a factor in improving the performance between the Rome and Milan processors. In applications that are sensitive to memory bandwidth and latency, clock synchronization is 3% to 5% higher than Rome processors that do not make the two clocks run at the same speed.

The following are the general feeds and speeds of the three generations of Epyc processors:

  AMD launched another impact on Intel server chips

As you can see, the number of cores and threads has not changed much between the Roman and Milan generations, and both chips use the 7-nanometer process of Taiwan semiconductor Manufacturing Company. AMD still provides simultaneous multithreading (SMT) support for two virtual threads per physical core, instead of pushing it to four threads or eight threads per core like IBM uses Power8 and Power9 chips.

The memory and I/O system are basically the same, each Epyc slot has eight controllers, and each slot has 128 channels of PCI-Express 4.0 I/O. The thermal design of the processor is the same.

There are good reasons for this: Milan chips must maintain slot compatibility with Rome chips, otherwise motherboard and system manufacturers will cause great pain to AMD. This must be a performance improvement under all these constraints, and this is precisely the product delivered by AMD and Milan. Compared with Rome, in a set of representative workloads, the average number of instructions per clock (IPC) is more Out of 19%.

The 19% increase in the volume of each socket is far better than Intel’s 5% to 10% improvement in IPC per generation per socket shown by Intel. Frankly speaking, this may be much better than many people’s expectations of AMD.

You can’t complete all the work at once, or you can’t complete any work at all. In fact, Milan had to wait until the Ryzen PC chip market needed a fatter core complex to complete certain things of flattening the NUMA domain, because they were all plugged in with that memory and I/O hub chip to create something for the operation In terms of the system and its applications, it looks like a monolithic socket (more or less).

AMD launched another impact on Intel server chips

Specifically, the Rome core complex has four Zen2 cores, each with its own L2 cache, and they hang a shared 16 MB L3 cache. Two of the modules are etched onto a small chip, which is essentially Ryzen’s baby PC chip, and then eight of the chips are interconnected with the Infinity Fabric in the socket to create a 64-core Rome chip. By the way, both Rome and Milan are using Infinity Fabric Gen 2.0 (x-GMI-2 in the picture above) to link the core complex to the memory and I/O chips in the center of the package.

In Milan’s design, the core system is unified. All eight Zen3 cores have dedicated L2 caches, and they all share a 32 MB L3 cache, and are implemented in the form of small chips. Eight of these chiplets provide up to the same 64 cores, but the number of NUMA domains represented by the entire socket is reduced by half, so the operating system and virtual machines see more raw processing and caching. In fact, 32 MB of L3 cache can be allocated for a single core, and this is the case in some SKUs of the Rome product family (especially for very high-performance SKUs).

So, for example, in Epyc 75F3, only four of the eight cores are turned on, for a total of 32 cores, each four of each core has a full 32 MB shared L3 cache and all eight DDR4 use 256 GB The memory stick activates a memory controller with a maximum capacity of 4 TB per slot. On the eight-core Epyc 72F3 chip (this is the extreme case of the Milan product line), only one of the eight cores is activated and runs at 3.7 GHz, which is close to its 4 GHz turbo speed. Each core has 32 GB of L3 cache, which is a large number, and compared with the Roman predecessors, based on the combination of core number, clock speed and IPC improvement, the performance of some applications can be exceeded Expected huge contribution.

AMD offers a total of 19 Milan Epyc 7003 processors, which are divided into three categories, as follows:

AMD launched another impact on Intel server chips

As in the past, the F model has been optimized for the fastest core clock speed frequency for a relatively small number of cores-only possible on a small number of cores, which will inevitably lead to a higher L3 cache to core ratio. There are four of these models with 8, 16, 24, and 32 cores. The other set of 5 Milan chips has a very high core density and therefore a high number of threads. They are aimed at server virtualization and database workloads, both of which, like many cores and threads, can increase throughput. Then, ten Milanese processors were “balanced and optimized” to balance the difference between relatively high performance and low total cost of ownership. Like the Naples and Rome processors, some Epyc chips are marked with P.

Like the previous two generations of Epyc chips, the third generation does not support NUMA machines with more than two slots. AMD is about to withdraw from the market, where there are machines with four or eight sockets Intel and IBM.

As we said, we will delve into the details of Milan processing in the following story. At the moment, we just want to provide you with data about the new chips, their mutual comparisons, and the comparisons with the previous generation Opteron and Epyc processors. Therefore, without further ado, here are the SKUs of Milan:

  AMD launched another impact on Intel server chips

The high-performance F model is shown in bold italics, and the P single-processor chip is highlighted in gray. This is our customization of the Epyc series. We have calculated the original performance index based on the number of cores and clock speed in the Milan line, and then created a relative performance index that takes this into account and the original improvement of IPC over time to provide you The relative performance index based on the following items: the performance of the quad-core “Shanghai” Opteron 2387 with a frequency of 2.8 GHz, its relative performance is 1.0, and the price/performance ratio is US$873. The pricing is the unit price for customers who purchase 1,000 processors, which is the standard for Intel and AMD’s pricing.

The following is the summary and speed of the Naples and Rome Epyc chips and the Shanghai Opteron 2300:


The relative performance of the Milan chip ranges from less than 6 for the octa-core Epyc 72F3 to 31.6 for the Epyc 7763, anywhere from the lowest of US$94 to the highest of US$414, the relative effect per unit. The 16-core Epyc 7313P and 24-core Epyc 7443P provide the best price/performance ratio. Interestingly, the low-core, high-clock, high-L3 cache eight-core Epyc 72F3 is only slightly less than half, and the price is $414. Performance indicators are higher than in early 2009. The performance and value benchmarks of Shanghai Opteron processors are higher. This may seem crazy, but it just shows you that Dennard zoom really stopped a long time ago.

It is difficult to generalize the product lines whose SKUs cannot be accurately matched between generations, but it seems that AMD generally provides higher performance and more value for money-but, of course, not in all cases, AMD can Provide higher performance and higher cost performance. Jump from Rome to Milan. To match the 48-core Epyc 7643 running at 2.3 GHz and the 48-core Epyc 7642 running at 2.3 GHz. The improvement in IPC alone has improved performance by 19%, but AMD has also increased the price from US$4,775 for the Rome chip to US$4,995 for the Milan chip, which significantly increases the price/performance ratio by 10%.

It boils down to a case, which is why we created the above table. You can compare your own heart.


The Links:   PM150RSE120 HMC1052L

Author: Yoyokuo