Processador ARM for server

Arm CPUs To Take A Bite Out Of The HPC Market​


Speaking at an HPC community event hosted by Dell, Arm’s senior director for the HPC business, Brent Gorda, said that the company is “really driving hard in the HPC community” and highlighted its partnership with companies such as Nvidia, Silicon Pearl, and Fujitsu to develop Arm-based silicon to drive HPC and AI applications.

But few HPC sites can design their own chip from scratch. Fortunately, Arm’s business model also lets partners take a ready-made core design and add custom modules to it, Gorda explained.

“There’s something called a core license whereby you can license Arm Neoverse, which is our IP. And that gives you the core building blocks, the logic itself, around which you customize and build the chip that you want to build,” he said.
Arm1.jpg

Surrounding all this is the Arm ServerReady compliance program, which certifies that a specific chip meets compatibility requirements for the Arm server ecosystem.

“Once you pass this certification, the software world is available to you. It guarantees functionality for the software, and you can then pay for supported OS releases like Red Hat.”

This ability to customize the chip for a specific application or set of applications is where Arm has an advantage, Gorda claimed, especially with where HPC and AI appears to be heading. Customers can take the Arm core engine plus the on-chip network, and add custom accelerators for their target workload.

One of the issues that has hindered Arm in the server market is software support, with many key software packages developed for X86 processor platforms. When asked if all the pieces are now in place to deploy HPC on Arm, Gorda said that in general, the answer is yes.
Arm3.webp

Gorda also pointed out that Arm acquired Allinea Software, a leading provider of software tools for HPC, about five years ago, in order to bolster Arm’s HPC software ecosystem support.
https://www.nextplatform.com/2022/02/04/arm-cpus-to-take-a-bite-out-of-the-hpc-market/
 
nVidia Grace:
5sZMG9m.png


Um módulo pode ter 2 Chips Grace, ligados via NVLink, com 144 Cores no total, 1 TB/s de Bandwidth de acesso a LPDDR5X com ECC. Quase 400 MB de SRAM no total.
Não está no slide, mas com 1 TB de RAM, terá um TDP de 500W.

K21AVtP.png


Ka6HsTO.png


Visto que o módulo liga o Grace com outro Grace, via NVLink, o segundo Chip pode ter um GPU Hopper, em vez de um segundo Grace.

ezwCBKN.png


Podem ser feitas várias combinações entre CPUs Grace e GPUs Hopper.

vIAC1Pp.png


Os Sistemas também podem ter a ConnectX-7, que já está no mercado, que é uma placa Infiniband 400 Gbit/s. Também podem ter esta placa que a nVidia mostrou, onde combinam a ConnectX-7 com a Hopper, numa "Smart NIC".
 

Now in preview: Azure Virtual Machines with Ampere Altra Arm-based processors​

The Dpsv5 and Epsv5 Azure VM-series feature the Ampere Altra Arm-based processor operating at up to 3.0GHz. The new VMs provide up to 64 vCPUs and include VM sizes with 2GiB, 4GiB, and 8GiB per vCPU memory configurations, up to 40 Gbps networking, and optional high-performance local SSD storage.

The VMs currently in preview support Canonical Ubuntu Linux, CentOS, and Windows 11 Professional and Enterprise Edition on Arm. Support for additional operating systems including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Debian, AlmaLinux, and Flatcar is on the way.
https://azure.microsoft.com/en-us/b...chines-with-ampere-altra-armbased-processors/
 

ADLINK’s Ampere® Altra® Developer Platform with Arm SystemReady certification​


COM-HPC-AMPERE-ALTRA_CPU_22040114004382541.jpg


COM HPC Ampere Altra Developer Platform key features:
  • Ampere® Altra® SoC
  • Arm Neoverse N1-based platform
  • Scalable, from 32 to 80 Ampere Altra cores (65W to 150W TDP)
  • Up to 768 GB DDR4 with 6 individual memory channels for demanding workloads
  • 3 x16 slots and 4 x4 slots PCIe Gen4
  • Open Source Firmware (EDKII bootloader with TianoCore / UEFI)
  • Arm SystemReady SR certified
  • Gigabit Ethernet support: 4x 10GbE and 1x GbE (optional)
Arm SystemReady is a compliance certification program based on a set of hardware and firmware standards: Base System Architecture (BSA) and Base Boot Requirements (BBR) specifications. The certification should give developers confidence that most standard Linux distributions can be installed out of the box, as you would expect from a consumer-based workstation. This ensures that subsequent layers of software also ‘just work’. SystemReady SR certification includes verification testing of Ubuntu Server 20.04.3, Windows PE (10.0.22000.1), VMware ESXi-Arm Fling v1.8, Fedora Server 35, FreeBSD 13.0-RELEASE, CentOS stream 9, and Debian 11.2.
https://www.adlinktech.com/en/news/com-hpc-ampere-altra-developer-platform-arm-sr

Disponível com várias opções de cor 😏

ampere-altra-development-platform_22040113524654166.jpg
 
2dLEwvK.png


Este Intel 8360Y tem 36 Cores. Nesta configuração dual socket, são 72 Cores, que é exactamente metade dos 144 Cores do Modulo com 2 Chips deste nVidia Grace.
Já agora, este nVidia Grace parece que não é um "Custom ARM Core". Parece que usa os ARM N2.
 
Talvez seja uma questão de semântica, usa os cores ARM N2 standard mas é a integração no SoC que é custom, mas isso pode ser qualquer coisa, até o NVLink ou ineterconects/controladores proprietários da Nvidia, ao invés dos "Core Link" que a ARM usa (usava?) e que são igualmente licenciáveis.

Eles falaram num novo interconnect - CMN700 - na mesma apresentação do V1 e do N2, já com suporte CCIX e CXL.

Arm-Neoverse-CMN-700-2048x1033.jpg

Arm-Neoverse-CMN-700-CCIX-and-CXL-2048x1032.jpg

https://www.servethehome.com/arm-neoverse-n2-and-v1-at-arm-tech-day-2021/4/



Já agora... a malta da Riken decidiu ver simulações do que seria fazer stacking de cache L2 à la cache L3 stack do Epyc 7003X

Stacking Up L2 Cache, RIKEN Shows 10X Speedup For A64FX By 2028​

Inspired by the idea of AMD’s “Milan-X” Epyc 7003 processors with their 3D V-Cache stacked L3 cache memory and then propelled by actual benchmark tests pitting regular Milan CPUs against Milan-X processors using real-world and synthetic HPC applications, researchers at RIKEN Lab in Japan, where the “Fugaku” supercomputer based on Fujitsu’s impressive A64FX vectorized Arm server chip, have fired up a simulation of a hypothetical A64FX follow-on that could, in theory, be built in 2028 and provide nearly an order of magnitude more performance than the current A64FX.
Believe it or not, RIKEN did not have the floorplan of the A64FX processor, and had to estimate it from die shots and other specs, but it did that as the starting point for the Gem5 simulator, which is open source and used by a lot of tech companies. (There is no way this estimate was done without Fujitsu’s approval and review, however unofficial, and therefore we think the floorplan used for the A64FX is absolutely accurate.)
riken-a64fx-larc-cmg-configs-600x573.jpg

Here is the important thing after all that simulating. Across a wide suite of HPC benchmarks, including real applications running at RIKEN and other supercomputer centers as well as a slew of HPC benchmarks we all know, the LARC CMG was able to deliver around 2X more performance on average and it was as high as 4.5X for some workloads. Couple this with a quadrupling of CMGs and you are looking at a CPU socket that could be on the order of 4.9X to 18.6X more powerful. The geometric mean of performance improvements between A64FX and LARC for those applications that are sensitive to L2 cache is 9.8X. By the way:

The Gem5 simulations were run at the CMG level because the Gem5 simulator could not handle the full LARC socket with sixteen CMGs, and RIKEN has had to make some assumptions about how that scale would work out within the socket.
https://www.nextplatform.com/2022/0...he-riken-shows-10x-speedup-for-a64fx-by-2028/
 

Arm server chip startup Ampere is headed for an IPO​

Oracle-backed server processor startup Ampere Computing said Monday that it plans to go public, filing initial confidential paperwork with the SEC.

A public listing would give Ampere an infusion of cash and potential access to more investment further down the line via public markets after Oracle has quietly invested $426 million in Ampere,
Oracle’s latest quarterly earnings report implied that it had taken a 20% to 50% stake in Ampere, based on accounting rules. James sits on Oracle’s board but it stopped treating her as an independent member after Oracle first took a stake in the Silicon Valley company.
Bloomberg News reported in October that Oracle was in talks with SoftBank for a stake that would have valued Ampere at about $8 billion, but that the company didn’t need to raise money at the time.
https://www.protocol.com/bulletins/ampere-files-for-ipo

 
Para quem ainda não sabe: Plesk tem versão para ARM, suportado no Ubuntu 20.

Pessoalmente ainda não experimentei mas deve ser algo de outro mundo.
 
Ampere Adds "Ampere1" CPU Core Support To LLVM

However, as noted last year, Ampere has begun working on their own core designs for slated introduction later in 2022
image.php


"Ampere Next-Generation" last year was confirmed to be 5nm based and have an Arm ISA compliant design and next-generation memory (DDR5) and storage capabilities. Details, however, remain light for this Ampere Altra / Altra Max successor that will usher in their own core designs. Ampere's 2022 design has also been referenced by the "Siryn" codename.

Thus I was excited to see this morning that being mainlined into LLVM was "Ampere1". Initial compiler support for the "ampere1" target is added and is compliant with the Armv8.6-A ISA. This at least confirms Armv8.6-A usage for this initial in-house Ampere core design rather than Armv9 but already a significant improvement over Armv8.2 with the Neoverse N1 cores.
https://www.phoronix.com/scan.php?page=news_item&px=Ampere1-LLVM-Clang-Compiler
 

Gigabyte G492-PD0 is the Ampere Altra Max Arm NVIDIA A100 Server​


Gigabyte-G492-PD0-Hardware-Overview-696x458.jpg

The server itself has a single Ampere Altra Max processor for up to 128 cores, but that is notable. The x86 systems usually have two processors because a lot of AI training workloads have pre-processing steps that use high-performance cores. Still, the use of the Ampere Altra is an Arm-based high-end server that is leaning heavily on accelerators.
Gigabyte-G492-PD0-Block-Diagram-696x596.jpg

https://www.servethehome.com/gigabyte-g492-pd0-is-the-ampere-altra-max-arm-nvidia-a100-server/
 
Opção estranha. Não só há a questão da performance do Processador, como ligam 64 Lanes Pci-Ex do processador a 288 Lanes Pci-Ex (8 GPUs, 8 Slots Pci-Ex e 8 NVMe), o que parece bastante desequilibrado, como o uso de tantos Switchs Pci-Ex faz levantar a questão se o uso de um segundo socket, não tornaria a plataforma mais barata.

Quase parece um produto feito para um cliente especifico, a pedido, que depois decidiram vender ao publico.
 
Não sei onde é que este foi buscar isto, porque não vi isto em mais lado nenhum...


Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU​


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d8f222-aea8-4d3d-a40b-e75bb6bf5010_1024x768.jpeg


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2234025-5b43-4cb1-8cd5-86a3e05435e4_1024x768.jpeg


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce40a8d-e103-4101-9d3e-4b2ceb3e3d83_1024x768.jpeg


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96de63a8-e1db-4aa7-a3aa-3ebfbe7fbfa1_1024x663.jpeg


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdba87224-cb15-4884-b678-fcd26290b9ae_1024x576.jpeg

https://semianalysis.substack.com/p/amazon-graviton-3-uses-chiplets-and

Além disso, custom SSD e NIC


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2edeffab-d216-4eea-90e4-0005fe9f0929_1024x580.png




https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb97a83c-ec4f-4deb-80c7-224aa3b582d9_1024x576.jpeg



EDIT: acho que cheguei à fonte


C7g (Graviton 3) C6g (Graviton 2) os restantes são Intel

FFkTnVcVQAUu2-e



E já estão disponíveis

AWS Graviton3 Hits GA with 3 Sockets Per Motherboard Designs​

Amazon-AWS-Graviton3-Instance-Families-696x383.jpg


Screenshot-2022-05-24-at-15-37-30-AWS-Graviton3-Hits-GA-with-3-Sockets-Per-Motherboard-Designs.png


One interesting note here is that the maximum vCPUs is only 64 so it does not seem like AWS is focusing on delivering instances that span multiple Graviton3’s at the moment. The Graviton3’s may be on the same motherboard, but it appears as though they are being treated as separate nodes. If they were all in the same node, we would likely have >64 core instances.

AWS is also spending resources to bring its managed services to Graviton as it increases vendor-lock in by having proprietary hardware and software solutions. AWS is following the path largely abandoned by companies like IBM and Sun-Oracle in previous eras.
Amazon-AWS-Graviton3-Managed-Services-on-AWS-696x385.jpg

https://www.servethehome.com/amazon...ckets-per-motherboard-designs-tri-socket-arm/


O Michael já andou a testar

Amazon Graviton3 Benchmarks - Nice Performance Uplift With AWS EC2 C7g

Amazon EC2 C7g are the first instances using the new Graviton3 processors. Not only are the Graviton3 processors 25% higher performance, 2x the FP and crypto performance, and 50% faster memory, but Amazon also says they use up to 60% less energy for the same performance as comparable EC2 instances.

Unfortunately I didn't have pre-launch access to C7g but since yesterday I have been carrying out many benchmarks on the new C7g Graviton3 instances. In today's article are some initial benchmarks looking at the Graviton3 Linux performance.

This initial round of Graviton3 benchmarking is looking at the C7g.4xlarge instance type in relation to the now prior-generation Graviton2-based c6g.4xlarge instance. During the on-demand testing the c6g.4xlarge was priced at $0.544 USD per hour and the c7g.4xlarge at $0.58 USD per hour.

image.jpg


Screenshot-2022-05-24-at-15-49-16-Amazon-Graviton3-Benchmarks-Nice-Performance-Uplift-With-AWS-EC2.png


When taking the geometric mean across all the raw performance benchmarks carried out on both the Graviton2 and Graviton3 instances, the c7g.4xlarge came out to being about 42% faster than the c6g.4xlarge instance type.

From my testing thus far the initial benchmarks are showing Graviton3 to offer terrific generational improvements over Graviton2. It will be especially interesting though to see how the Graviton3 performance stacks up to the Intel and AMD instance types, so stay tuned for that follow-up comparison in the coming days. That comparison will also feature an expanded selection of Linux benchmarks.
https://www.phoronix.com/scan.php?page=article&item=aws-graviton3-c7g&num=1
 
Amazon Graviton3 vs. Intel Xeon vs. AMD EPYC Performance

For today's article the following Amazon EC2 instances were benchmarked for better gauging the Graviton3 performance potential and also the price-performance in Amazon's cloud:

a1.4xlarge - The original Graviton processors using Cortex-A72 cores. The a1.4xlarge instance type was priced on-demand at $0.408 USD per hour.
c6g.4xlarge - The now prior-generation Graviton2 instance type using Neoverse-N1 cores. The on-demand c6g.4xlarge pricing was $0.544 USD per hour.
c6a.4xlarge - The AMD EPYC 7003 "Milan" instance type powered by an AMD EPYC 7R13 processor. The c6a.4xlarge instance was priced on-demand at $0.612 USD per hour.
c6i.4xlarge - The Intel Xeon Scalable "Ice Lake" instance type using a Xeon Platinum 8375C processor. The c6i.4xlarge was using the Xeon Platinum 8375C processor. The c6i.4xlarge instance type was priced on-demand at $0.68 USD per hour.
c7g.4xlarge - The new Graviton3 instance type with Neoverse-V1 cores. The c7g.4xlarge on-demand pricing is currently at $0.58 USD per hour.
https://www.phoronix.com/scan.php?page=article&item=graviton3-amd-intel&num=1

O Graviton 3 ganha... mas os únicos 16 cores são os ARM, os x86 são 8c16T
 

Cancelled the K12 CPU? Big mistake. Huge, says Jim Keller​


Now it has emerged that Jim Keller, a key architect who worked on Arm development at AMD, reckons the chipmaker was wrong to halt the project after he left the company in 2016.

In the talk, available online via YouTube, Keller discusses how when planning the Zen 3 core – now at the heart of AMD's "Milan" Epyc processor chips – he and other engineers realized that much of the architecture was very similar for Arm and X86 "because all modern computers are actually RISC machines inside," and hence according to Keller, "the only blocks you have to change are the [instruction] decoders, so we were looking to build a computer that could do either, although they stupidly cancelled that project."

That project was apparently the K12, which was planned to be AMD's first custom microarchitecture based on the 64-bit ARMv8-A instruction set, and would have led to chips that would follow on after the Opteron A1100 series chips, which were based on Arm's Cortex-A57 core designs.
https://www.theregister.com/2022/06/20/jim_keller_arm_cpu/


Não é uma noticia muito surpreendente. Era mais ou menos óbvio que esta seria a opinião do Jim Keller.
 
Back
Topo