Intel held a virtual Architecture Day presentation, disclosing details of the engineering behind several upcoming products in the consumer and data centre spaces. While exact specifications of CPUs and GPUs will have to wait till they are actually launched, we now have a better idea of the building blocks that Intel is using to put them together. Intel SVP and GM of the Accelerated Computing Systems and Graphics group, Raja Koduri, led the presentation during when multiple senior Intel engineers appeared.
The 12th Gen Core CPU lineup, codenamed ‘Alder Lake‘, is expected to launch within the next few months, starting with desktop models. These will be the first mainstream Intel CPUs to feature a mix of high-performance and low-power cores – which is common across mobile SoCs today. This follows the experimental ‘Lakefield‘ CPU which has had only a limited release so far. Alder Lake will use a more modular approach than before, with different combinations of logic blocks for different product segments.
Intel will use the terms Performance core and Efficient core, often shortened to P core and E core. For Alder Lake, the E cores are based on the ‘Gracemont’ architecture while the P cores use the ‘Golden Cove’ design. For Gracemont, Intel targeted physical silicon size and throughput efficiency, to target multi-threaded performance across a large number of individual cores. These cores run at low voltage and will be used primarily by simpler processes.
The Golden Cove-based P cores are designed for speed and low latency. Intel calls this the highest-performing core it has ever built. New with this generation is support for Advanced Matrix Extensions for accelerating deep learning training and inference.
Combined, this generation of P and E cores in the Alder Lake architecture will be highly scalable, from 9W to 125W, which covers most of today’s mobile and desktop categories. It will be manufactured using the newly announced Intel 7 process, which is a rebranding of the 10nm ‘Enhanced SuperFIN’ process. Different implementations will integrate different combinations of DDR5, PCIe Gen5, Thunderbolt 4, and Wi-Fi 6E.
The desktop implementation will use a new LGA1700 socket with up to eight performance cores (two threads each), eight efficient cores (single-threaded), and 30MB of last-level cache memory. The integrated GPU will have up to 32 execution units for basic display output and graphics capabilities. It will not have integrated Thunderbolt or an image processing block, but it will support 16 lanes of PCIe Gen5 plus another four lanes of PCIe Gen4. The matching platform controllers for motherboards will have up to 12 more PCIe Gen4 and 16 PCIe Gen3 lanes.
Two mobile versions of Alder Lake were also discussed – a more mainstream die with six P cores and eight E cores, and an ultracompact die with two P cores and eight E cores. Both will have GPUs with 96 execution units as well as image processing units and integrated Thunderbolt controllers, and will be aimed at devices that won’t have discrete GPUs.
All Alder Lake CPUs are comprised of modular logic blocks – the CPU cores, GPU, memory controller, IO, and more. They will support up to DDR5-4800, LPDDR5-5200, DDR4-3200 and LPDDR4X-4266 RAM, and it will be up to motherboard and laptop OEMs to decide which to implement. The modular blocks of each CPU will be connected through three fabrics – Compute, Memory, and IO. Intel describes 100GBps of compute fabric bandwidth per P core or per cluster of four E cores, for a total of 1000GBps between 10 such units. Last-level cache can be dynamically adjusted between inclusive and exclusive depending on load.
We now have a bit of information about how workloads will be balanced between P and E cores. Intel is announcing a new hardware scheduler called Thread Director, which will be completely transparent to software and will work with the OS scheduler to assign threads to different cores based on urgency and real-time conditions. Designed to scale across mobile and desktop CPUs, Thread Director will be able to adapt to thermal and power conditions and migrate threads from one type of core to another, as well as manage multi-threading on the P cores, with “nanosecond precision”.
Thread Director requires Windows 11, and so Alder Lake will perform optimally under this upcoming OS, though Windows 10, Linux, and other OSes will also work. It means that the OS scheduler now understands what kinds of threads require what kinds of resources, and can prioritise latency, power saving, or other parameters depending on operating conditions.
Intel has been teasing its first high-end gaming GPU for a while now, and is ramping up hype with the recent announcement of a new Intel Arc brand for GPU hardware, software and services. The first-generation product is codenamed ‘Alchemist’, and will launch in early 2022. This is a tier of the Xe architecture product stack known as Xe-HPG, or High Performance Gaming. Alchemist will be manufactured by TSMC on its N6 node. It will support hardware ray tracing as well as DirectX 12 Ultimate features such as mesh shading and variable rate shading.
Each first-gen Xe-HPG core will have 16 vector engines and 16 matrix engines plus caches, allowing for common GPU workloads as well as AI acceleration. Four such cores, plus four ray tracing units and other rendering hardware, make up a “slice”. Each Alchemist GPU can have up to eight such slices.
Now, we also know that Intel will roll out its own version of AI upscaling, called XeSS (Xe Super Sampling), to take on Nvidia’s DLSS and AMD’s FSR. XeSS is an AI-based upscaling method that combines information from previous frames. Intel is claiming up to 2X better performance by rendering at lower resolutions and then upscaling to the target resolution. XeSS will run even on Xe LP integrated GPUs, and multiple game developers are on board to support it.
While we don’t have any GPU specifications yet, Intel did say it has worked on delivering “leadership” performance per Watt. We’re sure to find out more as the launch draws nearer.
Intel also made several announcements related to its server and datacentre businesses during the Architecture Day, including a demonstration of the upcoming Ponte Vecchio architecture for big data which will be the basis of the Aurora exascale supercomputer. Other highlights were the modular ‘Sapphire Rapids’ Xeon Scalable platform, the oneAPI software stack, and an emerging product category – Infrastructure Processing Units (IPUs), designed to separate infrastructure overheads from client data and processing requirements in cloud-centric datacentres.