12:09PM EDT - Our first talk of the day is from Intel, about its next-generation Ice Lake Xeon Scalable processor.

12:10PM EDT - We're 20 minutes from the Intel talk starting, but Hot Chips will commence with a 15-minute intro talk to the conference, which we'll cover here

12:10PM EDT - This is the first 'Virtual' Hot Chips, due to COVID. Last year's attendance was 1200-1400 or so (I'm still waiting on exact numbers)

12:10PM EDT - With the conference going virtual, they cut prices, which means there has been an uptick in signups I'm told

12:11PM EDT - Highest cost for the conference and tutorials was $160. Bargain

12:11PM EDT - Tutorials were yesterday, whereas the main conference starts today

12:12PM EDT - Today there's a lot of talks on CPU and GPU. Intel, IBM, AMD, more Intel, then NVIDIA A100, Intel Xe, and Xbox Series X to finish around 6pm PT

12:18PM EDT - And here we go with the intro to the conference

12:19PM EDT - Record registration numbers. 2100+ as of this morning, still growing

12:20PM EDT - Intel is the Rhodium sponsor

12:20PM EDT - That paid for some of the equipment for streaming, and provided the studio for the event

12:20PM EDT - Platinum sponsor is AMD

12:21PM EDT - Now going through some of the attendee info - links to help with logins and such

12:23PM EDT - Presentations and recordings are usually made public by end-of-year

12:29PM EDT - Two keynotes, one from Raja

12:32PM EDT - Questions through slack through the event

12:32PM EDT - And now the first session begins

12:33PM EDT - First up is Intel Ice Lake Xeon

12:34PM EDT - Speaker was lead on Nehalem-EX, and featured in Sandy, Ice

12:34PM EDT - 10+ process

12:34PM EDT - New 2-socket whitley

12:34PM EDT - Uses Sunny Cove

12:35PM EDT - New ISA

12:35PM EDT - 384 OoO window, 128+72 in flight loads/stores

12:35PM EDT - vs cascade

12:35PM EDT - 48 kB L1D

12:36PM EDT - 1.25 MB L2 cache

12:36PM EDT - ~18% IPC over Cascade

12:36PM EDT - second FMA

12:37PM EDT - New instructions

12:37PM EDT - AVX-512 IFMA, VPMADD52

12:37PM EDT - Vector AES, GFNI, SHA-NI

12:37PM EDT - VBMI, VPOPCNT*

12:38PM EDT - (not much more detail than what's on the slides)

12:38PM EDT - Updating current software to boost perfomance

12:40PM EDT - New infrastructure architecture

12:40PM EDT - New control structure

12:40PM EDT - Distributed control and telemetry fabric

12:41PM EDT - One new fabric dedicated for power, one for other

12:41PM EDT - P-Unit for power

12:41PM EDT - Communication streamlined

12:42PM EDT - Control is IP independent

12:42PM EDT - Building new SoCs becomes easier

12:43PM EDT - Migration from Cascade to Ice

12:43PM EDT - 28 core to 28 core

12:43PM EDT - Move from 6x3 ring to 7x3 ring

12:43PM EDT - Memory is now 2 channels per segment, not 3

12:43PM EDT - So 8 memory channels total

12:44PM EDT - IOs on north and south of die

12:44PM EDT - PCIe Gen 4 (x64?)

12:45PM EDT - New IO virtualization implementation, up to 3x bw scaling

12:45PM EDT - larger TLBs and large page sizes

12:45PM EDT - 3 UPI links, independently clocked

12:45PM EDT - Doesn't say if 10.2 GT/s

12:46PM EDT - Each UPI agent has its own fabric stop for better comms to other sockets

12:46PM EDT - New memory controller design with optimizations - built from ground up, built with efficiency in mind

12:47PM EDT - Best efficiency across all frequencies. Supports top DDR4 speeds (3200 at 2DPC?)

12:47PM EDT - TME using AES-XTS 128-bit, enabled by BIOS

12:47PM EDT - When enabled, entire memory is encrypted. Key is not accessible from BIOS or software. HW generated key

12:47PM EDT - Overhead is a few percent perf impact

12:48PM EDT - Support for Optane-200 DCPMM

12:48PM EDT - At top DDR4 speed? DDR4-3200? I thought 200 was 2666 only

12:48PM EDT - New mechnaisms for latency and coherence

12:49PM EDT - Dynamic prefetch throttling - modulates prefetching under memory bandwidth to enable faster speeds rather than overloading the prefetchers

12:50PM EDT - Non-Temporal Write optimization helps low core count writes by not waiting for snoop responses - pull data from core early

12:52PM EDT - OSB - opportunitistic snoop broadcast updated, support for new opcodes to reduce latency for socket cache-to-cache by ~70ns

12:54PM EDT - Bandwidth increases compared to Cascade

12:54PM EDT - Now power management latency

12:55PM EDT - P-state and C-state transition latency were hurting performance

12:55PM EDT - New PLL design allows for not locking

12:55PM EDT - Allows transitions almost not-visible

12:56PM EDT - Latency spikes disappear when P-states change

12:56PM EDT - Also new Fabric frequency change - used to drain buffers and restart clocks. Now no longer needed, reduces latency by 3x

12:56PM EDT - Latencies on bottom right of slide

12:57PM EDT - AVX512 frequency is low compared to SSE - now some improvements

12:57PM EDT - Better power analysis of specific AVX512 instructions

12:57PM EDT - AVX512 now has smarter mapping between instructions and maps

12:57PM EDT - 3 new power levels for AVX512

12:58PM EDT - For specific instructions, end up with better frequency for 256-bit and 512-bit instructions

12:58PM EDT - Provides software writers more incentive to use AVX-512

12:59PM EDT - Speed Select Features

12:59PM EDT - SST-PP: Performance Profile

12:59PM EDT - SST-BF: Base Frequency

12:59PM EDT - SST-CP: Core Power

01:00PM EDT - SST-TF: Turbo Frequency

01:00PM EDT - Select Ice Lake SKUs will have Intel SST enabled, allowing customers to change the performance profile of the CPU based on cooling or requirements

01:00PM EDT - Dynamically adjusted at runtime

01:02PM EDT - Wrap up - Sunny Cove in Xeon on 10nm. Better infrastructure and fabric control

01:03PM EDT - Ice Lake: A Balanced CPU for All Server Usages

01:04PM EDT - Now Q&A

01:04PM EDT - Q: What is the perf impact when TME enabled? A: Target was to be less than 5%. We are seeing 1-2% on pre-prod samples. Not more than that.

01:05PM EDT - Q: How will base frequency scale for AVX-512. Only turbo in presentation A: Similar improvements will apply. Less loss of freq for similar instructions

01:06PM EDT - Q: Support additional crypto? A: Reach out to Intel if you want additional algorithms

01:06PM EDT - Q: What change in PCIe for VM improvement? A: New Virtualization engine design. Increased TLB. VT-D IOMMU running at double speed. Large page support for translation requests as well. All new, that's how 2x

01:07PM EDT - Q: 18% IPC at iso-core. How does it compare with Cascade/Cooper A: They were the same arch, cascade/cooper. No comment on SoC level performance. We will see substantial improvements at SoC level.

01:08PM EDT - That's a wrap. Next talk is IBM, head on over to that live blog

Comments Locked

24 Comments

View All Comments

  • Spunjji - Tuesday, August 18, 2020 - link

    I'm genuinely interested to see whether this ends up being one of their "shipped for revenue" releases that they stop talking about once a successor rolls around, or whether it actually gets out there in volume.
  • JayNor - Tuesday, August 18, 2020 - link

    tomshardware picked up on this tidbit on IF type features added in Ice Lake Server's fabric:

    "Intel redesigned the chip to support two new sideband fabrics, one controlling power management and the other used for general-purpose management traffic. These provide telemetry data and control to the various IP blocks, like execution cores, memory controllers, PCIe/UPI controllers, and the like. This is akin to AMD's Infinity Fabric, which also features a sideband telemetry/control mechanism for SoC structures."

    https://www.tomshardware.com/news/intel-10nm-xeon-...
  • JayNor - Tuesday, August 18, 2020 - link

    there was also this tidbit in the toms article ... 3x fabric bandwidth seems signifcant:

    "The die includes a separate peer-to-peer (P2P) fabric to improve bandwidth between cores, and the I/O subsystem was also virtualized, which Intel says offers up to three times the fabric bandwidth compared to Cascade Lake."
  • TomWomack - Tuesday, August 25, 2020 - link

    Silly question, but how wide are the FP registers into which the unit is renaming? Are they full 256-bit SIMD with other instructions using a power-gated bottom quarter, or are SIMD instructions renaming into multiple registers?

Log in

Don't have an account? Sign up now