Hot Chips 2020 Live Blog: Next Gen Intel Xeon, Ice Lake-SP (9:30am PT)
by Dr. Ian Cutress on August 17, 2020 11:15 AM EST- Posted in
- CPUs
- Intel
- Xeon
- Enterprise CPUs
- 10nm
- Live Blog
- Ice Lake
- Ice Lake-SP
- Hot Chips 32
12:09PM EDT - Our first talk of the day is from Intel, about its next-generation Ice Lake Xeon Scalable processor.
12:10PM EDT - We're 20 minutes from the Intel talk starting, but Hot Chips will commence with a 15-minute intro talk to the conference, which we'll cover here
12:10PM EDT - This is the first 'Virtual' Hot Chips, due to COVID. Last year's attendance was 1200-1400 or so (I'm still waiting on exact numbers)
12:10PM EDT - With the conference going virtual, they cut prices, which means there has been an uptick in signups I'm told
12:11PM EDT - Highest cost for the conference and tutorials was $160. Bargain
12:11PM EDT - Tutorials were yesterday, whereas the main conference starts today
12:12PM EDT - Today there's a lot of talks on CPU and GPU. Intel, IBM, AMD, more Intel, then NVIDIA A100, Intel Xe, and Xbox Series X to finish around 6pm PT
12:18PM EDT - And here we go with the intro to the conference
12:19PM EDT - Record registration numbers. 2100+ as of this morning, still growing
12:20PM EDT - Intel is the Rhodium sponsor
12:20PM EDT - That paid for some of the equipment for streaming, and provided the studio for the event
12:20PM EDT - Platinum sponsor is AMD
12:21PM EDT - Now going through some of the attendee info - links to help with logins and such
12:23PM EDT - Presentations and recordings are usually made public by end-of-year
12:29PM EDT - Two keynotes, one from Raja
12:32PM EDT - Questions through slack through the event
12:32PM EDT - And now the first session begins
12:33PM EDT - First up is Intel Ice Lake Xeon
12:34PM EDT - Speaker was lead on Nehalem-EX, and featured in Sandy, Ice
12:34PM EDT - 10+ process
12:34PM EDT - New 2-socket whitley
12:34PM EDT - Uses Sunny Cove
12:35PM EDT - New ISA
12:35PM EDT - 384 OoO window, 128+72 in flight loads/stores
12:35PM EDT - vs cascade
12:35PM EDT - 48 kB L1D
12:36PM EDT - 1.25 MB L2 cache
12:36PM EDT - ~18% IPC over Cascade
12:36PM EDT - second FMA
12:37PM EDT - New instructions
12:37PM EDT - AVX-512 IFMA, VPMADD52
12:37PM EDT - Vector AES, GFNI, SHA-NI
12:37PM EDT - VBMI, VPOPCNT*
12:38PM EDT - (not much more detail than what's on the slides)
12:38PM EDT - Updating current software to boost perfomance
12:40PM EDT - New infrastructure architecture
12:40PM EDT - New control structure
12:40PM EDT - Distributed control and telemetry fabric
12:41PM EDT - One new fabric dedicated for power, one for other
12:41PM EDT - P-Unit for power
12:41PM EDT - Communication streamlined
12:42PM EDT - Control is IP independent
12:42PM EDT - Building new SoCs becomes easier
12:43PM EDT - Migration from Cascade to Ice
12:43PM EDT - 28 core to 28 core
12:43PM EDT - Move from 6x3 ring to 7x3 ring
12:43PM EDT - Memory is now 2 channels per segment, not 3
12:43PM EDT - So 8 memory channels total
12:44PM EDT - IOs on north and south of die
12:44PM EDT - PCIe Gen 4 (x64?)
12:45PM EDT - New IO virtualization implementation, up to 3x bw scaling
12:45PM EDT - larger TLBs and large page sizes
12:45PM EDT - 3 UPI links, independently clocked
12:45PM EDT - Doesn't say if 10.2 GT/s
12:46PM EDT - Each UPI agent has its own fabric stop for better comms to other sockets
12:46PM EDT - New memory controller design with optimizations - built from ground up, built with efficiency in mind
12:47PM EDT - Best efficiency across all frequencies. Supports top DDR4 speeds (3200 at 2DPC?)
12:47PM EDT - TME using AES-XTS 128-bit, enabled by BIOS
12:47PM EDT - When enabled, entire memory is encrypted. Key is not accessible from BIOS or software. HW generated key
12:47PM EDT - Overhead is a few percent perf impact
12:48PM EDT - Support for Optane-200 DCPMM
12:48PM EDT - At top DDR4 speed? DDR4-3200? I thought 200 was 2666 only
12:48PM EDT - New mechnaisms for latency and coherence
12:49PM EDT - Dynamic prefetch throttling - modulates prefetching under memory bandwidth to enable faster speeds rather than overloading the prefetchers
12:50PM EDT - Non-Temporal Write optimization helps low core count writes by not waiting for snoop responses - pull data from core early
12:52PM EDT - OSB - opportunitistic snoop broadcast updated, support for new opcodes to reduce latency for socket cache-to-cache by ~70ns
12:54PM EDT - Bandwidth increases compared to Cascade
12:54PM EDT - Now power management latency
12:55PM EDT - P-state and C-state transition latency were hurting performance
12:55PM EDT - New PLL design allows for not locking
12:55PM EDT - Allows transitions almost not-visible
12:56PM EDT - Latency spikes disappear when P-states change
12:56PM EDT - Also new Fabric frequency change - used to drain buffers and restart clocks. Now no longer needed, reduces latency by 3x
12:56PM EDT - Latencies on bottom right of slide
12:57PM EDT - AVX512 frequency is low compared to SSE - now some improvements
12:57PM EDT - Better power analysis of specific AVX512 instructions
12:57PM EDT - AVX512 now has smarter mapping between instructions and maps
12:57PM EDT - 3 new power levels for AVX512
12:58PM EDT - For specific instructions, end up with better frequency for 256-bit and 512-bit instructions
12:58PM EDT - Provides software writers more incentive to use AVX-512
12:59PM EDT - Speed Select Features
12:59PM EDT - SST-PP: Performance Profile
12:59PM EDT - SST-BF: Base Frequency
12:59PM EDT - SST-CP: Core Power
01:00PM EDT - SST-TF: Turbo Frequency
01:00PM EDT - Select Ice Lake SKUs will have Intel SST enabled, allowing customers to change the performance profile of the CPU based on cooling or requirements
01:00PM EDT - Dynamically adjusted at runtime
01:02PM EDT - Wrap up - Sunny Cove in Xeon on 10nm. Better infrastructure and fabric control
01:03PM EDT - Ice Lake: A Balanced CPU for All Server Usages
01:04PM EDT - Now Q&A
01:04PM EDT - Q: What is the perf impact when TME enabled? A: Target was to be less than 5%. We are seeing 1-2% on pre-prod samples. Not more than that.
01:05PM EDT - Q: How will base frequency scale for AVX-512. Only turbo in presentation A: Similar improvements will apply. Less loss of freq for similar instructions
01:06PM EDT - Q: Support additional crypto? A: Reach out to Intel if you want additional algorithms
01:06PM EDT - Q: What change in PCIe for VM improvement? A: New Virtualization engine design. Increased TLB. VT-D IOMMU running at double speed. Large page support for translation requests as well. All new, that's how 2x
01:07PM EDT - Q: 18% IPC at iso-core. How does it compare with Cascade/Cooper A: They were the same arch, cascade/cooper. No comment on SoC level performance. We will see substantial improvements at SoC level.
01:08PM EDT - That's a wrap. Next talk is IBM, head on over to that live blog
24 Comments
View All Comments
Spunjji - Tuesday, August 18, 2020 - link
It's pretty similar to how AMD do Desktop/Server first, then Mobile with tweaks - only in reverse!anonomouse - Monday, August 17, 2020 - link
Considering that Willow Cove is basically more or less the same as Sunny Cove, I kinda doubt Sapphire Rapids would bother to "upgrade" to Willow Cove. It'd be more likely that it's a bit later, but with Golden Cove.Rudde - Saturday, August 22, 2020 - link
Willow Cove is basically Sunny Cove adapted to higher frequencies (SuperFin / 10nm+). Considering Ice Lake SP is already on 10nm+, I don't see any reason to use Willow Cove.JayNor - Saturday, September 12, 2020 - link
I don't recall seeing in any presentation a mention that Ice Lake Server has been updated to SuperFin. I think they would have been explicit, if this were so.AntonErtl - Monday, August 17, 2020 - link
Willow Cove has 1.25MB L2 (and a non-inclusive L3), like this server Sunny Cove (and the server Skylake). This server Sunny Cove also has an extra FMA unit. So microarchitecturally Willow Cove is between client and server Sunny Cove, as far as I gather from the reporting. I guess there are improvements in Willow Cove at lower levels that were ot ready in time for server Sunny Cove (server parts have longer lead times); or maybe the server team is not as keen as others to have a separate name for the core.One interesting development is that the OoO Window size is given as 384, while I had that number as 352 earlier (but don't remember from where).
Ian Cutress - Monday, August 17, 2020 - link
Development cycle. The Xeon chip takes longer to optimize and bring to market than a mobile chip. That and the process delays ofcanonomouse - Monday, August 17, 2020 - link
They said it's more or less the same core microarchitecturally, so there's not really a big difference. At that point, it's probably more to do with just with what fabrication technology they are able to use in "volume".DigitalFreak - Monday, August 17, 2020 - link
Intel has some literal "hot chips" to talk about this year.Eulytaur - Monday, August 17, 2020 - link
Disappointed that Intel didn't release any SKU's yet, I hope we get some soon because this talk about improvements with no actual SKU's is very worrying.Ian Cutress - Monday, August 17, 2020 - link
Full launch later this year. General Availability, who knows.