Managing Power

When designing a processor, power delivery is as important as the microarchitecture. There are many ways to provide power, with the options typically balancing design effort, die area, efficiency, and simplicity.

The best way to look at it is to consider a basic implementation. This might be to supply one power rail to all cores with one set voltage to all of them, and a separate power rail to the graphics, and a separate power rail to the memory controller. Let the motherboard manage the input voltages into the chip, and controlling each large segment is easy.  Now how about separating up each of the cores into its own voltage island, and being able to control them all individually. This design is more complex, requires more control, but ultimately can be used to save a lot of power – there is no need to fire up all the cores to 4.0 GHz when only one is active. The downside of this implementation is if the design needs a separate power rail for each core coming into the processor. This makes the motherboard side of the power regulation very complex and expensive, and potentially inefficient.

The next step in the design would be to create a single voltage rail going into processor, and sorting out the voltages inside the processor with regulators. It sounds complex to do in silicon because it is, but offers the best payoffs. AMD has done that last one.

The biggest upside to such a design is sending power where it needs to go while also keeping costs down and efficiency high. With all the effort in the design phase, it becomes a better platform when put into the field. In this slide, AMD shows how the processor keeps one core half-fed and another core ticking over during the graphics phase of 3DMark, while they issue kernel commands and do some basic physics, but when the physics phase kicks in the cores can each turbo individually while the GPU comes back down. Add in some fine-grained control and as long as it reacts quick enough, it should offer a power-efficient implementation.

AMD calls the first part of what they have done as ‘Synergistic Power Rail Sharing’, which basically means one power rail going into the processor. If that sounds familiar, Intel did it with Broadwell and currently does it on their high-end processors. Where Intel used a FIVR, or fully integrated voltage regulator with massive inductors (remember the cut out required on Broadwell motherboards for those inductors?), AMD is using a split VDD package rail and per-island linear low-dropout regulators (LDO) for each of the cores and each of the compute units. Every voltage island required gets an optimized LDO for its purpose, which doubles as a power gate when that portion of the processor can be turned off. This implementation allows for the motherboard to be simplified (lower cost), and the processor to have better control, at the expense of extra control circuitry.

When Intel introduced their FIVR implementation, they said that they found better efficiency using their big inductors and decided against the linear LDO regulators because they were inefficient at low power. We put that to Sam Naffziger, AMD’s top guy on power, and he responded that yes, as a percentage, the power efficiency at idle might be lower than expected – but the power consumption of an idle core while another is loaded is still a very tiny proportion. Sam stated that when the LDO is in complete power gate mode, it can be considered off and any residual power consumption is minimal, regardless of its actual efficiency. He said that they still worked hard on the LDO implementation for power efficiency anyway, to make sure everything still worked.  Overall, total current requirements were down 36%, which reduces the motherboard-side power regulation, leading to smaller, lighter, and potentially cooler designs.

AMD stated that with Intel’s new 8th Generation Kaby Lake-R mobile processors, with the increase in core count but the same TDP, it meant that the system had to drive more current, especially to hit the higher PL2 power state which had more than doubled over the 7th generation parts. The downside to having a single rail implementation, at least from a reviewer’s perspective, is that it now becomes harder to separate the CPU and the GPU for power monitoring.

With per-core voltage access, AMD is able to fine-tune the dynamic voltage/frequency scaling algorithms for each core as well as the GPU based on the external sensors, current loading, and available power. As long as threads are not jumping from core to core, AMD is able to identify the cores that are churning through the most work (e.g. during a game) and direct power to those cores using frequency arbiters for each core.

Race to Sleep

In the past we had the race to idle – the notion that if you applied extra power to finish a workload quicker, overall less energy was used when you compared the static (always there) and dynamic (on demand) energy of the system. Now we have a race-to-sleep: how quickly can the parts of the chip come in and out of sleep states in order to save power. If one element of the silicon gets a request every 50ms that takes 25ms to process, it has a 25ms window to potentially sleep – if it can’t get in and out of sleep in under 10ms, then there is no point turning it off.

The race to sleep is usually countered by offering a series of sleep states, with the nearest sleep states being quicker to enter/exit but offering less of a power reduction. With Ryzen Mobile, AMD is adding extra sleep states due to the use of the linear LDO regulators we discussed in the previous section.

With each core now in its own power island with its own LDO, each core can enter sleep states independently. In this case, AMD’s CC6 state powers off most of the core but keeps the L3 cache active in case another CPU uses it – it only takes 100 microseconds to enter/exit this CC6 state. When all the cores are in CC6, the regulators can also disable the L3 cache altogether for a CPUOFF state, giving better power reductions but now the entry/exist latency is around 1.5ms.

The same goes for the graphics: the LDO regulators can effectively power gate 95% of the GPU, including the compute units, the fixed function encoders/decoders, and potentially parts of the display pipeline. The uncore is still active however, in case other parts of the GPU need to use it.  When certain criteria are met, the graphics can enter a GFXOFF state, saving most of the power.

When CPUOFF and GFXOFF are both enabled, the system can fully implement VDDOFF, which disables most of the processor entirely. This sounds like a complete system shutdown, but enough of the display pipeline is active to still have a powered on state. AMD is quoting that when a system has a static Windows screen with nothing firing up the cores, the chip could be in this VDDOFF state up to 99% of the time.

Some of this power gating control comes through the Infinity Fabric, which consists of both data control and system control elements. As some of the regions of the processor still need to remain on to keep the system alive, even in VDDOFF mode, AMD has used the Infinity Fabric to separate the core into two different sorts of regions:

Type A: Can remain off during display refresh
Type B: Can become briefly active for display refresh

Because a 60 Hz panel will refresh every 16.6ms, certain parts of the SoC still need to ensure the frame buffer has data and keep it active. Obviously, if the data buffer needs updating then a lot more of the processor needs to fire up to do so, but this case is taking care more about static images on the display, and overall is an intersting approach that is something we more typically see in smartphone/tablet-focused processors.

With the two region types, the fewer Type B regions there are means the more power you can save by keeping the Type A regions turned off during display refreshes. In this case AMD uses a state machine to control the display buffer and keep control of the different regions, but here it shows that only the memory controller, display controller, and multimedia hub are in that Type-B for display refreshes, and the rest of the processor can remain in the lower power states.

This ultimately saves more power for laptops when it comes to battery life: how much of the time does a user spend on a laptop just looking/reading at a static screen? It is a very common use case.

Ultimately AMD is saying that with all the new power enhancements, they are expecting good improvements in battery life. With this slide, VP9 playback time is doubled (because now the GPU has a VP9 decoder); while something more comparable like 1080p H264 playback is boosted by 15%. That doesn’t sound like much, but it can mean an extra few minutes when you are running low trying to get something done in a high-pressure situation.

Sense Me Ryzen: Better Boost AMD Ryzen Mobile: Zen 1.5
Comments Locked

140 Comments

View All Comments

  • xemone - Thursday, October 26, 2017 - link

    This is impressive and I'm glad to see AMD chips that can finally compete with Intel in the low TDP range. I am however, disappointed LPDDR4 compatibility isn't included in the initial parts.

    But these are only the first two and there are more to come, so I'm hopeful we'll see chips that support power-sipping memory. Any 15W TDP chip intended for the the ultrathin mobile market should at least allow for LP-DRAM. Let's not forget Intel has opened up Thunderbolt 3 and made it royalty-free. Adding these two technologies to AMDs Infinity Fabric "interconnect" onboard Raven Ridge would allow manufacturers to build sleeker devices. Board space is at a serious premium and that often why its hard to find low power AMD chips in these premium thin and lights.

    Things are about to change!
  • sonichedgehog360@yahoo.com - Thursday, October 26, 2017 - link

    “If we look at processors from Intel that are 4C/8T, like the 35W Core i7-7700T, this scores 777 in our testing, which kind of drives away from AMD’s point here. AMD succeeds in touting that it has ‘desktop-class performance’ in a small power package, attempting to redefine its status as high performance. Part of me thinks at this level, it could be said that all the mobile processors in this range have ‘desktop-class performance’, so this is a case of AMD now catching up to the competition.“

    You just said that in Cinebench R15, AMD’s Ryzen 7 2700U achieved 707 at 15W and compare it to a 35W Intel product that achieved 777. But you call this catching up; I would call that blowing past the competition! That score is nearly double the performance per watt, considering that you just compared AMD’s product with 15W TDP with an Intel product with a 35W TDP.
  • sonichedgehog360@yahoo.com - Thursday, October 26, 2017 - link

    Looking more closely, a 15W Ryzen 7 2700U appears to fall right in line with an Intel Skull Canyon NUC’s 45W Intel Core i7-6700HQ in CPU performance and slightly outperforms it in GPU performance. Per the official AnandTech review, the Skull Canyon NUC got a Cinebench R15 ST/MT score of 148.24/711.04. Per NotebookCheck, its Iris Pro Graphics 580 achieves a score of 3510 in 3DMark 11 - Performance.
  • SaturnusDK - Thursday, October 26, 2017 - link

    I was similarly perplexed by the wording used here. How is more than double the performance per watt "catching up". The examples have the i7-7700T score 22.2 points per watt while the R7 2700U completely annihilates that by 47.1 point per watt. Seems to me that it is Intel that has a lot of catching up to do.
  • sonichedgehog360@yahoo.com - Thursday, October 26, 2017 - link

    It could be a combination of years of Intel having a lion’s share of the media mindshare (before Ryzen, for the longest time, the fact of the matter was that Intel was far and away the superior architecture) combined with the fact that there may have been very limited time given between receipt date and embargo time, giving way to more errors cropping up in a highly rushed journalism process.
  • extide - Friday, October 27, 2017 - link

    Yeah, but that 35W part could sustain that performance for a much longer time, if not indefinitely. The 15W AMD part (and likewise 15W Intel parts) will throttle down a fair bit after sustained use. According to AMD the R7 2700U drops to ~550 on cinebench after a 5-min loop. (Last slide on page 3)
  • SaturnusDK - Friday, October 27, 2017 - link

    It's possible it could sustain it for longer. We don't know that though. And even 550 points is still a massive performance per watt advantage to the AMD part. 770 points at 35W is 22 points per watt while 550 points (sustained) at 15W is 36.67 points per watt. A whooping 66.7% performance per watt advantage.
  • lilmoe - Thursday, October 26, 2017 - link

    I probably missed this, but any word on bulk pricing in comparison with Intel U series?

    Other than that, I'm pretty damn sure the 14nm LPP will shine at the 15w and lower power envelopes. This is where the power per watt comparisons matter for consumers. Same should be applicable to mobile Vega. I wonder if pairing the APU with a discreet mobile Vega part would have any advantages over an Intel/nVidia pair. Hopefully it would have better harmony and better switching drivers.

    I would also love to see benchmarks emphasising latency vs Intel speed shift. I just hate to admit Intel might have an advantage there.

    Too early to tell, but boy am I excited since what feels like ages.
  • Kamen75 - Thursday, October 26, 2017 - link

    Samsung's 14nm LPP process being leased by GF just doesn't do Ryzen and Vega much justice on high performance desktop parts. Given 14nm LPP's smartphone SoC heritage it sure does let these low power AMD designs shine though. I'm also anticipating great things from IBM's 7nm process so long as it isn't delayed for an extra year. Bring on the 4.5 - 6 watt fanless APU's.

    I too am excited for the first time in years.
  • lilmoe - Thursday, October 26, 2017 - link

    Fingers crossed for 6 or even 8 cores Zen2 and 14-16 CUs at 7nm, with higher max clocks for ST. HBM would be the icing on the cake. Throw a dGPU with twice or thrice the CUs and put your hand in my pocket and help yourself to my wallet AMD.

Log in

Don't have an account? Sign up now