Google's Tensor inside of Pixel 6, Pixel 6 Pro: A Look into Performance & Efficiency
by Andrei Frumusanu on November 2, 2021 8:00 AM EST- Posted in
- Mobile
- Smartphones
- SoCs
- Pixel 6
- Pixel 6 Pro
- Google Tensor
CPU Performance & Power
On the CPU side of things, the Tensor SoC, as we discussed, does have some larger configuration differences to what we’ve seen on the Exynos 2100, and is actually more similar to the Snapdragon 888 in that regard, at least from the view of a single Cortex-X1 cores. Having double the L2 cache, however being clocked 3.7%, or 110MHz lower, the Tensor and the Exynos should perform somewhat similarly, but dependent on the workload. The Snapdragon 888 showcases much better memory latency, so let’s also see if that actually plays out as such in the workloads.
In the individual subtests in the SPEC suite, the Tensor fares well and at first glance isn’t all too different from the other two competitor SoCs, albeit there are changes, and there are some oddities in the performance metrics.
Pure memory latency workloads as expected seem to be a weakness of the chip (within what one call weakness given the small differences between the different chips). 505.mcf_r falls behind the Exynos 2100 by a small amount, the doubled L2 cache should have made more of a difference here in my expectations, also 502.gcc_r should have seen larger benefits but they fail to materialise. 519.lbm_r is bandwidth hungry and here it seems the chip does have a slight advantage, but power is still extremely high and pretty much in line with the Exynos 2100, quite higher than the Snapdragon 888.
531.deepsjeng is extremely low – I’ve seen this behaviour in another SoC, the Dimensity 1200 inside the Xiaomi 11T, and this was due to the memory controllers and DRAM running slower than intended. I think we’re seeing the same characteristic here with the Tensor as its way of controlling the memory controller frequency via CPU memory stall counters doesn’t seem to be working well in this workload. 557.xz_r is also below expectations, being 18% slower than the Snapdragon 888, and ending up using also more energy than both Exynos and Snapdragon. I remember ex-Arm’s Mike Filippo once saying that every single clock cycle the core is wasting on waiting on memory has bad effects on performance and efficiency and it seems that’s what’s happening here with the Tensor and the way it controls memory.
In more execution bound workloads, in the int suite the Tensor does well in 525.x264 which I think is due to the larger L2. On the FP suite, we’re seeing some weird results, especially on the power side. 511.povray appears to be using a non-significant amount lesser power than the Exynos 2100 even though performance is identical. 538.imagick also shows much less power usage on the part of the Tensor, at similar performance. Povray might benefit from the larger L2 and lower operating frequency (less voltage, more efficiency), but I can’t really explain the imagick result – in general the Tensor SoC uses quite less power in all the FP workloads compared to the Exynos, while this difference isn’t as great in the INT workloads. Possibly the X1 cores have some better physical implementation on the Tensor chip which reduces the FP power.
In the aggregate scores, the Tensor / GS101 lands slightly worse in performance than the Exynos 2100, and lags behind the Snapdragon 888 by a more notable 12.2% margin, all whilst consuming 13.8% more energy due to completing the task slower. The performance deficit against the Snapdragon should really only be 1.4% - or a 40MHz difference, so I’m attributing the loss here just to the way Google runs their memory, or maybe also to possible real latency disadvantages of the SoC fabric. In SPECfp, which is more memory bandwidth sensitive (at least in the full suite, less so in our C/C++ subset), the Tensor SoC roughly matches the Snapdragon and Exynos in performance, while power and efficiency is closer to the Snapdragon, using 11.5% less power than the Exynos, and thus being more efficient here.
One issue that I encountered with the Tensor, that marks it being extremely similar in behaviour to the Exynos 2100, is throttling on the X1 cores. Notably, the Exynos chip had issues running its cores at their peak freq in active cooling under room temperature (~23°C) – the Snapdragon 888 had no such issues. I’m seeing similar behaviour on the Google Tensor’s X1 cores, albeit not as severe. The phone notably required sub-ambient cooling (I tested at 11°C) to get sustained peak frequencies, scoring 5-9% better, particularly on the FP subtests.
I’m skipping over the detailed A76 and A55 subscores of the Tensor as it’s not that interesting, however the aggregate scores are something we must discuss. As alluded to in the introduction, Google’s choice of using an A76 in the chip seemed extremely hard to justify, and the practical results we’re seeing the testing pretty much confirm our bad expectations of this CPU. The Tensor is running the A76 at 2.25GHz. The most similar data-point in the chart is the 2.5GHz A76 cores of the Exynos 990 – we have to remember this was an 7LPP SoC while the Tensor is a 5LPE design like the Eynos 2100 and Snapdraogn 888.
The Tensor’s A76 ends up more efficient than the Exynos 990’s – would would hope this to be the case, however when looking at the Snapdragon 888’s A78 cores which perform a whopping 46% better while using less energy to do so, it makes the Tensor’s A76 mid-cores look extremely bad. The IPC difference between the two chips is indeed around 34%, which is in line with the microarchitectural gap between the A76 and A78. The Tensor’s cores use a little bit less absolute power, but if this was Google top priority, they could have simply clocked a hypothetical A78 lower as well, and still ended up with a more performant and more efficient CPU setup. All in all, we didn’t understand why Google chose A76’s, as all the results end up expectedly bad, with the only explanation simply being that maybe Google just didn’t have a choice here, and just took whatever Samsung could implement.
On the side of the Cortex-A55 cores, things also aren’t looking fantastic for the Tensor SoC. The cores do end up performing the equally clocked A55’s of the Snapdragon 888 by 11% - maybe due to the faster L3 access, or access to the chip’s SLC, however efficiency here just isn’t good, as it uses almost double the power, and is more characteristic of the higher power levels of the Exynos chips’ A55 cores. It’s here where I come back to say that what makes a SoC from one vendor different to the SoC from another is the very foundations and fabric design - for the low-power A55 cores of the Tensor, the architecture of the SoC encounters the same issues of being overshadowed in system power, same as we see on Exynos chips, ending up in power efficiency that’s actually quite worse than the same chips own A76 cores, and much worse than the Snapdragon 888. MediaTek’s Dimensity 1200 even goes further in operating their chip in seemingly the most efficient way for their A55 cores, not to mention Apple’s SoCs.
While we don’t run multi-threaded SPEC on phones, we can revert back to GeekBench 5 which serves the purpose very well.
Although the Google Tensor has double as many X1 cores as the other Android SoCs, the fact that the Cortex-A76 cores underperform by such a larger degree the middle cores of the competition, means that the total sum of MT performance of the chip is lesser than that of the competition.
Overall, the Google Tensor’s CPU setup, performance, and efficiency is a mixed bag. The two X1 cores of the chip end up slightly slower than the competition, and efficiency is most of the time in line with the Exynos 2100’s X1 cores – sometimes keeping up with the Snapdragon 888 in some workloads. The Cortex-A76 middle cores of the chip in my view make no sense, as their performance and energy efficiency just aren’t up to date with 2021 designs. Finally, the A55 behavioural characteristic showcases that this chip is very much related to Samsung’s Exynos SoCs, falling behind in efficiency compared to how Qualcomm or MediaTek are able to operate their SoCs.
108 Comments
View All Comments
Alistair - Tuesday, November 2, 2021 - link
It's very irritating how slow Android SOCs are. I'll just keep on waiting. Won't give up my existing Android phone until actual performance improvements arrive. Hopefully Samsung x AMD will make a difference next year.Speedfriend - Thursday, November 4, 2021 - link
Looking at the excellent battery life of the iPhone 13 (which I am currently waiting for as my work phone) does iPhone till kill suspend background tasks. When I used to day trade, my iPhone would stop prices updating in the background, very annoying when I would flick to the app to check prices and unwittingly see prices hours old.ksec - Tuesday, November 2, 2021 - link
Av1 hardware decoder having design problem again?Where have I heard of this before?
Peskarik - Tuesday, November 2, 2021 - link
preplanned obsolescencetuxRoller - Tuesday, November 2, 2021 - link
I wonder if Google is using the panfrost open source driver for Mali? That might account for some of the performance issues.TheinsanegamerN - Tuesday, November 2, 2021 - link
Seems to me based on thermals that the pixel 6/pro suffer from thermal throttling, and thus have power power budgets, then they should have given the internal hardware, leading to poor results.Makes me wonder what one of these chips could do in a better designed chassis.
name99 - Tuesday, November 2, 2021 - link
I'd like to ask a question that's not rooted in any particular company, whether it's x86, Google, or Apple, namely: how different *really* are all these AI acceleration tools, and what sort of timelines can we expect for what?Here are the kinda use cases I'm aware of:
For vision we have
- various photo improvement stuff (deblur, bokeh, night vision etc). Works at a level people consider OK, getting better every year.
Presumably the next step is similar improvement applied to video.
- recognition. Objects, OCR. I'd say the Apple stuff is "acceptable". The OCR is genuinely useful (eg search for "covid" will bring up a scan of my covid card without me ever having tagged it or whatever), and the object recognition gets better every year. Basics like "cat" or person recognition work well, the newest stuff (like recognizing plant species) seems to be accurate, but the current UI is idiotic and needs to be fixed (irrelevant for our purposes).
On the one hand, you can say Google has had this for years. On the other hand my practical experience with Google Lens and recognition is that the app has been through so many rounds of "it's on iOS, no it isn't; it's available in the browser, no it isn't" that I've lost all interest in trying to figure out where it now lives when I want that sort of functionality. So I've no idea whether it's better than Apple along any important dimensions.
For audio we have
- speech recognition, and speech synth. Both of these have been moved over the years from Apple servers to Apple HW, and honestly both are now remarkably good. The only time speech recognition serves me poorly is when there is a mic issue (like my watch is covered by something, or I'm using the mic associated with my car head unit, not the iPhone mic).
You only realize how impressive this is when you hear voice synth from older platforms, like the last time I used Tesla maybe 3 yrs ago the voice synth was noticeably more grating and "synthetic" than Apple. I assume Google is at essentially Apple level -- less HW and worse mics to throw at the problem, but probably better models.
- maybe there's some AI now powering Shazam? Regardless it always worked well, but gets better and faster every year.
For misc we have
- various pose/motion recognition stuff. Apple does this for recognizing types of exercises, or handwashing, and it works fine. I don't know if Google does anything similar. It does need a watch. Not clear how much further this can go. You can fantasize about weird gesture UIs, but I'm not sure the world cares.
- AI-powered keyboards. In the case of Apple this seems an utter disaster. They've been at it for years, it seems no better now with 100x the HW than it was five years ago, and I think everyone hates it. Not sure what's going on here.
Maybe it's just a bad UI for indicating that the "recognition" is tentative and may be revised as you go further?
Maybe the model is (not quite, but almost entirely) single-word based rather than grammar and semantic based?
Maybe the model simply does not learn, ever, from how I write?
Maybe the model is too much trained by the actual writing of cretins and illiterates, and tries to force my language down to that level?
Regardless, it's just terrible.
What's this like in Google world? no "AI"-powered keyboards?, or they exist and are hated? or they exist and work really well?
Finally we have language.
Translation seems to have crossed into "good enough" territory. I just compared Chinese->English for both Apple and Google and while both were good enough, neither was yet at fluent level. (Honestly I was impressed at the Apple quality which I rate as notably better than Google -- not what I expected!)
I've not yet had occasion to test Apple in translating images; when I tried this with Google, last time maybe 4 yrs ago, it worked but pretty terribly. The translation itself kept changing, like there was no intelligence being applied to use the "persistence" fact that the image was always of the same sign or item in a shop or whatever; and the presentation of the image, trying to overlay the original text and match font/size/style was so hit or miss as to be distracting.
Beyond translation we have semantic tasks (most obviously in the form of asking Siri/Google "knowledge" questions). I'm not interested in "which is a more useful assistant" type comparisons, rather which does a better job of faking semantic knowledge. Anecdotally Google is far ahead here, Alexa somewhat behind, and Apple even worse than Alexa; but I'm not sure those "rate the assistant" tests really get at what I am after. I'm more interested in the sorts of tests where you feed the AI a little story then ask it "common sense" questions, or related tasks like smart text summarization. At this level of language sophistication, everybody seems to be hopeless apart from huge experimental models.
So to recalibrate:
Google (and Apple, and QC) are putting lots of AI compute onto their SoCs. Where is it used, and how does it help?
Vision and video are, I think clear answers and we know what's happening there.
Audio (recognition and synth) are less clear because it's not as clear what's done locally and what's shipped off to a server. But quality has clearly become a lot better, and at least some of that I think happens locally.
Translation I'm extremely unclear how much happens locally vs remotely.
And semantics/content/language (even at just the basic smart secretary level) seems hopeless, nothing like intelligent summaries of piles of text, or actually useful understanding of my interests. Recommendation systems, for example, seem utterly hopeless, no matter the field or the company.
So, eg, we have Tensor with the ability to run a small BERT-style model at higher performance than anyone else. Do we have ways today in which that is used? Ways in which it will be used in future that aren't gimmicks? (For example there was supposed to be that thing with Google answering the phone and taking orders or whatever it was doing, but that seems to have vanished without a trace.)
As I said, none of this is supposed to be confrontational. I just want a feel for various aspects of the landscape today -- who's good at what? are certain skills limited by lack of inference or by model size? what are surprising successes and failures?
dotjaz - Tuesday, November 2, 2021 - link
" but I do think it’s likely that at the time of design of the chip, Samsung didn’t have newer IP ready for integration"Come on. Even A77 was ready wayyyy before G78 and X1, how is it even remotely possible to have A76 not by choice?
Andrei Frumusanu - Wednesday, November 3, 2021 - link
Samsung never used A77.anonym - Sunday, November 7, 2021 - link
Exynos 980 uses Cortex-A77