Lucid's Virtu Enables Simultaneous Integrated/Discrete GPU on Sandy Bridge Platformsby Anand Lal Shimpi on February 28, 2011 10:38 PM EST
- Posted in
- Sandy Bridge
We first met LucidLogix (now just Lucid) 2.5 years ago at IDF. The promise was vendor-agnostic multi-GPU setups with perfect performance scaling. The technology was announced at a very important time. Intel and NVIDIA were battling out support for SLI on Nehalem motherboards. NVIDIA didn't want SLI enabled on any non-NVIDIA chipsets, and Intel wasn't about to let NVIDIA build any chipsets for Nehalem. Lucid's Hydra technology seemed to be exactly what we needed to get around the legal holdup that kept Nehalem users from enjoying SLI.
Three things made Lucid's technology less interesting as time went on. Hydra took two years to come to market, NVIDIA enabled SLI on Intel platforms and single GPU performance got really, really good.
What made Lucid's Hydra tech possible was a software layer that intercepted OpenGL and DirectX calls from the CPU and directed them to a GPU of Lucid's choosing. While Hydra saw limited success, parts of the technology had another application.
Sandy Bridge's Platform Issues
Although we came away impressed by Intel's Sandy Bridge CPU and GPU, it was the platform that really let us down. SATA controller errata aside, Intel's 6-series chipset lineup had a huge problem. At launch the P67 was the only chipset that supported CPU overclocking, however P67 doesn't support SNB's on-die GPU. Enter the H67 chipset, which does support processor graphics but it doesn't support overclocking. It gets worse.
One of the biggest features Sandy Bridge has to offer is the support for hardware assisted video transcoding (Quick Sync). In our review we found Intel's Quick Sync to be the absolute best way to transcode video for use on portable devices. There's just one issue: Quick Sync only works when the on-die GPU is active.
If you pair Sandy Bridge with a discrete GPU on the desktop, you lose the ability to use one of the CPU's biggest features.
Intel will address the overclocking/processor graphics exclusion through the upcoming Z68 chipset, however that doesn't solve the problem of not being able to use Quick Sync if you have a discrete GPU installed. Intel originally suggested using multiple monitors with one hooked up to the motherboard's video out and the other hooked up to your discrete GPU to maintain Quick Sync support, however that's hardly elegant. At CES this year we were shown a better alternative from none other than Lucid.
Remember the basis of how Hydra worked: intercept API calls and dynamically load balance them across multiple GPUs. In the case of Sandy Bridge, we don't need load balancing - we just need to send games to a discrete GPU and video decoding/encoding to the processor's GPU. This is what Lucid's latest technology Virtu, does.
The name Virtu is short for GPU Virtualization and the setup is pretty simple at a high level.
Start with a platform that supports Sandy Bridge's processor graphics (H6x or Z68) and connect your display to the motherboard's video out. Add in a supported discrete GPU, supply power but don't connect your monitor to it.
Virtu behaves a lot like Hydra. It intercepts API calls and passes them along to a GPU of its choosing. Unlike Hydra however, the goal here isn't to spread the load across multiple GPUs. Instead, Virtu aims to match each task with the GPU best suited to it.
Video output is handled by SNB's GPU, data is simply copied from the dGPU's frame buffer to the iGPU's frame buffer for output. There should be some overhead in this process however Lucid claims it's minimal.
What we end up with is a system that should run all 3D games on your discrete GPU, and run all video decoding and encoding on SNB's GPU. Since this isn't switchable graphics but rather a form of GPU virtualization you can actually run iGPU and dGPU applications at the same time (e.g. you can watch a movie in one window on the iGPU and play a game in another on the dGPU).
Virtu relies on profiles and hard coded GPU support. Currently there are around 100 games/benchmarks that are supported by Virtu. Eventually you'll be able to manually add your own titles but for now we have to rely on what Lucid has validated and enabled. GPU support is broad but limited to anything from the AMD 4xxx, 5xxx and 6xxx series as well as the NVIDIA 2xx, 4xx and 5xx series. Lucid pledges to always ensure the top games are tested/supported as well as the previous two generations of AMD and NVIDIA GPUs.
The Virtu software will be bundled with motherboards. The business arrangements will take place between the motherboard manufacturers and Lucid itself, the end user shouldn't have to worry about licensing the software.
Lucid gave us a copy of the software it shared with motherboard manufacturers: a Virtu release candidate. The software is still not mass production and there are some limits (e.g. can't define our own game profiles, there's a Virtu logo plastered randomly on the screen when you're gaming) but it's enough to give us a brief look at the technology.
Installing Virtu was very simple. Just go through the installer application, reboot and you're good to go. The only requirements are that you're using a compatible video card and that your display is connected to the SNB video out and not the discrete GPU.
Once loaded the first thing I noticed was AMD's Catalyst Control Center and NVIDIA's control panel refused to load. As far as they were concerned, I was running an Intel HD 3000 GPU and they weren't needed. The appropriate AMD and NVIDIA drivers did load however.
Other than the irate control panels, the rest of the experience was completely seamless. I ran games, browsed the web and even transcoded a video - each application behaved as if the only GPU available was the one best suited for the task. Quick Sync even came up as an option under Arcsoft's Media Converter 7.
I measured performance with Virtu and natively off of the dGPU itself in four games to see how much overhead the frame buffer copying and Virtu interception posed:
|AMD Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality|
|Civilization V||DiRT 2||Metro 2033||World of Warcraft|
|AMD Radeon HD 6970||39.6 fps||76.4 fps||34.7 fps||111.5 fps|
|AMD Radeon HD 6970 (Virtu)||36.5 fps||74.4 fps||32.3 fps||102.8 fps|
|NVIDIA Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality|
|Civilization V||DiRT 2||Metro 2033||World of Warcraft|
|NVIDIA GeForce GTX 460||38.8 fps||69.4 fps||18.7 fps||85.4 fps|
|NVIDIA GeForce GTX 460 (Virtu)||35.8 fps||48.0 fps||18.0 fps||79.7 fps|
I generally saw a 2 - 8% drop in performance compared to a standalone discrete GPU without Virtu. The only exception was a big 30% drop on the GeForce GTX 460 running the DiRT 2 benchmark. Given the relatively consistent performance everywhere else, I'm guessing this is an early-software-artifact rather than a normal occurrence.
I also ran a Quick Sync test both with and without a discrete GPU attached - performance remained unchanged:
|Lucid Virtu Performance Impact|
|Quick Sync Nikon D7000 (1080p24) to iPhone 4|
|AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu)||199.3 fps|
|Intel HD Graphics 3000||199.3 fps|
Finally I decided to run a Quick Sync test while I ran our Metro 2033 benchmark to see how running two tasks, each on an independent GPU, impacted each other:
|Lucid Virtu Performance Impact (Metro 2033 + Quick Sync)|
|Quick Sync Nikon D7000 (1080p24) to iPhone 4||Metro 2033 Benchmark|
|Peak Theoretical Performance||199.3 f[s||36.5 fps|
|AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu)||72.0 fps||32.1 fps|
While Metro didn't lose much performance, the Quick Sync task ran considerably slower. Remember that the Quick Sync engine shares resources with the Sandy Bridge CPU cores (mainly the ring bus and L3 cache). Having the CPU working on feeding the dGPU vertex data definitely impacts Quick Sync performance.
Finally I measured power consumption:
|Lucid Virtu Power Consumption|
|Idle||Load (Metro 2033)|
|Intel HD Graphics 3000||34.7W||N/A|
|AMD Radeon HD 6970 (Virtu)||126W||265W|
|NVIDIA GeForce GTX 460 (Virtu)||52.0W||191W|
Here we see that there are still some kinks that need to be worked out. With the Radeon HD 6970 idle power is still quite high, even with the dGPU idle. The GeForce GTX 460 paints a different picture as Lucid manages to mostly power down the NVIDIA GPU when it's not in use. Note that even in this case there's a power penalty over a purely integrated setup - the dGPU is still active to a certain extent.
Intel is slowly correcting the issues with the Sandy Bridge platform situation. The first B3 stepping 6-series chipsets are now in the hands of OEMs and motherboard manufacturers and Z68 boards are coming in the next quarter. Lucid's Virtu is a key part of the strategy however, at least on the desktop. In mobile it's a non-issue as everyone supports some form of switchable graphics there, but for desktops we need a universal solution. While the Virtu release candidate still needs some work, it's far more polished than I expected it to be.
Once setup there's no user intervention necessary - the software just works. Fire up a game and it'll run on your discrete GPU. Visit YouTube or transcode a video and your discrete GPU powers down leaving Sandy Bridge's on-die graphics to handle the workload.
There is definite overhead to Virtu - I measured 2 - 8% on average, however I did see a 30% figure pop up in DiRT 2 on NVIDIA hardware. I'd expect the performance hit to be less than 10% in most cases.
Board makers and OEMs should have their hands on the RC of Virtu now, meaning we should see it show up in motherboard boxes in the not too distant future. Of course this still doesn't take care of those users who wish to overclock their CPU, pair it with a discrete GPU and use Quick Sync as well. We'll have to wait until Z68 for that to happen. Even then, Lucid's Virtu will still likely play a role in those systems.
Post Your CommentPlease log in or sign up to comment.
View All Comments
DanNeely - Tuesday, March 1, 2011 - linkThis is being pitched as something to be bundled by the mobo vendor. Does this mean people who bought SNB prior to this coming out will be sunk unless they buy a new board?
haplo602 - Tuesday, March 1, 2011 - linkSo finaly somebody reimplemented what voodoo graphics chips (1 and 2) were doing back then (and TV tunner cards). Copying the framebuffer to a specified region where it was displayed by the primary card.
What's the big deal ? I mean Microsoft should have this figured out already and implemented in the OS. Why does it take an ISV to kick the big players ass to implement usable technologies ?
Intel made a big mistake with their QuickSync implementation (only usable on laptops and very low end desktops). But given the SB issues on Linux, I guess Intel did not think things through.
I am waiting on a homogenous implementation of various computation units with a NUMA like architecture. I hope AMD will make this possible with their Llano parts and the integrated GPU will simply be another coprocessor when not hooked to a display.
Stargozer - Tuesday, March 1, 2011 - linkIn addition to the reduced frame rates, is there any latency added by using Virtu?
How are the minimum frame rates affected (it seems possible that all the shuffling might be more of a limitation in a high-stress scenario)?
Figaro56 - Tuesday, March 1, 2011 - linkIf Intel is so great why don't they fix this retarded situation? For the marginal performance benefit you get form Sandy Bridge I think I'll just stick with my AMD chipset and motherbaord. I don't have to mess around with gay things like this to benefit.
fic2 - Tuesday, March 1, 2011 - linkWouldn't it be easier just to make a dongle that plugged into the iGPU video port and acted like it was a monitor?
hechacker1 - Tuesday, March 1, 2011 - linkWhat happens if you are interested in DXVA? Which GPU does the decoding?
I'm also curious if there is any latency issues for gaming and video playback, with the frame-buffer being copied from the discrete GPU to the Intel GPU.
billythefisherman - Tuesday, March 1, 2011 - linkIf vritu is simply copying the frame buffer over what exactly is turning off the iGPU normally? If its simply the port then surely there's a simple way to turn the SB iGPU on to aid in direct compute or open cl acceleration. Surely this can be controlled by a very simple firmware update provided by a mobo manufacturer?
jmunjr - Tuesday, March 1, 2011 - linkSo Virtu's only real purpose is to allow Quicksync on systems with discrete GPUs? This is moronic...
fic2 - Wednesday, March 2, 2011 - linkHence my previous suggestion that it would be easier just to create a "monitor" dongle that plugged into the iGPU port and acted like a monitor. This would enable the QuickSync feature without actually having a monitor.
Of course, the obvious solution is for Intel to fix their d@mn implementation so that there doesn't have to be a monitor plugged in.
strikeback03 - Wednesday, March 2, 2011 - linkSupposedly it could also power down the dGPU when not needed, saving energy. That said, it seems Intel should be able to enable the iGPU portion of the chip when desired for transcoding, maybe a firmware update.