HD 6990 4-way CrossFire (QuadFire) BeamNG benchmarks and quirks!

Discussion in 'Computer Hardware' started by EvilMcSheep, Mar 2, 2020.

  1. EvilMcSheep

    EvilMcSheep
    Expand Collapse

    Joined:
    Jan 5, 2016
    Messages:
    130
    Released in early 2011 as the world's fastest graphics card at an MSRP of $699, the AMD HD 6990 was certainly quite the beast - not only in performance, but also power consumption and noise! :D

    About the card
    Unusually by modern standards (though this approach was dominating the top-end, and even seen at the mid-range back then), the card got it's power by having two GPUs on a single PCB, working in CrossFire, effectively combining two graphics cards into one, for (theoretically) almost double the performance of the best single-GPU solutions!


    (8 more 256MB GDDR5 chips on the other side of the PCB, cooled by the backplate)

    TeraScale 3 architecture
    (Final version of TeraScale, first seen in the ATI HD 2000 series, 2007. The arch was replaced by "Graphics Core Next" - GCN (with HD 7000 series) in 2012, which is still in use to this day, with parts of GCN still being present in the latest RDNA (RX 5000 series) GPUs.
    Driver support for TeraScale ended in 2015, while it's still ongoing for GCN)


    1536 Stream Processors@830/880MHz
    (2.55/2.7 TFLOPS, similar to an RX 560 (or RX 570 for dual GPU), though this does NOT imply similar real-world performance, due to architectural and driver differences - but it's not necessarily TOO far in all cases either)

    2GB of GDDR5@1250MHz over a 256-bit bus
    (The card is often advertised as "4GB", which is technically accurate as the total amount of VRAM, but as CrossFire doesn't allow resource sharing, it's effectively a 2GB card for gaming)

    389mm² die, based on the TSMC 40nm process, with 2.64B transistors
    (For reference, a modern 7nm GPU, the RX 5700 (XT) fits almost 4x the transistors (10.3B) within only 251mm², a 6x improvement in density!)

    TDP of ~187/225W
    (Again, per GPU! The HD 6990 is a 375/450W card! :D
    Note that manufacturer-given TDP values have no industry standard formula, so only think of them as vague guides, and look for independent tests to find the actual power consumption! For example, Nvidia advertise the GTX 590 (their answer to the 6990) as a 365W card, despite it drawing slightly more power than the 6990 (at overall similar performance, depending on the game.
    According to testing by Guru3D on FurMark, the GTX 590 and HD 6990 are 340W and 331W cards respectively, which is kind of disappointing, really)

    Overall, it's the same specs as the HD 6970, just with a lower stock core and memory clock
    2x 1536 Stream Processors
    (830/880MHz)

    2x 2GB of GDDR5
    (1250MHz, 256bit bus)

    40nm, 2x 187/225W

    TeraScale 3 architecture
    (Driver support ended 2015)
    Full specs (and image) here:
    https://www.techpowerup.com/gpu-specs/radeon-hd-6990.c275
    (Assuming that you looked at either of the Spoilers, let's continue!)

    AUSUM!

    What's with the slashes under Specs? Bragging about an overclock?
    No no no no no!

    If you go back up and look to the right of the CrossFire finger (top-left connector on the PCB), you'll see a little switch!

    As you might expect, this is a BIOS switch!
    Out of the factory, the card comes configured for 375W, which (while still violating the 300W max by PCIe spec) is right within the power limits for the connectors (75W from slot + 2x150W from dual 8-pin)

    By flicking the "Antilles Unlocking Switch for Uber Mode", you enable 'violate-all-spec' mode, allowing power consumption up to 450W, and increasing the core clocks and voltage to match those of the HD 6970.

    While the card was certainly designed with this in mind, doing this voided the warranty, and the cards came with a sticker, warning the user about this.

    Mildly interesting trivia!
    Unlike earlier dual-GPU designs, this card uses a central blower fan, with opposing heatsinks, as opposed to sequential;

    While on earlier cards such as the HD 3870x2, you could see the 2nd GPU running 15+°C warmer than first due to already-heated air from the other GPU, both GPUs on the HD 6990 get fresh air!

    Unfortunately, this partially defeats the purpose of blower-style coolers, as half the heat would no longer be dumped straight out of the case, and as the 2nd GPU's air goes straight against the direction of typical case airflow, you could see how this can be inconvenient, and unappealing for system builders.

    My solution for this was to cut holes in the case for two stacked 180mm side intakes, and to use reversed case airflow (including reversing the CPU cooler, obviously), with front now being the exhaust.
    While the unfiltered 180s certainly suck in a lot of dust, it does actually work quite well! Can easily feel the nice, hot air coming from the lower front exhaust :)

    Also, side intake best intake, fight me!
    Unlike on the card's direct competition, the GTX 590, the HD 6990's full-size block of a shroud meant that if you were to place an another 6990 in the slot below, it would very much block off airflow for the fan;

    AMD's solution was to add two cheap, thick rubber bumpers on the top of the card, to prevent two HD 6990s from getting TOO friendly, and running out of breath.
    Peak double precision compute performance of this card, at 1:4 ratio, is very good indeed, at 637 GFLOPS (more than a 5700 XT!) per GPU, but of course, this simply does not universally translate to raw performance, but, as unlikely as it is, if your workload just happens to take good advantage of the old architecture of the card (seriously, good luck with that lol), the HD 6990 might just prove to be a great deal!

    The story/my experience
    For 1/7th of the original price for such a setup (~$200, instead of $1400), I recently bought two of these - largely because of being stuck in the past, and not because it was remotely good of an good idea :D
    2020-03-01183922.jpg
    (Excuse the potato-grade photography, I put in quite a lot of effort into staying period-correct, you see!)
    For full context for why the CPU cooler is reversed, and why the rear exhaust port is taped, see the "On the blower cooler design..." Spoiler above - in short, it's to go with stacked 180mm side intakes.
    Don't worry, it's a cheap thin-steel box, and nothing of real value has been hurt to make this work :D
    The random cable with the Molex connectors is there to deliver power to the 180s tied to the side panel, through the rear of the case. Yes, this could be gotten out of the way by using a simple cable extension. ...could.
    I see it as similar to a classic car - you don't buy it because it's practical, nor even necessarily exciting in it's driving characteristics, but more likely because you have some soft spot for it!

    These cards are only "classic" for their week-held status as THE FASTEST, but really, they're merely 'old' right now.

    Really, I just find the 6990s to be wonderfully ridiculous, and MUCH more interesting than the modern single-GPU alternatives at similar price (Vega 56, GTX 1070...).

    'But... performance is so bad!"
    Yes! And I don't care :D (ok, just a little)
    BeamNG is by far the most intensive game that I play, and I barely follow new releases, as I have a massive backlog of great older games on Steam to get through - more raw performance is fairly low priority.

    While both cards do coil whine depending on load/FPS, the frequency gets almost completely absorbed by headphones, fairly non-isolating ones at that!

    The cards stay quiet while not under major load, and when the fans do spin up to their 5500RPM, it soon fades into the background because of headphones and game sounds/music!

    The only real problem is the power bill, which certainly has grown, with the system idling at ~250W, and gaming at ~750W, in case of Beam :D

    Holy hell, that's 1500 words and nothing about BeamNG benchmarking yet!
    Typing walls of text with clicky switches is too fun! :D

    The fun part!
    First though...
    The semi-scientific test system!

    FX-9590@4.7GHz, 1.53V
    (Yes, this could definitely affect benchmark results, with it's weak single-threaded performance!

    The 9590 definitely fits in within the family of ridiculously hungry hardware of topic :D
    If it makes you more comfortable with something so irrational, think of it as an FX-8350 overclocked to 4.7, I got the 9590 for not much more than one of those would've cost)


    Arctic Freezer A32
    (A little $30 heatsink! With a 3000RPM fan and liquid metal! :D Originally bought it for my FX-6100, but later found that it's perfectly adequate for the 9590 under common/gaming loads, though it can reach 85°C while rendering, where all is good after clocking down to 4.6GHz)

    2x4GB HyperX Fury@1600MHz CL10

    (On the slow side, but FX CPUs see relatively minimal benefit from faster RAM, so it's fine.
    8GB is not enough for Italy, but that's basically it - seeing no other need to upgrade)

    ASUS 970 Pro/Gaming Aura
    (inb4 "REEEEEE 9590 on 970 board":
    The VRM is solid enough to push 400W through the CPU, it's fine lol.
    The 970 chipset does bring a slight concern though - with both PCIe x16 slots used, it only does 8 lanes per slot, which might have SOME negative effect on performance, though while trying dual-GPU Beam with and without the 2nd card installed, I could notice/measure no difference whatsoever)

    ADATA S40G 256GB NVMe drive
    (Similar in performance to a Samsung 960 Evo 250GB NVMe, though running at half-speed here due to the platform only supporting PCIe 2.0 (which is not relevant for graphics card testing, as those don't support anything higher)
    BeamNG-related files are all on this drive, while the OS and most other software is on other SSDs - drive performance should in no way hurt the benchmark results!)

    Windows 7
    (Rather be unsupported but stable, than using an OS under perpetual beta)

    Super Flower Leadex II Gold 1200W
    (Same platform as of EVGA's SuperNOVA G3 line, OEM's own product!
    These still offer some of the cleanest power on the market, and are overall great units, though I might end up replacing the stock 135mm ball bearing fan with one from be quiet!, as this can get on the loud side, largely due to a fairly stupid fan controller)


    and finally...
    Two HD 6990s on Uber BIOS, undervolted to 1.1V, from 1.175, Catalyst 15.7.1.
    (If you've been reading this ... well, article :D, you know enough about these now!
    Though something else mildly interesting to point out is that one of the cards is from Dell, and the BIOS switch on it was non-functional (maybe intentional, to avoid the need for the warning sticker, and general customer troubles), so I had to apply Uber BIOS by flash.
    While setting the values manually probably would've given the same exact results, the other BIOS likely allows for a higher power limit. Anyway, flashing the BIOS is better than relying on software for overclocking/voltage
    See, I can't shut up about these, they're awesome! :D)
    Keeping resolution at 1600x900, as that's the max where my monitor is happy to run "overclocked" at 72Hz, and ultimately, I did the testing mainly for personal use.
    Note that multi-GPU scaling is usually best with higher resolutions, so don't expect 1080p results to be much worse!

    While there isn't much difference between CrossFire modes when only using 2 GPUs, Quad got unhappy with settings other than "Optimize 1x1", so using that for all!

    I also tried SuperTile via RadeonPro, which is kind of nice, causing motion to look smoother!
    Unfortunately, it also amplifies stutters, so I'm not that keen on it.
    Interestingly, despite visibly different frame delivery, benchmark data showed no difference between the modes, so it averaged out!
    This difference becomes possible to show via frametime plot.

    I did a registry change to set the Flip Queue Size to 0, disabling pre-rendered frames, losing a measurable amount of FPS (<10%), and some consistency too.
    Why?
    Even though the driver default of 3 had been seemingly fine for many years, after trying out 0, I just can't go back, the added latency is too much!
    After first trying Audiosurf with it at 0, it felt like ... well, negative latency! :D It was magical!

    Similarly to the Flip Queue Size, while this setting definitely makes the motion look smoother, Frame Pacing adds a disgusting amount of latency, so keeping that off.
    High on most settings, as there wasn't much of a performance difference by going lower;

    Low textures due to VRAM limitation on the more complex maps;
    Low meshes for a big decrease in stutter;

    Dynamic reflections have massively varying amount of performance impact depending on the map, just kept them off for this. Note that to avoid flickering with multi-GPU setups, you need at least 2 faces per update!

    UI GPU acceleration disabled, due to flicker.
    PointLights as seen in the tube of Grid Map cause massive lag with multi-GPU enabled, so they were deleted on all maps, just to be safe.

    Similarly, bodies of water can cause major stutter/loss of FPS too, so all bodies of water were deleted on the tested maps. Though it's not all water that does this, not sure why, maybe something related to one of the settings, could look into this more if anyone's interested!
    All testing was done using MSI Afterburner's Benchmark feature!

    For every map and every configuration, I redrove the same route three times and calculated an 'average' run from the data, to use for the bars;

    While more accurate results may have been gotten by using Beam's Replays, driving by hand was more fun, and as you'll see, didn't bring any real inconsistency to the results.

    You can see the range of data by looking for the little blue bars - these start at the lowest result and end at highest!
    While testing of lows has become much more common than it used to be, a lot of people still don't understand them, so let's try to explain!

    If you were to take an average of the worst 1% of all data, that would be your 1% low result!
    Same idea with 0.1%, where it becomes much more sensitive.

    The idea with these is to detect stutter and overall inconsistent frame delivery;
    For example, you could be seeing an average FPS of well over 60FPS, but the experience can be perceived as much less smooth, due to said inconsistency;

    While somewhat lower than what one might say "the game feels like", the 1% low figure will generally give a value much closer to the actual 'feel' of the game than average FPS could!

    Being more sensitive, the 0.1% value will be hurt much more by any visible stutters - if this value is very low, you can expect there to have been visible stuttering during the run.

    Benching time! WOOOO!
    Let's start with something simple: Endurodrome - a simple-looking, but very well designed map of gradual destruction! (Meshes High for this one!)
    Simply doing a lap with the D15 Marauder, with a standing start, and ending the bench after crossing the line. (135-140s)
    Enduro.png

    Pretty good scaling here! Feels very nice with quad GPU, and certainly have space to further increase graphics settings there.

    As on all these tests, the worse of data came from when there were smoke/mud particles on the screen, so expect a better experience with cleaner driving!

    I'd be curious to see how a triple-GPU setup would do with these, as that's often said to be the multi-GPU sweetspot, but as the AMD drivers only allow either disabling CrossFire with the 2nd card entirely, or only using one GPU, that is not convenient to test with this setup of dual-GPU cards, though likely not impossible.

    Interestingly, you can see over 100% average FPS scaling with dual GPU, and this is not the only place you'll see this behavior with this setup!
    My HD 3870x2 and 4870x2 used to scale like this too, and I'm not confident with any personal theory for why it behaves like this, guess it's just some quirk of T3D? Any ideas?


    Alright, next up: JRI!
    This used to be one of the more intensive maps in the game, but now running it seems easy, as long as you have 4 GPUs, at least. Why not add that to minimum requirements? :D

    Using the Moonhawk V8 Sport manual here, a car that's very fun and easy to drive consistently.
    The route here goes around the volcano in the center of the map, one of the more intensive areas (80-85s)
    JRINormal.png
    Again, we see the over 100% scaling...

    Here, the need for testing of FPS lows becomes obvious - while dual GPU average certainly looks very impressive compared to single, it doesn't actually feel that much smoother to play.

    Meanwhile, going quad improves the experience massively, though with the occasional stutter still visible, but certainly nothing to ruin the experience!


    And now, to finish off: WCUSA!
    Again, using the Moonhawk, and doing a run along the side of the map, starting from the very north-east corner, just following the roads until reaching the bus stops near the police station, right after the jump, in the urban area.
    (all runs very close to 120s, got very consistent at this one! :))
    WCUSA.png
    Here, the multi-GPU setup kind of falls apart, with very minimal improvement to lows;

    While quad GPU feels very smooth indeed for brief moments, the game generally doesn't look much smoother than on single GPU;

    YES, it may display around 80FPS in the corner of the screen in just about any part of the map, but I honestly think that consistently delivered 30FPS would be much more pleasant to play with;

    In general, it really does feel like ~24 no matter the configuration on this map, even though the shiny red bars imply excellent scaling indeed - perfect for marketing! :D


    Thanks for the read!
    Hehe, maybe got a little carried away with this... it was fun! :D
     
    #1 EvilMcSheep, Mar 2, 2020
    Last edited: Mar 2, 2020
    • Like Like x 2
  2. PriusRepellent

    PriusRepellent
    Expand Collapse

    Joined:
    Mar 19, 2018
    Messages:
    353
    This makes me wonder how well my 5700 XT would do if I added a second one and enabled CrossFire. As for you doing a triple, you can't. Each 6990 is already running crossfire of two GPUs. No more than 4 GPUs are supported in CrossFire configuration.

    Also, my single 5700 XT is delivering a pretty smooth experience at 4K. Your resolution is a lot lower. That GPU is REALLY showing its age, and I could not recommend any build to use them over something newer.

    As for building a PC for vintage gaming, HD 6990 is too new. For that, I'd rather use something that has Windows 98 SE support (for maximum DOS compatibility). Voodoo cards come to mind.
     
    #2 PriusRepellent, Mar 6, 2020
    Last edited: Mar 6, 2020
    • Like Like x 1
  3. EvilMcSheep

    EvilMcSheep
    Expand Collapse

    Joined:
    Jan 5, 2016
    Messages:
    130
    I don't think you can, AMD dropped CrossFire with Navi.
    (There's mGPU though, for DX12/Vulkan!)
    I meant disabling one of the 4 GPUs and seeing where the performance ends up, not adding another card :D You physically couldn't connect another one, as these cards only have one CF finger.
    Again, I don't think that performance would be much worse at 1080, going from 720p to 900p was only ~15% performance loss, despite 50% more pixels, I'm only using this resolution to get a higher refresh rate out of my monitor (72Hz), so it is somewhat misleading.
    I could do some runs at 1080 to test this, if you're interested!
    For sure! While messing with the hardware itself is very fun, this is absolutely not a practical system! :D
    Yeah, again, these are 'just old' at this point, though I'd definitely consider them to be future classics.
    I have quite the Steam library of games released ~2007-2013 to catch up on (still haven't played Crysis! Madness!), and that's about as "vintage" as I'm trying to be with this lol
     
    #3 EvilMcSheep, Mar 7, 2020
    Last edited: Mar 7, 2020
    • Like Like x 1
  4. PriusRepellent

    PriusRepellent
    Expand Collapse

    Joined:
    Mar 19, 2018
    Messages:
    353
    That makes me want to do it even more. Some 5700s have been reported to work with Crossfire despite not having official support from AMD. Even if not, I can always use the extra card for compute.
     
    • Like Like x 1
  5. EvilMcSheep

    EvilMcSheep
    Expand Collapse

    Joined:
    Jan 5, 2016
    Messages:
    130
    Interesting! Would like to see how modern cards scale too! :)
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice