Roger Wilco Jr wrote:I have a Windows 7 system playing Horizons 64-bit. I haven't upgraded anything, other than the game, in months. 4790K, 16GB, 2x GTX970 SLI, SSD, TrackIR, Voice Attack. It usually happens when I'm in a RES, but I'm in a RES most of the time. The screen will suddenly freeze and the sound will continue. Then the screen usually goes black and I can't do anything but hit the reset button.
That sounds like it could be I/O wait, which I've seen over the years, usually disk writes. It either recovers in time for the GPU buffers to not go stale, or it doesn't, and the GPU buffers go blank. But only event logs will confirm, and it's not a default setup.
► Show Spoiler
I've run into it when I've done 1080p@60Hz nVidia Shadow Capture to my 4-disk Intel RAID-5. It's slow at writes because Software RAID (which mainboard RAID is) uses the CPU system interconnect to do the SIMD XOR operations for parity. I.e., the CPU might do XORs fast, but the Matrix LOAD/STO through the system interconnect to the CPU is not "disk-speed." Although it eventually recovers before the GPU buffers were stale. Switching to the NAND SSD solved my issue, although I'm using a 1TB device, and only using 200GiB. I.e., I have a 5x undercommit.
In your case, you have a NAND. Understand as a commodity NAND ages, it will remap more and more of its 32-512KiB cells, slowing things down to the point the DRAM (memory) included in the device cannot mitigate the performance impact. NAND cells themselves are extremely slow at writes (as slow as a platter is at sequential writes), while extremely fast at reads. If you are heavily committed to your NAND device -- e.g., >>70% (let alone >>85%), expect it to slow down faster.
Roger Wilco Jr wrote: Eh, I'm not much of a Windows guru. The most I could make out was a critical error about kernal power, apparently related to me using the reset button.
So we're interested in the logs just before that.
I.e., you should export the Event log and send it to one of us. Some of us are long-standing MCPs (from the original NT 3.1 days) and know NT internals in and out.
► Show Spoiler
I wish Windows was setup, by default, to use Performance Monitor to record -- let alone had a way to easily capture and snapshot info, especially "sanitizing" it so it doesn't include site-specific data. At most, you can just see it in real-time by default, but it doesn't operate like most POSIX systems' sysstat (although advanced storage I/O statistics still have to be post-configured), let alone just reporting the default load averages (which records I/O wait). Plus there's really no, included equivalent to sosreport (that gives us everything we need to figure things out), because Microsoft has its ecosystem of 3rd party vendors who have their VAR and tools (with plenty of markup).
AndyB wrote:Btw, do you know if it's true that windows update may try to force a change to Windows 10? I think I read that on this board, and shut off updates.
In Windows update,
disable/hide KB3035583. If it's already been installed (stupid Upgrade applet), uninstall it, then tell it to hide/disable it. The main complaint with KB3035583 is that, once installed, it downloads the 6GiB of updates ... without telling you, if you have "Download Updates but let me Choose." That is wholly irresponsible of Microsoft.
► Show Spoiler
I ran into this issue myself ... over a tether and ate up all my download for month. Utterly pissed me off, since I don't run Windows much at all, but had to boot to edit something in MS Office Pro 2013 from a client. We MCPs openly bitched to Microsoft, and Microsoft shut off the "logic" for Enterprise and Government customers, for a time. Then they silently re-enabled it for those customers, and stuck their fingers in their ears.
But it's also par for the course with Microsoft.
My favorite in the last few years was them blacklisting unmaintained drivers in XP, causing systems to blue screen, forcing people to upgrade. I caught them red-handed on this, and posted to the forums. Dell, Gateway and others confirmed it too, especially after I dissected the internals, including a full reproducer (after 18 hours of debugging), and they confirmed it with their techs.
I.e., entire Dell and Gateway model lines became unusable when people updated their XP systems, and they were getting calls.
At one point, we had a mob of fellow MCPs and even some MVPs demanding the re-inclusion of the unmaintained driver. Microsoft responded by deleting the thread. Subsequent tickets and other things were "won't fix" and "will charge" (for the support call). But I'm used to them doing this to their own partners and professionals ever since SQL Slammer.
MrSandman wrote:I had something similar it was a failing PSU. Have you tried taking one of the gfx cards out and running a solo card for a while? I'm guessing your PSU is man enough to run those 2 cards sure that would be a pretty big power drain.
As always ... run the mainboard OEM's tools to monitor voltage (LMS the mainboard has standard LMSensors/WMI), and Alt-Tab to see the voltage on the +12V rails. If it's dropped from before you're started the game, you're pulling enough current to cause a drop, and that means your supply is inadequate.
Oh ... BTW ... we're now starting to see similar issues on +3.3V/+5V rails with ATX PSUs. Why?
► Show Spoiler
2.5" bay, 30x50mm mSATA and new 22x80mm NAND SSD devices use more power than a 2.5" platter, and they are approaching 3.5" platters in power consumption. But the difference is that a 3.5" drive uses +12V rails for their spindles, NAND SSD drives are using the +3.3V/+5V rails, which are far more limited in ATX PSUs.
More and more high-end NAND devices are starting to use the +12V rails for this reason, as they start to suck up 10-20W. Again, the "cost" in NAND devices isn't the NAND, but the DRAM used for buffering writes, as NAND is quite slow at writes. DRAM is leaky and power hungry, especially as we're seeing 1GiB DRAM for every 1TiB NAND in these devices.
Roger Wilco Jr wrote:Oddly, when running on the lowest gfx settings, the cards were running at the same temperature, about 42C and 64C. There has always been a spread, but I'm not sure it was that wide in the past. I wonder if maybe one of those fans is failing. I don't think 64C is unreasonable, but I would like to get it a bit lower.
Your outer card
significantly inhibits the cool air from getting to the inner card, and 20C difference is very, very typical. But 64C is reasonable, you're right.
Now you understand why I use a single GPU, Mini-ITX configuration.
► Show Spoiler
The PC layout is still tied back to the legacy of the original IBM PC, and thanx to Intel. In a vertical tower, the cards even face the wrong way, heat-wise, down instead of up-face. Intel tried to correct this with BTX (CPU at front, cards on the opposite side, facing up), but it never found adoption outside of OEMs. Alternatively, getting a tower that flips the ATX orientation so the cards are at the top, CPU at the bottom, also helps.
Again, I use a horizontal Mini-ITX to solve the long-standing problem, with the GPU having its own, side panel it can pull cool air directly in from, the CPU on the other side doing the same, and the drives at the front getting theirs. Less fans so quieter, less ambient air means less airflow required while less internal also means less heat exchange doing on inside the case, and being done outside ... Thermo-Fluid Transfer 101. Adding magnetic filters to the external sides of the case also solve the dust intake issues too.