Frustrating Crashes

Postby **Roger Wilco Jr** » Mon May 02, 2016 11:24 pm

They are happening every hour now! I'd rarely have this happen in the past, but over the last few days it's been happening more and more.

I have a Windows 7 system playing Horizons 64-bit. I haven't upgraded anything, other than the game, in months. 4790K, 16GB, 2x GTX970 SLI, SSD, TrackIR, Voice Attack.

It usually happens when I'm in a RES, but I'm in a RES most of the time. The screen will suddenly freeze and the sound will continue. Then the screen usually goes black and I can't do anything but hit the reset button. After I reboot and restart the game, the graphics are usually changed from full screen 2560x1440 to windowed 1280x720.

If I check speccy during the game, the hottest thing is one of the graphics cards and it's only about 60 degrees C.

Any thoughts? This is getting really frustrating.

Postby **AndyB** » Mon May 02, 2016 11:33 pm

any clues in the event viewer?

http://windows.microsoft.com/en-gb/wind ... =windows-7

Postby **Roger Wilco Jr** » Tue May 03, 2016 1:02 am

Eh, I'm not much of a Windows guru. The most I could make out was a critical error about kernal power, apparently related to me using the reset button. It's happened 4 times in the last 2 days, and 24 times in the last 5 months. I think I'll just try to update everything and see if that helps. Hopefully Nvidia has fixed the problem with the last update that I had to roll back.

Btw, do you know if it's true that windows update may try to force a change to Windows 10? I think I read that on this board, and shut off updates.

Postby **cptcisco** » Tue May 03, 2016 2:53 am

sounds like you graphic drivers are messing up. Have you tried to use a newer or if you do have the newest, can you go back to a older version?

Postby **MrSandman** » Tue May 03, 2016 5:42 am

I had something similar it was a failing PSU. Have you tried taking one of the gfx cards out and running a solo card for a while? I'm guessing your PSU is man enough to run those 2 cards sure that would be a pretty big power drain.

Postby **Roger Wilco Jr** » Tue May 03, 2016 12:01 pm

I am running old gfx drivers from Feb because newer ones cause a problem with a mild strobing effect. The latest apparently still have the problem and FD is still looking into it. The power supply is a good 750W one and the cards don't need much power. I think I'm find there, unless it is failing.

I was going to open a ticket when I read that I should try some things first. I'll start by using the lowest graphics settings. I suppose I can also try one card and then the other. I had the terrain detail slider set fully to gfx card and maybe I can play with that setting a bit. I also have a 4th case fan that I never installed because I couldn't find a place to plug it into. I think I'll try to get that installed.

Oddly, when running on the lowest gfx settings, the cards were running at the same temperature, about 42C and 64C. There has always been a spread, but I'm not sure it was that wide in the past. I wonder if maybe one of those fans is failing. I don't think 64C is unreasonable, but I would like to get it a bit lower.

I'll be happy to switch to a single next generation card when they come out (in preparation for VR).

Postby **AndyB** » Tue May 03, 2016 2:56 pm

I've had a quick google on this and people are reporting temps as high as 80c on overclocked cards without any problems so i doubt that's it, the same with the power supply as the 970 uses 145 watts per card, a 750 watt PSU should give you about 100 watts to spare.

that doesn't leave a lot of options so after the drivers i'd start looking at things like the bios version to see if that is up to date, it can't hurt.

YouTube · Postby **thebs** » Tue May 03, 2016 3:05 pm

Roger Wilco Jr wrote:I have a Windows 7 system playing Horizons 64-bit. I haven't upgraded anything, other than the game, in months. 4790K, 16GB, 2x GTX970 SLI, SSD, TrackIR, Voice Attack. It usually happens when I'm in a RES, but I'm in a RES most of the time. The screen will suddenly freeze and the sound will continue. Then the screen usually goes black and I can't do anything but hit the reset button.

That sounds like it could be I/O wait, which I've seen over the years, usually disk writes. It either recovers in time for the GPU buffers to not go stale, or it doesn't, and the GPU buffers go blank. But only event logs will confirm, and it's not a default setup.

► Show Spoiler

Roger Wilco Jr wrote:
AndyB wrote:any clues in the event viewer?
http://windows.microsoft.com/en-gb/wind ... =windows-7
Eh, I'm not much of a Windows guru. The most I could make out was a critical error about kernal power, apparently related to me using the reset button.

So we're interested in the logs just before that.

I.e., you should export the Event log and send it to one of us. Some of us are long-standing MCPs (from the original NT 3.1 days) and know NT internals in and out.

► Show Spoiler

AndyB wrote:Btw, do you know if it's true that windows update may try to force a change to Windows 10? I think I read that on this board, and shut off updates.

In Windows update, disable/hide KB3035583. If it's already been installed (stupid Upgrade applet), uninstall it, then tell it to hide/disable it. The main complaint with KB3035583 is that, once installed, it downloads the 6GiB of updates ... without telling you, if you have "Download Updates but let me Choose." That is wholly irresponsible of Microsoft.

► Show Spoiler

MrSandman wrote:I had something similar it was a failing PSU. Have you tried taking one of the gfx cards out and running a solo card for a while? I'm guessing your PSU is man enough to run those 2 cards sure that would be a pretty big power drain.

As always ... run the mainboard OEM's tools to monitor voltage (LMS the mainboard has standard LMSensors/WMI), and Alt-Tab to see the voltage on the +12V rails. If it's dropped from before you're started the game, you're pulling enough current to cause a drop, and that means your supply is inadequate.

Oh ... BTW ... we're now starting to see similar issues on +3.3V/+5V rails with ATX PSUs. Why?

► Show Spoiler

Roger Wilco Jr wrote:Oddly, when running on the lowest gfx settings, the cards were running at the same temperature, about 42C and 64C. There has always been a spread, but I'm not sure it was that wide in the past. I wonder if maybe one of those fans is failing. I don't think 64C is unreasonable, but I would like to get it a bit lower.

Your outer card significantly inhibits the cool air from getting to the inner card, and 20C difference is very, very typical. But 64C is reasonable, you're right.

Now you understand why I use a single GPU, Mini-ITX configuration.

► Show Spoiler

YouTube · Postby **thebs** » Tue May 03, 2016 3:39 pm

AndyB wrote:I've had a quick google on this and people are reporting temps as high as 80c on overclocked cards without any problems so i doubt that's it,

Just like kerosene doesn't melt steel ... but it weakens it.

► Show Spoiler

Same deal ... heat reduces efficiency ... which reduces power. PSUs degrade over time. It's a basic law of any transformer, everything from nominal usage to transients. And your PSU is a set of various transformers. Check your +12V rails with your mainboard OEM's tools, both before and while Elite is running.

Now 60C something does sound fine. But understand heat will reduce power. So if you don't have much room, it can factor in. I'm not saying that's your issue, but it can.

AndyB wrote:the same with the power supply as the 970 uses 145 watts per card

Er, um, that's the rated Thermal Design Power (TDP -- Wikipedia). The card can use more, but it's the median thermal power equivalent rating required to cool.

E.g., my GTX 980 Ti is rated at 250W TDP, but that means I can easily pull 25A (300W) on its own.

AndyB wrote:a 750 watt PSU should give you about 100 watts to spare.

Actually, that's not that much. Even if your voltage doesn't drop, you could be flirting with it at its limits. Especially when heat is added.

I'm personally doing some testing on my rig, now that I moved from a GTX 970 to a GTX 980 Ti. I have a single, 80%+ efficient 650W PSU that can deliver 580W on the +12V rail. I'm not seeing any voltage drop, but my PSU is clearly generating more heat now, just with the move. I noted it as my CPU temperatures are now up, which is right under the PSU. And my PSU is 2 years old. I've put it through a lot of cold-hot-cold-hot-cold cycles as I only boot it to game. I've had no issues, but I don't like my CPU temps going up as a result. It's clear my PSU is much warmer now that I put the GTX 980 Ti in.

Key rule of engineering analysis ... every change is an impact, which is yet another variable in yet another equation which has an impact in other equations ... and can lead to failure. Failure to identify such increases in risk is common.

► Show Spoiler

Learned this first hand in my first NASA Material Review (MR) experience. The Shuttle Transport System (STS) was a 1970s design, with thousands upon thousands of part changes due to unavailability, often due to various issues, including political. That meant NASA, on average, was doing more than one MR ... a day.

It's hard to deal with thousands of design changes due to part unavailability. Imagine if you couldn't get parts for your car? So you had to get replacement parts that weren't designed for your car. Core, essential, safety components that would put your car at extremely high risk of failure, mortally so, if they failed? Junkyards are still popular for a reason, especially for 20+ year old cars.

The O-Ring design in the SRB was well risk mitigated in the original '70s design, because the sealant improved thermal insulation, not just against moisture. Once it was changed in 1985 because one material in the sealant was outlawed, the O-Ring design was no longer well risk mitigated, even though the new sealant passed the MR as being even better against moisture. It increased risk of O-Ring inelasticity.

The Orbiter as also well risk mitigated against EFI insulation break-offs. They only had a dozen or so break offs, all small, low risk of damaging the Orbiter, especially at the angles of the EFT (under) the orbiter. Once it was changed in 1997 because one material in the insulator was outlawed (actually 10 years earlier, but NASA and the USAF were granted an exception, now expired), the new insulator was better ... except in tensile strength. Now we had hundreds of break-offs, including dozens of huge chunks.

I take the latter personally. We reported this issue in 1998, as the same stuff used on the EFT is also used to insulate ballistic missiles. I.e., the insulators break off as the missile rises during lift-off, and they were breaking apart and staying on the missile, causing aerodynamic effects, roll issues and, essentially, become massive risk up to Max-Q. I failed to make it known strongly enough. 7 people died because I didn't care enough.

This isn't rocket science. But it does tend to make you change how you approach things in general. I never eliminate anything. Everything impacts. Just my view. Just how I approach everything.

Postby **Roger Wilco Jr** » Tue May 03, 2016 5:04 pm

thebs wrote:So we're interested in the logs just before that.

I.e., you should export the Event log and send it to one of us. Some of us are long-standing MCPs (from the original NT 3.1 days) and know NT internals in and out.

I may take you up on that.

Elite: Dangerous PvE - Mobius

Group Members: 40,000

Frustrating Crashes

Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Re: Frustrating Crashes

Heat always has impact when it comes to power

Re: Frustrating Crashes

Who is online