PDA

View Full Version : Collatz is BSODing my computer



Maxwell
02-19-11, 06:07 PM
Ok. So, my computer (Silver Hammer), used to run Collatz quite well. After the DNETC scare, I ran DNETC up to 100M, changing none of the setting on my system. Today, I stop DNETC and start Collatz back up. Nothing but a string of BSODs.

This is Win7x64, 2x5970, CCC 10.10, BOINC 6.10.58. Running (now) at stock settings.

I have restarted the machine, reset the project, detached/reattached to Collatz. I uninstalled the driver, ran Driver Sweeper, reinstalled the driver (10.10 has historically been the most stable on my system). I ran chkdsk, and I keep getting the same thing over and over.

Any ideas as to what I can try? I want to go on a Collatz push, but need working cards to do that...

Fire$torm
02-19-11, 06:34 PM
Ok. So, my computer (Silver Hammer), used to run Collatz quite well. After the DNETC scare, I ran DNETC up to 100M, changing none of the setting on my system. Today, I stop DNETC and start Collatz back up. Nothing but a string of BSODs.

This is Win7x64, 2x5970, CCC 10.10, BOINC 6.10.58. Running (now) at stock settings.

I have restarted the machine, reset the project, detached/reattached to Collatz. I uninstalled the driver, ran Driver Sweeper, reinstalled the driver (10.10 has historically been the most stable on my system). I ran chkdsk, and I keep getting the same thing over and over.

Any ideas as to what I can try? I want to go on a Collatz push, but need working cards to do that...

I would try the following:
Detach Collatz and shutdown BM.

Locate your BOINC data\projects folder and make sure the "boinc.thesonntags.com_collatz" folder and the "boinc.thesonntags.com_collatzsymbols" folder have been deleted.

Run CCleaner (http://download.cnet.com/ccleaner/?tag=mncol;1) on the registry for orphaned entries and the HDD for orphaned files.

Run Free Registry DeFrag (http://download.cnet.com/Free-Registry-Defrag/3000-2094_4-10553700.html?tag=mncol;3) (will not really help the problem but it will help speed bootup so why not.) DeFrag will request reboot when it is finished, say yes.

Now go to Collatz website and make sure ONLY GPU work is selected. Note: Collatz now has a "Mini Collatz" app. Try the normal app first. Try the mini app if the normal app crap out your system.

Reattach Collatz. (Keeping fingers crossed)

Crazybob
02-19-11, 06:36 PM
I pulled some Collatz today to make up for MW being down and did notice 1 WU ran for 1 hour and 11 minutes.

Maxwell
02-19-11, 08:47 PM
I pulled some Collatz today to make up for MW being down and did notice 1 WU ran for 1 hour and 11 minutes.
Slicker recently upped the size of the Collatz WUs, so that's not surprising. They are now 4x longer than they used to be.

@F$: The only part of that I hadn't tried was running CC cleaner. But I did just follow your directions and got the same "screen goes oddly pixelated, blacks out, BSOD, shut down" sequence.

Crap. Other thoughts? My computer is being naughty right now...

joker
02-19-11, 09:01 PM
I am not a computer guru in any manner but I did have a case of the BSOD's awhile back and it ended up being my memory. I just laxed my timings a little and it solved my problems. Good luck.

Fire$torm
02-20-11, 05:54 AM
Slicker recently upped the size of the Collatz WUs, so that's not surprising. They are now 4x longer than they used to be.

@F$: The only part of that I hadn't tried was running CC cleaner. But I did just follow your directions and got the same "screen goes oddly pixelated, blacks out, BSOD, shut down" sequence.

Crap. Other thoughts? My computer is being naughty right now...

Not really sure although jokers comment jogged a memory. I think it would be a good Idea to "Re-seat" all your computer parts, cards, plugs and screws. And while your at it try swapping the positions of any components that have twins, video cards, memory stick. One last thing, err two last things. Run an anti-malware utility like Malwarebytes and unplug all USB devices sans mouse and KB.

Mumps
02-20-11, 10:27 AM
And, also, have you tried running only mini-Collatz? That might help identify if the cards simply don't like the longer WU's.

Maxwell
02-20-11, 11:45 AM
So as it turns out, it is not Collatz specifically, but any load on the cards now. I tested Mumps' idea the opposite way - I started to run a DNETC WU on there (which was fine until yesterday), and got the same BSOD problem.

When I get a chance to dig into the computer, I'll reseat every thing. Thanks for the ideas, folks.

Fire$torm
02-20-11, 12:33 PM
So as it turns out, it is not Collatz specifically, but any load on the cards now. I tested Mumps' idea the opposite way - I started to run a DNETC WU on there (which was fine until yesterday), and got the same BSOD problem.

When I get a chance to dig into the computer, I'll reseat every thing. Thanks for the ideas, folks.

Oh boy...... Maxwell, what you are describing now sounds like a faulty hardware problem. :(
Super simple test. Pull out one of the 5970s and run Collatz, then switch cards and test again. If it can now crunch then it is either a bad card, the one you pulled, or your PSU is going bye bye.

If nothing above works then it will be necessary to try the cards in a different system and/or try a different PSU in the errant system and trying the supeer simple test again.

Shadow
02-20-11, 06:50 PM
My suggestions along with a buck and a half might get you a cold cup of coffee but here goes. First, if you've done any updating or changing of CCC, do the control panel>uninstall a program and look for Microsoft Visual C++ (any year) redistributable, and get rid of it. Most people uninstall their old version of catalyst but forget to uninstall that nasty little bugger. You may even find multiple occurrences of that file.

http://support.amd.com/us/kbarticles/Pages/GPU57RemoveOldGraphicsDrivers.aspx

If that doesn't do it then I'd say to try your cards one at a time to see if one is going bad, and give them a good cleaning while you have them out.

Maxwell
02-22-11, 02:25 AM
So, I have not tried everything people have suggested, but I've gone through most stuff. Next up is trying individual components, but I didn't get there tonight. I do have a couple updates, though...

- I caught the error on the BSOD before the box restarted: some about not successfully recovering from a driver crash. It seems that when I do something crazy like run an ATI WU, it crashes the driver (both 10.10 and 11.2).
- Interestingly, when I'm in CCC with stock settings, and hit the "Test Custom Clocks" button, the clocks fail. With stock settings. Ugh.

I did switch cards around, and things work the same no matter which card is in the first slot. I'm beginning to think this is a PSU issue (though that is little more than a blind guess at this stage). This is a bit disappointing, since I have this PSU (http://www.newegg.com/Product/Product.aspx?Item=N82E16817341028), which is supposed to be good.

Other thoughts/suggestions before I begin to cry myself to sleep?

Fire$torm
02-22-11, 08:32 AM
Damn, not looking good :(

Have you tried just a single 5970?
This would be a pain but you could try a different OS like Ubuntu or Windows XP. With Ubuntu you could use a USB thumb drive with persistence. This would avoid having to mess with the HDD. Another thumb drive option would be Dotsch/UX (http://www.dotsch.de/boinc/Dotsch_UX.html)

More info on Linux USB live installs at PenDriveLinux.com. (http://www.pendrivelinux.com/)

Edit: BTW, try the single card test in each of the PCIe slots.

Maxwell
02-22-11, 01:53 PM
Ok. Testing. Using the 11.2 driver, and everything worked fine on the machine for all tests until I used DNETC WUs to put loads on the card:

1. PowerColor card, x16 slot: failure. Restart.
2. PowerColor card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
3. Sapphire card, x16 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
4. Sapphire card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.

I'm skeptical of the probability of having multiple failures simultaneously (not impossible, I know, but unlikely). Which makes me think the PSU is not giving out enough power... does that sound reasonable?

And it is not just a BOINC thing. Even with one card in the system at a time, I can get the driver to crash after using the "Test Custom Clocks" link within CCC using stock settings still.

There goes my morning...

Fire$torm
02-22-11, 02:11 PM
Ok. Testing. Using the 11.2 driver, and everything worked fine on the machine for all tests until I used DNETC WUs to put loads on the card:

1. PowerColor card, x16 slot: failure. Restart.
2. PowerColor card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
3. Sapphire card, x16 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
4. Sapphire card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.

I'm skeptical of the probability of having multiple failures simultaneously (not impossible, I know, but unlikely). Which makes me think the PSU is not giving out enough power... does that sound reasonable?

And it is not just a BOINC thing. Even with one card in the system at a time, I can get the driver to crash after using the "Test Custom Clocks" link within CCC using stock settings still.

There goes my morning...

Hahaha, Welcome to TroubleshootingVille........ Yeppers, it can burn hours away like nobodies business x-(

OK, three possiblities: PSU, motherboard or OS. I was going to suggest trying a different OS first but do this instead. See if one of those IT guys where you work can "Loan" you a PSU tester!!! They have to have one. If they don't then suggest they buy one and let you test it for them :)

Harley
02-22-11, 08:22 PM
Ok. Testing. Using the 11.2 driver, and everything worked fine on the machine for all tests until I used DNETC WUs to put loads on the card:

1. PowerColor card, x16 slot: failure. Restart.
2. PowerColor card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
3. Sapphire card, x16 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.
4. Sapphire card, x8 slot: failure. Driver crapped out, but managed to stop DNETC before the computer restarted.

I'm skeptical of the probability of having multiple failures simultaneously (not impossible, I know, but unlikely). Which makes me think the PSU is not giving out enough power... does that sound reasonable?

And it is not just a BOINC thing. Even with one card in the system at a time, I can get the driver to crash after using the "Test Custom Clocks" link within CCC using stock settings still.

There goes my morning...




Max, Good news! Its a driver issue. With MW down today (I was not having problems with MW wu's) I switched over to DNETC. When I switched back to MW I had the same problems that you described. Like I had pushed the OC too far. I was able to run MW on for only a short time before it crashed. I was checking everything. I had the card OC'd at 925/1000, which is low for me. CCC and GPU-Z both said everthing was fine. Then I checked a MW wu. The task info said the gpu's were clock at 1000/1500!!! No wonder it was crashing. I removed CCC 11.2 and used driver sweeper. Then installed CCC 11.1 and now everthing seems to be fine. I'll keep you updated.

Edit: I should have said its a driver and DNETC problem. Try running MW and check the task info. Do not run DNETC.

Harley
02-22-11, 08:30 PM
Its been a half hour now with no problems.

Fire$torm
02-22-11, 09:14 PM
@Harley Thx for the info

@Maxwell: Hmmmm, if your problem is the same as Harley's then......

Uninstall BOINC. Uninstall CCC and reboot. Run Driver Sweeper and reboot. Run CCleaner for Registry & HDD, then Auslogics Defrag, then Free Registry Defrag and reboot.

Now you have driver choices, old or new? I would suggest CCC v11.2 (Win7-x64). (http://sites.amd.com/us/game/downloads/Pages/radeon_win7-64.aspx)

Install BOINC and test. If it fails then get the gas can from your car and......
Joke aside, if it does fail then its still a driver issue but the fault is in the OS. Maybe something in the .Net Framework is corrupted since CCC has .Net dependencies.

So the alternative to the above? Swap out the HDD with a spare and install Win7 on the temp drive and try crunching just Collatz and/or DNETC.

Just remembered that Windows does a horrible job of reporting bad hard drive sectors and other faults. Any recent Ubuntu LiveCD distro will report a faulty HDD immediately after booting to the Desktop.

Maxwell
02-22-11, 10:12 PM
I swear to freaking god...

I almost started shouting from the roof tops. I took Harley's suggestion, uninstalled CCC, driver swept, reinstalled 11.1. Tested a couple MW WUs, and they completed successfully. Woohoo!

Decided to test a couple Collatz WUs next, and the problem came back - driver death.

Did I mention I reinstalled Winders today? Bah!

Ok. Going to try the Fire$torm laundry list of activities now, adding in uninstalling .NET and reinstalling it. And punching a baby. That goes on the list too...

joker
02-22-11, 10:14 PM
Be sure to kick your dog in the balls.

Harley
02-22-11, 10:18 PM
I swear to freaking god...

I almost started shouting from the roof tops. I took Harley's suggestion, uninstalled CCC, driver swept, reinstalled 11.1. Tested a couple MW WUs, and they completed successfully. Woohoo!

Decided to test a couple Collatz WUs next, and the problem came back - driver death.

Did I mention I reinstalled Winders today? Bah!

Ok. Going to try the Fire$torm laundry list of activities now, adding in uninstalling .NET and reinstalling it. And punching a baby. That goes on the list too...



What clock speeds did you use? Did you overvolt? I'll give Collatz a try at your settings and see what happens.

zombie67
02-22-11, 10:35 PM
This is a dedicated cruncher? And you are sure it is not the HW? Just wipe the HD and reinstall the OS from the ground up. Like nuking the planet, it's the only way to be sure.

Fire$torm
02-22-11, 11:01 PM
This is a dedicated cruncher? And you are sure it is not the HW? Just wipe the HD and reinstall the OS from the ground up. Like nuking the planet, it's the only way to be sure.

+++1

BTW: That line (Its the only way to be sure) was used in the Aliens sequel :-bd

Maxwell
02-22-11, 11:13 PM
@Harley: The only overclocking I have ever done with through CCC - nothing crazy. And I was ostensibly running Collatz at stock settings...

@zombie: Not a dedicated cruncher, which is why I am leery to do a complete wipe. I'm not sure it's not HW, just REALLY hoping (since I don't have spares or alternate systems to work with).

Still going through F$'s process - hopefully that will get me somewhere. Otherwise, I may do the full wipe.

Harley
02-22-11, 11:45 PM
I ran a mini Collatz wu at 925/1000 and had no problem.
http://boinc.thesonntags.com/collatz/workunit.php?wuid=33256506

This is a MW wu I ran when I was having all the problems.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=319317257
The card's clocks were set at 925/1000. The task's details show the clocks at 1000/1500.
All this happened after running DNETC. Damn... I hate DNETC. I hope it turns out to be a software/driver issue for you and not hardware.

Maxwell
02-23-11, 01:33 AM
@Harley - can you check a normal Collatz WU? And what CPU project were you running along with the GPU projects in your uber system?

Update: I either have a ghost or something f***ed up within BOINC. After going through the F$ procedure, I go back to the bedroom to start setting up a wall mount for a TV. I come back out about 30 min later, and the computer has restarted. Not surprising, given how bitch it's been the last couple of days. Enter the ghost...

After signing in to the computer, I see that my CPU project (AlmereGrid) has disappeared (detached). No clue why or how. I didn't do it.

The machine seems to crunch MW just fine for a bit (10-15min) then BSOD's - the last error was "Video Scheduler has encountered an unexpected error". Working on fixes now. Just finished punching a baby and kicking the dog...

Harley
02-23-11, 11:16 AM
@Harley - can you check a normal Collatz WU? And what CPU project were you running along with the GPU projects in your uber system?

Update: I either have a ghost or something f***ed up within BOINC. After going through the F$ procedure, I go back to the bedroom to start setting up a wall mount for a TV. I come back out about 30 min later, and the computer has restarted. Not surprising, given how bitch it's been the last couple of days. Enter the ghost...

After signing in to the computer, I see that my CPU project (AlmereGrid) has disappeared (detached). No clue why or how. I didn't do it.

The machine seems to crunch MW just fine for a bit (10-15min) then BSOD's - the last error was "Video Scheduler has encountered an unexpected error". Working on fixes now. Just finished punching a baby and kicking the dog...



I ran a normal Collatz WU with no problem.
http://boinc.thesonntags.com/collatz/result.php?resultid=75940348
For the cpu I'm running AndrOINC. However I'm only running 6 of 8 theads.

When I was having the problems my box shut down on me 23 times! According to Windows it was caused by the ATI driver. Everything was running fine before I ran DNETC and now everthing is back too normal.

Harley
02-23-11, 01:28 PM
Max;
Sorry for jumping back and forth between PM'ing and this thread but I want to put this out there for anyone else who might be having this problem. this is one of your MW wu's.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=319520490
Check the clocks! 1000/1500 that's the same as mine showed. Whatever the problem is its the same for both of us. I have XFX cards and use their overvolting tool. I believe the tool is brand and card (Black edition) specific. You can try MSI's Afterburner. I think you can use that for overvolting.

Maxwell
02-23-11, 02:44 PM
Damn it! You're right! Why you can find the problem with my machine better than I can, I have no idea, but you're right.

Unfortunately, the fix you recommended earlier didn't work for me. Despite uninstalling CCC, driver sweeping, and reinstalling 11.1, that didn't reset clocks to normal.

So do I use Afterburner to clock them back to stock, or do I try and work that from within CCC after installing it?

Harley
02-23-11, 03:06 PM
Damn it! You're right! Why you can find the problem with my machine better than I can, I have no idea, but you're right.

Unfortunately, the fix you recommended earlier didn't work for me. Despite uninstalling CCC, driver sweeping, and reinstalling 11.1, that didn't reset clocks to normal.

So do I use Afterburner to clock them back to stock, or do I try and work that from within CCC after installing it?



Ummmm, I guess I have too much time on my hands.
Here's what I suggest.

1. Use CCC to restore factory defaults. Not just the Default tab but from preferences.
2. Use MSI Afterburner to change the voltage. Then reset to normal.

At this point you can try running MW or you can go the whole nine yards and remove CCC and use driver sweeper. If you do this, after using driver sweeper go to the control panel- device manager and make sure the display adapter driver is gone. If not uninstall and reboot. This will force windows to reinstall the drivers. Wait until windows has finished. Then reintall CCC. Good Luck!

Maxwell
02-23-11, 06:39 PM
Ok. So did the whole "Restore Defaults" thing like Harley suggested, and checked Afterburner, and everything appeared to be at stock, including voltages. I tried the "test custom clocks" with settings at stock settings after restoring defaults, and it failed (it shouldn't). Then I decided to go a little loco...

I backed the memory clock down to 500, and the GPU clock down to 575 in CCC. This is from stock settings of 1000 and 725, respectively. Tested these custom clocks, and it passed. Put them on MW for about 45min, then emptied the cache. All went well.

Uninstalled the driver, swept, reinstalled the driver, to see if that changed something internal to the card. It didn't because when I tried the "test custom clocks" thing again with default settings, it failed again.

I'm now uninstalling completely again, and I'm going to switch the location of the two cards, and see if that can get something going...

Fire$torm
02-23-11, 07:08 PM
Ok. So did the whole "Restore Defaults" thing like Harley suggested, and checked Afterburner, and everything appeared to be at stock, including voltages. I tried the "test custom clocks" with settings at stock settings after restoring defaults, and it failed (it shouldn't). Then I decided to go a little loco...

I backed the memory clock down to 500, and the GPU clock down to 575 in CCC. This is from stock settings of 1000 and 725, respectively. Tested these custom clocks, and it passed. Put them on MW for about 45min, then emptied the cache. All went well.

Uninstalled the driver, swept, reinstalled the driver, to see if that changed something internal to the card. It didn't because when I tried the "test custom clocks" thing again with default settings, it failed again.

I'm now uninstalling completely again, and I'm going to switch the location of the two cards, and see if that can get something going...

Maxwell, hold your horses please. Something is really wrong here. Like it takes a rocket scientist to figure that out...... Where is Dr.Von Braun when you need him?

I strongly suggest you do the HDD swap with a fresh install of Win7. Remember you do not need to activate it for 30 days.

Damn, new thought leads to new question: Did you just recently install any new M$ updates? I just remembered SP1 for Win7 came out like 2 or 3 days ago. This could be your culprit. So do the HDD swap but do NOT update. Install CCC 10.10 without APP, just to see what happens. Install BOINC and crunch. You can use the BlackOps group account so it won't mess with your CPID. I will add Collatz, MW & DNETC to it.

If it works then you can install all the M$ updates dated before the first of the year and crunch some more. Then update CCC to 11.0 and test again. You need to find the point of failure. Just keep adding sets of updates until it BSODs. This will give you a target.

Maxwell
02-23-11, 08:07 PM
I don't have a spare drive. Frankly, since I've screwed with this Windows install enough, I'm just formatting the disk and starting over. I have backed up what I need, so I'll go from there...

Maxwell
02-23-11, 11:09 PM
So after reinstalling Winders, I ran some Collatz WUs (which would only run after downclocking my systems to GPU = 575, Mem = 500, at least as they are displayed in CCC). They completed, but then my computer went into fits and the driver crashed again. Something is clearly wrong here...

Fire$torm
02-24-11, 02:53 AM
So after reinstalling Winders, I ran some Collatz WUs (which would only run after downclocking my systems to GPU = 575, Mem = 500, at least as they are displayed in CCC). They completed, but then my computer went into fits and the driver crashed again. Something is clearly wrong here...

Questions...

What were the CPU and GPUs temps just before the fits started?

When was the last time you cleared the dust buildup in the GPUs?

You have that Corsair H50 CPU Cooler. Have you checked the radiator for over zealous dust bunny, Tribble wonnabes?

Maxwell
02-24-11, 09:29 AM
Temps are fine. Low 60s on the GPU (at worst), and low 40s on the CPU.

Blew the dust out the when this started happening, so two days ago.

I'm running this machine on air - the H50 is in another machine. Blew the dust out of this cooler a couple days ago, too...

zombie67
02-24-11, 08:34 PM
Load means two things. Heat and power consumption. Maybe it's not heat. Maybe it's the PSU.

Fire$torm
02-25-11, 01:47 AM
Temps are fine. Low 60s on the GPU (at worst), and low 40s on the CPU.

Blew the dust out the when this started happening, so two days ago.

I'm running this machine on air - the H50 is in another machine. Blew the dust out of this cooler a couple days ago, too...

Yeah, I was pretty sure you had but it crossed my mind so I threw it out there. No insult intended.

Maxwell
02-25-11, 06:47 PM
So far, I have punched 7 babies, and uppercutted 3 toddlers. In my head, at least...

Harley was able to duplicate the problem on his machine, then solve it, then pass along the procedure for how to fix it to me. This, of course, makes him a badass and earns him my eternal gratitude.

Long story short: DNETC forced the clocks too high. When you move to another project, the card is then clocked too high, and is a pain to reset and get working again. One key symptom: BOINC has the card rated about 35% higher than it should be based on stock clocks. I followed Harley's procedure, fixed the overclocking issue, and got my machine to the same status Harley's was at when it started working again. But as soon as I tried to run a Collatz WU, I BSOD'd.

Next, I pulled out an old 9500GT card I had sitting here. Pulled the 5970s out of the machine. Installed the 9500GT. Apparently, my monitor is too good for the card, as it kept crashing the driver. I plugged an old CRT monitor into the card. It seems to work (after, of course, reinstalling the driver). However, it does not seem to be working perfectly.

I have this weird blinking on my display. It looks like the driver is starting to crap out on me. The card is actually completing DistRTgen WUs successfully, so it seems the card is working and the driver is at least functional.

Now - I'm pretty sure the cards are ok. I'm certain it is not a Winders issue, since I have reinstalled Winders. My current working hypothesis (for whatever good that is right now) is that two things went wrong at the same time. DNETC screwed up the clocks hardcore (which Harley has helped me fix). This, in turn, caused some hardware issue. I'm guessing either PSU or MoBo.

Is there any way to test these without spending money (I was blindsided by about $1000 in car maintenance yesterday), keeping in mind I don't really have any alternate systems to play with?

Maxwell
02-26-11, 08:53 PM
I am not a computer guru in any manner but I did have a case of the BSOD's awhile back and it ended up being my memory. I just laxed my timings a little and it solved my problems. Good luck.
So this is the one thing I haven't tried...

I have done just about everything else suggested in this thread, save for swapping out hardware/trying things in a new system (both of which aren't possible for me).

After running through y'all's suggestions and a crapton of googling the symptoms, I've seen quite a few mentions of it potentially being a RAM issue. I have run Win7's Memtest, and it showed that everything was fine. However, I did find several folks online who noted it really was a RAM problem, even though the RAM passed the Memtest. They solved the issue by doing what joker did - lax the timings.

So how do I do that?

Harley
02-26-11, 09:02 PM
So this is the one thing I haven't tried...

I have done just about everything else suggested in this thread, save for swapping out hardware/trying things in a new system (both of which aren't possible for me).

After running through y'all's suggestions and a crapton of googling the symptoms, I've seen quite a few mentions of it potentially being a RAM issue. I have run Win7's Memtest, and it showed that everything was fine. However, I did find several folks online who noted it really was a RAM problem, even though the RAM passed the Memtest. They solved the issue by doing what joker did - lax the timings.

So how do I do that?

Its done through the MB BIOS setup.

Maxwell
02-26-11, 09:06 PM
Its done through the MB BIOS setup.
Sorry for asking stupid questions, but this is not something I've ever played with. What do I look for, and what should I change it to?

joker
02-26-11, 09:30 PM
First question is: what kind of memory are you running? A link would be good if you have one.

And I am curious as to your MB model number too.

Maxwell
02-26-11, 09:44 PM
I have this RAM (http://www.newegg.com/Product/Product.aspx?Item=N82E16820231275) x2. And this is my MoBo (http://www.newegg.com/Product/Product.aspx?Item=N82E16813128416).

I'd love to hear what you did to fix your issue...

Harley
02-26-11, 10:01 PM
I have this RAM (http://www.newegg.com/Product/Product.aspx?Item=N82E16820231275) x2. And this is my MoBo (http://www.newegg.com/Product/Product.aspx?Item=N82E16813128416).

I'd love to hear what you did to fix your issue...


The stock timings are Ummm... well.... I don't know. The overview has it listed as 8-8-8-21. The details page has it listed as 8-8-8-24. Use CPU-Z and check the memory tab it will tell you the timings. Then check the SPD and it will tell the timings which are supported.

joker
02-26-11, 10:07 PM
Ok, I too have a Gigabyte MB so I am hoping that we have pretty much the same BIOS. Boot up computer, get into BIOS. There should be something labled MB intelligent tweeker. Enter that. Then there should be something labeled DRAM configuration. Enter that. There should be something labeled Timings and it is probably set to Auto. Switch it to manual. Now below that should be a tab labeled CAS timings. If it shows a 7, change it to and 8. If is shows 8, change it to a 9........F10 and reboot. I hope this made sense to you. If not, ask more questions and we will continue to work on it. Good luck.

zombie67
02-27-11, 12:55 AM
Memtest: Not sure what win7 uses to do that. A good memory test can be run from a CD, and will usually take a day or two depending on the amount you have.

http://www.memtest.org/

Fire$torm
02-27-11, 01:02 AM
Memtest: Not sure what win7 uses to do that. A good memory test can be run from a CD, and will usually take a day or two depending on the amount you have.

http://www.memtest.org/

Its the same test.

zombie67
02-27-11, 01:10 AM
Its the same test.

Well then there is no way it was completed that fast.

Maxwell
02-27-11, 02:40 AM
Ok - with 8 Gigs of DDR3, the Memtest embedded with windows took 4+ hours.

Something is telling me that joker is either right, or on the right track...

I went and changed the memory settings in BIOS, and I'm at the most stable system I've had in the last week. I have both 5970s in the machine in Crossfire mode, running at stock settings.

With memory clocked back to 1066, at 7-7-7-19, I've been running MW for the last couple hours without incident. I tried putting it on Collatz at these settings, and it crapped out - but MW is running well so far.

Any thoughts on the next settings I should try or anything like that?

joker
02-27-11, 03:37 AM
My only last thought is to clock it back to 8 since that is what it is rated at and see what happens.

Maxwell
02-27-11, 07:39 AM
Holy crap - I woke up after going to bed, and found my computer... running! And crunching!

joker
02-27-11, 04:18 PM
Did you mess with it again or did it just "fix" itself?

Maxwell
02-27-11, 05:14 PM
Haven't touched it again. After the computer has run without erroring for about 18 hours now, I'm so giddy I'm afraid to touch anything. I might get back to this tonight or tomorrow, depending on how things go.

But this clearly seems to have been some RAM timing issue. No clue how or why it happened.

joker
02-27-11, 05:48 PM
If it is working then, as my doctor keeps telling me............DONT PLAY WITH IT!

Maxwell
02-28-11, 06:18 PM
The Silver Hammer passed all tests except one, and the resulting error is the name of the thread. It ran on MW for a day and a half without erroring out - that made me happy. I had backed the RAM speed down to 1066, with timings of 7, 7, 7, 19. When I put it on Collatz, crash. Put RAM back to stock settings. Crash.

So now I need help (again) - what should I try next? I'm a little scared of changing things willy-nilly in BIOS, and need some suggestions of where to go...

Again, I'm rocking this RAM (http://www.newegg.com/Product/Product.aspx?Item=N82E16820231275). Current timings and SPD page from CPU-Z below.
http://i1194.photobucket.com/albums/aa369/mborneman82/SH-Memory.png
http://i1194.photobucket.com/albums/aa369/mborneman82/SH-SPD.jpg

joker
02-28-11, 07:37 PM
This is the last thing I can think of as far as memory goes. Manually set your Command Rate down to 2T.

Maxwell
02-28-11, 11:41 PM
This is the last thing I can think of as far as memory goes. Manually set your Command Rate down to 2T.
Done. Didn't fix it. Bah.

So is there a way to adjust the GPU RAM settings, aside from the RAM clock? I ask because I can get my machine stable on MW, but not on Collatz. MW loads heavier on the GPU than Collatz, so I don't think load is the issue. But since Collatz uses the GPU RAM so differently than the other ATI projects, is there some setting I should adjust there?

Note again, that I've got everything running at stock settings...

Fire$torm
03-01-11, 12:35 AM
Errrr..... Hmmmmmmm, I have an idea.

You have 2x sticks of RAM. Pull one out and run Collatz. Your system will probably run a little slower. After the test, pass or fail, replace the installed stick with the one you pulled. Test again.

Reasoning: It is rare to have all the RAM sticks go bad. And since it is fairly easy and painless to install/remove RAM, this will be a fast way of physically testing for bad RAM. So one of the sticks should be able to run Collatz. That would be the "Good Stick".

The "Bad Stick" is therefore possessed. Call for a Priest and have the silicon exorcised. Then burn it at the stake just to be sure.......

joker
03-01-11, 12:51 AM
And punch a few more children.......

Harley
03-01-11, 01:08 AM
This is the last thing I can think of as far as memory goes. Manually set your Command Rate down to 2T.



You are also running a little over your XMP values. Your ram is running at 1339.2. Try adjusting your ram ratio down in the BIOS. I don't have a Gigabyte MB. Maybe Joker can walk you through it.


Oh yeah then kick the dog.

Maxwell
03-01-11, 02:08 AM
Errrr..... Hmmmmmmm, I have an idea.

You have 2x sticks of RAM. Pull one out and run Collatz. Your system will probably run a little slower. After the test, pass or fail, replace the installed stick with the one you pulled. Test again.

Reasoning: It is rare to have all the RAM sticks go bad. And since it is fairly easy and painless to install/remove RAM, this will be a fast way of physically testing for bad RAM. So one of the sticks should be able to run Collatz. That would be the "Good Stick".

The "Bad Stick" is therefore possessed. Call for a Priest and have the silicon exorcised. Then burn it at the stake just to be sure.......
You know, I had a not dissimilar thought, but didn't know particularly how to test it...

I actually have 4 sticks - 4x2Gb sticks. The Mobo has two sets of two "matching slots" (technical term ;) ). Can I just pull one stick at a time and run with it? And since all four sticks are identical, is the empty RAM slot an arbitrary decision, or is there one (or two) slots I need to make sure are filled?

@Harley - My BIOS is telling me it's running at 1333, but CPU-Z is picking it up at 1339.2. What do I change to fix that? What is a RAM ratio?

And yeah, kicking something is not far off. Maybe a child petting a dog... :rolleyes:;)

Fire$torm
03-01-11, 09:23 AM
You know, I had a not dissimilar thought, but didn't know particularly how to test it...

I actually have 4 sticks - 4x2Gb sticks. The Mobo has two sets of two "matching slots" (technical term ;) ). Can I just pull one stick at a time and run with it? And since all four sticks are identical, is the empty RAM slot an arbitrary decision, or is there one (or two) slots I need to make sure are filled?

@Harley - My BIOS is telling me it's running at 1333, but CPU-Z is picking it up at 1339.2. What do I change to fix that? What is a RAM ratio?

And yeah, kicking something is not far off. Maybe a child petting a dog... :rolleyes:;)
OK, here is the deal.....

I decided I needed to read the manual for a change :)

My idea was to have only one stick of RAM on the MB which your MB does support. The caveat is that the memory will run a LOT slower because its dual channel memory.

Since you have 4 matching sticks you can use a modified version of the test.
It would be best to put a small piece of masking tape on each stick to label them as 1 through 4 or A through D.

Now, using only memory slots #1 and #2, test the sticks in pairs. 1+2, 3+4, 1+3, 2+4, 1+4 and 2+3.

Shadow
03-01-11, 10:53 AM
Seems weird that whatever did this messed with so much. RAM timings, gpu clocks, etc. But since everything else is so out of whack, might want to check your voltage settings for the RAM too.

Mumps
03-02-11, 08:41 PM
Since you have 4 matching sticks you can use a modified version of the test.
It would be best to put a small piece of masking tape on each stick to label them as 1 through 4 or A through D.

Now, using only memory slots #1 and #2, test the sticks in pairs. 1+2, 3+4, 1+3, 2+4, 1+4 and 2+3.

Actually, do a binary test. Theoretically one pair should work OK. So test 1+2, then 3+4. record if either pair works OK. If either pair fails, you now swap in one from the other pair to determine which one is at fault. So three tests total rather than doing all 6 combinations. :)

Maxwell
03-02-11, 10:36 PM
Actually, do a binary test. Theoretically one pair should work OK. So test 1+2, then 3+4. record if either pair works OK. If either pair fails, you now swap in one from the other pair to determine which one is at fault. So three tests total rather than doing all 6 combinations. :)
Well, that makes sense. But having run every combo, futzed with clocks/timings, and still seeing the same issues, I'm off RAM now. I've run several RAM stressing programs, and can't get an error. I'm working on VRAM right now...

joker
03-02-11, 11:27 PM
Time for a 1000W PSU!

Maxwell
03-02-11, 11:34 PM
Time for a 1000W PSU!
He says to someone with a 1000W Gold Rated PSU (http://www.newegg.com/Product/Product.aspx?Item=N82E16817341028) already in there...

And the PSU is running MW right now, so I don't think it's a power issue. ;) Good suggestion, though...

joker
03-02-11, 11:57 PM
You never know when your PCI bus has gone bad.

zombie67
03-03-11, 12:32 AM
I have a similar machine. Similar, in that it a PITA. It's an i7 with dual 5870s, and 6x1gb RAM. It's about a year old now. For the first 6 months, it did like yours is doing. Constantly crashing with some projects, not others. I tried changing GPUs around, tried swapping DIMMs around, replacing DIMMS, less, more, different combos. Nothing made any difference. Then one day, it was stable, so long as I didn't put it on a heavy RAM project. 6 months later, I have been able to slowly add heavy RAM projects, and no problems so far. The lesson learned? There is no god damn rhyme or reason to this stuff sometimes. God hates you, and you have to put up with it for as long as he has his eye on you. Either that or life is completely random, which is even more unnerving. ~X(

joker
03-03-11, 01:57 AM
Yes, Sauron is after your a$$!

zombie67
03-03-11, 02:03 AM
Yes, Sauron is after your a$$!

Heh. Sauron (http://en.wikipedia.org/wiki/Sauron_(comics)) or Sauron (http://en.wikipedia.org/wiki/Sauron)? Either way it's a problem!

Fire$torm
03-03-11, 08:55 AM
Heh. Sauron (http://en.wikipedia.org/wiki/Sauron_(comics)) or Sauron (http://en.wikipedia.org/wiki/Sauron)? Either way it's a problem!

Yep, but you would be much more "screwed" with Tolkien's over Marvel's.

Maxwell
03-09-11, 10:53 PM
And, I'm back to this. Woohoo.

I think I have narrowed it down to GPU RAM (yes, I've said this before...). The reason I say this is that my system is now rock solid stable, except when attempting to run Collatz and/or upping my RAM clocks. I'm currently running at 825/600 on MW (stock being 725/1000). Stable as hell. Temps great, no hiccups or lags or anything. The second I crunch a Collatz WU, or just a few seconds after upping the GPU RAM clocks, I get this:
http://i1194.photobucket.com/albums/aa369/mborneman82/2011-03-08223851.jpg
http://i1194.photobucket.com/albums/aa369/mborneman82/2011-03-08223732.jpg

When I run the "Test Custom Clocks" function from CCC (even with stock settings this happens), I get this:
http://i1194.photobucket.com/albums/aa369/mborneman82/2011-03-08224428.jpg

I have used ATI OverVolt to adjust RAM voltages, but that has a minimal effect, and resets itself after a computer restart anyway.

Any ideas, or should I continue punching myself in the nuts?

joker
03-10-11, 12:53 AM
I hate to be a naysayer but it could be a PCI-E bus problem. Your MB might be at fault here.

Fire$torm
03-10-11, 09:38 AM
Joker has a good point there. The bottom line is you need another system for testing or at least an equivalent GPU and PSU (Known to be good) to swap into your system for comparison. What about taking the funky system or one of the GPUs to work with you and use one of the lab boxes for a test bed?

Maxwell
03-13-11, 07:29 PM
I hate to be a naysayer but it could be a PCI-E bus problem. Your MB might be at fault here.

Joker has a good point there. The bottom line is you need another system for testing or at least an equivalent GPU and PSU (Known to be good) to swap into your system for comparison. What about taking the funky system or one of the GPUs to work with you and use one of the lab boxes for a test bed?
Good news: It's not the motherboard/PCIe slots.
Bad news: It's the video cards.

I took the cards in to work, and tried them out there. Same error, same artifacts, same issues. Happened in combination and when I only put one card in at a time.

Harley has been a champ, and has been PMing me a ton of ideas on how to fix the cards, but I've had no luck so far. Am I at RMA stage, or are there other things I can try? It's weird to me, because these cards work great on MW, but freak out on Collatz.

joker
03-13-11, 07:42 PM
This is now out of league. Good luck. %%-

Shadow
03-13-11, 09:58 PM
I'd take the RMA route if the option is there. I hope they're not Sapphire's, I had an awful time with them recently on an RMA. I think you would have the same problem with MW if their work units ran longer.
Oh yeah, I forgot. You've punched puppies, kicked children petting puppies, and punched yourself in the nuts, but have you kicked a kitten yet?

joker
03-13-11, 10:11 PM
You might have to go to your local petting zoo and kick some goats and sheep in the nuts. X_X

Harley
03-13-11, 10:24 PM
I think its time to grab the Bull by the horns, the Tiger by the tail and kick them both in the nuts.

Maxwell
03-13-11, 10:49 PM
New plan:

1. Grab bull by the horns, and make him immobile.
2. Grab tiger by the tail and put it on top of the bull.
3. Grab child, throw it on top of the tiger.
4. Grab puppy, put it on child.
5. Grab kitten, toss it on the puppy.
6. Chug about 15 "5 hour energy" drinks.
7. Epic kick - get the bull, tiger, child, puppy, and kitten all in the nuts in one kick.
8. RMA the cards once I hit my 1M in MW.
9. Punt a Chihuahua, since no one would think that's a bad idea. And it would be funny.

Stupid cards.

Thanks for all your help, anyone. If anyone has any bright ideas in the next week, I'd love to hear them, since I have a bit before I'm going to send the cards in...

Fire$torm
03-14-11, 06:16 PM
Thanks for all your help, anyone. If anyone has any bright ideas in the next week, I'd love to hear them, since I have a bit before I'm going to send the cards in...

Find the largest living Man-Eating-Shark in the world and kick him in the balls?

joker
03-14-11, 06:27 PM
http://www.youtube.com/watch?v=V9gRIzyQICc&feature=related