PDA

View Full Version : BOINC tasks failures



zombie67
03-20-15, 10:01 AM
Over the past several days, two of my machines have started spitting out errors, "196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED (http://boincfaq.mundayweb.com/index.php)". These are my two i7-3930K machines. Both have plenty of disk space available, hundreds of gigs free.

Here are a few examples from different projects:

http://www.vdwnumbers.org/vdwnumbers/result.php?resultid=904881
http://setiathome.berkeley.edu/result.php?resultid=4043259649
http://escatter11.fullerton.edu/nfs/result.php?resultid=42179400

Any idea what's going on here? It can't be just a coincidence that both of my i7-3930K machines started doing this at the same time, and none of my other windows machines or VMs.

Maxwell
03-20-15, 10:51 AM
Two things to check, as I've run into this error a few times:

1. Check the BOINC settings. Make sure it's not one of those "forehead slap" moments where your machine has hundreds of Gigs of HDD space, but you're only allowing BOINC 10GB or something like that.
2. Check the disk usage of each project. I've had a runaway project (one of the QCN Continuals, IIRC) that errored out and was eating several hundred gigs of space. Detach reattach fixed that one.

zombie67
03-20-15, 11:53 AM
Two things to check, as I've run into this error a few times:

1. Check the BOINC settings. Make sure it's not one of those "forehead slap" moments where your machine has hundreds of Gigs of HDD space, but you're only allowing BOINC 10GB or something like that.

Yeah, I checked the settings. Nothing abnormal.

Leave at least GB free = .5
Use no more than % of total = 99%

And if that was the problem, it would be hitting all my other machines too. I cleared local settings just incase, with no change.


2. Check the disk usage of each project. I've had a runaway project (one of the QCN Continuals, IIRC) that errored out and was eating several hundred gigs of space. Detach reattach fixed that one.

I still have 800gb free on the HDs. If it was a run-away project, the HD would be full. So I don't think that is the problem.

FWIW, windows says the drives have no problems.

Any more issued or suggestions?

Bryan
03-20-15, 12:15 PM
My suggestion would have been the same as Maxwell's. I had 6 3930s on NFS and didn't have that error happen so I'm inclined to think it is a setting.

Maxwell
03-20-15, 01:19 PM
What is the setting for "Use at most XXX Gigabytes of disk space"? BOINC seems to have some redundant settings that conflict with each other...

zombie67
03-20-15, 01:29 PM
What is the setting for "Use at most XXX Gigabytes of disk space"? BOINC seems to have some redundant settings that conflict with each other...

Use no more than --- GB

It is blank, which means unlimited. But again, if there was a problem with the settings, it would be impacting all my machines.

Maxwell
03-20-15, 01:33 PM
You're not making this easy, zombie. I was hoping for a quick solution. Oy.

Try clicking over on the "Disk" tab of the BOINC manager. On that left pie chart, what does it say for "free, available to BOINC:" and "used by BOINC:"?

zombie67
03-20-15, 01:41 PM
https://dl.dropboxusercontent.com/u/55884901/duskusage.png

Maxwell
03-20-15, 01:55 PM
OK. I googled the error, and every indication is that has to be some setting somewhere. You've clearly checked those, so I'm going to start throwing out hail marys...

1. Have you restarted the computers? (and are these dedicated crunchers?)
2. Are the VMs on this machine that could be reserving large chunks of the HDD without using it?
3. Do you have enough RAM in here that it's reserving "virtual RAM" that's claiming a huge chunk of the HDD?
4. Do you have high-GPU-RAM cards in there that are reserving huge chunks of virtual RAM?
5. Some combination of 2, 3, and 4?

I seem to recall that Bryan "lost" some ginormous chunk of his HDD, and it took him a while to track down the culprit (but I can't recall what the culprit was).

zombie67
03-20-15, 02:03 PM
I have rebooted the machines several times now, and they each have 32gb of RAM.

I am beginning to suspect it is one of the projects I am running. And frankly, I suspect SETI. I am running it on ATI cards with the optimized seti@home app, only on those two machines. My other windows machine running seti@home GPU is nVidia optimized. All the other machines running SETI are Macs, using the stock app, and only astropulse.

Maxwell
03-20-15, 02:10 PM
I am beginning to suspect it is one of the projects I am running. And frankly, I suspect SETI. I am running it on ATI cards with the optimized seti@home app, only on those two machines. My other windows machine running seti@home GPU is nVidia optimized. All the other machines running SETI are Macs, using the stock app, and only astropulse.
Oh fun. Might be worth posting the app_info file to see if we can spot the setting in there...

zombie67
03-20-15, 02:26 PM
Update: Started happening on my i7-3770K windows machine with the nVidia card. I have stopped SETI work on all three now, and thinks seem to be working properly. Stay tuned.

It is not any of the config file, as this just started happening, and I haven't changed anything since August. I am thinking maybe one of the recent windows updates clashes with this app, or the drivers, or something. Nothing else has changes on the machines.

zombie67
03-20-15, 08:16 PM
it came back after a while. Another couple things in common with the three machines:

- only these three are running BU, 3x rbox 32 each
- only these three are running the new LHC VM. Maybe there is something different about the win version? I know there are bugs specific to the Mac and Linux versions of the app.

zombie67
03-21-15, 11:32 AM
Well, I am completely perplexed and frustrated. But at least I have it solved for now. I had to go back to an old version of BOINC, 7.2.42. No more errors. The only problem is that this version is before the BU modifications were added. So each of the BU tasks are back to taking a full CPU thread. Ugh.

myshortpencil
03-21-15, 05:13 PM
Were you running 7.4.36 before the reversion?

zombie67
03-21-15, 05:25 PM
Yeah. Started at 7.4.36, then the latest, then back to 7.2.42.