PDA

View Full Version : Collatz is down



EmSti
01-19-14, 11:38 AM
Looks like things stopped about 20:35 Eastern.

Slicker
01-20-14, 10:33 AM
Naturally,I'm away for the weekend taking advantage of the MLK holiday. Looks like it is server related as I can VPN in to my home network but there's no response to ping or rsh to the Collatz server. That's the good news. The bad news is that the off site backups don't appear to have worked for a while so I hope the backup (a.k.a. previous Collatz server) has a good copy from the nightly job and/or that I can get the current server back up and running w/o losing much data. I'll know more after I get back home later this evening.

Duke of Buckingham
01-20-14, 10:41 AM
I have 2 tasks of Collatz that are now outdated, should I abort them or let it go? :confused:

They were finished on time but the fact Collatz have been down for so kong surpassed the expiration date. :((

shiva
01-20-14, 11:43 AM
I have 2 tasks of Collatz that are now outdated, should I abort them or let it go? :confused:

They were finished on time but the fact Collatz have been down for so kong surpassed the expiration date. :((I would keep them, I think you will get credit for them

Slicker
01-20-14, 12:18 PM
The server is two hundred miles away and 100% unresponsive. I don't have a magic crystal ball with me to read the future so I have no idea whether it can be repaired or whether I'll need to set up a new BOINC server from scratch. Until I figure that out, it would be like me trying to guess how much Obama will raise my taxes next year. I just really don't know.

Duke of Buckingham
01-20-14, 12:37 PM
The server is two hundred miles away and 100% unresponsive. I don't have a magic crystal ball with me to read the future so I have no idea whether it can be repaired or whether I'll need to set up a new BOINC server from scratch. Until I figure that out, it would be like me trying to guess how much Obama will raise my taxes next year. I just really don't know.

I hope computers are more trustable than politicians or are they lying to me for all those years?

The computers of course because about the politicians I have no doubth that they are lying, they never learned how to speak the truth anyway.

EmSti
01-20-14, 12:45 PM
I know the feeling - how long how long how long....

Once asked a boss to step out into the hallway and away from where we were troubleshooting so we could talk. Then closed the door when he went through the doorway and we went back to work, in peace. To his credit, he took the hint and left us be.

Good luck with it, hope it turns out well for us, if not oh well. It has only been a few days.

Slicker
01-20-14, 10:55 PM
The server drives are pretty much toast. Only one is recognized at boot and that doesn't help much when it is in a raid 5 array. Having just replaced two of them a several months back, I'm thinking it may also be controller related so the controller and all drives are being replaced. I'll probably add a couple hot spares this time just to be safe. Unfortunately, my 31 year old furnace is being replaced tomorrow so I can't run over to MicroCenter, TigerDirect, or CDW to pick up the new hardware until that's done. Also, Docsis 2 to Docsis 3 upgrade should also take place between now and the time the server is ready to go back online so the network speed should be better when the server does come back online. Regardless, it will probably be a couple weeks to rebuild the server and get all the BOINC and Collatz software rebuilt/re-installed.

shiva
01-20-14, 11:19 PM
ooffff well, one can only do what one can. Good luck with the rebuild Slicker!

EmSti
01-20-14, 11:46 PM
I have 130 gpu wus left to crunch, should I suspend them now or crunch them until done?

I have 1850+ gpu wus completed and waiting to upload. I will just let them sit until servers are back online and see what happens.

shiva
01-21-14, 12:24 AM
I'm just going to let mine run, don't know how many I have but Slicker will do everything he can to get everyone their credit.

Slicker
01-21-14, 09:12 AM
My suggestion would be to abort any WUs that aren't already completed. I should know within the next day or two what to do with the completed ones. That all hinges on how recent a restore I can do which isn't looking real good right now. Looks like the disk corruption occurred over several days getting worse and worse so the most recent backups may not be any good. Now it comes down to whether I can read the workunit or results tables from the bad drive and whether the user and host records associated with the workunit and results can also be read. If I were a betting man, I'd bet on something else.

shiva
01-21-14, 01:13 PM
well I think I will take the advise of the master :) at this point. I will be wishing you well through your current challenges.

STE\/E
01-21-14, 01:21 PM
Just give us all a Billion Credits to make up for any loss of Credits ... :D

Bryan
01-21-14, 02:12 PM
Just give us all a Billion Credits to make up for any loss of Credits ... :D

+1 I like the way you think and I'm sure I had about that amount in my cache =))

conf
01-23-14, 12:43 AM
Have about 100 WUs completed and they all try to upload and the deadline is near.
Maybe we can get credits for the WUs sent from Collatz before it has gone down.
I really want to get them out of my uploads cause I dont see anything else and it slows the system.

EmSti
01-23-14, 08:14 AM
Posted on the collatz site: "The Collatz server is currently down due to a hard drive controller failure. New hardware should arrive in the next day or two. A complete rebuild of the operating system, BOINC, and the Collatz software and project setup will then need to be done prior to the project coming back online. As this will be a complete rebuild, there is no sense hanging on to any completed workunits at this time. So, go ahead and abort any workunits you have whether they are complete or not. I expect the project will remain down for the next 10-15 days while I set it all back up and test it. Sorry for any inconvenience. And yes, I will make up for the lost credits once everything is back up and running. "

Fire$torm
01-23-14, 11:38 AM
Hey,

FYI: Some of our newer members may not realize that our team member Slicker is the project admin of Collatz. He always posts here on the forums regarding any issues with Collatz, when RL allows.

Duke of Buckingham
01-23-14, 12:28 PM
Slicker ... Collatz ??? :confused:

Sorry for all the bad words I have posted, I am new in here ...:o)

Joking of course, I really hope you can solve this problem as soon as possible for us to get to our usually crunching at Collatz Slicker. :o

Everyday I go the project to see if it came back, I had abort some tasks when the project had problems. It were my computer problems maybe my Hard Drive that is pushing to much from the PSU and all system is coming down. I have now a new PSU and hope to get a decent Hard Drive as soon I understand what is the best I can achieve. :p

STE\/E
02-17-14, 02:25 AM
Project is down again, must have been down for 6 Hr's now, I've only got about an Hr left to run then will have to switch back over to Moo on that Box ...

STE\/E
02-17-14, 11:34 AM
Back up again ... :)

Fire$torm
02-17-14, 04:48 PM
I think Slicker did an update to the apps based on that survey he had posted on the Collatz forums. Hence the reason for shutting down the server.

Slicker
02-19-14, 01:55 AM
I think Slicker did an update to the apps based on that survey he had posted on the Collatz forums. Hence the reason for shutting down the server.

It also does nightly backups, but that shouldn't take more than a few minutes. Every few weeks I export the results and re-optimize the database tables as MySQL pretty much sucks as as soon as it grows to a size greater than the physical RAM. (If it has to reside in RAM, how then to me it isn't a real database. A real database should be able to be 10-50 times the size of physical RAM and still run OK. Oracle and SQL Server can both do that. MySQL doesn't even come close - at least not if you more than 3 concurrent users.)

Fire$torm
02-20-14, 09:40 AM
I'm having a problem, can't get any nVidia work on any of my boxes.... :((

Never Mind... PEBKAC

EmSti
02-20-14, 09:54 AM
I am in the same boat. My problem started right after the preferences saving was fixed. After the fix, collatz sent AMD instead of Nividia (there is an AMD, but it is doing DiRT). Fixed the preferences from all to Nividia/cuda, aborted the AMD wus and nothing since. I have waited over 24 hours now, thought I may have pissed the servers with the abort. Attempting reattach now.

EmSti
02-20-14, 09:55 AM
Reattach didn't work.

1027 2/20/2014 9:50:41 AM Fetching configuration file from http://boinc.thesonntags.com/collatz/get_project_config.php
1028 http://boinc.thesonntags.com/collatz/ 2/20/2014 9:50:49 AM Master file download succeeded
1029 http://boinc.thesonntags.com/collatz/ 2/20/2014 9:50:54 AM Sending scheduler request: Project initialization.
1030 http://boinc.thesonntags.com/collatz/ 2/20/2014 9:50:54 AM Requesting new tasks for CPU and NVIDIA and ATI
1031 Collatz Conjecture 2/20/2014 9:50:56 AM Scheduler request completed: got 0 new tasks
1032 Collatz Conjecture 2/20/2014 9:50:56 AM No tasks sent
1033 Collatz Conjecture 2/20/2014 9:50:56 AM Tasks for CPU are available, but your preferences are set to not accept them
1034 Collatz Conjecture 2/20/2014 9:50:56 AM Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
1035 Collatz Conjecture 2/20/2014 9:50:56 AM Tasks for Intel GPU are available, but your preferences are set to not accept them
1036 Collatz Conjecture 2/20/2014 9:50:56 AM New computer location: school

FourOh
02-20-14, 09:57 AM
I'm having a problem, can't get any nVidia work on any of my boxes.... :((

Can't get AMD work either boooooo. Guess I'll work on Milkyway.

Al
02-20-14, 10:02 AM
I'm getting no task available too. Fortunately I have 2 days on board.

EmSti
02-20-14, 10:09 AM
hhhmmmm collatz server status is all green with tasks to send.

Fire$torm
02-20-14, 10:22 AM
My solution was to allow for all forms of work and allowed "If no work for selected applications is available, accept work from other applications?" All cards have plenty of work atm.

Edit: Btw, It's all OpenCL work.

EmSti
02-20-14, 10:23 AM
Problem solved: I spooted this comment from Slicker - "I found a bug that when multiple venues existed, if any venue was set to ignore a plan class, then all venues would ignore the plan class. A fix has been released. Let me know if you run into any issues."

That comment came in after my last update on preferences. I deleted all my venues. Changed the default to Nvidia/cuda only, changed agents to default, did project update and I got wus again. I would recommend deleting as reseting you venues.

Fire$torm
02-20-14, 10:30 AM
Problem solved: I spooted this comment from Slicker - "I found a bug that when multiple venues existed, if any venue was set to ignore a plan class, then all venues would ignore the plan class. A fix has been released. Let me know if you run into any issues."

That comment came in after my last update on preferences. I deleted all my venues. Changed the default to Nvidia/cuda only, changed agents to default, did project update and I got wus again. I would recommend deleting as reseting you venues.

Ah, good to know. Thx for the info.

Fire$torm
03-10-14, 04:47 PM
Damn, server is down. Slicker isn't having much luck with that puppy as of late. Hope it's not serious.

cineon_lut
03-11-14, 01:52 AM
Yeah, bummer. App wise it's one of the (if not THE) best written, but the server ain't giving much love this month.


Vic (mobile)

zombie67
04-09-14, 09:55 AM
Looks like collatz has been down since sometime yesterday.

Slicker
04-09-14, 02:18 PM
Looks like collatz has been down since sometime yesterday.
Something is going on with the overnight backups that causes MySQL to not allow connections after it is done. The backup completes but I have to stop and restart mysql in order for it to allow multiple connections. Because of that, the BOINC daemon don't start and since they don't start, the project stays in maintenance mode. This appears to be my year for having every computer I own give me multiple problems. I just got done rebuilding my i7 desktop (hard drive failure) and didn't even have all the software loaded when my i7 laptop screen died. The replacement arrived today and is the wrong fracking one! No backlight and while the resolution is supposed to be HD (and it might be), the frequency is screwed up such that the screen shows up 6 times each about 2" tall. Now the Collatz server is adding to my woes. Maybe I should start mowing lawns for a living. Oh wait. All three of my lawn mowers are currently broken. At least that isn't an issue yet due to the non-stop polar vortex we've had for the last 4 months.

Bok
04-09-14, 02:42 PM
How are you doing the backups? And what if anything does the mysqld.log say? Are you using persistent connections?