PDA

View Full Version : Collatz Database Corrupt



Slicker
07-12-13, 12:35 PM
While I was working on getting the new Collatz server configured, I realized that the current server was down again. I rebooted it but couldn't connect due to a MySQL corruption. How corrupt? It won't start. If it won't start, you can't repair it. Nice. The only option is to wipe out all the MySQL data and restore from backups. The good news is that the backup ran less than 15 minutes prior to the server going down, so very little data will be lost. But, given the issues lately, I'm very tempted to just restore it to the new server. That will take a little longer as I have to do a bunch of other configuration stuff as well, but it will be more stable in the end. Assume it will be down all weekend. If I can't get it running by Monday, it may be down for another whole week (family reunion and we are the lucky ones hosting it this year).

Slicker
07-13-13, 10:58 PM
Doh! I thought the database had been backed up 15 minutes prior to the crash. Not so. The backup happened around 2 a.m. and the system crashed around 6 a.m. That means that the database vs. the file system are 4 hours out of sync. Give 55K WUs completed per day, that means thousands of files out of whack. Once I get it all configured, I'll have to write some custom one-time code to compare the files in the WU folders to the records in the database and somehow create the WU records for the ones which aren't in the database. Then I'll need to do the same for any result files. It could be a very long week.

zombie67
07-14-13, 08:50 AM
Ouch! Good luck!

Slicker
07-14-13, 07:45 PM
From the looks of things, there are 4500 WUs that were reported, validated, and deleted that the server (from the backup file) thinks are missing. That probably means there were another 4500 WUs created and sent out that the server also doesn't know it did and worse, doesn't know who it sent them to. This is going to be fun to try and figure out...

Duke of Buckingham
07-14-13, 07:47 PM
From the looks of things, there are 4500 WUs that were reported, validated, and deleted that the server (from the backup file) thinks are missing. That probably means there were another 4500 WUs created and sent out that the server also doesn't know it did and worse, doesn't know who it sent them to. This is going to be fun to try and figure out...

http://manicdepressiveblog.files.wordpress.com/2011/05/confusion_11.jpg