We had a few hours downtime over the last weekend, and I thought some information about it may help others with resolving possible similar errors that may happen in their own systems.
So, taking the clock back to just after midday Saturday, our warning monitors alert us that the main website has gone down. A quick look at the status terminal shows that we have had a kernel panic on the main website server, and the machine is locked. We put in a request for a reboot with the hosting company, and within minutes the system has been power cycled, and everything is running again.
This is where the work begins!
The big problem, we note, is that our backup slave mysql database on a different server in a different building is not receiving replication updates. It just sits there, and stopping and starting the slave achieves nothing. Checking the logs we find the following:
090328 14:37:53 [Note] Slave: connected to master 'repl@87.117.204.74:3306',replication resumed in log 'mysql-bin.000057' at position 263780655
090328 14:37:53 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
Hrm, doesn’t look good.
A little bit of information about mysql replication. It works by storing binary logs on the server, logs of commands executed, so that a slave can execute the same commands and have a database in the same state. The commands are sent via a TCP/IP network connection from master to slave.The slave keeps track of where it is and the master sequentially sends the commands to it.
Now, when you power cycle a machine (in this case when the system kernel panics and doesn’t respond to any typed commands, there isn’t much to do other than power cycle it), you do not always end up with everything saved to disc. To improve speed, a number of systems will cache data in memory instead of writing it to disc, and then write in chunks when it is efficient to do so. The problem comes that if some data is being held in memory in preparation of writing to disc, then that data will be lost if you just cut the power. It cannot magically write the data to the disc, it has no electricity to do it. Data will be lost. There are ways to stop this happening, but that is the topic for another day. The fact is that on any modern Linux distro, this will happen.
And so skip to the mysql replication, and the slave is requesting to start its replication from a point that the master said does not exist. This is simply because the data was never written to the physical disc, and the processes before reboot were accessing data that the kernel was caching in memory.
So, mysql replication becomes a victim of efficiency. When this happens, there are two ways to go about fixing it. First, we could just say ’start from the next valid position’ in the binary logs. This would get us up and running, but there is ‘a risk of data loss’ that we deemed to be unacceptable in a backup database. The data loss would be from any transactions that have been made but never written to the binary log because of the power outage. The second option, and the one we choose to take, is the option of creating an entire dump of the mysql database and starting replication again from a known good position.
So, we do this. Taking a dump of the database requires our web and email to be down for an hour, and that bites, but it is really the only option. We dump the database, restart the services, and then, we’re done, it should now be fine. All that is left now is to rebuild the slave from the masters data, and start replication running again.
Or that is how it should have gone.
We transfer the dump across, and run a rebuild of the database. All goes well for a few hours (a rebuild can take a LONG time as it rebuilds indexes on huge tables), until suddenly….
ERROR at line 1963239: Unknown command '\"'.
This was obviously not what we expected. Importing the dump again revealed the same problem, and we realised there was a problem we hadn’t encountered before.
We did all the usual checks, ensuring the database versions were the same, ensuring the dump on the server and master were the same using md5sum, and all seemed OK, except that the database dump wouldn’t import.
We were left wondering if something was wrong with mysqldump.
To be sure, we did another dump. The site had been running for a few more hours and so the database would be a little different than when we did the last dump. Stopping the servers for another hour, and repeating the process. The second time we had a different error message, but a fatal one nonetheless. And the same a third time.
We were becomming convinced that the database was dumping invalid information, and that if that was the case, we were in a bit of trouble because, well, you cannot really go through a dump in excess of 20GB by hand to look for errors. The line numbers of errors were worse than useless, with some of the lines being many MB in size.
A process of methodical deduction finally solved the problem. We spotted that /var/log/messages was showing a number of ATA errors as we were doing the dump. Not encouraging! So, we tried the dump on another hard drive, dumping once onto the location we had been writing to all afternoon, and at the same time dumping to a location on a different drive. The location we had been writing to was significantly different, when we diffed the two files, than the new location. And upon loading the new data dump into the database, suddenly we had a working slave again.
So, the mystery was solved. The data we were dumping was being written onto a hard drive that was seeming to just change random bits on its files. Never a good thing. We were left thinking that we would have had a MUCH easier time if mysql would check the integrity of the files it dumped.
So now this leaves us back up and running, but with some things to look at. We feel that mysql desperately needs to be able to check the integrity of files on a disc, to be sure that a dump, whether from mysqldump, or the binary logs, is valid. Without that the dump is just pointless.
As such, for one of our next contributions to the open source community we will be looking at either adding ourselves, or setting up a bounty for someone else, to add this useful piece of integrity checking, which we feel is vital to mysql. More information to follow later on this, once we have worked out some of the details!