TRUNCATE making replicated MySQL table inconsistent

Here at TIM Group we make use of MySQL’s statement-based replication. This means that some functions, like UUID and LOAD_FILE, cannot be used when we write code or do manual maintenance because they break the consistency of the slaves. We now know we have to add the TRUNCATE statement to the list of things not to use. This is how it went…

One of the applications we have is used for background jobs: you send it an HTTP request to have a job started and then you look at its database for the results.

We needed the result of running one of these jobs for some analysis. The instructions looked like:

1) TRUNCATE table job_results
2) send request to /jobs/new
3) extract content of job_results table

We started following the instructions but after a few minutes of staring at the terminal waiting for the request to complete we thought that the request had a problem. We hit Ctrl-C, killing the request, ran TRUNCATE job_results again and restarted the request.

Our second attempt was faster to respond: after a few minutes it returned an error.

And after a few more minutes the two slave databases started reporting an error with statement-based replication. They could not insert a record in the table job_results because the primary key was duplicate.

The error looked very odd since the primary key was not specified in the query.

We decided to exclude this table and restart the replication process and then proceed with an investigation on why the system had failed.

The key tool for the investigation was mysqlbinlog that can be passed a binlog file and will print the statements contained in it.

In theory the binlog on the master contains all statements that were executed so executing the same statements again, starting from the same state, will result in the slave having the same data but this time we were in for a few surprises…

The error reported from the slaves (both, more or less at the same time) looked like this:

Error ‘Duplicate entry ‘1945442’ for key ‘PRIMARY’ on query. Default database: ‘analysis’.
Query: ‘INSERT INTO job_results (`date`,`value`) VALUES
(‘2014-10-02′,-0.007722666880906233367143014953252257)’

How is this even possible? The primary key (an INT NOT NULL AUTO_INCREMENT) does not have a value specified in the query. Is the auto_increment counter wrong in the table?

By looking at the statements in the binlog on the master we found this (this is cleaned up a bit from what mysqlbinlog would actually output):

SET INSERT_ID=1945442;
INSERT INTO job_results (`date`,`value`) VALUES
('2011-04-26',-0.006699993100483214013976571543498656);

and later on also this:

SET INSERT_ID=1945442;
INSERT INTO job_results (`date`,`value`) VALUES
('2014-10-02',-0.007722666880906233367143014953252257);

The statement “SET INSERT_ID” (for which we could not find any documentation on the official site) overrides the auto_increment assignment. We can exclude the counter being wrong on the slave but we are left with an even bigger question mark: why does the binary log contain an operation that would have raised an error if it happened? Isn’t that the log of what actually happened on the master?

We suspected the problem had to do with the TRUNCATE command we had issued so we checked and we found no instances of that between the two statements but we did find one right before the first INSERT.

We had a pretty good hypothesis at that point and we verified it with an experiment:

Preparation:

1) enable binary logging (by setting log_bin in my.cnf)
2) create table test (id int not null auto_increment, value varchar(10) not null, primary key(id));

Open two mysql consoles, on the first one do:

begin;
insert into test (value) values ('A');
insert into test (value) values ('B');
commit;

begin;
insert into test (value) values (‘C’);

In the second one:

truncate test;

And back in the first one:

insert into test (value) values ('D');
commit;

Looking at the table now you would find:

mysql> select * from test;
+----+-------+
| id | value |
+----+-------+
|  1 |     D |
+----+-------+

But what is saved in the binary log?

create table test (id int not null auto_increment, value varchar(10) not null, primary key(id))

BEGIN
SET INSERT_ID=1;
insert into test (value) values (‘A’)
SET INSERT_ID=2;
insert into test (value) values (‘B’)
COMMIT;

BEGIN
truncate test
COMMIT;

BEGIN
SET INSERT_ID=3;
insert into test (value) values (‘C’)
SET INSERT_ID=1;
insert into test (value) values (‘D’)
COMMIT;

Which once executed makes for an additional record:

mysql> select * from test;
+----+-------+
| id | value |
+----+-------+
|  3 |     C |
|  1 |     D |
+----+-------+

The record (3, C) exists only on the slave so it will break replication as soon as a line with the same id is inserted (in addition to breaking the consistency of the replica) and is caused by the TRUNCATE statement being recorded as having happened between the transactions.

So running TRUNCATE while some INSERT are happening will cause this problem but hadn’t we stopped the process with Ctrl+C?

Well, we did stop one process but unfortunately it was just the curl command creating the request, the process on the server was still running when we ran TRUNCATE.

Lessons learned (or at least rehashed):

  • Do not use TRUNCATE on replicated tables
  • Experimenting with mysqlbinlog was easy compared to understanding the relevant documentation.
  • Providing feedback for long running operations is a must!

Leave a Reply

Your email address will not be published. Required fields are marked *