This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Binlog event checksums
Title
Task ID180
Queue
Version
Status
Priority
Copies toKnielsen
Sergei

Created byMdcallag13 Mar 2011Done
Supervisor   
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
We want a checksum attribute per binlog event that can be used to detect a corrupt 
event. At some point the slave should validate the checksum such as prior to 
applying it in the SQL thread. It might be useful to have the IO thread confirm 
the checksum before writing the event to the relay log. mysqlbinlog should also be 
able 

See:
http://forge.mysql.com/worklog/task.php?id=2540
http://bugs.mysql.com/bug.php?id=25737

This worklog 2540 is pushed to mysql-trunk for inclusion in MySQL 5.6. The
plan for this MWL#180 is to back-port the MySQL 5.6 implementation into
MariaDB.
 Task Dependencies
Others waiting for Task 180Task 180 is waiting forGraph
 
 High-Level Specification
Here is suggested content for documentation for this worklog.

=======================================================================
MariaDB 5.3 includes a feature to include a checksum in binlog events. This is
a backport from the same feature in MySQL 5.6.

Checksums are enabled with the option binlog_checksum:

 * Variable Name: binlog_checksum
 * Scope: Global
 * Access Type: Can be changed dynamically
 * Data Type: bool
 * Default Value: OFF

The variable can be changed dynamically without restarting the
server. Setting the variable in any way (even to the existing value) forces a
rotation of the binary log (the intention is to avoid having a single binlog
where some events are checksummed and others are not).

When checksums are enabled, replication slaves will check events received over
the network for checksum errors, and will stop with an error if a corrupt
event is detected.

In addition, the server can be configured to verify checksums in two other
places.

One is when reading events from the binlog on the master, for example
when sending events to a slave or for something like SHOW BINLOG EVENTS. This
is controlled by option master_verify_checksum, and is thus used to detect
file system corruption of the binlog files.

The other is when the slave SQL thread reads events from the relay log. This
is controlled by option slave_sql_verify_checksum, and is used to detect file
system corruption of slave relay log files.

 * Variable Name: master_verify_checksum
 * Scope: Global
 * Access Type: Can be changed dynamically
 * Data Type: bool
 * Default Value: OFF

 * Variable Name: slave_sql_verify_checksum
 * Scope: Global
 * Access Type: Can be changed dynamically
 * Data Type: bool
 * Default Value: ON

The mysqlbinlog client program by default does not verify checksums when
reading a binlog file, however it can be instructed to do so with the option
verify-binlog-checksum:

 * Variable Name: verify-binlog-checksum
 * Data Type: bool
 * Default Value: OFF


Interoperability with older versions of MySQL and MariaDB
---------------------------------------------------------

The introduction of checksums on binlog events changes the format that events
are stored in binlog files and sent over the network to slaves. This raises
the question on what happens when replicating between different versions of
the server, where one server is a newer version that has the binlog checksum
feature implemented, while the other server is an older version that does not
know about binlog checksums.

When checksums are disabled on the master (or the master has the old version
with no checksums implemented), there is no problem. In this case the binlog
format is backwards compatible, and replication works fine.

When the master is a newer version with checksums enabled in the binlog, but
the slave is an old version that does not understand checksums, replication
will fail. The master will disconnect the slave with an error, and also log a
warning in its own error log. This prevents sending events to the slave that
it will be unable to interpret correctly, but means that binlog checksums can
not be used with older slaves. (With the recommended upgrade path, where
slaves are upgraded before masters, this is not a problem of course).

Replicating from a new MySQL master with checksums enabled to a new MariaDB
which also understands checksums works, and the MariaDB slave will verify
checksums on replicated events.

There is however a problem when a newer MySQL slave replicates against a newer
MariaDB master with checksums enabled. The slave server looks at the master
server version to know whether events include checksums or not, and MySQL has
not yet been updated to learn that MariaDB does this already from version
5.3.0 (as of the time of writing, MySQL 5.6.2). Thus, if MariaDB at least
version 5.3.0 but less that 5.6.1 is used as a master with binlog checksums
enabled, a MySQL slave will interpret the received events incorrectly as it
does not realise the last part of the events is the checksum. So replication
will fail with an error about corrupt events or even silent corruption of
replicated data in unlucky cases. This requires changes to the MySQL server to
fix.

Here is a summary table of the status of replication between different
combination of master and slave servers and checksum enabled/disabled:


    OLD: MySQL <5.6.1 or MariaDB < 5.3.0 with no checksum capabilities
    NEW-MARIA: MariaDB >= 5.3.0 with checksum capabilities
    NEW-MYSQL: MySQL >= 5.6.1 with checksum capabilities


    Master      Slave /     Checksums   Status
		mysqlbinlog enabled?
    OLD         OLD         -           Ok
    OLD         NEW-MARIA   -           Ok
    OLD         MYSQL       -           Ok
    NEW-MARIA   OLD         No          Ok
    NEW-MARIA   OLD         Yes         Master will refuse with error
    NEW-MARIA   NEW-MARIA   Yes/No      Ok
    NEW-MARIA   NEW-MYSQL   No          Ok
    NEW-MARIA   NEW-MYSQL   Yes         Fail. Requires changes in MySQL,
					otherwise it will not realise
					MariaDB < 5.6.1 does checksums
					and will be confused.
    NEW-MYSQL   OLD         No          Ok
    NEW-MYSQL   OLD         Yes         Master will refuse with error
    NEW-MYSQL   NEW-MARIA   Yes/No      Ok
    NEW-MYSQL   NEW-MYSQL   Yes/No      Ok


For using the mysqlbinlog client program, there are similar issues.

A version of mysqlbinlog that understands checksums can read binlog files from
either old or new servers of course, with or without checksums enabled.

An old version of mysqlbinlog can read binlog files produced by a new server
version, if checksums were disabled when the log was produced. However, old
versions of mysqlbinlog reading a new binlog file containing checksums will be
confused, and output will be garbled, with the added checksums being
interpreted as extra garbage at the end of query strings and similarly. No
error will be reported in this case, just wrong output.

A version of mysqlbinlog from MySQL >= 5.6.1 will have similar problems as a
slave until this is fixed in MySQL. Reading a binlog file with checksums
produced by MariaDB >= 5.3.0 but < 5.6.1, it will not realise that checksums
are included, and will produce garbled output just like an old version of
mysqlbinlog. However, the MariaDB version of mysqlbinlog can read binlog files
produced by either MySQL or MariaDB just fine.
 Low-Level Design
Carefully checking the bzr history of mysql-trunk, this seems to extract the
patch for MySQL WL#2540:

bzr diff
'-rrevid:alfranio.correia@oracle.com-20101121143257-se3vpqus73l4mum0
..revid:luis.soares@oracle.com-20101124111752-9b8260bd1qak87hr'
--old=lp:mysql-server --new=lp:mysql-server
 File Attachments
 NameTypeSizeByDate
 User Comments
(Mdcallag - 2011-03-13 09:22:04
    https://code.launchpad.net/~jtolmer/mysql-server/global-trx-ids

(Mdcallag - 2011-03-13 09:22:30
    Code is in https://code.launchpad.net/~jtolmer/mysql-server/global-trx-ids

(Mdcallag - 2011-03-18 20:50:28
    Code might also be in 5.6 per http://bugs.mysql.com/bug.php?id=25737
 Time Estimates
NameHours WorkedLast Updated
Knielsen5709 May 2011
Total57 
 Hrs WorkedProgressCurrentOriginal
This Task57040
Total57040
 
 Funding and Votes
Votes: 0: 0%
 Make vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Knielsen - Mon, 09 May 2011, 05:53
    
Work out a way to interoperate, and implement it.
Do a complete review of the merged patch, fixing some minor things found.
Do some additional testing on top of mysql-test-run:
 - Run RQG replication tests with checksum checks enabled.
 - Carefully test interoperability between different versions.
Fix all failures seen in Buildbot.
Write up content for documentation for the new feature, including
interoperability matrix.
Push to main MariaDB 5.3.
Worked 19 hours and estimate 0 hours remain (original estimate increased by 8 hours).

(Knielsen - Sun, 08 May 2011, 05:11
    
Version updated.
--- /tmp/wklog.180.old.18682	2011-05-08 05:11:54.000000000 +0000
+++ /tmp/wklog.180.new.18682	2011-05-08 05:11:54.000000000 +0000
@@ -1,2 +1,2 @@
-9.x
+Server-5.3
 

(Knielsen - Sun, 08 May 2011, 05:11
    
Status updated.
--- /tmp/wklog.180.old.18682	2011-05-08 05:11:54.000000000 +0000
+++ /tmp/wklog.180.new.18682	2011-05-08 05:11:54.000000000 +0000
@@ -1,2 +1,2 @@
-Assigned
+Complete
 

(Knielsen - Sun, 08 May 2011, 05:11
    
Supervisor updated:  -> Knielsen
Lead Architect updated:  -> Knielsen
Architecture Review updated:  -> Knielsen
Code Review updated:  -> Knielsen
QA updated:  -> Pstoev
Documentation updated:  -> Knielsen

(Knielsen - Fri, 06 May 2011, 08:56
    
High-Level Specification modified.
--- /tmp/wklog.180.old.20636	2011-05-06 08:56:06.000000000 +0000
+++ /tmp/wklog.180.new.20636	2011-05-06 08:56:06.000000000 +0000
@@ -1,2 +1,142 @@
+Here is suggested content for documentation for this worklog.
+
+=======================================================================
+MariaDB 5.3 includes a feature to include a checksum in binlog events. This is
+a backport from the same feature in MySQL 5.6.
+
+Checksums are enabled with the option binlog_checksum:
+
+ * Variable Name: binlog_checksum
+ * Scope: Global
+ * Access Type: Can be changed dynamically
+ * Data Type: bool
+ * Default Value: OFF
+
+The variable can be changed dynamically without restarting the
+server. Setting the variable in any way (even to the existing value) forces a
+rotation of the binary log (the intention is to avoid having a single binlog
+where some events are checksummed and others are not).
+
+When checksums are enabled, replication slaves will check events received over
+the network for checksum errors, and will stop with an error if a corrupt
+event is detected.
+
+In addition, the server can be configured to verify checksums in two other
+places.
+
+One is when reading events from the binlog on the master, for example
+when sending events to a slave or for something like SHOW BINLOG EVENTS. This
+is controlled by option master_verify_checksum, and is thus used to detect
+file system corruption of the binlog files.
+
+The other is when the slave SQL thread reads events from the relay log. This
+is controlled by option slave_sql_verify_checksum, and is used to detect file
+system corruption of slave relay log files.
+
+ * Variable Name: master_verify_checksum
+ * Scope: Global
+ * Access Type: Can be changed dynamically
+ * Data Type: bool
+ * Default Value: OFF
+
+ * Variable Name: slave_sql_verify_checksum
+ * Scope: Global
+ * Access Type: Can be changed dynamically
+ * Data Type: bool
+ * Default Value: ON
+
+The mysqlbinlog client program by default does not verify checksums when
+reading a binlog file, however it can be instructed to do so with the option
+verify-binlog-checksum:
+
+ * Variable Name: verify-binlog-checksum
+ * Data Type: bool
+ * Default Value: OFF
+
+
+Interoperability with older versions of MySQL and MariaDB
+---------------------------------------------------------
+
+The introduction of checksums on binlog events changes the format that events
+are stored in binlog files and sent over the network to slaves. This raises
+the question on what happens when replicating between different versions of
+the server, where one server is a newer version that has the binlog checksum
+feature implemented, while the other server is an older version that does not
+know about binlog checksums.
+
+When checksums are disabled on the master (or the master has the old version
+with no checksums implemented), there is no problem. In this case the binlog
+format is backwards compatible, and replication works fine.
+
+When the master is a newer version with checksums enabled in the binlog, but
+the slave is an old version that does not understand checksums, replication
+will fail. The master will disconnect the slave with an error, and also log a
+warning in its own error log. This prevents sending events to the slave that
+it will be unable to interpret correctly, but means that binlog checksums can
+not be used with older slaves. (With the recommended upgrade path, where
+slaves are upgraded before masters, this is not a problem of course).
+
+Replicating from a new MySQL master with checksums enabled to a new MariaDB
+which also understands checksums works, and the MariaDB slave will verify
+checksums on replicated events.
+
+There is however a problem when a newer MySQL slave replicates against a newer
+MariaDB master with checksums enabled. The slave server looks at the master
+server version to know whether events include checksums or not, and MySQL has
+not yet been updated to learn that MariaDB does this already from version
+5.3.0 (as of the time of writing, MySQL 5.6.2). Thus, if MariaDB at least
+version 5.3.0 but less that 5.6.1 is used as a master with binlog checksums
+enabled, a MySQL slave will interpret the received events incorrectly as it
+does not realise the last part of the events is the checksum. So replication
+will fail with an error about corrupt events or even silent corruption of
+replicated data in unlucky cases. This requires changes to the MySQL server to
+fix.
+
+Here is a summary table of the status of replication between different
+combination of master and slave servers and checksum enabled/disabled:
+
+
+    OLD: MySQL <5.6.1 or MariaDB < 5.3.0 with no checksum capabilities
+    NEW-MARIA: MariaDB >= 5.3.0 with checksum capabilities
+    NEW-MYSQL: MySQL >= 5.6.1 with checksum capabilities
+
+
+    Master      Slave /     Checksums   Status
+		mysqlbinlog enabled?
+    OLD         OLD         -           Ok
+    OLD         NEW-MARIA   -           Ok
+    OLD         MYSQL       -           Ok
+    NEW-MARIA   OLD         No          Ok
+    NEW-MARIA   OLD         Yes         Master will refuse with error
+    NEW-MARIA   NEW-MARIA   Yes/No      Ok
+    NEW-MARIA   NEW-MYSQL   No          Ok
+    NEW-MARIA   NEW-MYSQL   Yes         Fail. Requires changes in MySQL,
+					otherwise it will not realise
+					MariaDB < 5.6.1 does checksums
+					and will be confused.
+    NEW-MYSQL   OLD         No          Ok
+    NEW-MYSQL   OLD         Yes         Master will refuse with error
+    NEW-MYSQL   NEW-MARIA   Yes/No      Ok
+    NEW-MYSQL   NEW-MYSQL   Yes/No      Ok
+
+
+For using the mysqlbinlog client program, there are similar issues.
+
+A version of mysqlbinlog that understands checksums can read binlog files from
+either old or new servers of course, with or without checksums enabled.
+
+An old version of mysqlbinlog can read binlog files produced by a new server
+version, if checksums were disabled when the log was produced. However, old
+versions of mysqlbinlog reading a new binlog file containing checksums will be
+confused, and output will be garbled, with the added checksums being
+interpreted as extra garbage at the end of query strings and similarly. No
+error will be reported in this case, just wrong output.
+
+A version of mysqlbinlog from MySQL >= 5.6.1 will have similar problems as a
+slave until this is fixed in MySQL. Reading a binlog file with checksums
+produced by MariaDB >= 5.3.0 but < 5.6.1, it will not realise that checksums
+are included, and will produce garbled output just like an old version of
+mysqlbinlog. However, the MariaDB version of mysqlbinlog can read binlog files
+produced by either MySQL or MariaDB just fine.
 
 

(Knielsen - Mon, 02 May 2011, 06:39
    
Finished with backporting the test suite, all tests pass now.

Started working on getting decent interoperability with other versions
(eg. different MariaDB versions, as well as MySQL and other forks). Still
working on this. The problem is that the MySQL code is based on parsing the
version string stored in the format_description event and knowing on the slave
what certain server versions do or do not do with respect to checksums; it
becomes complicated to maintain complex version<->capability mapping in the
slave code (and impossible to change in older versions of course). But still
hopeful that I can find a simple solution that does something reasonable in
the most important cases.
Worked 14 hours and estimate 11 hours remain (original estimate unchanged).

(Knielsen - Wed, 27 Apr 2011, 14:49
    
Finished backporting the code, backported the test suite part.
The interaction with WL#2687/WL#5072/BUG#40278/BUG#47175, which is not backported, was somewhat
tricky, but I found a decent solution.
Started fixing .result file differences / test failures.
Worked 15 hours and estimate 25 hours remain (original estimate unchanged).

(Knielsen - Wed, 27 Apr 2011, 14:47
    
Low Level Design modified.
--- /tmp/wklog.180.old.2839	2011-04-27 14:47:31.000000000 +0000
+++ /tmp/wklog.180.new.2839	2011-04-27 14:47:31.000000000 +0000
@@ -2,6 +2,8 @@
 patch for MySQL WL#2540:
 
 bzr diff
-'-rrevid:alfranio.correia@oracle.com-20101121143257-se3vpqus73l4mum0..revid:luis.soares@oracle.com-20101124111752-9b8260bd1qak87hr'
+'-rrevid:alfranio.correia@oracle.com-20101121143257-se3vpqus73l4mum0
+..revid:luis.soares@oracle.com-20101124111752-9b8260bd1qak87hr'
 --old=lp:mysql-server --new=lp:mysql-server
 
+

(Knielsen - Mon, 18 Apr 2011, 10:34
    
Worked on back-porting the patch to MariaDB-5.2-rpl.
Worked 8 hours and estimate 40 hours remain (original estimate increased by 48 hours).

(Knielsen - Mon, 11 Apr 2011, 06:46
    
Low Level Design modified.
--- /tmp/wklog.180.old.14706	2011-04-11 06:46:32.000000000 +0000
+++ /tmp/wklog.180.new.14706	2011-04-11 06:46:32.000000000 +0000
@@ -1,2 +1,7 @@
+Carefully checking the bzr history of mysql-trunk, this seems to extract the
+patch for MySQL WL#2540:
 
+bzr diff
+'-rrevid:alfranio.correia@oracle.com-20101121143257-se3vpqus73l4mum0..revid:luis.soares@oracle.com-20101124111752-9b8260bd1qak87hr'
+--old=lp:mysql-server --new=lp:mysql-server
 
-- View All Progress Notes (18 total) --


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB