This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 mysqlbinlog: remove undesired effects of format description binlog statement
Title
Task ID50
Queue
Version
Status
Priority
Copies to
Created byPsergey17 Aug 2009Done
Supervisor   
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
According to complaints in BUG#46640:

1. mysqlbinlog will emit a 5.0-incompatible "format description binlog 
statement" even when reading a binary log that was produced by 5.0. 
This will cause an error when one tires to apply mysqlbinlog output to a
5.0.x server.

2. When one executes a BINLOG statement (containing "format description binlog
statement" or other events), this will cause subsequent statements run in
the same session to be binlogged with the server id of the last event
executed, not with the real server id. This can cause problems like infinite
recursive application in a circular replication topology.

This WL is to fix these issues.
 Task Dependencies
Others waiting for Task 50Task 50 is waiting forGraph
 
 High-Level Specification
First item
----------
AFAIU what needs to be done is:
1. record a source server version (it is in the first binlog event). 
2. don't emit a BINLOG statement if the recorded version number is 5.0.x or
  below.

and we'll get a 5.0-applicable binlog.

Second item
-----------
One question that one needs to sort out before disabling server_id change is 
why it was put there in the first place?  Should it be always removed?

First, the problem is not the format description log event. In fact all log
events have their own value of server_id, and every event in the BINLOG
statement is executed with its own id as server_id.

It seems it was introduced deliberately in the patch for Bug#32407. This patch
makes sql thread and BINLOG statement share code path for executing the binary
event, even though the bug is really about something different (outputting
format description event to allow proper execution of other BINLOG
statements).

Nevertheless, I think using the server_id in the events of a BINLOG statement
is wrong:

1. It is different behaviour from executing normal statement-based events not
written using BINLOG.

2. It causes the possibility for infinite cycle of replication in a cyclic
replication setup, if the server ids in BINLOG are different from all other
server ids in the cycle.

3. The functionality of applying events with original server id from the event
is still available by pointing the slave thread to the binlog. The
functionality of applying row-based binlog events with server id of the server
executing the BINLOG statements is not otherwise available.

In fact most of the code from the slave thread that is now also run when
executing BINLOG statements is redundant, or does the wrong thing (like
updating binlog position).

Maybe there is a confusion on the purpose of the BINLOG statement. Is it to be
able to apply row-based events manually to a server (mysqlbinlog|mysql)? Or is
it to be able to simulate the effect of the slave sql thread? I think the
former is the most important one.
 Low-Level Design
I think the fix for point 2 is to replace the call to
apply_event_and_update_pos() in mysql_client_binlog_statement() with a direct
call to ev->apply_event(). However, I need to check two things to be sure this
is correct:

1. The existing code does

    if (!ev->when) ev->when= my_time(0)

Need to understand if this is needed/correct or not.

2. The existing code does ev->update_pos(). I think this is redundant for
BINLOG statement (as it uses a fake rli structure), but I need to check to
make sure.

Once this is done, point 1 may no longer be needed. The user can use
--base64-output=never when applying the output to a 5.0 server, and omit that
option when applying the output to a 5.1 server. There should be no need to
omit the format description event in the output when other BINLOG statements
for row events are present, as it no longer changes the server id. In fact the
format description event is required to be able to execute other BINLOG
statements, as its purpose is to define the binary format of the events
contained in these statements.

Alternatively, we could implement that if the format description event of the
source binlog has server version < 5.1, and --base64-output=auto (default),
then the format description event is omitted (and should any BINLOG statement
be needed, unlikely as it is from a 5.0 server, we will need to throw an
error).

The binlog format version for 5.0 and 5.1 is actually the same, hence the need
to look at server version to guess if the format description event can be
omitted.
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
Knielsen1605 Oct 2009
Total16 
 Hrs WorkedProgressCurrentOriginal
This Task16410
Total16410
 
 Funding and Votes
Votes: 1: 0%
 Change vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Guest - Thu, 17 Jun 2010, 00:39
    
Dependency deleted: WL#39 no longer depends on WL#50

(Guest - Mon, 21 Dec 2009, 15:40
    
Status updated.
--- /tmp/wklog.50.old.26984	2009-12-21 13:40:53.000000000 +0000
+++ /tmp/wklog.50.new.26984	2009-12-21 13:40:53.000000000 +0000
@@ -1 +1 @@
-Code-Review
+Complete

(Knielsen - Mon, 05 Oct 2009, 14:38
    
Follow-up with MySQL developers on patch
Worked 2 hours and estimate 4 hours remain (original estimate unchanged).

(Knielsen - Tue, 29 Sep 2009, 10:12
    
Category updated.
--- /tmp/wklog.50.old.3293	2009-09-29 10:12:26.000000000 +0300
+++ /tmp/wklog.50.new.3293	2009-09-29 10:12:26.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint

(Knielsen - Tue, 29 Sep 2009, 10:12
    
Status updated.
--- /tmp/wklog.50.old.3293	2009-09-29 10:12:26.000000000 +0300
+++ /tmp/wklog.50.new.3293	2009-09-29 10:12:26.000000000 +0300
@@ -1 +1 @@
-Un-Assigned
+Code-Review

(Knielsen - Mon, 28 Sep 2009, 16:00
    
Updated patch against MySQL 5.1.39
Fixes a couple of issues with the patch.
Discussions with MySQL developer.
Worked 4 hours and estimate 6 hours remain (original estimate unchanged).

(Knielsen - Wed, 09 Sep 2009, 18:31
    
Discussed with MySQL developers on Bug#46640.

Committed a suggested fix:

    https://lists.launchpad.net/maria-developers/msg00926.html
Worked 8 hours and estimate 10 hours remain (original estimate increased by 18 hours).

(Guest - Wed, 09 Sep 2009, 11:50
    
High Level Description modified.
--- /tmp/wklog.50.old.11584	2009-09-09 11:50:12.000000000 +0300
+++ /tmp/wklog.50.new.11584	2009-09-09 11:50:12.000000000 +0300
@@ -5,7 +5,11 @@
 This will cause an error when one tires to apply mysqlbinlog output to a
 5.0.x server.
 
-2. When one applies "format description binlog statement" at the slave, it 
-will change the slave's server_id when applied. 
+2. When one executes a BINLOG statement (containing "format description binlog
+statement" or other events), this will cause subsequent statements run in
+the same session to be binlogged with the server id of the last event
+executed, not with the real server id. This can cause problems like infinite
+recursive application in a circular replication topology.
 
 This WL is to fix these issues.
+

(Knielsen - Tue, 08 Sep 2009, 15:07
    
Low Level Design modified.
--- /tmp/wklog.50.old.19208	2009-09-08 15:07:09.000000000 +0300
+++ /tmp/wklog.50.new.19208	2009-09-08 15:07:09.000000000 +0300
@@ -1 +1,34 @@
+I think the fix for point 2 is to replace the call to
+apply_event_and_update_pos() in mysql_client_binlog_statement() with a direct
+call to ev->apply_event(). However, I need to check two things to be sure this
+is correct:
+
+1. The existing code does
+
+    if (!ev->when) ev->when= my_time(0)
+
+Need to understand if this is needed/correct or not.
+
+2. The existing code does ev->update_pos(). I think this is redundant for
+BINLOG statement (as it uses a fake rli structure), but I need to check to
+make sure.
+
+Once this is done, point 1 may no longer be needed. The user can use
+--base64-output=never when applying the output to a 5.0 server, and omit that
+option when applying the output to a 5.1 server. There should be no need to
+omit the format description event in the output when other BINLOG statements
+for row events are present, as it no longer changes the server id. In fact the
+format description event is required to be able to execute other BINLOG
+statements, as its purpose is to define the binary format of the events
+contained in these statements.
+
+Alternatively, we could implement that if the format description event of the
+source binlog has server version < 5.1, and --base64-output=auto (default),
+then the format description event is omitted (and should any BINLOG statement
+be needed, unlikely as it is from a 5.0 server, we will need to throw an
+error).
+
+The binlog format version for 5.0 and 5.1 is actually the same, hence the need
+to look at server version to guess if the format description event can be
+omitted.
 

(Knielsen - Tue, 08 Sep 2009, 14:29
    
High-Level Specification modified.
--- /tmp/wklog.50.old.17531	2009-09-08 14:29:30.000000000 +0300
+++ /tmp/wklog.50.new.17531	2009-09-08 14:29:30.000000000 +0300
@@ -12,3 +12,37 @@
 One question that one needs to sort out before disabling server_id change is 
 why it was put there in the first place?  Should it be always removed?
 
+First, the problem is not the format description log event. In fact all log
+events have their own value of server_id, and every event in the BINLOG
+statement is executed with its own id as server_id.
+
+It seems it was introduced deliberately in the patch for Bug#32407. This patch
+makes sql thread and BINLOG statement share code path for executing the binary
+event, even though the bug is really about something different (outputting
+format description event to allow proper execution of other BINLOG
+statements).
+
+Nevertheless, I think using the server_id in the events of a BINLOG statement
+is wrong:
+
+1. It is different behaviour from executing normal statement-based events not
+written using BINLOG.
+
+2. It causes the possibility for infinite cycle of replication in a cyclic
+replication setup, if the server ids in BINLOG are different from all other
+server ids in the cycle.
+
+3. The functionality of applying events with original server id from the event
+is still available by pointing the slave thread to the binlog. The
+functionality of applying row-based binlog events with server id of the server
+executing the BINLOG statements is not otherwise available.
+
+In fact most of the code from the slave thread that is now also run when
+executing BINLOG statements is redundant, or does the wrong thing (like
+updating binlog position).
+
+Maybe there is a confusion on the purpose of the BINLOG statement. Is it to be
+able to apply row-based events manually to a server (mysqlbinlog|mysql)? Or is
+it to be able to simulate the effect of the slave sql thread? I think the
+former is the most important one.
+
-- View All Progress Notes (13 total) --


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB