This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Implement UNION ALL without usage of a temporary table
Title
Task ID44
Queue
Version N/A
Status
PriorityN/A
Copies toMonty
Psergey

Created byIgor14 Aug 2009Done
Supervisor N/A  
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
Moved to JIRA: https://mariadb.atlassian.net/browse/MDEV-334
 Task Dependencies
Others waiting for Task 44Task 44 is waiting forGraph
 
 High-Level Specification
Moved to JIRA: https://mariadb.atlassian.net/browse/MDEV-334
 Low-Level Design
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
Guest2014 Aug 2009
Total20 
 Hrs WorkedProgressCurrentOriginal
This Task2000
Total2000
 
 Funding and Votes
Votes: 1: 0%
 Change vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Rasmus - Tue, 12 Jun 2012, 08:12
    
High Level Description modified.
--- /tmp/wklog.44.old.25527	2012-06-12 08:12:20.000000000 +0000
+++ /tmp/wklog.44.new.25527	2012-06-12 08:12:20.000000000 +0000
@@ -1,7 +1,2 @@
-Currently when any union operation is executed the rows received from its
-operands are always sent to a temporary table. Meanwhile for a UNION ALL
-operation that is used at the top level of a query without an ORDER BY clause it
-is not necessary. In this case the rows could be sent directly to the client.
-The goal of this task  is to provide such an implementation of UNION ALL
-operation that would not use temporary table at all in certain, most usable cases. 
+Moved to JIRA: https://mariadb.atlassian.net/browse/MDEV-334
 

(Rasmus - Tue, 12 Jun 2012, 08:12
    
High-Level Specification modified.
--- /tmp/wklog.44.old.25518	2012-06-12 08:12:05.000000000 +0000
+++ /tmp/wklog.44.new.25518	2012-06-12 08:12:05.000000000 +0000
@@ -1,207 +1,2 @@
-<contents>
-1. Handling union operations in MySQL Server
-  1.1. Specifics of MySQL union operations
-  1.2 Validation of union units
-  1.3 Execution of union units
-2. Optimizations improving performance of  UNION ALL operations 
-  2.1 Execution of UNION ALL without temporary table
-  2.2. Avoiding unnecessary copying
-  2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
-3. Other possible optimizations for union units
-</contents>
-
-1. Handling union operations in MySQL Server
-============================================
-
-1.1. Specifics of MySQL union operations
-----------------------------------------
-
-UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
-allows us to use  these operations in a sequence, one after another. For example
-the following queries are accepted by the MySQL Server:
-  (select a1,b1,c1 from t1 where a1=b1) union
-    (select a2,b2,c2 from t2 where a2!=b2) union 
-      (select a3,b3,c3 from t3 where a3>b3);                        (1)
-  (select a1,b1,c1 from t1 where a1=b1) union all
-    (select a2,b2,c2 from t2 where a2!=b2) union all 
-      (select a3,b3,c3 from t3 where a3>b3);                        (2)
-Any mix of UNION and UNION ALL is also acceptable:
-  (select a1,b1,c3 from t1 where a1=b1) union
-    (select a2,b2,c3 from t2 where a2!=b2) union all 
-      (select a3,b3,c3 from t3 where a3>b3);                        (3)
-  (select a1,b1,c1 from t1 where a1=b1) union all
-    (select a2,b2,c2 from t2 where a2!=b2) union  
-      (select a3,b3,c3 from t3 where a3>b3);                        (4)
-
-It should be noted that query (4) is equivalent to query (1). At the same time
-query (3) is not equivalent to any of the queries (1),(2),(4).
-In general any UNION ALL in a sequence of union operations can be equivalently
-substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid SQL query
-is considered by MySQL Server as erroneous:
- ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))  
- union all 
- ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
-  
-A sequence of select constructs separated by UNION/UNION ALL is called 'union
-unit' if it s not a part of another such sequence.
-A union unit can be executed as a query. It also can be used as a subquery.
-A union unit can be optionally appended by an ORDER BY  and/or LIMIT construct.
-In this case it cannot be used as a subquery. 
-
-1.2 Validation of union units
------------------------------
-
-When the parser stage is over the further processing of a union unit is
-performed by the function mysql_union.
-The function first validate the unit in the method  SELECT_LEX_UNIT::prepare.
-The method first validates each of the select constructs of the unit and then it
-checks that all select are compatible. The method checks that the selects return
-the same number of columns and for each set of columns with the same number k
-there is a type to which the types of the columns can be coerced. This type is
-considered as the type of column k of the result set returned by the union unit. 
-For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
-bigint and double respectively then the second column of the union unit will be
-of the type double. If the types of the columns c1,c2,c3 are specified as
-varchar(10), varchar(20), varchar(10) then the type of the corresponding column
-of the result set will be varchar(20). If the columns have different collations
-then a collation from which all these collations can be derived is looked for
-and it is assigned as the
-collation of the third column in the result set.
-After compatibility of the corresponding select columns has been checked and the
-types of the columns  from of the result set have been determined the method 
-SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
-result set for the union unit. Currently rows returned by the selects from the
-union unit are always written into a temporary table. To force selects to send 
-rows to this temporary table  SELECT_LEX_UNIT::prepare creates JOIN objects for
-the selects such that the JOIN::result field refers to an object of the class
-select_union. All selects from a union unit share the same select_union object.
-
-1.3 Execution of union units
-----------------------------
-
-After  SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
-created a temporary table as a container for rows from the result sets returned
-by the selects of the unit, and has prepared all data structures needed for
-execution, the function mysql_union invokes  SELECT_LEX_UNIT::exec.
-The method  SELECT_LEX_UNIT::exec processes the selects from the union unit one
-by one.
-Each select first is optimized with JOIN::optimize(), then it's executed with
-JOIN::exec().The result rows from each select are sent to a temporary table.
-This table accumulates all rows that are to be returned by the union unit. For
-UNION operations duplicate rows are not added, for UNION ALL operations all
-records are added. It is achieved by enabling and disabling usage of the unique
-index defined on all fields of the temporary  table. The index is never used if
-only UINION ALL operation occurs in the unit. Otherwise it is enabled  before
-the first select is executed and disabled after the last UNION operation.
-To send rows to the temporary table the method select_union::send_data is used.
-For a row it receives from the currently executed select the method first stores
-the fields of the row in in the fields of the record buffer of the temporary
-table. To do this the method calls function fill_record. All needed type
-conversions of the field values are performed when they are stored the record
-buffer. After this the method select_union::send_data  calls the ha_write_row
-handler function to write the record from the buffer to the temporary table. A
-possible error on duplicate key that occurs with an attempt to write a duplicate
-row is ignored.
-After all rows received from all selects have been placed into the temporary
-table the  method    SELECT_LEX_UNIT::exec calls mysql_select that reads rows
-from the temporary table and sends them to the output stream (to the client). If
-there is an ORDER BY clause to be applied to result of the union unit then the
-rows read from the temporary table have to be sorted first.
-
-2. Optimizations improving performance of  UNION ALL operations 
-===============================================================
-
-The following three optimizations are proposed to be implemented in the
-framework of this task.
-
-2.1 Execution of UNION ALL without temporary table
---------------------------------------------------
-
-If a union unit with only UNION ALL operations is used at the top level of the
-query (in other words it's not used as a subquery) and is not appended with an
-ORDER BY clause then it does not make sense to send rows received from selects
-to a temporary table at all. After all needed type conversions have been done
-the row fields could be sent directly into the output stream. It would improve
-the performance of UNION ALL operations since writing to the temporary table and
-reading from it would not be needed anymore. In the cases when the result set is
-big enough and the temporary table cannot be allocated in the main memory the
-performance gains would be significant. Besides, the client could get the first
-result rows at once as it would not have to wait until all selects have been
-executed.
-To make an UNION ALL operation not to send rows to a temporary table we could
-provide the JOIN objects created for the selects from the union unit with an
-interceptor object that differs from the one  they use now. In the current code
-they use an object of the class select_union derived from the
-select_result_interceptor class. The new interceptor object of the class that
-we'll call select_union_send (by analogy with the class select_send) shall
-inherit  from the select_union and shall have its own implementations of the
-virtual methods send_data, send_fields, and send_eof.
-The method send_data shall send  fields received from selects to the record
-buffer of the temporary table and then from this buffer to the output stream.
-The method send_fields shall send the format of the rows to the client before it
-starts getting records from the first select , while the method send_eof shall
-signal about the end of the rows after the last select finishes sending records.
-The method  create_result_table  of the class select_union shall be re-defined
-as virtual. The implementation of this method for the class select_union_send
-shall call  select_union::create_result_table and then shall build internal
-structures needed for select_unionsend::send_data. So, the definition of the
-class select_union_send should look like this:
-  class select_union_send :public select_union
-  {
-    ...                  // private structures
-  public:
-    select_union_send() :select_union(), ...{...}
-    bool send_data(List<Item> &items);
-    bool send_fields(List<Item> &list, uint flags);
-    bool create_result_table(THD *thd, List<Item> *column_types,
-                                   bool is_distinct, ulonglong options,
-                                   const char *alias); 
-  };
-
-2.2. Avoiding unnecessary copying
----------------------------------
-
-If a field does not need type conversion it does not make sense to send it to a
-record buffer. It can be sent directly to the output stream. Different selects
-can require type conversions for different columns. 
-Let's provide each select from the union unit with a data structure (e.g. a
-bitmap) that says what fields require conversions, and what don't . Before
-execution of a select this data structure must be passed to the
-select_union_send object shared by all selects from the unit. The info in this
-structure will tell  select_union_send::send_data what fields should be sent to
-the record buffer for type conversion and what can be sent directly to the
-output stream. In this case another variant of the fill_record procedure is
-needed that would take as parameter the info that says what fields are to be
-stored in the record buffer.
-
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
-----------------------------------------------------------------------
-
-If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
-used at the top level of a query then any UNION ALL operation after the last
-UNION operation can be executed in more efficient way than it's done in the
-current implementation. More exactly, the rows from any select that follows
-after the second operand of the last UNION operations could be sent directly to
-the output stream. In this case two interceptor objects have to be created: one,
-of the type select_union, is shared by the selects for which UNION operations
-are performed, another, of the type select_union_send, is shared by the the
-remaining selects. For this optimization the method  SELECT_LEX_UNIT::exec is to
-undergo a serious re-work.
-
-
-3. Other possible optimizations for union units
-===============================================
-
-The following optimizations are not supposed to be implemented in the framework
-this task.
-1. For a union unit containing only UNION ALL with an ORDER BY send rows from
-selects directly to the sorting procedure.
-2. For a union unit at the top level of the query without ORDER BY clause send
-any row received from an operand of a UNION operation directly to the output
-stream as soon as it has been checked by  a lookup in the temporary table that
-it's not a duplicate.
-3. Not to use temporary table for any union unit used in EXIST or IN subquery.
-
-
+Moved to JIRA: https://mariadb.atlassian.net/browse/MDEV-334
 

(Sergei - Thu, 01 Jul 2010, 05:48
    
Version updated.
--- /tmp/wklog.44.old.27918	2010-07-01 05:48:26.000000000 +0000
+++ /tmp/wklog.44.new.27918	2010-07-01 05:48:26.000000000 +0000
@@ -1 +1 @@
-Server-9.x
+9.x

(Sergei - Thu, 01 Jul 2010, 05:48
    
Category updated.
--- /tmp/wklog.44.old.27918	2010-07-01 05:48:26.000000000 +0000
+++ /tmp/wklog.44.new.27918	2010-07-01 05:48:26.000000000 +0000
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin

(Guest - Fri, 14 Aug 2009, 09:13
    
2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL
           came up with the idea how to bypass temporary table when executing UNION ALL

2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table

2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations 
           (in subqueries, in queries that do not use tables)

2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task
Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours).

(Guest - Fri, 14 Aug 2009, 08:52
    
Supervisor updated.
--- /tmp/wklog.44.old.22769	2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769	2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty

(Guest - Fri, 14 Aug 2009, 08:52
    
Version updated.
--- /tmp/wklog.44.old.22769	2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769	2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x

(Guest - Fri, 14 Aug 2009, 08:52
    
Privacy level updated.
--- /tmp/wklog.44.old.22769	2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769	2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n

(Guest - Fri, 14 Aug 2009, 08:50
    
High-Level Specification modified.
--- /tmp/wklog.44.old.22656	2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656	2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
 UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
 allows us to use  these operations in a sequence, one after another. For example
 the following queries are accepted by the MySQL Server:
-  (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union 
+  (select a1,b1,c1 from t1 where a1=b1) union
+    (select a2,b2,c2 from t2 where a2!=b2) union 
     (select a3,b3,c3 from t3 where a3>b3);                        (1)
-  (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all 
+  (select a1,b1,c1 from t1 where a1=b1) union all
+    (select a2,b2,c2 from t2 where a2!=b2) union all 
     (select a3,b3,c3 from t3 where a3>b3);                        (2)
 Any mix of UNION and UNION ALL is also acceptable:
-   (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all 
+  (select a1,b1,c3 from t1 where a1=b1) union
+    (select a2,b2,c3 from t2 where a2!=b2) union all 
     (select a3,b3,c3 from t3 where a3>b3);                        (3)
-  (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union  
+  (select a1,b1,c1 from t1 where a1=b1) union all
+    (select a2,b2,c2 from t2 where a2!=b2) union  
     (select a3,b3,c3 from t3 where a3>b3);                        (4)
+
 It should be noted that query (4) is equivalent to query (1). At the same time
 query (3) is not equivalent to any of the queries (1),(2),(4).
 In general any UNION ALL in a sequence of union operations can be equivalently
 substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
-  ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all 
-  ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))  
+ union all 
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
   
 A sequence of select constructs separated by UNION/UNION ALL is called 'union
 unit' if it s not a part of another such sequence.

(Guest - Fri, 14 Aug 2009, 08:45
    
High-Level Specification modified.
--- /tmp/wklog.44.old.22406	2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406	2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
 2. Optimizations improving performance of  UNION ALL operations 
   2.1 Execution of UNION ALL without temporary table
   2.2. Avoiding unnecessary copying
-  2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+  2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
 3. Other possible optimizations for union units
 </contents>
 
 1. Handling union operations in MySQL Server
-==================================
+============================================
 
 1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
 
 UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
 allows us to use  these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
 In this case it cannot be used as a subquery. 
 
 1.2 Validation of union units
-----------------------------------
+-----------------------------
 
 When the parser stage is over the further processing of a union unit is
 performed by the function mysql_union.
@@ -77,7 +77,7 @@
 select_union. All selects from a union unit share the same select_union object.
 
 1.3 Execution of union units
-----------------------------------
+----------------------------
 
 After  SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
 created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
 rows read from the temporary table have to be sorted first.
 
 2. Optimizations improving performance of  UNION ALL operations 
-=================================================
+===============================================================
 
 The following three optimizations are proposed to be implemented in the
 framework of this task.
 
 2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
 
 If a union unit with only UNION ALL operations is used at the top level of the
 query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
   };
 
 2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
 
 If a field does not need type conversion it does not make sense to send it to a
 record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
 needed that would take as parameter the info that says what fields are to be
 stored in the record buffer.
 
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
 
 If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
 used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
 
 
 3. Other possible optimizations for union units
-=================================
+===============================================
 
 The following optimizations are not supposed to be implemented in the framework
 this task.
-- View All Progress Notes (11 total) --


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB