This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Make DS-MRR sort the ranges before scanning the index
Title
Task ID125
Queue
Version
Status
Priority
Copies to
Created byPsergey03 Jul 2010Done
Supervisor   
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
After MWL#121 is implemented, we have this state:
- Regular DS-MRR sorts rowids and reads table records in disk order
- CPK variant sorts lookup keys and makes index lookups in index(=disk) order

It is easy to get an idea that we could make regular DS-MRR to do both: first,
sort lookup keys, then do index lookups in index order, then sort the rowids 
and do record reads in disk order.

The advantages of sorting lookup keys are:
- lookups on identical keys are grouped together (TODO: so should this WL code
  avoid making lookups on the same key values or we can rely on storage engine 
  internals to make the repeat identical lookups inexpensive?)
- TODO what else? is the above really enough to justify doing this WL?

(depending on the nature of the advantages, we might want to do index-ordered 
reads for "index only" scans).
 Task Dependencies
Others waiting for Task 125Task 125 is waiting forGraph
91 MariaDB 5.3
121 Add DS-MRR support for clustered primary keys 
123 DS-MRR for clustered PKs: more efficient buffer use 
124 DS-MRR for clustered PKs: cost function 
 
 High-Level Specification
1. Buffer usage
---------------

1.1 Efficient buffer space use problem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppose we're given a sequence of $N ranges, and $SIZE kbytes of memory. The
buffer will be used for
1. storage/sorting of lookup keys
2. storage/sorting of rowids

If we take a simple approach, we divide the buffer into two parts, one for
lookup keys and another for rowids. The problem is that we don't know for 
certain how many rowids we will get for each lookup key, so we will either
not be able use the full buffer or will run out of buffer space and will
have to do two disk-ordered reads instead of one.

We could re-use the space occupied by lookup keys to store rowids (actually,
<rowid, range_id> pairs) but that doesn't solve the problem of not being able
to tell how many lookup keys we should have in one sort+sweep-read step.

1.2 Solution
~~~~~~~~~~~~
The buffer should be divided in proportion of expected sizes: lookup keys 
occupy 

   (sizeof(key) + sizeof(range_id)) * n_keys                      (1)

rowids will occupy

   (sizeof(rowid) + sizeof(range_id)) * rec_per_key * n_keys      (2)

(TODO: what to do if rec_per_key value is unknown?)
We can divide both (1) and (2) by n_keys and that will give the needed
proportion.


We will take advantage of the fact that {key, range_id} pairs for which we got 
{rowid, range_id} pairs are not needed anymore. This will be done as follows:

- Store {key, range_id} pairs from the start of the buffer
- sort them in reverse order
- start index-ordered read (that would require traversing the sorted array
  backwards)
- keep track of how much more space becomes free after we've done lookup for
  the next {key, range_id} pair.
- let the accumulated array of {rowid, range_id} pairs to grow from the end of
  the buffer.

Using this scheme, one array will be at start of the buffer and gradually
shrinking and the other at end of the buffer and gradually growing.  Overflow
detection code should make sure that arrays never have any intersection with
one another.

2. Grouping identical keys 
--------------------------
BKA could possibly supply multiple identical keys, that is, MRR implementation
will get many pairs like:
   
   {keyX, rangeY1}, {keyX, rangeY2}, {keyX, rangeY3}, ...

since we do index reads in key order, we'll be able to detect identical reads
with little extra cost (compare current key with the next/previous) and avoid
making multiple lookups with the same value of keyX.

This will be implemented in this WL entry.

3. User control
---------------
Introduce an @@optimizer_switch flag to control the behavior.
Flag name:      mrr_sort_keys
Default value:  ON

4. Interplay with MWL#123
-------------------------

MWL#125 code will support MWL#123's optimization of storing/sorting not keys but
rather pointers to them  (TODO: right?)
 Low-Level Design
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
All Sub Tasks0 
Total0 
 Hrs WorkedProgressCurrentOriginal
Total000
 
 Funding and Votes
Votes: 0: 0%
 Make vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Psergey - Sat, 09 Oct 2010, 10:58
    
Dependency created: WL#125 now depends on WL#124

(Psergey - Sat, 09 Oct 2010, 10:58
    
Dependency created: WL#125 now depends on WL#123

(Psergey - Sat, 09 Oct 2010, 10:58
    
Dependency created: WL#125 now depends on WL#121

(Psergey - Sat, 09 Oct 2010, 10:56
    
Dependency created: WL#91 now depends on WL#125

(Psergey - Wed, 14 Jul 2010, 17:30
    
High-Level Specification modified.
--- /tmp/wklog.125.old.22513	2010-07-14 17:30:33.000000000 +0000
+++ /tmp/wklog.125.new.22513	2010-07-14 17:30:33.000000000 +0000
@@ -58,13 +58,11 @@
    
    {keyX, rangeY1}, {keyX, rangeY2}, {keyX, rangeY3}, ...
 
-since we do index reads in key order, we'll be able to detect this case 
-(compare current key with the next/previous) and avoid making multiple lookups
-with the same value of keyX.
-
-This, however, will not be implemented within scope of this WL entry. We'll
-rely on the fact that storage engine makes multiple repetitive index lookups to
-be sufficiently fast.
+since we do index reads in key order, we'll be able to detect identical reads
+with little extra cost (compare current key with the next/previous) and avoid
+making multiple lookups with the same value of keyX.
+
+This will be implemented in this WL entry.
 
 3. User control
 ---------------

(Psergey - Tue, 13 Jul 2010, 22:46
    
High-Level Specification modified.
--- /tmp/wklog.125.old.4453	2010-07-13 22:46:30.000000000 +0000
+++ /tmp/wklog.125.new.4453	2010-07-13 22:46:30.000000000 +0000
@@ -1,4 +1,3 @@
-
 1. Buffer usage
 ---------------
 
@@ -73,3 +72,9 @@
 Flag name:      mrr_sort_keys
 Default value:  ON
 
+4. Interplay with MWL#123
+-------------------------
+
+MWL#125 code will support MWL#123's optimization of storing/sorting not keys but
+rather pointers to them  (TODO: right?)
+

(Psergey - Tue, 13 Jul 2010, 22:38
    
High-Level Specification modified.
--- /tmp/wklog.125.old.4298	2010-07-13 22:38:59.000000000 +0000
+++ /tmp/wklog.125.new.4298	2010-07-13 22:38:59.000000000 +0000
@@ -1,5 +1,9 @@
-Efficient buffer space use problem
-----------------------------------
+
+1. Buffer usage
+---------------
+
+1.1 Efficient buffer space use problem
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Suppose we're given a sequence of $N ranges, and $SIZE kbytes of memory. The
 buffer will be used for
 1. storage/sorting of lookup keys
@@ -15,4 +19,57 @@
 <rowid, range_id> pairs) but that doesn't solve the problem of not being able
 to tell how many lookup keys we should have in one sort+sweep-read step.
 
+1.2 Solution
+~~~~~~~~~~~~
+The buffer should be divided in proportion of expected sizes: lookup keys 
+occupy 
+
+   (sizeof(key) + sizeof(range_id)) * n_keys                      (1)
+
+rowids will occupy
+
+   (sizeof(rowid) + sizeof(range_id)) * rec_per_key * n_keys      (2)
+
+(TODO: what to do if rec_per_key value is unknown?)
+We can divide both (1) and (2) by n_keys and that will give the needed
+proportion.
+
+
+We will take advantage of the fact that {key, range_id} pairs for which we got 
+{rowid, range_id} pairs are not needed anymore. This will be done as follows:
+
+- Store {key, range_id} pairs from the start of the buffer
+- sort them in reverse order
+- start index-ordered read (that would require traversing the sorted array
+  backwards)
+- keep track of how much more space becomes free after we've done lookup for
+  the next {key, range_id} pair.
+- let the accumulated array of {rowid, range_id} pairs to grow from the end of
+  the buffer.
+
+Using this scheme, one array will be at start of the buffer and gradually
+shrinking and the other at end of the buffer and gradually growing.  Overflow
+detection code should make sure that arrays never have any intersection with
+one another.
+
+2. Grouping identical keys 
+--------------------------
+BKA could possibly supply multiple identical keys, that is, MRR implementation
+will get many pairs like:
+   
+   {keyX, rangeY1}, {keyX, rangeY2}, {keyX, rangeY3}, ...
+
+since we do index reads in key order, we'll be able to detect this case 
+(compare current key with the next/previous) and avoid making multiple lookups
+with the same value of keyX.
+
+This, however, will not be implemented within scope of this WL entry. We'll
+rely on the fact that storage engine makes multiple repetitive index lookups to
+be sufficiently fast.
+
+3. User control
+---------------
+Introduce an @@optimizer_switch flag to control the behavior.
+Flag name:      mrr_sort_keys
+Default value:  ON
 

(Psergey - Mon, 05 Jul 2010, 00:14
    
Supervisor updated.
--- /tmp/wklog.125.old.8195	2010-07-05 00:14:08.000000000 +0000
+++ /tmp/wklog.125.new.8195	2010-07-05 00:14:08.000000000 +0000
@@ -1 +1 @@
-
+Bothorsen

(Psergey - Mon, 05 Jul 2010, 00:14
    
Version updated.
--- /tmp/wklog.125.old.8195	2010-07-05 00:14:08.000000000 +0000
+++ /tmp/wklog.125.new.8195	2010-07-05 00:14:08.000000000 +0000
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x

(Psergey - Mon, 05 Jul 2010, 00:14
    
Privacy level updated.
--- /tmp/wklog.125.old.8195	2010-07-05 00:14:08.000000000 +0000
+++ /tmp/wklog.125.new.8195	2010-07-05 00:14:08.000000000 +0000
@@ -1 +1 @@
-y
+n
-- View All Progress Notes (11 total) --


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB