This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Add DS-MRR support for clustered primary keys
Title
Task ID121
Queue
Version N/A
Status
PriorityN/A
Copies to
Created byPsergey12 Jun 2010Done
Supervisor N/A  
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
Currently, DS-MRR doesn't allow to do MRR scans over clustered primary keys. The
reason for this is that
 - Clustered primary keys are stored in disk order, so, if the sequence of
   scanned ranges is ordered, the reads will automatically happen in disk
   order, and DS-MRR's step of re-ordering reads is redundant.
 - Within DS-MRR implementation, the "get rowids from keys" step is not
   necessary when using clustered primary key, because in InnoDB/XtraDB
   clustered primary key values are the rowids.

However, when MRR calls are made by BKA, there are cases where DS-MRR over
clustered primary key is beneficial:

* BKA may provide lookup keys that have duplicates and/or are in arbitrary
  order. In that case, DS-MRR implementation may sort the key values and 
  order them, so that it hits the disk in key(=disk) order.

* When running multi-table join with high @@join_cache_level value (and so,
  linked join buffers), lack of MRR implementation causes the chain of linked
  join buffers to break. (TODO and so what? Is that really a problem?)

* TODO anything else?

This WL entry is about addressing the above by adding support of DS-MRR over
clustered primary key that would work according to this strategy:
1. Sort incoming keys 
2. Use the sorted keys to do a disk-ordered retrieval.

 Task Dependencies
Others waiting for Task 121Task 121 is waiting forGraph
91 MariaDB 5.3
125 Make DS-MRR sort the ranges before scanning the index
 
 High-Level Specification
1. Choices to be made
---------------------

1.1 Handling of complex ranges
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The "sort incoming keys" part is easy when we have only equality ranges.
If we allow ranges of arbitrary form (including ranges with one endpoint
being infinity and/or ranges overlapping with one another), sorting becomes
non-trivial. Do we need to support this case or support only equality ranges?

Decision: the new code should handle only the case with equality ranges.
For non-equality ranges, the execution will proceed as before.

1.2 Handling index prefix scans
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What do we do if asked to do a scan on a prefix of clustered PK? 

Decision: handle this if the ranges are equality ranges. The difference from
scan on full primary key is that 
- we will have to use read_range_XXX() or index_read()/index_next_same() 
  functions, while for full primary key value we could have used rnd_pos().
- One equality range can produce multiple matching records.

1.3 Use of knowledge that primary_key==rowid
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered
PKs? 
Decision: don't make this assumption.


2. Code-level changes overview
------------------------------

DsMrr_impl::choose_mrr_impl():
Enable MRR when 
  - ihis is a clustered primary key
  - incoming ranges are single-point (HA_MRR_SINGLE_POINT is set)
     - will need to make the SQL layer to set this flag
  - incoming ranges are not already sorted (HA_MRR_SORTED is not set)

(TODO do we need new cost formula?)

DsMrr_impl::dsmrr_init() 
 - different elem_size (not rowid length but key tuple length)
 - don't create the secondary handler object, we won't need it.

DsMrr_impl::dsmrr_fill_buffer():
 - need a variant of this function that would not access the index but just
   fill and sort the array.

DsMrr_impl::dsmrr_next():
 - should abstract-out:
   - buffer element size
   - rnd_pos/index_read call.
      - Also for CPK prefix scans there can be multi
 Low-Level Design
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
Total0 
 Hrs WorkedProgressCurrentOriginal
Total000
 
 Funding and Votes
Votes: 1: 100%
 Change vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Psergey - Sat, 09 Oct 2010, 10:58
    
Dependency created: WL#125 now depends on WL#121

(Psergey - Sat, 09 Oct 2010, 10:55
    
Dependency created: WL#91 now depends on WL#121

(Sergei - Thu, 01 Jul 2010, 05:48
    
Category updated.
--- /tmp/wklog.121.old.27911	2010-07-01 05:48:16.000000000 +0000
+++ /tmp/wklog.121.new.27911	2010-07-01 05:48:16.000000000 +0000
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin

(Sergei - Thu, 01 Jul 2010, 05:48
    
Version updated.
--- /tmp/wklog.121.old.27911	2010-07-01 05:48:16.000000000 +0000
+++ /tmp/wklog.121.new.27911	2010-07-01 05:48:16.000000000 +0000
@@ -1 +1 @@
-Benchmarks-3.0
+9.x

(Sergei - Thu, 01 Jul 2010, 05:48
    
Version updated.
--- /tmp/wklog.121.old.27911	2010-07-01 05:48:16.000000000 +0000
+++ /tmp/wklog.121.new.27911	2010-07-01 05:48:16.000000000 +0000
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x

(Psergey - Sun, 13 Jun 2010, 11:56
    
High-Level Specification modified.
--- /tmp/wklog.121.old.9396	2010-06-13 11:56:34.000000000 +0000
+++ /tmp/wklog.121.new.9396	2010-06-13 11:56:34.000000000 +0000
@@ -1,17 +1,55 @@
-Basic idea: DS-MRR scan should be done as follows:
+1. Choices to be made
+---------------------
 
-1. Sort incoming keys 
-2. Use the sorted keys to do a disk-ordered retrieval
+1.1 Handling of complex ranges
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The "sort incoming keys" part is easy when we have only equality ranges.
+If we allow ranges of arbitrary form (including ranges with one endpoint
+being infinity and/or ranges overlapping with one another), sorting becomes
+non-trivial. Do we need to support this case or support only equality ranges?
 
-Unresolved questions:
+Decision: the new code should handle only the case with equality ranges.
+For non-equality ranges, the execution will proceed as before.
 
-* The "sort incoming keys" part is trivial when we have only equality ranges.
-  If we allow ranges of arbitrary form ( including ranges with one endpoint 
-  being infinity or ranges overlapping with one another), sorting becomes
-  non-trivial. Do we need to support this case or support only equality ranges?
+1.2 Handling index prefix scans
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+What do we do if asked to do a scan on a prefix of clustered PK? 
 
-* Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered
-  PKs? (current decision: No)
+Decision: handle this if the ranges are equality ranges. The difference from
+scan on full primary key is that 
+- we will have to use read_range_XXX() or index_read()/index_next_same() 
+  functions, while for full primary key value we could have used rnd_pos().
+- One equality range can produce multiple matching records.
 
-* Do we support scanning on a prefix of clustered PK? (Yes but scan on prefix 
-  of clustered PK will not be in disk order. We need to run it with regular mode)
+1.3 Use of knowledge that primary_key==rowid
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered
+PKs? 
+Decision: don't make this assumption.
+
+
+2. Code-level changes overview
+------------------------------
+
+DsMrr_impl::choose_mrr_impl():
+Enable MRR when 
+  - ihis is a clustered primary key
+  - incoming ranges are single-point (HA_MRR_SINGLE_POINT is set)
+     - will need to make the SQL layer to set this flag
+  - incoming ranges are not already sorted (HA_MRR_SORTED is not set)
+
+(TODO do we need new cost formula?)
+
+DsMrr_impl::dsmrr_init() 
+ - different elem_size (not rowid length but key tuple length)
+ - don't create the secondary handler object, we won't need it.
+
+DsMrr_impl::dsmrr_fill_buffer():
+ - need a variant of this function that would not access the index but just
+   fill and sort the array.
+
+DsMrr_impl::dsmrr_next():
+ - should abstract-out:
+   - buffer element size
+   - rnd_pos/index_read call.
+      - Also for CPK prefix scans there can be multi

(Psergey - Sun, 13 Jun 2010, 11:55
    
High Level Description modified.
--- /tmp/wklog.121.old.9380	2010-06-13 11:55:42.000000000 +0000
+++ /tmp/wklog.121.new.9380	2010-06-13 11:55:42.000000000 +0000
@@ -1,18 +1,18 @@
-Currently, DS-MRR doesn't support operation over clustered primary keys. The
-reason for this was that
- - Clustered primary keys are stored in disk order and so, if the sequence of 
-   ranges is ordered, the reads will already go in disk order (and so DS-MRR's
-   step of re-ordering reads is not necessary).
+Currently, DS-MRR doesn't allow to do MRR scans over clustered primary keys. The
+reason for this is that
+ - Clustered primary keys are stored in disk order, so, if the sequence of
+   scanned ranges is ordered, the reads will automatically happen in disk
+   order, and DS-MRR's step of re-ordering reads is redundant.
  - Within DS-MRR implementation, the "get rowids from keys" step is not
    necessary when using clustered primary key, because in InnoDB/XtraDB 
    clustered primary key values are the rowids.
 
-However, with BKA making the MRR calls, there are cases where DS-MRR over
+However, when MRR calls are made by BKA, there are cases where DS-MRR over
 clustered primary key is beneficial:
 
 * BKA may provide lookup keys that have duplicates and/or are in arbitrary
   order. In that case, DS-MRR implementation may sort the key values and 
-  order them, so that it hits the disk in key order.
+  order them, so that it hits the disk in key(=disk) order.
 
 * When running multi-table join with high @@join_cache_level value (and so,
   linked join buffers), lack of MRR implementation causes the chain of linked
@@ -20,3 +20,9 @@
 
 * TODO anything else?
 
+This WL entry is about addressing the above by adding support of DS-MRR over
+clustered primary key that would work according to this strategy:
+1. Sort incoming keys 
+2. Use the sorted keys to do a disk-ordered retrieval.
+
+

(Psergey - Sun, 13 Jun 2010, 09:42
    
High-Level Specification modified.
--- /tmp/wklog.121.old.5009	2010-06-13 09:42:38.000000000 +0000
+++ /tmp/wklog.121.new.5009	2010-06-13 09:42:38.000000000 +0000
@@ -6,11 +6,12 @@
 Unresolved questions:
 
 * The "sort incoming keys" part is trivial when we have only equality ranges.
-  If we allow ranges of arbitrary form ( ncluding ranges with one endpoint 
+  If we allow ranges of arbitrary form ( including ranges with one endpoint 
   being infinity or ranges overlapping with one another), sorting becomes
-  non-trival. Do we need to support this case or support only equality ranges?
+  non-trivial. Do we need to support this case or support only equality ranges?
 
 * Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered
-  PKs?
+  PKs? (current decision: No)
 
-* Do we support scanning on a prefix of clustered PK? (seems to be yes?)
+* Do we support scanning on a prefix of clustered PK? (Yes but scan on prefix 
+  of clustered PK will not be in disk order. We need to run it with regular mode)

(Psergey - Sat, 12 Jun 2010, 08:39
    
High-Level Specification modified.
--- /tmp/wklog.121.old.538	2010-06-12 08:39:46.000000000 +0000
+++ /tmp/wklog.121.new.538	2010-06-12 08:39:46.000000000 +0000
@@ -1 +1,16 @@
+Basic idea: DS-MRR scan should be done as follows:
 
+1. Sort incoming keys 
+2. Use the sorted keys to do a disk-ordered retrieval
+
+Unresolved questions:
+
+* The "sort incoming keys" part is trivial when we have only equality ranges.
+  If we allow ranges of arbitrary form ( ncluding ranges with one endpoint 
+  being infinity or ranges overlapping with one another), sorting becomes
+  non-trival. Do we need to support this case or support only equality ranges?
+
+* Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered
+  PKs?
+
+* Do we support scanning on a prefix of clustered PK? (seems to be yes?)


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB