This worklog has been replaced with mariadb.org/jira

This site is here for historical purposes only. Do not add or edit tasks here!

 
 
 

WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Threadpool
Title
Task ID246
Queue
Version
Status
Priority
Copies toSergei

Created byWlad09 Dec 2011Done
Supervisor   
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
Implement a dynamic thread pool, which provides good (superior compared to 
thread-per-connection) performance for short queries and many threads and does 
not block on on long queries.
 Task Dependencies
Others waiting for Task 246Task 246 is waiting forGraph
 
 High-Level Specification
High-level overview
===================

MariaDB 5.5 will implement a dynamic threadpool. There was already a threadpool 
starting from 5.1, the new one is a complete reimplementation of it, the most 
distinguishing feature is the this new pool will adapt itself to the load . The 
size of the pool will grow when there is a load and  shrink if there is no 
work. 
For the short-queries and multiple concurrent clients, number of threads that 
are CPU-active will usually be smaller than in one-thread-per-connection 
method, but we aim still to load CPUs 100%. The benefits of smaller number of 
concurrently running threads are reduced lock contention inside database server 
and reduced context switching.

High-level implementation
=========================
Ideally, all operating systems, we support would provide thread pooling 
facilities, including asynchronous IO integration. We could rely on the OS 
kernel to size the pool – it knows best how many threads should run at any 
given point, when thread is running and when it is sleeping and it whether it 
makes sense to create or activate another thread from pool. Such facility makes 
implementation a lot smaller and more robust. In practice however, it is only 
Windows and recent version of OSX that provide an integrated threadpool . The 
first version in MariaDB will have two different threadpool implementations – 
one based on Windows threadpooling API , and one generic for Unix systems that 
provide  scalable IO multiplexing primitives (that is, Linux with epoll, 
BSD/OSX with kevent, and Solaris with event ports). It would be great to have 
an implementation based on OSX’s  native libdispatch, but  we’ll have to  delay 
this to some later point , until we get a solid Mac system to test it
So why there is an own implementation for Windows:
-Most importantly, it aligns with the goal to provide the best implementation 
for each platform.
-It  would be rather awkward to port Windows asynchronous IO to poll-like egde-
triggered Unix polls(), also it would be very awkward to impossible to port 
shared memory connections based to those APIs (named pipes could be a bit 
simpler though). Shared memory is easy with Windows threadpool, since it 
provides a way to  asynchronously wait for anything, not just for IO 
completions, but for example for event handles to be signaled.

What needs to be implemented?
=============================
We implement a thread scheduler. There is already an existing framework  for 
it, there are couple of callback functions to implement - scheduler 
initialization and destruction, add_connection() callback when new  connection 
is about to login, wait_begin() and wait_end() callbacks  introduced in MySQL 
5.5 that are called whenever thread has to wait and couple of other optional 
callbacks that we won’t implement at all.

Functionality common for threadpools implementation:
Will be implemented in sql/threadpool_common.cc.  We need just 3 functions here
-  threadpool_add_connection(THD*) –connection login (authentication 
handshake,  initialization of connection structures)
- threadpool_remove_connection(THD*) – called on any error or whenever client 
sends logs out (sends QUIT). Closes connection.
- threadpool_process_request(THD*) – process single query
Each of this functions additionally takes care of setting/resetting  thread 
local storage variables  for the processing thread  (thd->mysys_var, PSI 
thread, DBUG structures)


Functionality specific to each threadpool implementation
-“posting work” to the threadpool. Currently used to offload connection login 
to the worker threads (s. below)
-Start asynchronous read or asynchronous wait for socket read readiness. This 
is used whenever we wait for client to send queries.

The life of a connection, high-level perspective
================================================
-Client logs in.  “acceptor” thread accepts the connection and calls 
add_connection scheduler callback. 

add_connection() posts connection to the thread pool. A worker thread calls 
threadpool_add_connection() . The purpose of offloading authentication and 
thread initialization to a worker is to avoid blocking the single acceptor 
thread – this if client is not responsive, it will block the acceptor thread. 
After login is finished, threadpool starts an asynchronous read on the 
connections’ socket . 

-Client issues a query, or client dies, or connectivity is lost. Async IO comes 
back  (or read readiness is signaled) and  one of the worker threads handles 
the query . Once query is finished and result is sent back to the client, then 
again threadpool issues an asynchronous read. Unless client shutdown or an 
communication error is detected, in which case threadpool_remove_connection() 
is called to close connection and free resources associated with the client.


How KILL CONNECTION is handled
==============================
Killing connections reliably while client is idle, was already tricky without 
threadpool (http://bugs.mysql.com/bug.php?id=37780), and it is even trickier 
with threadpool. The old method used prior to MySQL 5.5 was either to  close 
the socket on Windows (makes recv() come back with an error) or to issue 
pthread_kill() on Unix to interrupt the recv()). Neither works well in a 
threadpool environment. pthread_kill() on Unix will not work because there is 
no thread that is stuck in recv(). There is possibly a thread that does one of 
the poll() variations and polls for multiple clients at the same time,  polling 
thread is interrupted, it is not immediately clear which client should be 
killed. On the other hand, closing socket does not work either. Neither 
epoll_wait() nor kevent() nor port_get() will get a close() notification,  so 
if only close(socket) is done, client will remain listed in processlist and 
will still consume the resources. Besides, closing the connection makes race 
conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note 
however that since 5.5, MySQL is using the “closesocket” method on all 
platforms, despite race condition. 

A better solution for the Unix is not to close() but shutdown(2), it does 
exactly what one would need, the socket read (or write) is interrupted without 
closing the socket. shutdown(SHUT_RD) happens to break the  recv() in progress 
without closing the socket.  Better yet, it also works with different poll() 
variations – socket comes back with EOF flag, and subsequent recv() will also 
return 0 indicating EOF. 

Based on the said above, we will implement a new function vio_shutdown(), using 
shutdown(2) on Unixes (see below notes about Windows)

We will use vio_shutdown() for KILL to interrupt IO in progress. It will have 
the effect of naturally emulating client error, the connection will behave as 
if async IO came back and the client died. The big plus in this solution is 
that no additional inter-thread synchronization is required. A nice side-effect 
is fixing the existing race condition in KILL on Windows, even if threadpool is 
not used.

vio_shutdown on Windows.
Unfortunately, shutdown() does not interrupt IO  on Windows, but there are 
other methods to achieve it – CancelIoEx() on Vista+ , or a more involved 
method on XP possible by  queueing APC to the reading thread and issue CancelIo
() inside APC.


How client inactivity timeout (wait_timeout) is handled
=======================================================
Neither Windows threadpool not any of Unix poll variation automatically support 
client inactivity timeout, so we need to implement this ourselves. On Windows, 
we’ll use a lightweight timer queue per client (part of Windows threadpool 
API). If timer is signaled and there was no activity on the connection for too 
long, connection is killed.
On Unix we will recalculate the next wait deadline for current connection 
whenever query finishes and store it in the connection structure. We also 
maintain a global variable containing for minimum of all next  deadline values. 
Once current time exceeds this minimum, periodically running timer will scan 
through all THDs looking for connections that were not active for too long and 
killing them. next_min_deadline is also recomputed inside the same timer 
routine.

Threadpool implementation on Unix.
=================================
In the first version we will be using a similar design that is used by Oracle 
Enterprise MySQL threadpool. closed-source implementation was  described in 
Mikael Ronström’s blog series 
http://mikaelronstrom.blogspot.com/2011_10_01_archive.html  in great details.
Thread pool is partitioned into groups. Each group in itself is a complete 
small pool with a listener thread (the one waiting for network events) , work 
queue and worker threads. A group has the responsibility of keeping one thread 
running, if there is a work to be done. More than one thread in a group can be 
running, depending on circumstances (more about this later).
Clients are assigned to the groups in a round-robin fashion. This will keep 
(statistically) about the same ratio of clients per group.
Listener and worker roles are dynamically assigned. Listener can become worker, 
after waiting for network events; it can pick an event and handle thus it 
becoming a worker. Vice versa, once worker is finished processing a query, it 
can become listener.

The benefit of dynamic listener assignment, compared to traditional 
Leader/Follower model with statically assigned is that it spares thread wakeup, 
context switch and allows thread to use its full time quantum.
When worker finishes processing a query, it first checks whether the queue 
contains new requests. If queue is not empty, it picks a request from queue and 
handles is. If queue is empty, it checks  whether group has a listener, and if 
not, it registers itself as listener and waits for network events. Otherwise 
(queue is empty, and there is already a listener), worker tries to pick single 
request and handle it from a network, issuing a non-blocking  
epoll_wait/kevent/port_get with timeout value of 0.  This is yet another 
optimization designed to keep running thread running (prevent switching thread 
state from running to sleeping, and a subsequent wakeup) .
However, worker will not pick a request from queue or from network, if group is 
already “overcommitted” – i.e there are too many (4+) threads active. In this 
case, worker will either become a listener if there is no listener currently, 
or it will add itself to the sleep queue and sleep.
Finally, if there is definitely no work to be done, the thread registers itself 
in the LIFO list of waiting threads, and sleeps on the per-worker-thread 
condition variable, waiting for a wakeup. If  no wakeup comes within timeout 
(default is 1 minute), meaning there is no work to be done, then worker thread 
exists.

Sleeping threads are woken or new workers are created in one of the following 
cases
-	Listener can wake up a sleeping thread, when it populates work queue 
and decides not to pick event itself 
-	A worker thread is executing a request, and needs to wait. 
Scheduler’s  wait_begin()  callback decrements  number of currently active 
threads in the group and if it goes down to zero, it either wakes sleeping 
thread, or creates a new one.
-	Threads can be woken( or created) inside a periodic timer routine, 
that checks whether currently active thread spent too much time executing 
work,  while the group  was either  left without listener or there are request 
in the queue that are staying there for too long time.
New thread creation is throttled; i.e a new thread is not always created at 
exactly the same time where we decide there is a shortage.  Only if number of 
threads in the group is currently small,  new ones are created without any 
delay. Once the group size grows, we will create threads with a short delay, to 
preventing having too many threads in the pool. The more threads are currently 
in the group, the longer will be the throttling period.

Threadpool implementation on Windows
====================================
Windows implementation is simpler than the Unix one, since it is OS that takes 
over the task of creating and waking threads, queueing work and managing 
threadpool size is . There are couple of tweaks however that are worth noting.
Every MySQL thread that uses mysys functions needs some thread local storage 
structures setup upon thread creation . my_thread_init()  is called whenever 
new thread is started and my_thread_exit() does the cleanup prior to thread 
exit. Windows threadpool creates threads transparently to the user, thus we 
will used fiber local storage variable (basically TLS with destructor)  to 
ensure that every used thread calls my_thread_init() once. FLS destructor will 
call my_thread_exit() ensuring proper cleanup once Windows threadpool decides 
to shutdown a thread.
It is worth noting that there will be no threadpool on XP/2003 since the new 
threadpool API does not exist on on XP.  However we still need to ensure that 
server starts on XP even without threadpool, and this means we cannot use 
threadpool API directly, otherwise executable would have unresolved symbols and 
refuse to start. Therefore, all required functions  are loaded at runtime using 
GetProcAddress(GetModuleHandle(“kernel32”), function_name).  Similar technique 
is already used in several other places (e.g to use Vista condition  variables 
and reader-writer locks)

Using threadpool
================
Setting the scheduler.
thread_handling=pool-of-threads


We provide several variables that tweak threadpool behavior (However generally, 
there should be no need to tweak them, perhaps apart from thread_pool_size on 
Unix)
- thread_pool_size  (Unix only) – number of thread groups. 
Minimum value is 1, maximum value is 128, and the default is number of CPUs on 
the machine as OS reports it (can be number of cores and hyperthreads also 
count). thread_pool_size is roughly equivalent to the maximum number 
of threads that should be actively running at 
the same time. However there are different factors that can make actual number 
of threads running at the same time a little higher or lower than the value.
- thread_pool_stall_limit (Unix only) – The interval for the periodic timer 
that checks for thread stalls (and possibly wakes or creates new threads if 
stall is detected). Default value is 500. 
- thread_pool_idle_timeout(Unix only) – Maximum time, in seconds,  after which 
sleeping  worker thread will retire, i.e exit.
- thread_pool_min_threads(Windows only) – minimum number of threads in the 
pool. Default is 0. The actual work is done by SetThreadpoolThreadMinimum()  
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686268(v=vs.85).aspx 
- thread_pool_max_threads(Windows and Unix)
Maximum number of threads in the threadpool. New threads are not created if 
this limit is reached . On Windows, the actual work is done by 
SetThreadpoolThreadMaximum()  http://msdn.microsoft.com/en-
us/library/windows/desktop/ms686266(v=vs.85).aspx 

Status variables
================
There are 2 system variables introduced by the thread pool
Threadpool_threads –  total number of threads in threadpool
Threadpool_idle_threads  – number of threads in sleep queues. On Windows it is  
always 0.

Running benchmarks
==================
When running sysbench and maybe other benchmarks, that create many threads on 
the same machine as the server, it is advisable to run benchmark driver and 
server on different CPUs to get the realistic results.
Running lots of driver threads and only several server threads on the same CPUs 
will have the effect that OS scheduler will schedule benchmark driver threads 
to run with much higher probability than the server threads, that is driver 
will preempt the server.
Use “taskset –pc” on Linuxes, and “set /affinity” on Windows to separate 
benchmark driver and server CPUs

Comparison with MySQL enterprise threadpool
===========================================
Here are the downsides of Oracle’s implementation.
-MySQL/Oracle is using efficient epoll() on Linux and inefficient poll() 
everywhere else. 
-Client login. Login just blocks the acceptor thread, and network IO is done 
synchronously, which means slow client will prevent logons of other clients.

-wait_timeout does not work with MySQL Enterprise threadpool.
-KILL seems to have a high overhead and require careful locking due to 
inefficient implementation. 


Here are features Oracle has and we don’t
- Restricting number of concurrent simultaneous transactions 
(http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-
of_21.html ). The implementation lowers the priority of a connection once 
transactions starts (after “BEGIN” statement). I’m not sure what was the exact 
problem solved by this method, and there is no scientific explanation.  Perhaps 
it was kernel_mutex  problems solved in subsequent versions of MySQL. So far it 
I have to assume this was merely a trick to get better sysbench graph.

- There is an information_schema table for threadpool with extensive of 
information about the internals.

Here are features that we have and Oracle does not
- Restrict  maximum number of pool threads 

Threadpool parameter differences
================================
Not available in MariaDB, but in MySQL

thread_pool_prio_kickup_timer – related to restriction of concurrent 
transaction count
thread_pool_high_priority_connection – related to restriction of concurrent 
transaction count
thread_pool_max_unused_threads –specific to MySQL Enterprise Threadpool thread 
retirement logic. In MariaDB, use thread_pool_idle_timeout
thread_pool_algorithm – magic pixie dust that was likely can be used to improve 
the published sysbench results. No explanation is provided on what it actually 
does.

Not available in MySQL Enterprise Threadpool, but in MariaDB

threadpool_idle_timeout – specific to MariaDB thread retirement logic
threadpool_max_threads – maximum number of threadpool worker threads

Works differently in MySQL and MariaDB

threadpool_stall_limit – in MySQL Enterprise threadpool, the unit of 
measurement is 10 ms, such that value of 6 means 60 ms. In MariaDB, the unit of 
measurement is 1ms (for no other reason than using common units of measurements)
 Low-Level Design
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
Total0 
 Hrs WorkedProgressCurrentOriginal
This Task0300300
Total0300300
 
 Funding and Votes
Votes: 1: 0%
 Change vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Wlad - Sat, 14 Jan 2012, 03:13
    
High-Level Specification modified.
--- /tmp/wklog.246.old.14189	2012-01-14 03:13:17.000000000 +0000
+++ /tmp/wklog.246.new.14189	2012-01-14 03:13:17.000000000 +0000
@@ -235,7 +235,7 @@
 - thread_pool_size  (Unix only) – number of thread groups. 
 Minimum value is 1, maximum value is 128, and the default is number of CPUs on 
 the machine as OS reports it (can be number of cores and hyperthreads also 
-count also count). thread_pool_size is roughly equivalent to the maximum number 
+count). thread_pool_size is roughly equivalent to the maximum number 
 of threads that should be actively running at 
 the same time. However there are different factors that can make actual number 
 of threads running at the same time a little higher or lower than the value.

(Wlad - Sat, 14 Jan 2012, 03:11
    
High-Level Specification modified.
--- /tmp/wklog.246.old.14137	2012-01-14 03:11:14.000000000 +0000
+++ /tmp/wklog.246.new.14137	2012-01-14 03:11:14.000000000 +0000
@@ -232,10 +232,11 @@
 We provide several variables that tweak threadpool behavior (However generally, 
 there should be no need to tweak them, perhaps apart from thread_pool_size on 
 Unix)
-- thread_pool_size  (Unix only) – number of thread groups. Value 0 (default) 
-has a special meaning – the is equivalent to setting variable to the number of 
-CPUs.  Maximum value for thread_pool_size is 128.  thread_pool_size is roughly 
-equivalent to the maximum number of threads that should be actively running at 
+- thread_pool_size  (Unix only) – number of thread groups. 
+Minimum value is 1, maximum value is 128, and the default is number of CPUs on 
+the machine as OS reports it (can be number of cores and hyperthreads also 
+count also count). thread_pool_size is roughly equivalent to the maximum number 
+of threads that should be actively running at 
 the same time. However there are different factors that can make actual number 
 of threads running at the same time a little higher or lower than the value.
 - thread_pool_stall_limit (Unix only) – The interval for the periodic timer 

(Sergei - Sat, 10 Dec 2011, 08:48
    
Observers changed: Sergei

(Wlad - Fri, 09 Dec 2011, 22:33
    
High-Level Specification modified.
--- /tmp/wklog.246.old.7503	2011-12-09 22:33:30.000000000 +0000
+++ /tmp/wklog.246.new.7503	2011-12-09 22:33:30.000000000 +0000
@@ -102,14 +102,17 @@
 conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note 
 however that since 5.5, MySQL is using the “closesocket” method on all 
 platforms, despite race condition. 
+
 A better solution for the Unix is not to close() but shutdown(2), it does 
 exactly what one would need, the socket read (or write) is interrupted without 
 closing the socket. shutdown(SHUT_RD) happens to break the  recv() in progress 
 without closing the socket.  Better yet, it also works with different poll() 
 variations – socket comes back with EOF flag, and subsequent recv() will also 
 return 0 indicating EOF. 
+
 Based on the said above, we will implement a new function vio_shutdown(), using 
 shutdown(2) on Unixes (see below notes about Windows)
+
 We will use vio_shutdown() for KILL to interrupt IO in progress. It will have 
 the effect of naturally emulating client error, the connection will behave as 
 if async IO came back and the client died. The big plus in this solution is 

(Wlad - Fri, 09 Dec 2011, 22:32
    
Version updated.
--- /tmp/wklog.246.old.7485	2011-12-09 22:32:15.000000000 +0000
+++ /tmp/wklog.246.new.7485	2011-12-09 22:32:15.000000000 +0000
@@ -1,2 +1,2 @@
-Maria-2.0
+Server-5.5
 

(Wlad - Fri, 09 Dec 2011, 22:32
    
Supervisor updated:  -> Monty

(Wlad - Fri, 09 Dec 2011, 22:31
    
High-Level Specification modified.
--- /tmp/wklog.246.old.7476	2011-12-09 22:31:39.000000000 +0000
+++ /tmp/wklog.246.new.7476	2011-12-09 22:31:39.000000000 +0000
@@ -1,2 +1,323 @@
+High-level overview
+===================
 
+MariaDB 5.5 will implement a dynamic threadpool. There was already a threadpool 
+starting from 5.1, the new one is a complete reimplementation of it, the most 
+distinguishing feature is the this new pool will adapt itself to the load . The 
+size of the pool will grow when there is a load and  shrink if there is no 
+work. 
+For the short-queries and multiple concurrent clients, number of threads that 
+are CPU-active will usually be smaller than in one-thread-per-connection 
+method, but we aim still to load CPUs 100%. The benefits of smaller number of 
+concurrently running threads are reduced lock contention inside database server 
+and reduced context switching.
+
+High-level implementation
+=========================
+Ideally, all operating systems, we support would provide thread pooling 
+facilities, including asynchronous IO integration. We could rely on the OS 
+kernel to size the pool – it knows best how many threads should run at any 
+given point, when thread is running and when it is sleeping and it whether it 
+makes sense to create or activate another thread from pool. Such facility makes 
+implementation a lot smaller and more robust. In practice however, it is only 
+Windows and recent version of OSX that provide an integrated threadpool . The 
+first version in MariaDB will have two different threadpool implementations – 
+one based on Windows threadpooling API , and one generic for Unix systems that 
+provide  scalable IO multiplexing primitives (that is, Linux with epoll, 
+BSD/OSX with kevent, and Solaris with event ports). It would be great to have 
+an implementation based on OSX’s  native libdispatch, but  we’ll have to  delay 
+this to some later point , until we get a solid Mac system to test it
+So why there is an own implementation for Windows:
+-Most importantly, it aligns with the goal to provide the best implementation 
+for each platform.
+-It  would be rather awkward to port Windows asynchronous IO to poll-like egde-
+triggered Unix polls(), also it would be very awkward to impossible to port 
+shared memory connections based to those APIs (named pipes could be a bit 
+simpler though). Shared memory is easy with Windows threadpool, since it 
+provides a way to  asynchronously wait for anything, not just for IO 
+completions, but for example for event handles to be signaled.
+
+What needs to be implemented?
+=============================
+We implement a thread scheduler. There is already an existing framework  for 
+it, there are couple of callback functions to implement - scheduler 
+initialization and destruction, add_connection() callback when new  connection 
+is about to login, wait_begin() and wait_end() callbacks  introduced in MySQL 
+5.5 that are called whenever thread has to wait and couple of other optional 
+callbacks that we won’t implement at all.
+
+Functionality common for threadpools implementation:
+Will be implemented in sql/threadpool_common.cc.  We need just 3 functions here
+-  threadpool_add_connection(THD*) –connection login (authentication 
+handshake,  initialization of connection structures)
+- threadpool_remove_connection(THD*) – called on any error or whenever client 
+sends logs out (sends QUIT). Closes connection.
+- threadpool_process_request(THD*) – process single query
+Each of this functions additionally takes care of setting/resetting  thread 
+local storage variables  for the processing thread  (thd->mysys_var, PSI 
+thread, DBUG structures)
+
+
+Functionality specific to each threadpool implementation
+-“posting work” to the threadpool. Currently used to offload connection login 
+to the worker threads (s. below)
+-Start asynchronous read or asynchronous wait for socket read readiness. This 
+is used whenever we wait for client to send queries.
+
+The life of a connection, high-level perspective
+================================================
+-Client logs in.  “acceptor” thread accepts the connection and calls 
+add_connection scheduler callback. 
+
+add_connection() posts connection to the thread pool. A worker thread calls 
+threadpool_add_connection() . The purpose of offloading authentication and 
+thread initialization to a worker is to avoid blocking the single acceptor 
+thread – this if client is not responsive, it will block the acceptor thread. 
+After login is finished, threadpool starts an asynchronous read on the 
+connections’ socket . 
+
+-Client issues a query, or client dies, or connectivity is lost. Async IO comes 
+back  (or read readiness is signaled) and  one of the worker threads handles 
+the query . Once query is finished and result is sent back to the client, then 
+again threadpool issues an asynchronous read. Unless client shutdown or an 
+communication error is detected, in which case threadpool_remove_connection() 
+is called to close connection and free resources associated with the client.
+
+
+How KILL CONNECTION is handled
+==============================
+Killing connections reliably while client is idle, was already tricky without 
+threadpool (http://bugs.mysql.com/bug.php?id=37780), and it is even trickier 
+with threadpool. The old method used prior to MySQL 5.5 was either to  close 
+the socket on Windows (makes recv() come back with an error) or to issue 
+pthread_kill() on Unix to interrupt the recv()). Neither works well in a 
+threadpool environment. pthread_kill() on Unix will not work because there is 
+no thread that is stuck in recv(). There is possibly a thread that does one of 
+the poll() variations and polls for multiple clients at the same time,  polling 
+thread is interrupted, it is not immediately clear which client should be 
+killed. On the other hand, closing socket does not work either. Neither 
+epoll_wait() nor kevent() nor port_get() will get a close() notification,  so 
+if only close(socket) is done, client will remain listed in processlist and 
+will still consume the resources. Besides, closing the connection makes race 
+conditions (mentioned by Davi here http://lists.mysql.com/commits/63632) . Note 
+however that since 5.5, MySQL is using the “closesocket” method on all 
+platforms, despite race condition. 
+A better solution for the Unix is not to close() but shutdown(2), it does 
+exactly what one would need, the socket read (or write) is interrupted without 
+closing the socket. shutdown(SHUT_RD) happens to break the  recv() in progress 
+without closing the socket.  Better yet, it also works with different poll() 
+variations – socket comes back with EOF flag, and subsequent recv() will also 
+return 0 indicating EOF. 
+Based on the said above, we will implement a new function vio_shutdown(), using 
+shutdown(2) on Unixes (see below notes about Windows)
+We will use vio_shutdown() for KILL to interrupt IO in progress. It will have 
+the effect of naturally emulating client error, the connection will behave as 
+if async IO came back and the client died. The big plus in this solution is 
+that no additional inter-thread synchronization is required. A nice side-effect 
+is fixing the existing race condition in KILL on Windows, even if threadpool is 
+not used.
+
+vio_shutdown on Windows.
+Unfortunately, shutdown() does not interrupt IO  on Windows, but there are 
+other methods to achieve it – CancelIoEx() on Vista+ , or a more involved 
+method on XP possible by  queueing APC to the reading thread and issue CancelIo
+() inside APC.
+
+
+How client inactivity timeout (wait_timeout) is handled
+=======================================================
+Neither Windows threadpool not any of Unix poll variation automatically support 
+client inactivity timeout, so we need to implement this ourselves. On Windows, 
+we’ll use a lightweight timer queue per client (part of Windows threadpool 
+API). If timer is signaled and there was no activity on the connection for too 
+long, connection is killed.
+On Unix we will recalculate the next wait deadline for current connection 
+whenever query finishes and store it in the connection structure. We also 
+maintain a global variable containing for minimum of all next  deadline values. 
+Once current time exceeds this minimum, periodically running timer will scan 
+through all THDs looking for connections that were not active for too long and 
+killing them. next_min_deadline is also recomputed inside the same timer 
+routine.
+
+Threadpool implementation on Unix.
+=================================
+In the first version we will be using a similar design that is used by Oracle 
+Enterprise MySQL threadpool. closed-source implementation was  described in 
+Mikael Ronström’s blog series 
+http://mikaelronstrom.blogspot.com/2011_10_01_archive.html  in great details.
+Thread pool is partitioned into groups. Each group in itself is a complete 
+small pool with a listener thread (the one waiting for network events) , work 
+queue and worker threads. A group has the responsibility of keeping one thread 
+running, if there is a work to be done. More than one thread in a group can be 
+running, depending on circumstances (more about this later).
+Clients are assigned to the groups in a round-robin fashion. This will keep 
+(statistically) about the same ratio of clients per group.
+Listener and worker roles are dynamically assigned. Listener can become worker, 
+after waiting for network events; it can pick an event and handle thus it 
+becoming a worker. Vice versa, once worker is finished processing a query, it 
+can become listener.
+
+The benefit of dynamic listener assignment, compared to traditional 
+Leader/Follower model with statically assigned is that it spares thread wakeup, 
+context switch and allows thread to use its full time quantum.
+When worker finishes processing a query, it first checks whether the queue 
+contains new requests. If queue is not empty, it picks a request from queue and 
+handles is. If queue is empty, it checks  whether group has a listener, and if 
+not, it registers itself as listener and waits for network events. Otherwise 
+(queue is empty, and there is already a listener), worker tries to pick single 
+request and handle it from a network, issuing a non-blocking  
+epoll_wait/kevent/port_get with timeout value of 0.  This is yet another 
+optimization designed to keep running thread running (prevent switching thread 
+state from running to sleeping, and a subsequent wakeup) .
+However, worker will not pick a request from queue or from network, if group is 
+already “overcommitted” – i.e there are too many (4+) threads active. In this 
+case, worker will either become a listener if there is no listener currently, 
+or it will add itself to the sleep queue and sleep.
+Finally, if there is definitely no work to be done, the thread registers itself 
+in the LIFO list of waiting threads, and sleeps on the per-worker-thread 
+condition variable, waiting for a wakeup. If  no wakeup comes within timeout 
+(default is 1 minute), meaning there is no work to be done, then worker thread 
+exists.
+
+Sleeping threads are woken or new workers are created in one of the following 
+cases
+-	Listener can wake up a sleeping thread, when it populates work queue 
+and decides not to pick event itself 
+-	A worker thread is executing a request, and needs to wait. 
+Scheduler’s  wait_begin()  callback decrements  number of currently active 
+threads in the group and if it goes down to zero, it either wakes sleeping 
+thread, or creates a new one.
+-	Threads can be woken( or created) inside a periodic timer routine, 
+that checks whether currently active thread spent too much time executing 
+work,  while the group  was either  left without listener or there are request 
+in the queue that are staying there for too long time.
+New thread creation is throttled; i.e a new thread is not always created at 
+exactly the same time where we decide there is a shortage.  Only if number of 
+threads in the group is currently small,  new ones are created without any 
+delay. Once the group size grows, we will create threads with a short delay, to 
+preventing having too many threads in the pool. The more threads are currently 
+in the group, the longer will be the throttling period.
+
+Threadpool implementation on Windows
+====================================
+Windows implementation is simpler than the Unix one, since it is OS that takes 
+over the task of creating and waking threads, queueing work and managing 
+threadpool size is . There are couple of tweaks however that are worth noting.
+Every MySQL thread that uses mysys functions needs some thread local storage 
+structures setup upon thread creation . my_thread_init()  is called whenever 
+new thread is started and my_thread_exit() does the cleanup prior to thread 
+exit. Windows threadpool creates threads transparently to the user, thus we 
+will used fiber local storage variable (basically TLS with destructor)  to 
+ensure that every used thread calls my_thread_init() once. FLS destructor will 
+call my_thread_exit() ensuring proper cleanup once Windows threadpool decides 
+to shutdown a thread.
+It is worth noting that there will be no threadpool on XP/2003 since the new 
+threadpool API does not exist on on XP.  However we still need to ensure that 
+server starts on XP even without threadpool, and this means we cannot use 
+threadpool API directly, otherwise executable would have unresolved symbols and 
+refuse to start. Therefore, all required functions  are loaded at runtime using 
+GetProcAddress(GetModuleHandle(“kernel32”), function_name).  Similar technique 
+is already used in several other places (e.g to use Vista condition  variables 
+and reader-writer locks)
+
+Using threadpool
+================
+Setting the scheduler.
+thread_handling=pool-of-threads
+
+
+We provide several variables that tweak threadpool behavior (However generally, 
+there should be no need to tweak them, perhaps apart from thread_pool_size on 
+Unix)
+- thread_pool_size  (Unix only) – number of thread groups. Value 0 (default) 
+has a special meaning – the is equivalent to setting variable to the number of 
+CPUs.  Maximum value for thread_pool_size is 128.  thread_pool_size is roughly 
+equivalent to the maximum number of threads that should be actively running at 
+the same time. However there are different factors that can make actual number 
+of threads running at the same time a little higher or lower than the value.
+- thread_pool_stall_limit (Unix only) – The interval for the periodic timer 
+that checks for thread stalls (and possibly wakes or creates new threads if 
+stall is detected). Default value is 500. 
+- thread_pool_idle_timeout(Unix only) – Maximum time, in seconds,  after which 
+sleeping  worker thread will retire, i.e exit.
+- thread_pool_min_threads(Windows only) – minimum number of threads in the 
+pool. Default is 0. The actual work is done by SetThreadpoolThreadMinimum()  
+http://msdn.microsoft.com/en-us/library/windows/desktop/ms686268(v=vs.85).aspx 
+- thread_pool_max_threads(Windows and Unix)
+Maximum number of threads in the threadpool. New threads are not created if 
+this limit is reached . On Windows, the actual work is done by 
+SetThreadpoolThreadMaximum()  http://msdn.microsoft.com/en-
+us/library/windows/desktop/ms686266(v=vs.85).aspx 
+
+Status variables
+================
+There are 2 system variables introduced by the thread pool
+Threadpool_threads –  total number of threads in threadpool
+Threadpool_idle_threads  – number of threads in sleep queues. On Windows it is  
+always 0.
+
+Running benchmarks
+==================
+When running sysbench and maybe other benchmarks, that create many threads on 
+the same machine as the server, it is advisable to run benchmark driver and 
+server on different CPUs to get the realistic results.
+Running lots of driver threads and only several server threads on the same CPUs 
+will have the effect that OS scheduler will schedule benchmark driver threads 
+to run with much higher probability than the server threads, that is driver 
+will preempt the server.
+Use “taskset –pc” on Linuxes, and “set /affinity” on Windows to separate 
+benchmark driver and server CPUs
+
+Comparison with MySQL enterprise threadpool
+===========================================
+Here are the downsides of Oracle’s implementation.
+-MySQL/Oracle is using efficient epoll() on Linux and inefficient poll() 
+everywhere else. 
+-Client login. Login just blocks the acceptor thread, and network IO is done 
+synchronously, which means slow client will prevent logons of other clients.
+
+-wait_timeout does not work with MySQL Enterprise threadpool.
+-KILL seems to have a high overhead and require careful locking due to 
+inefficient implementation. 
+
+
+Here are features Oracle has and we don’t
+- Restricting number of concurrent simultaneous transactions 
+(http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-
+of_21.html ). The implementation lowers the priority of a connection once 
+transactions starts (after “BEGIN” statement). I’m not sure what was the exact 
+problem solved by this method, and there is no scientific explanation.  Perhaps 
+it was kernel_mutex  problems solved in subsequent versions of MySQL. So far it 
+I have to assume this was merely a trick to get better sysbench graph.
+
+- There is an information_schema table for threadpool with extensive of 
+information about the internals.
+
+Here are features that we have and Oracle does not
+- Restrict  maximum number of pool threads 
+
+Threadpool parameter differences
+================================
+Not available in MariaDB, but in MySQL
+
+thread_pool_prio_kickup_timer – related to restriction of concurrent 
+transaction count
+thread_pool_high_priority_connection – related to restriction of concurrent 
+transaction count
+thread_pool_max_unused_threads –specific to MySQL Enterprise Threadpool thread 
+retirement logic. In MariaDB, use thread_pool_idle_timeout
+thread_pool_algorithm – magic pixie dust that was likely can be used to improve 
+the published sysbench results. No explanation is provided on what it actually 
+does.
+
+Not available in MySQL Enterprise Threadpool, but in MariaDB
+
+threadpool_idle_timeout – specific to MariaDB thread retirement logic
+threadpool_max_threads – maximum number of threadpool worker threads
+
+Works differently in MySQL and MariaDB
+
+threadpool_stall_limit – in MySQL Enterprise threadpool, the unit of 
+measurement is 10 ms, such that value of 6 means 60 ms. In MariaDB, the unit of 
+measurement is 1ms (for no other reason than using common units of measurements)
 


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB