In MySQL 8.0 we are improving the Performance Schema to support indexes. This increases the flexibility and ease of use of using performance_schema, as a number of monitoring queries will see a dramatic speed up in performance.
Background
Since it’s initial launch in MySQL 5.5, the Performance Schema has grown to 93 tables, with some of these exposing a lot of data. To use an example; consider a busy system with 1000 sessions connected (either working or idle) and a DBA performing a mysqldump or restore operation. A typical way to monitor this would be to measure the table I/O that this activity generates:
This can generate a massive amount of data! In MySQL 5.7, this query would return 300K rows to the optimizer, which would apply the WHERE clause to discard all rows except one.
As any DBA knows, this access pattern is inefficient. You need indexes!
Changes in MySQL 8.0
In MySQL 8.0, performance_schema tables are now indexed to speed up data retrieval.
For this above, instead of inspecting 1,000 threads and extract statistics for 300 instruments for every single thread, the performance schema will now search for the target thread (among the 1,000 present), and search only for the target wait statistic (wait/io/table/sql/handler here) in this thread, avoiding the need to evaluate everything else, to return only the row needed to the optimizer.
This is just an example that uses the following index:
A total of 115 indexes have been added in the performance schema in MySQL 8.0.0, to support better data access patterns in general.
This work was possible because the performance schema design, from its creation way back in MySQL 5.5, choose to expose the data as a storage engine instead of temporary tables, unlike the INFORMATION_SCHEMA. Only storage engines can expose indexes, that the optimizer can use.
Additionally, as data access to the performance schema uses the same interface as data access to regular tables, it is able to benefit from future improvements to the query optimizer (better query plans and access methods), and reduces our maintenance burden. This is again the key difference with the INFORMATION_SCHEMA implementation where dedicated code (sql/sql_show.cc) tends not to get the same level of attention, and does not benefit from general server improvements.
What does it cost?
Some readers might also have noted that the performance_schema tables are highly volatile, with data written to it continuously, while read only a few times, typically once in a while by a monitoring application. A table used in a workload which is write intensive with only a few reads is a red flag when it comes to add an index, as the cost of maintaining the index is order of magnitudes higher that the benefits.
This raises the obvious question:
Question: How much overhead was just added by this new feature?
Answer: Absolutely zero
Answer: Absolutely zero
How is that even possible!??
Well, here is the trick – The performance schema plainly lies.
Well, here is the trick – The performance schema plainly lies.
It does — not — maintain a physical index internally, be it on file or memory.
It does, however, — pretend — to the optimizer that it has indexes, so that the optimizer is coerced into using the most efficient access pattern.
It does, however, — pretend — to the optimizer that it has indexes, so that the optimizer is coerced into using the most efficient access pattern.
The keyword here is “access pattern”: the optimizer provides more information about what to look for (namely, which index and key value to use), which allows the performance schema implementation to narrow down the search for data, and improve efficiency.
Practical example using sys.session
The performance improvements from indexes can be very easily seen in many of the sys schema queries. With 1000 idle threads, the query SELECT * FROM sys.session drops from 34.70 seconds down to 1.01 seconds (a 30x improvement!):
As you can see in the output of EXPLAIN, with the addition of indexes the execution plan also changes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
mysql 8.0> EXPLAIN SELECT * FROM sys.session;
+----+-------------+------------------+-+--------+-+-------------+-+---------------------------------------------+--------------+-+
| id | select_type | table | | type | | key | | ref | rows | |
+----+-------------+------------------+-+--------+-+-------------+-+---------------------------------------------+--------------+-+
| 1 | PRIMARY | <derived3> | | ALL | | NULL | | NULL | 627572736000 | |
| 3 | DERIVED | pps | | ALL | | NULL | | NULL | 1280 | |
| 3 | DERIVED | ewc | | ref | | PRIMARY | | performance_schema.pps.THREAD_ID | 76 | |
| 3 | DERIVED | estc | | ref | | PRIMARY | | performance_schema.pps.THREAD_ID | 12 | |
| 3 | DERIVED | esc | | ref | | PRIMARY | | performance_schema.pps.THREAD_ID | 128 | |
| 3 | DERIVED | etc | | ref | | PRIMARY | | performance_schema.pps.THREAD_ID | 12 | |
| 3 | DERIVED | <derived4> | | ref | | <auto_key0> | | performance_schema.pps.THREAD_ID | 350 | |
| 3 | DERIVED | conattr_pid | | eq_ref | | PRIMARY | | performance_schema.pps.PROCESSLIST_ID,const | 1 | |
| 3 | DERIVED | conattr_progname | | eq_ref | | PRIMARY | | performance_schema.pps.PROCESSLIST_ID,const | 1 | |
| 4 | DERIVED | mt | | ALL | | NULL | | NULL | 448000 | |
| 4 | DERIVED | t | | eq_ref | | PRIMARY | | performance_schema.mt.THREAD_ID | 1 | |
+----+-------------+------------------+-+--------+-+-------------+-+---------------------------------------------+--------------+-+
11 rows in set, 1 warning (0.00 sec)
|
Conclusion
We look forward to you trying out the new support for indexes in Performance Schema available in MySQL 8.0.0. Thank you for using MySQL!
No comments:
Post a Comment