Tuesday, 17 September 2019

Mysqldump​ vs Mysqlpump vs Mydumper

It is very important for every DBA’s to backup their data frequently so that they can recover data if any problems occur such as Hardware failure, System crashes, Human mistakes. There are several ways to backup MySQL data.
They are
1) Physical Backup
2) Logical Backup
Physical Backup:
              It is also called Raw Backup. It copies the directories and its files that has database contents.
Logical Backup:
            It is also called Text Backup. It converts all the available rows into single/multiple inserts and also contains create statements to create databases, tables, triggers and so on,
This blog is to discuss logical backups in MySQL and its available utilities.
List of utilities available for logical backup
  1. Mysqldump
  2. Mysqlpump
  3. Mydumper
  4.  
Mysqldump:
The definition for mysqldump given in manual page
“The mysqldump client utility performs logical backups, producing a set of SQL statements that can be executed to reproduce the original database object definitions and table data. It dumps one or more MySQL databases for backup or transfers to another SQL server. The mysqldump command can also generate output in CSV, other delimited text, or XML format.”
It is very easy to use and syntax is also very simple. When we execute this, it converts all rows to SQL queries and print that in standard output. We can redirect this output to an outfile(>) or using an option — result-file.
For Example, I have installed MySQL 5.7 in the local machine(512MB RAM and 2 core) with standard configuration and loaded around 10 million records using Sysbench. Let’s see how long will it take to take backup and restore using dump. I have used time command in Linux to calculate real execution process for the command to complete.
[root@ansible ~]# time mysqldump -uaakash -p --all-databases > dump.sql
Enter password:
real 1m39.233s
user 0m34.673s
sys 0m10.288s
It took nearly 1 minute 39 seconds to take a dump of total databases size 2GB. You can say it is less than 2 minutes and it is very fast, But just consider the same for a database with 100GB data /more and just consider you have only way to go with logical backup at that time.
By default, dump does not dump information_schema tables by default and it never takes performance_schema
The major drawback of mysqldump is it uses only one thread while taking backup and restore. (Even your server has 64 cores). We will see how long it takes for restoration.
[root@ansible1 ~]# time mysql -uaakash -p < dump.sql
Enter password:
real 4m24.302s
user 0m13.845s
sys 0m11.962s
REPORT THIS AD

It states that restoration takes more time than taking a backup (To restore faster, we can go for physical backup). It took more than 4 minutes for restoring 2GB data.
Mysqlpump:
It is another utility which was introduced in MySQL 5.7 with a few more features when compared with mysqldump
The definition for mysqlpump given in manual page
“The mysqlpump client utility performs logical backups, producing a set of SQL statements that can be executed to reproduce the original database object definitions and table data. It dumps one or more MySQL databases for backup or transfers to another SQL server.”
The major features are
  •  Parallel processing (multi threaded) to speed up the backup
  •  Progress Indicator
  • Dumping of user accounts as (create, grant statements not as inserts for MySQL system database)
By Default, mysqlpump does not take backup of system databases such as information schema, performance schema, and some MySQL system tables unless it mentioned explicitly. It also have options –include-database–exclude-database, –include-table–exclude-table with pattern matching(%). These options are more convenient for users who want to backup only a few objects from an entire dataset. In general, mysqlpump divides its work into chunks and each is assigned to a multi-threaded queue. This multithreaded queue is processed by N threads (By default it uses 2 threads). We can define a number of threads by using this option –default-parallelism and –parallel-schemas.
For Example:
[root@mydbops ~]# time mysqlpump --user=aakash --password --default-parallelism=2 > pump.sql
Enter password:
Dump progress: 1/7 tables, 0/3295 rows
Dump progress: 19/119 tables, 182212/9867560 rows
Dump progress: 23/119 tables, 422462/9867560 rows
.
..
...
Dump progress: 117/119 tables, 9841712/9867560 rows
Dump progress: 118/119 tables, 9925712/9867560 rows
Dump completed in 44981 milliseconds
real 0m48.493s
user 0m31.997s
sys 0m8.400s
REPORT THIS AD

It took a total of 48 seconds to take a dump of the entire database(same data as used for mysqldump) and also it shows its progress which will be really helpful to know how much backup completes and it is giving time elapsed to take backup at end.
As I said earlier, default parallelism depends on the number of cores in the server. Roughly increasing the value won’t be helpful. (My machine’s core is 2)
[root@ansible ~]# time mysqlpump --user=aakash --password --default-parallelism=4 > pump.sql
Enter password:
Dump progress: 1/6 tables, 0/2531 rows
Dump progress: 19/119 tables, 204462/9867560 rows
.
..
...
Dump progress: 117/119 tables, 9886712/9867560 rows
Dump completed in 47271 milliseconds
real 0m51.241s
user 0m32.629s
sys 0m9.621s
Pump lacks a synchronization point during the initial release (–single-transaction) before MySQL 5.7.11 while restoration, it uses a single thread which is the same as mysqldump , it lacks parallel restore.
[root@mydbops ~]# time mysql -uaakash -p < /home/vagrant/pump.sql
Enter password:
real 5m32.835s
user 0m11.700s
sys 0m12.872s
Mydumper/Myloader:


This is the utility which will be more useful to take backup/restore very big tables. Here I am using it to take backup of 2GB dataset just to show how it differs from mysqldump and mysqlpump. Generally, it will be useful for very big tables not for the smaller ones.Mydumper/Myloader is not available by default. We need to install it on our own.
By default, dumper will split its work by creating a parallel process. It also depends on the number of cores on the machine (same like mysqlpump). Here we can use –rows options to take backup of very big tables. It will split the big table into smaller chunks and write each chunk to a separate file which makes it process faster.
It also has an option –exclude-database and –trx-consistency-only to support regular expressions and less locking of InnoDB tables respectively. We have –verbose option to know what dumper is doing.
Example
[root@ansible mydumper-0.9.1]# time ./mydumper -t 2 --rows=50000 -v 3 --outputdir /root/backup/ --log file /root/mydumper.log
real 0m50.926s
user 0m9.156s
sys 0m4.744s
Some of the files in the backup folder
[root@ansible backup]# ls -lrth | head -n 10
total 1.9G
-rw-r--r--. 1 root root   66 Mar 13 19:31 mysql-schema-create.sql
-rw-r--r--. 1 root root   67 Mar 13 19:31 sbtest-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest1-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest2-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest3-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest4-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest5-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest6-schema-create.sql
-rw-r--r--. 1 root root   68 Mar 13 19:31 sbtest7-schema-create.sql

Metadata file:
When the backup is running
[root@ansible backup]# cat metadata.partial
Started dump at: 2019-03-13 19:31:02
When Backup completes
[root@ansible backup]# cat metadata
Started dump at: 2019-03-13 19:31:02
Finished dump at: 2019-03-13 19:31:53
And logfile representing work done by each thread
2019-03-13 19:40:13 [INFO] - Thread 2 dumping data for `sbtest`.`sbtest4` where `id` IS NULL OR (`id` >= 1 AND `id` < 100001)

2019-03-13 19:40:13 [INFO] - Thread 1 dumping data for `sbtest`.`sbtest5` where `id` IS NULL OR (`id` >= 1 AND `id` < 100001)

2019-03-13 19:40:13 [INFO] - Thread 2 dumping data for `sbtest`.`sbtest6` where `id` IS NULL OR (`id` >= 1 AND `id` < 100001)

2019-03-13 19:40:13 [INFO] - Thread 1 dumping data for `sbtest`.`sbtest7` where `id` IS NULL OR (`id` >= 1 AND `id` < 100001)
If the option –logfile is not mentioned, it will throw the stats to the standard output.
When compared with other tools, dumper gives more info and it has some more options to make the backup faster. The utilities like mysqldump/mysqlpump use a single thread at the time of data restoration ( backup recovery). But mydumper is different from others with restoration. It has an additional toolset called myloader and it comes along with mydumper package ( No additional installation required). Myloader uses multiple threads to speed up the data restoration/recovery process.
[root@ansible1 mydumper-0.9.1]# time ./myloader --threads=2 -d /home/vagrant/backup/ -q 100 -o -v 3

** Message: 19:50:25.682: 2 threads created

** Message: 19:50:25.683: Dropping table or view (if exists) `mysql`.`columns_priv`

** Message: 19:50:25.697: Creating table `mysql`.`columns_priv`

** Message: 19:50:25.706: Dropping table or view (if exists) `mysql`.`db`

** Message: 19:50:25.709: Creating table `mysql`.`db`

** Message: 19:50:25.715: Dropping table or view (if exists) `mysql`.`engine_cost`

.

..



** Message: 19:52:41.846: Dropping table or view (if exists) `sys`.`x$waits_by_user_by_latency`

** Message: 19:52:41.847: Creating table `sys`.`x$waits_by_user_by_latency`

real 2m16.568s

user 0m2.128s

sys 0m6.756s
It took only 2 minutes and 16 seconds to restore 2GB data. It is 2X better when compared with mysqldump / mysqlpump restoration process.
ToolBackup TimeRestoration Time
Mysqldump1m 39.233 s4m 24.302
Mysqlpump48.493 s5m 32.835s
Mydumper/Myloader50.926 s2m 16.568s
Each has its own pros and cons. We need to choose our tool for logical backup wisely based on our needs our production environment and time for recovery. 

No comments:

Post a Comment

The complete list of new features in MySQL 8.0

There are over 250 new features in MySQL 8.0. The MySQL Manual is very good, but verbose. This is a list of new features in short bullet f...