Your website is only as fast as its slowest component. In a WordPress server infrastructure, database servers are often the slowest. This is because databases depend on good old hard disks instead of the super fast server memory to fetch data.
In high traffic WordPress websites, database servers are known to incur intermittent downtime and performance bottlenecks. Today we’ll see how a Percona XtaDB cluster can be used to deliver WordPress high availability hosting.
WordPress stores all its posts, pages, comments, settings and plugin information in its database. So, for each information displayed on the website, several database accesses need to be made.
While we’ve been able to minimize database requests using caching systems (such as Varnish, APC, Memcached, etc.), about 40% of site visits still required access to the database server.
In these websites, the following top causes we’ve noted for database performance issues.
- Unoptimized database tables – Data gets written and deleted from tables all the time. Over time, this can lead to empty spaces in database files, and makes the database size very large. It results in slower query execution.
- Unoptimized load distribution – On busy websites, there can be many simultaneous users on the website. If all visitors are served using a single database server, the queries will be put on a queue, leading to slow access speeds.
- Heavy WordPress plugins – Some plugins such as “dynamic price listing” or “related posts” executes several database queries for a single page. This increases the database load, and results in slow page load speed.
- Unoptimized database configuration – Depending on the number and complexity of database queries, several settings such as heap memory size, sort table size, etc. need to be adjusted. If not, queries will take longer to execute.
How to make WordPress database faster?
Every website visit leaves a record in the WordPress database. Even a small business website issues thousands of database operations per day.
So, it is natural that the database size changes, its memory requirements changes, and its processing overhead increases. If left unattended, these factors can quickly add up to form a performance bottleneck.
To prevent the formation of a bottleneck, we use several methods:
- Periodically audit the database for performance issues. We manually check the optimization levels of each table, check if the database settings (such as query_cache size) are adequate for the changed table sizes, and see if there are any anomalies in operations (such as slow queries, aborted queries, etc.). Any noted issues are fixed on the spot.
- Optimize the tables regularly to minimize database size. A large database quickly brings down the site performance. We audit database table size at least once a week, clean out logs and session tables that consume space. The tables are then optimized to compress data, and minimize file size.
- Monitor the database for serious performance issues. We keep an eye on several critical database performance metrics (such as query_count, query_time, queue_wait, etc.) that gives us an idea about database health. If we detect an anomaly, we quickly find the root cause and fix the issue before it can affect website performance.
- Identify and fix heavy site code that’s present in themes, plugins, etc. When we note a slow database query, we backtrace it to the WordPress function that sent in the query. We then work with the site developers or our WordPress developers to streamline the code.
- Change WordPress server architecture as the site grows. The performance and uptime needs of a small site is different from a large one. As the site traffic increase, it might be necessary to split the database load into multiple servers. A great way to spread the load is to use database clusters. We’ve found Percona XtraDB cluster to be a great option for this purpose.
Using Percona XtraDB cluster for WordPress high availability
As we discussed above, unoptimized database architecture is a major reason for slow speeds in high traffic WordPress sites. If all site visitors in a busy website is served by a single database server, it’s just a matter of time before it responds slow to queries.
An effective solution is to distribute the site visitors evenly to multiple database servers. This is called database load balancing. The issue then is to synchronize the data.
What if one visitor leaves a comment in one database, and another in another database? How can it be synced? The solution is master-master replication.
Out of the many replication systems we’ve tried (like MySQL replication, Galera, NDB, etc.), we’ve found Percona’s implementation of Galera, known as XtraDB cluster to be very fast, and this is the system we recommend for WordPress load balancing.
To demonstrate, let me give you an example of a recent WordPress infrastructure we implemented using Percona XtraDB cluster. Shown here is the website architecture.
The WordPress site was served using two web servers that sits behind an Nginx load balancer. The web servers connect to a Database Load Balancer implemented using HAProxy.
This load balancer distributes the database queries to 3 XtraDB servers evenly. To make sure the site always remains online, a database fail-over was configured for the load balancer using a high availability router called “KeepAliveD”.
Now, let’s get to the details of how each of these components were configured and the performance benefits we noted.
Setting up Percona XtraDB multi-master cluster
Percona XtraDB combined the advantages of MySQL’s fast InnoDB engine and Galera’s multi-master replication to create a high performance database cluster.
In comparison to MySQL’s multi-master replication or NDB cluster, XtraDB takes lower amount of memory and disk access to keep the cluster in sync – which translates to higher performance.
The design of the XtraDB cluster involved many performance, stability and security considerations. Some of them are:
1. Using 3-node cluster instead of 2-node cluster to avoid database corruption
To split the database load, we could use either a 2-server cluster or a 3-server cluster. We chose the 3-server cluster because it gave a degree of protection against “split-brain” condition – where one node can be out-of-sync, and both nodes think they are the latest. It can lead to database inconsistency. With a 3-node solution, 2 nodes can agree on which data is the “latest”.
2. Enabling binary logging for easy fault recovery
To make the cluster more robust against failures, we setup a system called “binary logging”. This enables rolling back a change (like dropping a table, deleting important data, etc.) very easy. This is not enabled by default in Galera clusters, but we’ve seen from experience that accidents like this do happen for customized WordPress websites.
3. Custom firewall configuration to disallow external access
An important consideration in setting up the cluster was security configuration. Galera does not work well with SELinux and default IPtables configuration. We custom configured the IPtables so that all nodes and load balancers could talk to each other, but no IPs could connect to the database nodes.
4. Setting up auto-commits to fix update conflicts
In a multi-master cluster, there are situations where the same database row is updated by two users at the same time (known as parallel updates). In these cases, XtraDB approves the first query, while the second query is blocked, leading to a “deadlock error”. We custom configured the cluster settings so that any query in deadlock status will be re-attempted multiple times. This was done using the parameter “wsrep_retry_autocommit”.
By distributing the database load over 3 fast servers, and configuring database caching systems such as query_cache, we were able to bring a 66% improvement (1 sec response in place of 3 secs) in loading speed during traffic spikes.
Setting up load balancing using HAProxy
To evenly distribute database queries across the 3 nodes, we needed a database load balancer. We chose HAProxy for this (instead of Nginx or Ultramonkey) as it provided the most stable integration (ie. least operational errors) with Percona cluster.
HAProxy was customized to meet the WordPress database operational requirements. The top changes were:
1. Using “Least connected” over “Round robin” for even load balancing
The default configuration of HAProxy was to use a load distribution method called “round robin”. In this, if a database query is sent to a node, the next one is sent to another node. However, some queries were small, while others were complex, leading to some servers taking longer to finish off queries while others were under utilized.
So, we used a method called “least connected”, which asked HAProxy to send a query to the node with the lowest number of connections.
2. Using “clustercheck” to isolate out-of-sync cluster nodes
A node in the XtraDB cluster can go out of sync (that is data is outdated) due to network errors, hard disk issues, etc. Continuing to send queries to that node will result in loss of data. So, a utility called “clustercheck” was setup in each node, and HAProxy was configured to connect to it periodically. If “clustercheck” reported that a node was out of sync, HAProxy was configured to not send any more queries to that node.
3. Setting up cluster error alerts for emergency support
24/7 server monitoring and emergency support is an important part of our infrastructure management plans. In this infrastructure, we hooked HAProxy to Zabbix monitoring system, so that our support engineers instantly gets an alert if one of the database clusters develop an issue.
Our support engineers can then immediately login to the server, analyze the Percona debug logs, and fix the error before the website’s performance is affected.
Keeping HAProxy highly available with KeepAliveD
With a 3-node database solution the websites can be protected against a downtime even if 2 of the servers went down. But, if the load balancer went down, all 3 nodes would be become inaccessible.
To negate this single-point-failure, we setup a “backup” HAProxy load balancer, which will be brought online by a high availability routing software called “KeepAliveD”.
The IPs of the HAProxy load balancers (both “main” and “backup”) were configured to be controlled by KeepAliveD. In both load balancer servers, KeepAliveD program keeps in touch with each other.
If the KeepAliveD program in “backup” HAProxy detects that the “main” server is down, it’ll assume the ownership of the IP, and bring it online in the backup server.
This concept is called “Floating IP” – in that the IP can “float” to another server when the main server goes down. By employing this technique, we were able to achieve virtually 100% uptime for database servers.
Business websites need an infrastructure that is cost effective, while capable of fast performance. In this solution we explained how we used Percona XtraDB to resolve database level performance bottlenecks and to ensure high availability for high traffic WordPress sites.
No comments:
Post a Comment