Load testing is fun. Breaking things is fun. Breaking WordPress with load testing is even more fun. But in the era of highly scaleable WordPress hosting solutions, can we even break WordPress anymore? Oh yes, yes we can. The Crucible Challenge can.
The Crucible WordPress Performance challenge is a deceptively simple test inspired by the poor ops teams that have to handle traffic from Super Bowl advertisements. Given a WordPress site with consistent content and URL mappings:
Handle 50,000 (@ 500 per second ramp up) concurrent authenticated users for 2 hours with load test generators in New York, London, Amsterdam, Singapore, Bangalore, San Francisco, Toronto and Frankfurt.
Have an error rate below 0.1%.
Average response time should be below 800ms.
Median response time should be below 700ms.
99th percentile response time should be below 800ms.
Half way through the test, you must flush your cache.
If you are just starting with Cloudways it can be tough to figure out where your money is best spent! The prices are different for each platform and how they are sized isn’t them same either. This review gathers data using Kernl’s WordPress Load Testing tool and then massages that data into someone easy to digest.
Google Cloud – A “small” instance out of their Northern Virginia data center. $39.36 / month.
Amazon Web Services – A “small” instance out of their Northern Virginia data center. $36.51 / month.
Vultr – “2GB” plan out of their New Jersey data center. $23 / month
Linode – “2GB” plan from their New Jersey data center. $24 / month
Digital Ocean – “2GB” plan from one of the New York City data centers. $22 / month.
And we had the following load testing setup:
Content was an export of this blog
1000 concurrent users
30 minute duration
3 users per second ramp up
Load generating machines were in the San Francisco #2 Digital Ocean data center.
Overall Cloudways Performance and Value
Each provider performed extremely well in our tests. No errors were reported and they all reached over 700 requests / s sustained traffic. However not all hosts are created equal.
Costs / Month (cents)
$ / 1000 req (cents)
Given that each load test was exactly the same, you can see that some hosts performed better than others.
AWS and GCP are a bit more costly than Linode, Vultr, and Digital Ocean and they also performed worse than the lower cost providers.
Cloudways Provider Response Times
While cost per requests is a good metric we also care about the quality of those requests. Quality can be measured in a number of ways, but response time is often a “good enough” measure of how well a particular host performs. For the providers that you can get through Cloudways, the variance in request quality is extremely interesting.
The chart below shows the 99th percentile response time performance for each provider. It means that 99% of all requests during the load test finished at or below this time.
You can see that lower cost providers once again out-perform the higher cost AWS and GCP in pretty significant ways. The really interesting result here is that Vultr responded to 99% of requests in 240ms or less! Way to go Vultr!
If you aren’t sure which provider is the best value on Cloudways, from a raw cents per requests perspective you can’t go wrong with Linode, Vultr, and Digital Ocean. It’s the same story when you consider the quality (re: response time) of service: Vultr, Linode, and Digital Ocean are the clear winners here.
In the world of WordPress there are a lot of different plugins you can install to extend its functionality. One of the most popular plugins is Wordfence, a security plugin developed by the fine folks over at Defiant.
Most WordPress developers understand that the more plugins you add to your site the slower it goes, but exactly how much slower isn’t something that is often measured. More importantly, nobody has bothered to figure out the performance implications of installing many of the most popular plugins available to WordPress users today.
This all changes now with this series of blog posts exploring the performance implications of different WordPress plugins. In this series of posts we’ll test each plugin in isolation and then with caching enabled using Kernl’s WordPress load testing service. First up, is Wordfence.
The test machine for these tests was a $5 Digital Ocean droplet in their SFO2 (San Francisco) data center. The machine has 1GB of RAM, 1 vCPU, and a 25GB hard disk.
The software installed on the machine is as follows:
Ubuntu 19.04 with all updates installed.
For tests that required caching we used our favorite caching plugin, W3 Total Cache, with memcached as the data store.
The theme that was used was TwentyTwenty with no modifications and the content was an export of this blog.
All traffic was generated out of Digital Ocean’s NYC3 (New York City) data center.
A series of 4 tests were run to test the performance implications of installing Wordfence. They were:
Each test was with 200 concurrent users for 1 hour.
Maximum Requests per Second
The first metric that we looked at was the maximum requests per second that the site was able to handle under each situation outlined above.
As you can see the difference between having Wordfence enabled and having Wordfence disabled is huge. The non-cached site with no plugins enabled handled a maximum of 26 requests per second. With Wordfence enabled it could only handle 12.
More interesting though was that with caching enabled (W3 Total Cache) the site could handle 165 requests per second, but only 67 requests per second with Wordfence enabled.
These results were so surprising that we ran the tests twice. The results were the same (within 1%-2%) each time.
First Error Occurrence
The next metric we looked at was when did the first error occur during our load tests.
Once again we see that adding Wordfence took a pretty serious toll on our performance. For the baseline test (no plugins, no cache) we see our first error at around 26 requests / second. In the Wordfence test with no cache we saw it at 12 requests / second.
With our caching enabled test, we never saw any errors when Wordfence wasn’t enabled. However with Wordfence enabled we saw our first error at 56 requests / second.
Average Response Time
Now that we’ve looked at server capacity metrics (max requests, first error), let’s take a look at how your end user experience changes with Wordfence enabled.
These results were fairly interesting and not at all what was expected. The average response time for the baseline test was around 4.6 seconds, while the average response with Wordfence enabled was about 2.5 seconds. So why might that be? Looking through the data it appears that the baseline test had far more successful requests and that with Wordfence enabled the requests seemed to fail faster. In short, baseline test had higher throughput but slower response time. The Wordfence test had lower throughput and faster response time.
Turning our attention to the caching scenarios, we can see that enabling Wordfence is extremely problematic at scale. With caching enabled, our baseline test had an average response time of 146ms. With caching + Wordfence that time ballooned over 10x to 1874ms.
99th Percentile Response Time
Our final metric was looking at the 99th percentile response times for our different scenarios. What does that mean? It answers the question “How long does it take for 99% of requests to finish?”. This intentionally leaves out the last 1% because those are often outliers.
As you can see above, the un-cached 99th percentile hovers right around 5s when Wordfence is enabled or disabled. Since this is the 99th percentile such a high response time isn’t too surprising in both scenarios.
The more interesting scenario is when caching is enabled. For WordPress with only a caching plugin running, the 99th percentile is 370ms. That means 99% of all requests finished in 370ms. With Wordfence also enabled (Wordfence + caching), that number jumped to 2400ms. That’s a ~7X increase.
From these tests we came to a few conclusions:
Caching does not solve all of your performance problems. If your cache strategy relies on requests making it all the way through to WordPress, then you are very likely to still take a performance hit from other plugins. Things like Cloudflare, Varnish, Litespeed, or Nginx can help alleviate this problem.
Running Wordfence is expensive from a performance standpoint. The data suggests that by simply enabling Wordfence you lose about 50% of your maximum capacity and can increase your response time between 2x-7x.
Wordfence is still worth itfor a lot of people. If you’ve operated a WordPress site for any length of time you know how often they get attacked. Wordfence does a great job of reducing attack surface area and making it hard for people to attack you.
WordPress requires that you use a MySQL compatible database for its database backend. It used to be that you could confidently choose MySQL and go on with life, but in 2019 the choice isn’t quite that simple. With the MySQL, MariaDB, and Percona as the attractive options, how do you know which to choose?
Choosing a database isn’t always about performance, but for the sake of this article it will be 🙂
For this series of tests we tested database performance out-of-the-box with no special tweaks. It is quite possible that a professional database administrator could make each database run more performant, but most people hosting WordPress aren’t DBAs. With regards to caching, none was enabled. We wanted to test database performance, not cache performance.
What was tested?
The WordPress host machine was a DigitalOcean CPU Optimized droplet with 16 vCPUs (dedicated hyper-threads) and 32 GB of RAM. This monster of a machine was chosen so that we could be certain that Nginx + PHP-FPM weren’t the cause of any bottlenecks. A minor tweak to the PHP-FPM config allowed for full use of all 16 vCPUs.
The database host machines were DigitalOcean CPU Optimized droplets with 4 vCPUs (dedicated hyper-threads) and 8GB of RAM. CPU Optimized droplets were chosen because we didn’t want our tests to be at the mercy of shared CPU resources.
Each deployment was in DigitalOcean’s SFO2 region with the WordPress server communicating with the database over the internal private network. The traffic producing nodes were deployed in DigitalOcean’s NYC3 data center and communicated via the public internet.
For each database that was tested, we ran a load test with the following parameters:
500 concurrent users
2 req/s ramp up
30 minute duration
The goal here wasn’t to bring the database to it’s knees but instead see how it performed under sustained heavy load, but not so heavy that it falls over.
MariaDB WordPress Performance
During the MySQL acquisition of Oracle in 2009 there was a lot of concern amongst the core developers that Oracle would eventually close off MySQL to the world (similar to Oracle’s business model). Before that could take place, a GPL fork of MySQL was created named MariaDB.
MariaDB is open source and in active development. But how does it stand up to our WordPress load tests? Let’s find out.
First we’ll take a look at the requests and failures per second.
As you can see from the results above the performance scales up very well over time eventually peaking at ~379 req/s. We see periodic database-related errors (the vCPUs on the database were almost completely saturated) but nothing too crazy. Next, let’s see how the response times look.
The median response time of WordPress when backed by MariaDB under heavy load stays remarkably consistent. You can see that the average is slowly creeping up by the end of the test, but nothing that would be noticeable by customers yet.
The response time distribution is a little more interesting than the median response time. 90% of all requests finish in under 300ms and 99% of all requests finish in less than 500ms. Overall, the performance of MariaDB out of the box with no configuration is quite good.
Percona WordPress Performance
Another open-source fork of MySQL, Percona was started in 2006 and has been steadily delivering value-added features and enterprise support on top of MySQL for more than 12 years. With all that accumulated experience, how does Percona measure up?
Overall the Percona WordPress performance was pretty good. Not quite a good as MariaDB but nothing to scoff at. The one interesting thing was that the error rate was consistent across most of the test once the request volume passed ~200 requests/s. Lets see if the response time graph adds to the story.
The data here is actually pretty interesting. The response time for Percona was comparable to MariaDB right up until the time it started getting consistent errors. After that, it more than doubled and stayed that way for the rest of the test.
The response time distribution tells similar story to the response time chart. The difference between the 50th percentile and the 99th percentile shows that performance was very consistent across the entire test, it just wasn’t as good as MariaDB.
MySQL WordPress Performance
Last but not least (well….) in our test is MySQL. It’s been around since 1994 and is now owned by Oracle. In it’s latest releases there has been lot of great features such as window functions and more JSON features to compete with PostgreSQL. So let’s see how it did.
MySQL had a similar trajectory to Percona: It did pretty good across the board with a small but consistent series of errors after it started to go past 200 req/s. Also note that it never achieved higher than 300 req/s, which both MariaDB and Percona did.
The shape of the MySQL response time graph looks nearly identical to Percona, just about 50ms slower once the vCPUs started to get saturated. The detailed look in the response time distribution tells a similar story.
The response time distribution is where you can see Percona and MySQL diverge a bit. At the 99th percentile MySQL is returning at 875ms, while Percona was more like 550ms. In general, the distribution matches what one would expect given how the response time graph looks
The out of the box numbers for MariaDB make it look like the clear winner here. And compared to both Percona and MySQL it is in the raw performance department. This isn’t to say that you couldn’t tune Percona or MySQL to out-perform MariaDB, but only that you get more performance with zero configuration changes.
Back in June Vultr announced the general availability of their new “High Frequency” servers. Reading through the announcement I was intrigued by their claim of using “3+GHZ processors and blazing fast NVMe storage!” and immediately wondered what WordPress performance would look like versus their regular Cloud Compute offering.
What was tested?
To get a better idea of the performance characteristics of the Vultr High Frequency Compute servers, I ran several types of load tests with all Vultr servers sitting in their Silicon Valley data center:
200 concurrent users from New York City (Digital Ocean NYC3)
2000 concurrent users from New York City (Digital Ocean NYC3) & London (Digital Ocean LON1)
All servers with the exception of the “performance tuned” ones were built using the pre-built WordPress image that Vultr offers with no caching or performance tuning done. The “performance tuned” servers were created using Nginx, PHP-FPM, MariaDB, Memcached, and W3 Total Cache.
Starting with the Vultr High Frequency boxes, lets break down performance.
$6 32GB NVMe
The $6 Vultr High Frequency(HF) machine performed well against the roughly equivalent $5 Vultr Cloud Compute(CC) instance. If we take a look at the requests/failures per second charts you can see that the HF machine out-performed the CC instance by quite a lot.
As you can see the HF server was able to handle roughly twice as many requests per second as the CC server without any errors. Let’s check out the average and median response times as well as the response time distribution.
You can see that as the test progressed response times got steadily worse until leveling out at around 2.5s. The response time distribution shows that 99% of requests finished in 3s or under, with the 100th percentile outlier coming in at just under 6s. This is honestly pretty good performance for no tuning at all.
The response times for the $5 cloud compute instance were about 2 seconds worse than the high frequency instance, with the average coming in at around 4.5. The response time distribution was worse from a performance perspective, but better from a consistency perspective with the spread between the 50th percentile and the 100th percentile only being ~1.3s.
At a glance, it looks like the Vultr High Frequency server out-performs the cloud compute in a significant way for only $1 extra. However, our next tests show that it might not be so simple.
$24 128GB NVMe (Performance Tuned)
The next set of tests that were performed were 2000 concurrent users on a performance tuned setup. The traffic originated from a cluster of servers in both New York and London, with the host server being in Vultr’s Silicon Valley data center.
First, lets take a look at the requests per second handled by the $24 High Frequency server and the $20 Cloud Compute server.
Now, 2000 concurrent users is a lot. I think that the HF machine did quite well overall, but I confess that I did expect a bit more out of it. It topped out at around 1250 requests per second, with errors hovering in the ~175 requests per second range. If left to run for another 30 minutes I think it would have reached closer to 1350 requests per second.
This is where things start to get interesting. The Vultr marketing team bills the high frequency machines as heads and shoulders above the regular cloud compute machines. Maybe it is for some workloads. But for this test it doesn’t seem much better at all. Looking at both graphs you can see that the HF machine topped out a bit higher, and had fewer errors, but not enough to warrant the extra cash (in my opinion). Lets see what sort of story the response times tell.
On the performance tuned box you can see that the requests that did complete successfully were all quite fast. The average response time was between 100ms and 200ms throughout the entire test. The response time distribution was solid as well, with 99% of requests finishing in under 500ms. Not bad for 2000 concurrent users.
Now let’s take a look at how the comparable cloud compute server did.
For the Vultr Cloud Compute instance response times also hovered between 100ms-200ms, although just a hair higher than the high frequency servers. The response time distribution was excellent as well, with 99% of requests coming in under 500ms. The main difference here is that there was a 100th percentile outlier here at the high frequency server didn’t have.
Thoughts & Conclusions
From what I can tell, the Vultr High Frequency servers are better than the Cloud Compute servers, but maybe only for certain types of workloads. You can see in the $5/$6 test that they easily out-performed the cloud compute instances, but in the more expensive $20/$24 test the results weren’t so cut and dry.
If we look at it from a requests per dollar standpoint, the cloud compute instance is actually a better value for this type of workload.
I suggest running extensive load tests against the Vultr High Frequency machines before making any effort to switch over. They might be better for you. They might perform the same for more money.
Hello everyone! Kernl got some pretty interesting updates this month, so let’s dive in!
Average & Median Response Times for Load Tests – Kernl’s WordPress load testing services has always had this data available, we just never surfaced it in a way that was easy to consume. You’ll now see a new tab when you run a load test with information about average and median response times during the course of your test.
Progress Bar for Load Test Initialization – When you create a load test you will now see a progress bar that indicates how far in the process of instantiating your infrastructure we are. Prior to this update it was easy to think that the process had stalled out or broke.
Bug Fixes & Other
Load test site ownership verification would fail to certain types of HTML minification. This has been resolve.
Our internal analytics services has been refactors to be a singleton. This reduced each app server’s memory footprint by 5MB.
The public license validate endpoint now has a 10 second cache on it. This helps us deal with any sort of burst traffic from license validations.
All packages have been update on our servers.
Thats it for June! If you have any questions reach out to firstname.lastname@example.org.
Over the past 5 months I’ve been writing a lot of different articles testing WordPress performance when under heavy load. One of the comments that I often receive is “Yes, but how reliable is the host over time?”. To determine that answer I made some changes to Kernl that would allow customers to do long duration tests against their providers with a steady load. Given my affinity for Digital Ocean, I figured that would be a great first host to test.
What Was Tested & How I Tested It
Digital Ocean has several data centers across the globe and I figured that I should test each of these data centers to see how reliable they were. For this test I ran a single load test against the following data centers for 30 hours with a 25 concurrent users:
New York City (NYC3)
All requests were made from the Digital Ocean data center in San Francisco (SFO2). The target of each load test was a simple $5 / month droplet with the WordPress image from the Digital Ocean marketplace installed on it.
The table below summarizes the results for all of the long duration load tests. Click the region to see more details of the load test.
As you can see from the results above Digital Ocean’s reliability is excellent across the entire testing period. Even the data centers with the highest error rate (Frankfurt, London) had an incredibly small error rate. I’m not going to add the response time distribution results here because they were uniformly excellent.
The Bangalore (BLR1) test averaged only 20 req / s. Even though the geographic distance is far, I expected the response times to go up but to have a similar throughput.
The Toronto (TOR1) load test averaged 22 req / s for 28 hours, then jumped up to 23 req / s for that last two hours of the test. Maybe a noisy neighbor went quiet?
Digital Ocean’s Frankfurt (FRA1) and London (LON1) data centers had an order of magnitude more errors than the other data centers I tested.
As a whole, Digital Ocean performed very well in all of their data centers with a moderate amount of sustained traffic to a WordPress instance. In the future I would like to try running all of these tests twice with a different origin for each test run. It’s also worth noting that I haven’t done this type of test on any other platform yet. I hope that as I test more providers I’ll find out whether or not Digital Ocean performed as well as I think it did.
For quite awhile now Kernl has had the ability to throw some serious load at your WordPress site, but never a great way to share the results. Today that changes with the introduction of load test result sharing!
Why would I share my load test results?
The main use case for sharing your load test results is showing your clients that their new WordPress site can handle the traffic load that they expect. For someone like a YouTube or Twitter celebrity this can be a very real problem. Larger organizations also would enjoy this peace of mind.
How do I get started?
Easy! Just click the share button next to the load test that you want to share. You’ll be presented with your sharing URL that can be copy and pasted into Facebook, Twitter, Slack, or Email!
Load test your own WordPress site with Kernl! Getting started is free!
In the world of cloud computing there are a lot of different options to choose from. Normally you only need to choose how big your instance will be (2 vCPUs or 4, 2GB RAM or 6), but some cloud compute providers are upping their game and providing an even wider array of options and instance types for you to choose from.
Cloud Compute – You get your own virtual server, but it is sharing hardware resources with lots of friends. Noisy neighbors can definitely be a problem.
Dedicated – Dedicated servers, but virtualized. I (think) it is possible to run in to noisy neighbor problems in this situation.
Bare Metal – Dedicated servers and hardware. No hypervisor and no noisy neighbors taking up your resources.
In this article we’re going to see how a very basic WordPress install performs on the different types of Vultr compute instances. We’ll do so using Kernl’s WordPress Load Testing service.
As per usual with Kernlloadtests I imported this blog’s content into each load testing environment. The load test skews extremely read heavy. If you have a site that is write heavy or a mix you may see different results.
Each test was performed for 1 hour with 2000 concurrent users generating load from London and New York to Vultr’s data center in New Jersey.
For this test I used Vultr’s pre-built WordPress image with no caching. A lot of readers might say “But you can get much better performance using X or Y!”, and they would be right! But I’m not testing Apache vs Nginx performance, or W3 Total Cache vs WP Rocket, I’m testing Vultr hardware under load in a real world scenario. I simply want to know at the end of this article if Vultr Cloud Compute, Dedicated, or Bare Metal is better for WordPress hosting.
Test 1: Vultr Cloud Compute $10 / Month
The first test I performed was against the $10 per month Vultr Cloud Compute offering. As expected of a $10/month VPS performance wasn’t awesome, but it also wasn’t terrible.
As you can see, lots of failed requests and only maintaining throughput of 16 req/s. Not unexpected with a single core and 1 GB of RAM. After all, I was throwing 2000 concurrent requests per second at the server. The response time distribution was similarly bad.
Overall, the results for the $10 VPS were as expected. This isn’t really an apples to apples comparison (we’ll get to that later), but I wanted to give you an idea of what basic VPS instance performance looks like.
Test 2: Vultr Cloud Compute $80 / Month
With this test we’re starting to get closer to the cost of bare metal and dedicated instances. This server had 6 CPUs and 16GB of RAM. Considerably more robust than the $10 server.
This graph tells a much different story than the previous test. Performance peaked at 169 req/s and then leveled off at 100 req/s. We still saw a lot of errors, but once again this isn’t unexpected. Honestly if you started to get this much traffic you would likely start breaking up WordPress into its components (file system, PHP + Nginx, MySQL) and start scaling horizontally.
The response time distribution was much better for this server as well. The upper end was just as bad as the cheaper box, but the 90% and below ranges were pretty solid for the amount of traffic that was being received.
Test 3: Vultr Bare Metal $120 / Month
The Vultr Bare Metal server was the instance I was most excited about testing. I’ve always had a soft spot for hardware and getting access to a bare metal server is pretty cool. For $120 per month (on sale, price will rise to $300/month eventually) you get 8 CPUs and 32GB of RAM. This is a pretty serious server.
Lots of blue on this graph but also the expected amount of red. You can see that throwing 2 more non-virtual CPUs and 2X the RAM made a pretty big difference. We peaked at 200 req/s and then leveled out at 125 req/s. For reference that is 17.2 million requests per day.
The lower end of the response time distribution was solid, but the upper end wasn’t great at all. With all of those errors it isn’t surprising that this is the case.
Test 4: Vultr Dedicated $120 / Month
I honestly had a tough time figuring out why Vultr priced the bare metal and dedicated instances so close to each other. Dedicated is clearly inferior (far fewer CPUs and RAM) so why would anyone choose it? Anyway, let’s take a look at the graph.
This test peaked at 100 req/s and then leveled off at around 70. I really would expect a lot better performance for this sort of money.
Response time distribution was similar to the other boxes. With all the failures it tends to skew pretty hard in the wrong direction. I’m sure that there is a use case for these dedicated Vultr instances, but it definitely isn’t hosting a WordPress site.
With all of this data it was pretty easy to graph which of these is the best value.
Value was calculated by taking the cost per month and dividing it by the maximum number of requests. Based on the performance we saw above the Vultr Cloud Compute instances seem like your best value for WordPress hosting. For WordPress hosting it looks like Vultr Bare Metal and Dedicated instances aren’t a great choice. As mentioned above, there are likely use cases where they are a good choice though (maybe workloads that require very consistent performance).
As with all of these tests, your mileage may vary! I highly recommend that you run load tests on any new host that you use to get an idea of what sort of performance you can expect.
Load test your own WordPress site with Kernl! Getting started is free!