If you follow the Kernl Blog you’ll know that recently I’ve been writing about load testing different managed WordPress cloud providers. Half of the reason for doing this is to shake out any bugs in Kernl’s WordPress load testing platform and the other half is to learn whats out there in terms of managed WordPress hosting.
As I went through the first round of tests I kept thinking: “I wonder how they achieve that level of performance with WordPress?”. This blog post and the post that will follow it are a chronicle of my attempts to scale WordPress to the levels that these managed cloud providers are achieving in an economical fashion.
Having done a handful of load tests against other cloud providers I figured that I should hold myself to the same tests. The scale I’m going to try and achieve is:
- 200 concurrent users for 10 minutes.
- 2000 concurrent users for 2 hours.
- 20000 concurrent users for 1 hour.
The first test is just to shake out bugs in the load test, but I have seen some providers start to throw errors at that level. The second test is testing for sustained load. And the third test is simulating a heavy traffic spike.
- 1 CPU
- 1GB RAM
- 1000GB data transfer
- Ubuntu 18.10
I chose to use the LEMP stack instead of the LAMP stack mostly because I’m more familiar with tuning Nginx for performance. I followed the guide at https://www.digitalocean.com/community/tutorials/how-to-install-linux-nginx-mysql-php-lemp-stack-ubuntu-18-04 to get things running. The software specs:
- PHP 7.2
- Nginx 1.15.5
- MySQL 5.7.24
The first test went really well. I didn’t performance tune anything and didn’t have any sort of cache enabled. After 10 minutes we had settled into 35 requests / second and didn’t see any failures at all.
For 90% of people this is probably more performance than they would ever need. The response time distribution was even awesome. 100% of requests finished in ~500ms.
And Then The Wheels Fell Off
After my early success with the basic 200 user load test I thought it was time to throw some serious load at my WordPress install. This time I did the 2000 concurrent users for 2 hours test. At this point there still wasn’t any caching plugin installed.
As you can see things didn’t go well. We peaked at around 40 requests/s but then our failure rate started to increase is a really bad way. You can also see that we sorta stopped fielding requests after awhile. Looking at the system load information, you can see why things went poorly. The $5 droplet just couldn’t handle anymore.
As you would expect in this situation, the response time distribution was pretty dismal. In fact, this is the worst response time distribution that I’ve seen in all the load testing that I’ve performed 🙂
After reaching the max capacity of the $5 droplet with no tuning, it was time to try and scale.
WP Super Cache Me
WP Super Cache is a caching plugin that generates static HTML files of your WordPress site. For read-heavy sites its tough to beat in terms of performance. The blog that I’m load testing with definitely falls into this category so it was the right choice for this test.
This test was simply a repeat of the last test (2000 users, 2 hours, etc) but with caching enabled. The results were pretty great.
With WP Super Cache enabled on the $5 droplet we were able to field around 135 req/s, however you can see that our error rate was elevated during much of the test. If you expect to see this sort of traffic on a regular basis then this isn’t a great outcome but still pretty respectable for $5/month. The response time distribution tells a different story though:
Whats the point of serving 135 req/s if it takes more than 10s per request for 33% of your users? People are just going to close the tab after 1 second so we obviously have some more work to do.
Scale Me Up
When scaling any website you have 2 options (and they aren’t mutually exclusive):
- Scale up (vertically)
- Scale out (horizontally)
Scaling up is usually the easiest thing to do because you’re basically throwing more hardware at the problem. Digital Ocean makes scaling up really easy so I decided to give that a go first. This test was once again just a repeat of the 2000 users for 2 hours test but with better hardware. I upgraded from 1 CPU to 3 CPUs which seemed like the right choice given that it didn’t appear that memory was the problem in my previous tests.
So how did it go? Real good actually. Once all the load test users were sending requests we settled in at 344 request / second. If that rate continued all day that comes out to 29 million requests. Not bad for $15/month.
We’re still seeing some failures, but relative to the number of requests it is much lower than the previous test. We can do better but that will likely take some more vertical or horizontal scaling. But what about the response times? Turns out adding more CPUs helped out quite a bit.
100% of our requests finished in under 1.6s. While not SUPER fast it is still a respectable showing for the sort of load that this box was receiving. Even more impressive is that 90% of requests finished in under 100ms and some of that could be attributed to latency. The droplet was spun up in NYC3 and the load test generators were in Toronto, Canada.
The biggest selling point (for me) with WordPress is that it’s easy. With very little configuration or effort I was able to get a WordPress installation serving > 300 req/s. Sure it wasn’t perfect. I am still getting elevated error rates and vertical scaling can only take us so far. But this is likely good enough for almost anyone.
In part 2 of this series I’ll attempt to scale WordPress horizontally by using shared block storage to host the WordPress file system, a dedicated MySQL machine, and a bunch of application servers running behind a load balancer. The goal is serve 20,000 (or more!) concurrent users for 1 hour without any errors and response times below 1 second. Follow @kernl_ on Twitter to be notified when part 2 is published!