Adventures in Scalable WordPress Hosting: Part 2

Interested in testing your WordPress scalability? Check out the Kernl WordPress Load Testing beta program!

In part 1 of this series I explored scaling WordPress using WP Super Cache and by throwing more expensive hardware at the problem. In part 2 of this series we’ll go on adventure in horizontal scalability using load balancers, NFS, Memcached, and an externally hosted MySQL.

The Plan

To horizontally scale any app is an exercise in breaking things apart as much as possible. In the case of WordPress there are a few shared components that I wanted to break up:

  • File System – The file system is the most problematic part of scaling WordPress. Unless you change how WordPress stores plugins, themes, media, and other things you need to have a shared file system that all nodes in your cluster can access. There are likely some other solutions here, but this one provides a lot of flexibility.
  • MySQL – In many WordPress installs MySQL lives on the same machine as WordPress. For a horizontally scaled cluster this doesn’t work so we need a MySQL that is external.
  • Memcached – It was brought to my attention that during part 1 of this series using WP Super Cache to generate static pages was sort of cheating. In the spirit of making this harder for myself I introduced W3 Total Cache instead and will be using an external Memcached instance as the shared cache.

Now that the basic why and what is out of the way lets talk about the how. I’m a huge fan of Digital Ocean. I use them for everything except file storage so I’m going to use them for this WordPress cluster as well. Here’s how its going down:

  1. Create a droplet that will act as the file system for our cluster. Using NFS all droplets in the cluster will be able to mount it and use it for WordPress. I’m also going to use this for Memcached since NFS doesn’t take up many resources.
  2. Create a base droplet that has Nginx and PHP7.2-FPM installed on it. There is a little bit of boilerplate configuration here, but in general the install is typical. The only change to the Nginx configuration where I set the root directory to be the NFS mount. Use this base droplet to configure WordPress database settings.
  3. Use Compose.io create a MySQL database. I wanted something that was configured well that I didn’t have to think about. Totally worth the $27 / month.
  4. Once the above are done take a snapshot of the base droplet and use it to create more droplets. If all goes well you shouldn’t need to do any configuration.
  5. Using Digital Ocean’s load balancer service add your droplets to the load balancer.
  6. Voila! Thats it.
Ugly architecture diagram

No Cache Smoke Test

200 users, 10 minutes, 2 users/sec ramp up, from London

As with every load test that I do, the first test is always just to shake out any bugs in the load test itself. For this test I didn’t have any caching enabled and only a single app server behind the load balancer. It was effectively the same as the first load test I did during part 1 of this blog series.

As you can see from the graph below, performance was what we would expect from the setup that I used. We settled in to 21 requests / second with no errors.

As Expected.

The response time distribution wasn’t very great. 90% of requests finished in under 5 seconds, but thats still a very long time. Generally if I saw this response time distribution I would think that its time to add caching or scale up/out.

Not bad. Not great.

So. Many. Failures.

2000 users, 120 minutes, 2 users/sec ramp up, from London

The next test I decided to run was the sustained heavy load test. This is generally where I start to see failures from managed WordPress hosting providers. Given that I didn’t add any more app servers to the load balancer and had no caching things went as poorly as you would expect.

All the failures of failure land.

Everything was fine up until ~25 req/s and then the wheels fell off. The response time distribution was bad too. No surprises here.

50% of requests in 5 seconds, 100% in…33 seconds 🙁

Looks like its time to scale.

Horizontal Scalability

2000 users, 120 minutes, 2 users/sec ramp up, from London

Before adding Memcached to the setup I wanted to see how it scaled without it. That means adding more hardware. For this test I added four more application servers (Nginx + PHP) to the load balancer and ran the test again.

Linear Growth

As you can see from the request/failure graph we experience roughly linear growth in our maximum requests/second. Given we originally maxed out at ~20 req/s on one machine, maxing out at ~100 req/s with five machines seems like exactly the sort of result that I would expect to see. The response time distribution also started to look better:

Not perfect, but better.

Obviously a 90% score of 4 seconds isn’t awesome, but it is a lot better than the previous test. I did make a tiny tweak to the load balancer configuration that may have helped though. I decided to use the ‘least connections’ options instead of ’round robin’. ‘Least connections’ tells the load balancer to send traffic to the app server with the least number of active connections. This should help with dog piling on a server with a few slower connections.

Given the results above we can safely assume linear growth tied to the number of app servers that we have for quite some time. Meaning for each app server that I add I can expect to handle an additional ~20 req/s. With that in mind, I wanted to see what would happen if I enabled some caching on this cluster.

Gotta Go Fast

In my previous test of vertical scaling I used WP Total Cache to make things go quick. WP Total Cache generates static HTML pages for your site and then serves those. The benefit being that static pages are extremely fast to serve. In this test I wanted to try a more dynamic approach using Memcached and W3 Total Cache. W3 Total Cache takes a very different approach to caching by storing pages, objects, and database queries in Memcached. In general this caching model is more flexible, but possibly a bit slower. I installed Memcached on the same server as the NFS mount because it was under utilized. In a real production scenario I wouldn’t violate this separation of concerns.

Once I enabled W3 Total Cache and re-ran the last test I got some pretty great results.

Boom.

With W3 Total Cache enabled and 5 app servers we settled in at ~370 requests/second. More impressive is that we only saw 5 failures during the entire test. For perspective Kernl pushed 1,329,470 requests at the WordPress cluster I created. Thats a failure rate of 0.0003%.

My favorite part of this test was the response time distribution. Without having to wait on MySQL for queries the response times became crazy good.

The “bad” outlier is only 2.5s.

99% of requests finished in 29ms. And the outlier at 100% was only 2.5 seconds. Not bad for WordPress.

Going Further

Being the good software developer that I am I wanted to push this setup to it’s limits. So I decided to try a test that is an order of magnitude more difficult:

20,000 users, 10 users/sec ramp up, for 60 minutes, from London

Things didn’t go great but not because of WordPress. I won’t show any graphs of this test but I started to get limited by the network card on the NFS/Memcached machine. Digital Ocean says that I can expect around 30MB/sec out of a given droplet and with this test I was starting to bump in to that limit. If I wanted to test it further I would have had to load balance Memcached which felt a little bit outside of scope. In a real production scenario I would likely pay for a hosted Memcached service to deal with this for me.

Conclusions

With Kernl I’m always weighing the build versus buy question when it comes to infrastructure and services. Given how much effort I had to put in to making this setup horizontally scalable and how much effort it would take to make it reproducible and manageable, it hardly seems worth creating and managing my own infrastructure.

Aside from my time the cost of the hardware was also not cheap.

  • Load Balancer – $10 / month
  • MySQL Database – $27 / month
  • Memcached (if separate from NFS) – $5 / month
  • NFS Mount (if separate from Memcached) – $5 / month
  • Application Servers – $25 / month ($5 / month * 5 servers)
  • Total – $72 / month

At $72 / month I could easily have any of the managed WordPress hosting companies (GoDaddy, SiteGroup, WPEngine, etc) run my setup, handle updates, security, etc. The only potential hiccup is the traffic limits they place on your account. This setup can handle millions of requests per day and while their setups can too, they’ll charge you a hefty fee for it.

As with any decision about hardware and scaling the choice varies from person to person and organization to organization. If you have a dedicated Ops team and existing hardware, maybe scaling on your own hardware makes sense. If you’re a WordPress freelancer and don’t want to worry about it, maybe it doesn’t. IMHO I wouldn’t scale WordPress on my own. I’d rather leave it to the professionals.

Interested in testing your WordPress scalability? Check out the Kernl WordPress Load Testing beta program!

1 comment

  1. Great writeup. For GoDaddy’s Managed WordPress its visitor/traffic specs are only a recommendation. It doesn’t limit traffic or charge overages. WP Engine on the other hand gets quite expensive very quickly as you mentioned.

    In general, it’s hard to beat most Managed WordPress offerings since they include things like SSL, CDN, backups, etc.

Comments are closed.