Adventures in Scalable WordPress Hosting: Part 2

Interested in testing your WordPress scalability? Check out the Kernl WordPress Load Testing beta program!

In part 1 of this series I explored scaling WordPress using WP Super Cache and by throwing more expensive hardware at the problem. In part 2 of this series we’ll go on adventure in horizontal scalability using load balancers, NFS, Memcached, and an externally hosted MySQL.

The Plan

To horizontally scale any app is an exercise in breaking things apart as much as possible. In the case of WordPress there are a few shared components that I wanted to break up:

  • File System – The file system is the most problematic part of scaling WordPress. Unless you change how WordPress stores plugins, themes, media, and other things you need to have a shared file system that all nodes in your cluster can access. There are likely some other solutions here, but this one provides a lot of flexibility.
  • MySQL – In many WordPress installs MySQL lives on the same machine as WordPress. For a horizontally scaled cluster this doesn’t work so we need a MySQL that is external.
  • Memcached – It was brought to my attention that during part 1 of this series using WP Super Cache to generate static pages was sort of cheating. In the spirit of making this harder for myself I introduced W3 Total Cache instead and will be using an external Memcached instance as the shared cache.

Now that the basic why and what is out of the way lets talk about the how. I’m a huge fan of Digital Ocean. I use them for everything except file storage so I’m going to use them for this WordPress cluster as well. Here’s how its going down:

  1. Create a droplet that will act as the file system for our cluster. Using NFS all droplets in the cluster will be able to mount it and use it for WordPress. I’m also going to use this for Memcached since NFS doesn’t take up many resources.
  2. Create a base droplet that has Nginx and PHP7.2-FPM installed on it. There is a little bit of boilerplate configuration here, but in general the install is typical. The only change to the Nginx configuration where I set the root directory to be the NFS mount. Use this base droplet to configure WordPress database settings.
  3. Use Compose.io create a MySQL database. I wanted something that was configured well that I didn’t have to think about. Totally worth the $27 / month.
  4. Once the above are done take a snapshot of the base droplet and use it to create more droplets. If all goes well you shouldn’t need to do any configuration.
  5. Using Digital Ocean’s load balancer service add your droplets to the load balancer.
  6. Voila! Thats it.
Ugly architecture diagram

No Cache Smoke Test

200 users, 10 minutes, 2 users/sec ramp up, from London

As with every load test that I do, the first test is always just to shake out any bugs in the load test itself. For this test I didn’t have any caching enabled and only a single app server behind the load balancer. It was effectively the same as the first load test I did during part 1 of this blog series.

As you can see from the graph below, performance was what we would expect from the setup that I used. We settled in to 21 requests / second with no errors.

As Expected.

The response time distribution wasn’t very great. 90% of requests finished in under 5 seconds, but thats still a very long time. Generally if I saw this response time distribution I would think that its time to add caching or scale up/out.

Not bad. Not great.

So. Many. Failures.

2000 users, 120 minutes, 2 users/sec ramp up, from London

The next test I decided to run was the sustained heavy load test. This is generally where I start to see failures from managed WordPress hosting providers. Given that I didn’t add any more app servers to the load balancer and had no caching things went as poorly as you would expect.

All the failures of failure land.

Everything was fine up until ~25 req/s and then the wheels fell off. The response time distribution was bad too. No surprises here.

50% of requests in 5 seconds, 100% in…33 seconds šŸ™

Looks like its time to scale.

Horizontal Scalability

2000 users, 120 minutes, 2 users/sec ramp up, from London

Before adding Memcached to the setup I wanted to see how it scaled without it. That means adding more hardware. For this test I added four more application servers (Nginx + PHP) to the load balancer and ran the test again.

Linear Growth

As you can see from the request/failure graph we experience roughly linear growth in our maximum requests/second. Given we originally maxed out at ~20 req/s on one machine, maxing out at ~100 req/s with five machines seems like exactly the sort of result that I would expect to see. The response time distribution also started to look better:

Not perfect, but better.

Obviously a 90% score of 4 seconds isn’t awesome, but it is a lot better than the previous test. I did make a tiny tweak to the load balancer configuration that may have helped though. I decided to use the ‘least connections’ options instead of ’round robin’. ‘Least connections’ tells the load balancer to send traffic to the app server with the least number of active connections. This should help with dog piling on a server with a few slower connections.

Given the results above we can safely assume linear growth tied to the number of app servers that we have for quite some time. Meaning for each app server that I add I can expect to handle an additional ~20 req/s. With that in mind, I wanted to see what would happen if I enabled some caching on this cluster.

Gotta Go Fast

In my previous test of vertical scaling I used WP Total Cache to make things go quick. WP Total Cache generates static HTML pages for your site and then serves those. The benefit being that static pages are extremely fast to serve. In this test I wanted to try a more dynamic approach using Memcached and W3 Total Cache. W3 Total Cache takes a very different approach to caching by storing pages, objects, and database queries in Memcached. In general this caching model is more flexible, but possibly a bit slower. I installed Memcached on the same server as the NFS mount because it was under utilized. In a real production scenario I wouldn’t violate this separation of concerns.

Once I enabled W3 Total Cache and re-ran the last test I got some pretty great results.

Boom.

With W3 Total Cache enabled and 5 app servers we settled in at ~370 requests/second. More impressive is that we only saw 5 failures during the entire test. For perspective Kernl pushed 1,329,470 requests at the WordPress cluster I created. Thats a failure rate of 0.0003%.

My favorite part of this test was the response time distribution. Without having to wait on MySQL for queries the response times became crazy good.

The “bad” outlier is only 2.5s.

99% of requests finished in 29ms. And the outlier at 100% was only 2.5 seconds. Not bad for WordPress.

Going Further

Being the good software developer that I am I wanted to push this setup to it’s limits. So I decided to try a test that is an order of magnitude more difficult:

20,000 users, 10 users/sec ramp up, for 60 minutes, from London

Things didn’t go great but not because of WordPress. I won’t show any graphs of this test but I started to get limited by the network card on the NFS/Memcached machine. Digital Ocean says that I can expect around 30MB/sec out of a given droplet and with this test I was starting to bump in to that limit. If I wanted to test it further I would have had to load balance Memcached which felt a little bit outside of scope. In a real production scenario I would likely pay for a hosted Memcached service to deal with this for me.

Conclusions

With Kernl I’m always weighing the build versus buy question when it comes to infrastructure and services. Given how much effort I had to put in to making this setup horizontally scalable and how much effort it would take to make it reproducible and manageable, it hardly seems worth creating and managing my own infrastructure.

Aside from my time the cost of the hardware was also not cheap.

  • LoadĀ Balancer – $10 / month
  • MySQL Database – $27 / month
  • Memcached (if separate from NFS) – $5 / month
  • NFS Mount (if separate from Memcached) – $5 / month
  • Application Servers – $25 / month ($5 / month * 5 servers)
  • Total – $72 / month

At $72 / month I could easily have any of the managed WordPress hosting companies (GoDaddy, SiteGroup, WPEngine, etc) run my setup, handle updates, security, etc. The only potential hiccup is the traffic limits they place on your account. This setup can handle millions of requests per day and while their setups can too, they’ll charge you a hefty fee for it.

As with any decision about hardware and scaling the choice varies from person to person and organization to organization. If you have a dedicated Ops team and existing hardware, maybe scaling on your own hardware makes sense. If you’re a WordPress freelancer and don’t want to worry about it, maybe it doesn’t. IMHO I wouldn’t scale WordPress on my own. I’d rather leave it to the professionals.

Interested in testing your WordPress scalability? Check out theĀ Kernl WordPress Load Testing beta program!

Building & Scaling Kernl Analytics

Over the past 3 years I’ve often received requests from new and existing Kernl customers for some form of analytics on their plugin/theme. I avoided doing this for a long time because I wasn’t sure that I could do so economically at the scale Kernl operates at, but I eventually decided to give Kernl Analytics a whirl and see where things ended up.

kernl analytics product versions
Product Versions Graph

Concerns

After deciding to give the analytics offering a try, I had to figure how to build it. When I first set out to build Kernl Analytics I had 3 main concerns:

  • Cost – I’ve never created a web service from scratch that needs to INSERT data at 75 rows per second with peaks of up to 500 rows per second. I wanted to be sure that running this service wouldn’t be prohibitively expensive.
  • Scale – How much would I need to distribute the load? This is tightly coupled to cost.
  • Speed – This project is going to generate aĀ LOT of data by my standards. Can I query it in performant manner?

As development progressed I realized that cost and scale were non-issues. The database that I chose to use (PostgreSQL) can easily withstand this sort of traffic with no tweaking, and I was able to get things started on a $5 Digital Ocean droplet.

Kernl Analytics Architecture & Technology

Kernl Analytics was created to be it’s own micro-service with no public access to the world. All access to it is behind a firewall so that only Kernl’s Node.js servers can send requests to it. For data storage, PostgreSQL was chosen for a few reasons:

  1. Open Source
  2. The data is highly relational
  3. Performance

The application that captures the data, queries it, and runs periodic tasks is a Node.js application written in TypeScript. I chose TypeScript mostly because I’m familiar with it and wanted type safety so I wouldn’t need to write as many tests.

kernl analytics and typescript
TypeScript FTW!

With regards to size of the instance that Kernl Analytics is running on, I currently pay $15/month for a 3 core Digital Ocean droplet. I upgraded to 3 cores so that Postgres could easily handle both writes and multiple read requests at the same time. So far this setup has worked out well!

Pain Points

Overall things went well while implementing Kernl Analytics. In fact they went far better than expected. But that doesn’t mean there weren’t a few pain points along the way.

  • Write Volume – Kernl’s scale isĀ just large enough to cause some scaling and performance pains when creating an analytics service. Kernl averages 25 req/s which translates to roughly 75 INSERTs into Postgres. Kernl also has peaks of 150 req/s which scales up to about 450 INSERTs into Postgres. Postgres can easily handle this sort of load, but doing it on a $5 digital ocean droplet was taxing to say the least.
  • Hardware Upgrade – I tried to keep costs down as much as possible with Kernl Analytics, but in the end I had to increase the size of the droplet I was using to a $15 / 3-core droplet. I ended up doing that so one or two cores could be dedicated to writes while leaving a single core available for read requests. Postgres determines what actions are executed where, but adding more cores had led to a lot less resource contention.
  • Aggregation – Initially the data wasn’t aggregated at all. This caused some pain because even with some indexing, plucking data out of a table with > 2.5 million rows can be sort of slow. It also didn’t help that I was writing data constantly to the table, which further slowed things down. Recently I solved this by doing daily aggregations for Kernl Analytics charts and domain data. This has improved speed significantly.
  • Backups & High Availability – To keep costs down the analytics service is not highly available. This is definitely one of those “take out some tech debt” items that will need to be addressed at a later date. Backups also happen only on a daily basis, so its possible to lose a day of data if something serious goes wrong.
kernl analytics server load
Yay for affordable hosting

Future Plans

Kernl Analytics is a work in progress and there is always room to improve. Future plans for the architecture side of analytics are

  • Optimize Indexes – I feel that more speed can be coaxed out of Postgres with some better indexing strategies.
  • Writes -vs- Reads – Once I gain a highly available setup for Postgres I plan to split responsibilities for writing and reading. Writes will go to the primary and reads will go to the secondary.
  • API – Right now the analytics API is completely private and firewalled off. Eventually I’d like to expose it to customers so that they can use it to do neat things.