Welcome to the new year! January was a great month for Kernl with lots of great new features, tweaks, and bug fixes to make your experience even better. Lets dive in.
Plugin & Theme Tile View – The original Kernl list views for both plugins and themes was a table. Over time this table became difficult to understand and didn’t convey a lot of information. The new tile views shows more information about your plugin and theme while being a lot friendlier to new users.
Code Widget – On the plugin and theme detail pages there is not a widget at the top that shows you the code you need to integrate Kernl with your product. This is part of a broader plan to make Kernl more friendly to first-time users.
Version Table Actions – The plugin/theme version table had 5 different action buttons on it. This was super overwhelming to people so it has been collapsed into a drop-down menu instead.
http://status.kernl.us – We now have a full-featured status page powered by Pingdom. You can use this site to check the health of our service.
GitLab Oauth – GitLab authentication used to be powered by authentication tokens that you generated on GitLab and then copied into our system. You can now Oauth with GitLab which is a much easier flow for customers to manage.
Feature Flag Wizard – Feature flags can be a little daunting if you aren’t already familiar with them. To make it easier for customers to get started with we created a feature flag wizard. If you don’t already have any feature flags I encourage you to check it out.
Load Test Site Verification – You are now required to verify site ownership before running load tests. This is accomplished via a simple WordPress plugin. WordPress load testing is still in closed beta, but please reply to this email if you would like to take part in testing.
Minor Features & Bug Fixes
Unsubscribe links were broken when we whitelisted our domain with SendGrid. This has been resolved.
Our application servers were upgraded to Node.js 10.15.0.
Available RAM was increased from 1GB to 2GB on our application servers.
A loading spinner has been added when plugins and themes are loading.
Load testing response time distribution is now blue instead of grey.
To horizontally scale any app is an exercise in breaking things apart as much as possible. In the case of WordPress there are a few shared components that I wanted to break up:
File System – The file system is the most problematic part of scaling WordPress. Unless you change how WordPress stores plugins, themes, media, and other things you need to have a shared file system that all nodes in your cluster can access. There are likely some other solutions here, but this one provides a lot of flexibility.
MySQL – In many WordPress installs MySQL lives on the same machine as WordPress. For a horizontally scaled cluster this doesn’t work so we need a MySQL that is external.
Memcached – It was brought to my attention that during part 1 of this series using WP Super Cache to generate static pages was sort of cheating. In the spirit of making this harder for myself I introduced W3 Total Cache instead and will be using an external Memcached instance as the shared cache.
Now that the basic why and what is out of the way lets talk about the how. I’m a huge fan of Digital Ocean. I use them for everything except file storage so I’m going to use them for this WordPress cluster as well. Here’s how its going down:
Create a droplet that will act as the file system for our cluster. Using NFS all droplets in the cluster will be able to mount it and use it for WordPress. I’m also going to use this for Memcached since NFS doesn’t take up many resources.
Create a base droplet that has Nginx and PHP7.2-FPM installed on it. There is a little bit of boilerplate configuration here, but in general the install is typical. The only change to the Nginx configuration where I set the root directory to be the NFS mount. Use this base droplet to configure WordPress database settings.
Use Compose.io create a MySQL database. I wanted something that was configured well that I didn’t have to think about. Totally worth the $27 / month.
Once the above are done take a snapshot of the base droplet and use it to create more droplets. If all goes well you shouldn’t need to do any configuration.
Using Digital Ocean’s load balancer service add your droplets to the load balancer.
Voila! Thats it.
No Cache Smoke Test
200 users, 10 minutes, 2 users/sec ramp up, from London
As with every load test that I do, the first test is always just to shake out any bugs in the load test itself. For this test I didn’t have any caching enabled and only a single app server behind the load balancer. It was effectively the same as the first load test I did during part 1 of this blog series.
As you can see from the graph below, performance was what we would expect from the setup that I used. We settled in to 21 requests / second with no errors.
The response time distribution wasn’t very great. 90% of requests finished in under 5 seconds, but thats still a very long time. Generally if I saw this response time distribution I would think that its time to add caching or scale up/out.
So. Many. Failures.
2000 users, 120 minutes, 2 users/sec ramp up, from London
The next test I decided to run was the sustained heavy load test. This is generally where I start to see failures from managed WordPress hosting providers. Given that I didn’t add any more app servers to the load balancer and had no caching things went as poorly as you would expect.
Everything was fine up until ~25 req/s and then the wheels fell off. The response time distribution was bad too. No surprises here.
Looks like its time to scale.
2000 users, 120 minutes, 2 users/sec ramp up, from London
Before adding Memcached to the setup I wanted to see how it scaled without it. That means adding more hardware. For this test I added four more application servers (Nginx + PHP) to the load balancer and ran the test again.
As you can see from the request/failure graph we experience roughly linear growth in our maximum requests/second. Given we originally maxed out at ~20 req/s on one machine, maxing out at ~100 req/s with five machines seems like exactly the sort of result that I would expect to see. The response time distribution also started to look better:
Obviously a 90% score of 4 seconds isn’t awesome, but it is a lot better than the previous test. I did make a tiny tweak to the load balancer configuration that may have helped though. I decided to use the ‘least connections’ options instead of ’round robin’. ‘Least connections’ tells the load balancer to send traffic to the app server with the least number of active connections. This should help with dog piling on a server with a few slower connections.
Given the results above we can safely assume linear growth tied to the number of app servers that we have for quite some time. Meaning for each app server that I add I can expect to handle an additional ~20 req/s. With that in mind, I wanted to see what would happen if I enabled some caching on this cluster.
Gotta Go Fast
In my previous test of vertical scaling I used WP Total Cache to make things go quick. WP Total Cache generates static HTML pages for your site and then serves those. The benefit being that static pages are extremely fast to serve. In this test I wanted to try a more dynamic approach using Memcached and W3 Total Cache. W3 Total Cache takes a very different approach to caching by storing pages, objects, and database queries in Memcached. In general this caching model is more flexible, but possibly a bit slower. I installed Memcached on the same server as the NFS mount because it was under utilized. In a real production scenario I wouldn’t violate this separation of concerns.
Once I enabled W3 Total Cache and re-ran the last test I got some pretty great results.
With W3 Total Cache enabled and 5 app servers we settled in at ~370 requests/second. More impressive is that we only saw 5 failures during the entire test. For perspective Kernl pushed 1,329,470 requests at the WordPress cluster I created. Thats a failure rate of 0.0003%.
My favorite part of this test was the response time distribution. Without having to wait on MySQL for queries the response times became crazy good.
99% of requests finished in 29ms. And the outlier at 100% was only 2.5 seconds. Not bad for WordPress.
Being the good software developer that I am I wanted to push this setup to it’s limits. So I decided to try a test that is an order of magnitude more difficult:
20,000 users, 10 users/sec ramp up, for 60 minutes, from London
Things didn’t go great but not because of WordPress. I won’t show any graphs of this test but I started to get limited by the network card on the NFS/Memcached machine. Digital Ocean says that I can expect around 30MB/sec out of a given droplet and with this test I was starting to bump in to that limit. If I wanted to test it further I would have had to load balance Memcached which felt a little bit outside of scope. In a real production scenario I would likely pay for a hosted Memcached service to deal with this for me.
With Kernl I’m always weighing the build versus buy question when it comes to infrastructure and services. Given how much effort I had to put in to making this setup horizontally scalable and how much effort it would take to make it reproducible and manageable, it hardly seems worth creating and managing my own infrastructure.
Aside from my time the cost of the hardware was also not cheap.
Load Balancer – $10 / month
MySQL Database – $27 / month
Memcached (if separate from NFS) – $5 / month
NFS Mount (if separate from Memcached) – $5 / month
At $72 / month I could easily have any of the managed WordPress hosting companies (GoDaddy, SiteGroup, WPEngine, etc) run my setup, handle updates, security, etc. The only potential hiccup is the traffic limits they place on your account. This setup can handle millions of requests per day and while their setups can too, they’ll charge you a hefty fee for it.
As with any decision about hardware and scaling the choice varies from person to person and organization to organization. If you have a dedicated Ops team and existing hardware, maybe scaling on your own hardware makes sense. If you’re a WordPress freelancer and don’t want to worry about it, maybe it doesn’t. IMHO I wouldn’t scale WordPress on my own. I’d rather leave it to the professionals.