What’s New With Kernl – February 2020

I hope everyone had a great February! We didn’t too much feature development this month, but there was a lot of bug fixing and performance improvements, so let’s dive in!

Feature, Bugs, and Performance

  • Node.js – Kernl is now on Node.js 12.16.1. This release was all about security fixes.
  • Load Testing Machine Provisioning – We weren’t calculating the correct number of machines to provision on DigitalOcean. This lead to some serious over-provisioning when running load tests. This has been resolved, which means more customers can run more load tests at the same time.
  • Load Testing Secondary Node Behavior – Kernl uses Locust under the covers to run our WordPress Load Testing service. The Locust primary node has an argument called “–expect-slaves”. It tells Locust “Don’t start the load test until at this this many secondary nodes have connected.”. We weren’t calculating this number correctly which led to some weird behavior. This is now resolved so load tests should start in every situation now.
  • Easy Digital Downloads Domains – Kernl wasn’t passing the domain along to EDD. We now do this, which allows you to restrict updates to specific domains while using EDD.
  • Load Testing Snapshots – Kernl used to build up each load testing machine from the ground up every time a load test was started. We now start from a snapshot that gets us 50% of the way there. This has improved load test start times (especially on large tests) by an average of 30%.
  • GitHub Authorization Changes – The GitHub API is changing how it handles authorization headers. We’ve update Kernl to handle this change, so we’ll be good going forward when GitHub deprecates the old method.
  • High Traffic Endpoint Audit – We did an audit of our high-traffic API endpoints and cleaned some things up. Slight performance improvements were had (1%-2%), but mostly the improvements have been in code readability and comprehension.
  • GitLab Deployment Issues – In a recent release of GitLab they changed the required fields when asking for an access token via a refresh token. This broke all GitLab deployments for Kernl for a few days while we tracked down the issue. This has since been resolved.
  • Load Testing Unit & Integration Tests – When our load testing service was launched we weren’t sure if it was going to be successful. We’ve proven that it is a worthwhile feature, so now we’re focusing on reliability. We’re in the process of adding a suite of unit and integration tests around this functionality.

That’s it for this month! See you in March.

Should I use Memcached or Redis for WordPress caching?

Choosing between Memcached or Redis for your WordPress cache is a tough decision. Not because they have vastly different performance profiles (they don’t), but because either choice is a good one depending on your needs. In this post we’re going to explore the differences between Redis and Memcached, how they perform for WordPress, and a lot of different non-performance things you should consider when making your choice.

What is Memcached?

memcached logo

Memcached is an open-source, high performance, distributed memory object caching system. What does that mean? It means you can store a bunch of strings in memory and access them really fast. From a WordPress perspective, it means that using a caching plugin like W3 Total Cache we can store the results of the complicated SQL queries that WordPress does in memory and have them available instantly.

What is Redis?

Redis Logo

Redis is an open source in-memory store that can be used as a cache or a message broker. It’s a bit different then Memcached because you get a lot more out of the box with it. For example, Redis has built in replication, transactions, disk persistence, and provides high availability and partitioning. All those features means that managing WordPress can be a little harder to do, but not much harder. Especially if you just need to use it as a cache.

Performance

Both Redis and Memcached have excellent performance. They’re both used by some of the largest websites in the world and are fully ingrained in the Fortune 500. Given that all things are not created equal, let’s see how they perform with a read-heavy WordPress site (this blog).

The Setup

The load tests are performed against the DigitalOcean WordPress Marketplace image with either Redis or Memcached installed alongside of it. The machines have 2 vCPUs, 2 GB RAM and live in DigitalOcean’s SFO2 (San Francisco) data center.

The load test configuration:

  • 500 concurrent users
  • 2 users / second ramp up
  • 45 minute test ( ran twice )
  • Traffic comes from Digital Ocean’s NYC3 data center.

The content of the load test is a copy of this blog.

Baseline Performance (No Cache)

The baseline performance for WordPress with no cache isn’t great.

baseline requests/failures
50 requests / second with LOTS of failures

The response time also isn’t great. A little over 2 seconds on average.

baseline response times
~2 seconds response time on average.

Redis Performance

Once we install Redis and configure W3 Total Cache to use it, the number of requests that we can handle increases substantially.

redis requests/failures
300 requests per second

The requests remain steady at around 300 per second and no failures are recorded. The response time also improves quite a bit.

redis response times
Average ~475ms response time

475ms isn’t bad at all. That’s 4 times faster response times then without any caching at all.

Memcached Performance

With Memcached installed and W3 Total Cache configured to use it, we see some excellent performance.

memcached requests/failures
425 requests per second

In this situation, Memcached performs even better then Redis with 425 req/s versus Redis’ 300 req/s. Response time improvements are similar.

memcached response times
115ms response time

The Memcached response time is almost 3 times faster than the Redis response time. In general, the results where Memcached is faster than Redis are surprising. In most benchmarks Redis is equal or faster than Memcached, so it’s likely a configuration problem.

Other Considerations

When deciding what cache to use with your WordPress setup, there are a few other considerations your should be looking at:

  • Ease of setup – As you can see from the performance results above, Memcached has better performance out of the box. Knowing what I know about Redis this is likely a configuration issue, but the fact that I could get that level of performance with no configuration from Memcached is a good data point.
  • 3rd Party Hosting – Do you really want to manage your own Redis or Memcached server? If you don’t, you’ll want to look at the landscape of 3rd party providers. Redis has a robust provider ecosystem. Memcached’s is a little less robust.
  • Persistence – Do you need your cache to survive a reboot? This is important if the cost of re-populating your cache is too high for your system. If you do need persistence, Redis is your best option.
  • High Availability If you need high availability of your caching cluster, Redis is the clear winner here. Memcached can be made to operate this way, but Redis has it baked in to the core of the application.

If you’d like to see the full results of the load testing runs on Kernl, see the links below.

The Crucible – Extreme WordPress Performance Challenge

Load testing is fun. Breaking things is fun. Breaking WordPress with load testing is even more fun. But in the era of highly scaleable WordPress hosting solutions, can we even break WordPress anymore? Oh yes, yes we can. The Crucible Challenge can.

Crucible Challenge

The Crucible WordPress Performance challenge is a deceptively simple test inspired by the poor ops teams that have to handle traffic from Super Bowl advertisements. Given a WordPress site with consistent content and URL mappings:

  • Handle 50,000 (@ 500 per second ramp up) concurrent authenticated users for 2 hours with load test generators in New York, London, Amsterdam, Singapore, Bangalore, San Francisco, Toronto and Frankfurt.
  • Have an error rate below 0.1%.
  • Average response time should be below 800ms.
  • Median response time should be below 700ms.
  • 99th percentile response time should be below 800ms.
  • Half way through the test, you must flush your cache.

Simple, yes. Easy, no. Why is this hard?

  • 50,000 is a LOT of people.
  • 500 per second ramp up does not give any time for warming your cache. It’s like your site getting hit in the face with a sledgehammer.
  • Low error rate doesn’t give you a lot of room for problems.
  • Keeping your response times below 1000ms with traffic coming from all over the world presents interesting problems.
  • Flushing your cache after the load test is already in progress shows us you have a good cache invalidation strategy and can handle dog-piling.

Example Results

To give everyone an idea of what results might look like, I took at $160 / month CPU Optimized Digital Ocean droplet, put Open LiteSpeed + WordPress on it and ran The Crucible against it.

Shows the crucible request per second graph at 18,000 requests per second.
18,000 requests per second

I only ran the test for 2 minutes, but as you can see it started to max out near the 18,000 req/s mark, with failures in 1,800 failures/s area.

Can your service beat this? Want to find out? Drop an email to jack@kernl.us and we can get a test run scheduled.