Multitasking in Node.js With the Cluster Module

We already know that Node.js is single-threaded which means that it’s going to use a single core of a processor in a system that could potentially have multiple cores. Even though it handles the load pretty well for a single-threaded system, there’s definitely room for optimization, and the cluster module is one way of doing so.

The cluster module was introduced to scale an application execution by running worker/child processes on multiple processor cores. These processes share the same server port but even with the same port, these are separate processes at the end of the day. So they’ll each have their own V8 instance, event loop, their own memory, and all that jazz. These processes use an IPC(Inter-process communication) channel to communicate with the parent process. We’ll look at all the features as we move along this tutorial so let’s just dive in.

There’s also a video version of this tutorial available on Youtube.

The problem that we’re trying to solve

Let’s quickly setup a node project by typing in npm init -y. (I’ll be adding express for the sake of convenience but you don’t necessarily have to). Install express and loadtest by typing in npm i express loadtest. We’ll be using loadtest later in this tutorial to look at the performance benefits that you get by using the cluster module. Once installed, create a file called nonCluster.js and copy this code (this file will have the code without the cluster module, for comparison).

const app = require("express")()
const port = 3000;
app.get("/heavy", (req, res) => {
   let counter = 0;
   while (counter < 900000000) {
      counter++;
   } 
   res.end(`${counter} iterations completed! \n`);
})
app.get("/light", (req, res) => {
   res.send(`Done \n`);
})
app.listen(port, () => console.log("Listening to port 3000"));

It’s a pretty straightforward express app with 2 requests. The /heavy endpoint does a CPU-intensive task, blocking the event loop. The /light endpoint returns the response right away. Simple!

Now run the server and make a request to /heavy first and then the /light endpoint. The /heavy endpoint will take time obviously but you’ll notice that the /light endpoint is also taking the same amount of time. That’s because the /heavy endpoint is taking up time computing the request and has hence blocked the event loop. So any request made after the /heavyrequest has to now wait for its completion.

Solution to the problem

So to avoid this blockage, let’s add the cluster module. In a new file called cluster.js, copy this code:

Let’s break this down. Initially, when you had a single process, it resolved all the requests coming in. Now that we’re using the cluster module, there’ll be 2 types of processes. A parent/master and a child process. Initially, when the server starts it’s going to spin up a cluster of processes. After that anytime someone makes a request to the server, the parent process is just going to direct the request to a child process(mostly in a round-robin fashion). The child process will then ultimately resolve the request.

if(cluster.isMaster){
   //spin up cluster of processes
} else{
   //resolve request
}

Inside the if block we’ll check if the current process is the parent or not. The cluster module has a property called isMaster which lets you know whether the current process is a child or the parent process. If it’s the parent process, then we create the clusters using the fork method. (We only spin up processes equal to the total number of cores present in the system to avoid scheduling overhead.)
If it’s not the parent, that means it’s a child process. So this child process is now actually responsible for resolving the request. It’s going to have all the API endpoints and their corresponding logic.

cluster.on("online", (worker) => {
   console.log("Worker " + worker.process.pid + ' is online.');    })     
cluster.on('exit', (worker, code, signal) => {
   console.log(`${worker.process.pid} exited with code ${code});
   console.log('Starting a new worker');        
   cluster.fork();
});

After forking a new worker, it will respond with an onlineevent. We’ll listen to this event on the parent to see if all the processes are created as expected.

When a worker dies, the cluster module emits an exit event. So to have no downtime for our application, we’ll fork a new process the moment one goes down. This way we’ll always have a set of processes up and running even if any other process goes down for any intentional or unintentional reasons.

Alright, this looks good. Now if you run this file, and make the same requests (first to the /heavyendpoint and then to the /lightendpoint) you’ll see that it works as expected without blocking the event loop. So adding a cluster of processes does help. Now let’s use the loadtest package to do some relatively heavy testing.

Since you have the clustered application already running, let’s test it first. Type loadtest -n 1000 -c 100 http://localhost:3000/heavy to run the test. (If you don’t have loadtest installed globally, just add npx at the beginning of the command.)

The -n stands for the total number of requests which I’ve set to 1000.
The -c stands for concurrency. It’s basically going to simulate a real-world environment where an application gets requests from multiple clients simultaneously and in this case, it’s going to be 100 simultaneous clients.

These are the summarized results for the “clustered” application.

//With cluster
Total time : 7.178850510999999 s
Requests per second : 139
Mean latency : 685.8ms

Now let’s switch to the “non-clustered” application and run the same test once again.

//Without cluster
Total time : 27.252680297999998 s
Requests per second : 37
Mean latency : 2583.1 ms

There’s clearly a significant improvement in the overall performance when using clusters. But there’s a catch. Both these tests were for the /heavyendpoint which was a CPU-intensive operation. Let’s try to run the same tests but this time for the /light endpoint.

//With cluster
Total time: 0.5144123509999999 s
Requests per second: 1944
Mean latency: 48.8 ms

//Without cluster
Total time: 0.45111675100000004 s
Requests per second: 2217
Mean latency: 42 ms

Surprise, surprise. We actually get relatively poor performance when using a cluster for this example. Why is that?

You see, Node.js was primarily designed for I/O operations which for the most part does not block the event loop. So since it knows how to deal with these kinds of operations, adding extra processes and routing the request to each of these processes is ultimately an overhead that most likely is not required. However, in the case of CPU-intensive blocking operations, it’s better if the number of requests is split between processes for which we do need clusters.

So it ultimately comes down to what your application is designed for. If you have a microservice architecture and there’s a particular service that deals with CPU-intensive operations, you can spin up a cluster for that specific service, and the rest can be handled by your single-threaded node process.

So ya, that’s pretty much how you’d work with clusters in your Node.js application. It has its own set of caveats and you need to be aware of them before adding it to your codebase. This post is part of a series where we take a look at multi-tasking in Node.js.

There’s also a video version of this tutorial available on Youtube.

If you have any doubts or suggestions, you can put them in the comments or get in touch with me on any one of my socials. Cheers!

YouTube
LinkedIn
Twitter
GitHub