Java EE threads v/s Node.js – which is better for concurrent data processing operations

We have started using NodeJS a lot these days typically to handle data processing applications that involve a large number of concurrent requests, each of which involve one or more I/O operations.

Why Nodejs? Why not Java? Java obviously has a much wider acceptance in the enterprise software community, while NodeJS is still to catch up. In fact this post was triggered by exactly such questions posed by the customer who belongs squarely to the enterprise software community. But before we can answer this question, we must understand what the target application is trying to do.

What does the Data Processing Application do?

The Data Processing Application is built to handle a large number of data flow processes that involve significant amount of I/O operations.

Each process involves making calls to one or more database engines using network I/O. The response of each such database call is checked for successful operation, or the received results are further processed or returned to the calling application.

The Data Processing Application needs to execute a large number of concurrent API requests, each involving multiple network I/O operations.

Java EE v/s NodeJS

The most important differences between NodeJS and Java are the concurrency and I/O models. Java uses multi­threaded synchronous I/O while NodeJS uses single threaded asynchronous I/O.

Java EE NodeJS
Concurrency Model Multi-threaded Single-threaded
I/O Model Synchronous I/O Asynchronous I/O

Let us examine how these differences affect The Data Processing Application’s ability to handle a large number of concurrent requests that involve multiple I/O operations.

Concurrency Model

The following diagram illustrates the difference between the two concurrent execution models.

Java – Multi-threaded concurrency

Java Multi-threaded processing

 NodeJS – Single threaded concurrency
nodejs-single-threaded

The above diagram shows how in a multi­-threaded environment, multiple requests can be concurrently executed, while in a single-­thread environment multiple requests are sequentially executed. Of course, the multi-­threaded environment utilizes more resources.

It may seem obvious that a multi­threaded environment will be able to handle more requests per second than the single­threaded environment. This is generally true for requests that are compute intensive, which utilize the allocated resources extensively.

However, in cases where requests involve a lot of I/O such as database or web service calls, each request needs to wait for the external engine to respond to the calls made, and hence the allocated CPU and memory resources are not used during this wait time.

Synchronous v/s Asynchronous I/O

The following diagrams illustrate this scenario where requests involve I/O wait times:

Java – Synchronous I/O
java-synchronous-io
 NodeJS – Asynchronous I/O
nodejs-asynchronous-io

It is clear from this diagram that even though the single threaded concurrency model takes longer to execute the four requests, the amount of delay is much smaller than in the previous diagrams where there was no I/O wait time.

But what is more interesting is that the consumption of resources is still significantly smaller than in the multi­threaded model. Hence to handle the same 4 requests, the Java EE multi­threaded sunchronous I/O model would require significantly higher CPU and memory resources.

Hence in data processing applications where the server has to handle a large number of simultaneous requests, each of which involves I/O wait times, the single­ thread asynchronous I/O model of NodeJS provides a significant advantage over the multi­threaded synchronous I/O model of Java EE.

Scaling out to multiple CPUs

What we saw so far holds true for a single CPU. However most servers today have multiple CPU cores. Hence it is also important to ensure that we utilize all the cores efficiently. The Node Cluster module helps in creating and managing multiple worker processes that can share the same ports and can communicate with the parent via IPC and pass server handles back and forth.

Node Cluster however only works on a single server. To scale up a nodejs application in a cluster of multiple servers, we run instances of NodeJS on each server and coordinate execution between them using a network capable IPC with 0MQ (zeroMQ) or Socket.IO. This not only allows the Data Processing Engine to use all available CPU cores on a single server, it provides a consistent mechanism to scale out the load across multiple servers.

As a result, the Data Processing Engine can be scaled out into a cluster so as to handle an increasingly larger load of data processing requests.

Conclusion

Combining the asynchronous I/O model of NodeJS with inter­process communication in a cluster of processes provides a very high concurrent processing capacity to data processing applications that can be scaled out into a cluster of multiple servers.

To receive a case study of Node.js in an Enterprise Application

Related Posts

Share on Facebook0Tweet about this on TwitterShare on Google+0Share on LinkedIn0
  • An interesting presentation by Gabriele Lana to explain why NodeJS works well for IO intensive concurrent operations:

    http://www.slideshare.net/gabriele.lana/nodejs-explained-with-examples

  • Binh Thanh Nguyen

    Thanks. Nice post

  • A nice presentation about nodejs. Java however is not limited to synchronous operation only. For instance, the current servlet 3.0 API supports asynchronous request processing. I briefly tried that earlier this year and it works in practise.

    • Thanks for your comment Vladimir, you’re absolutely right… Java EE 6 introduces asynchronous servlets where a response to an http request need not be handled by the same thread – the request can be parked in a NIO queue and another thread from the pool can be assigned when the response is ready.

      If the nature of concurrency required in your application is handling a larger number of http requests such as multiple Ajax calls, servlet 3 should be sufficient. In the applications we are addressing, we need to process a much larger number of concurrent operations and hence node.js’ native async i/o with an event queue works better.

  • rahul

    We can leverage the advantages of non-blocking I/O (NIO) and asynchronous request with Servlet 3.1 release.

    https://weblogs.java.net/blog/swchan2/archive/2013/04/16/non-blocking-io-servlet-31-example

    http://docs.oracle.com/javaee/7/tutorial/doc/servlets013.htm#BEIHICDH

    It would be interesting to compare NodeJS with JavaEE async./NIO calls.

    In such case, would it be still more beneficial to use NodeJS vs JavaEE. Let me know your thoughts on this.

    • Hi Rahul, thanks for your comment. Vladimir had made a similar comment earlier. I have yet to really try the NIO extensively. We have done some initial experiments and it surely looks promising. We’re hoping to try NIO in some actual use cases and evaluate the performance. There are lot of references of people comparing NIO and Node.js on the net, but I think the decision to use either is likely to be purely subjective.

      Bottom line is that using asynchronous I/O in cases where we have a significant amount of network IO will surely allow higher concurrency than synchronous IO. But within async IO whether NIO performs better than Node or vice versa may be quite subjective to other decision variables. But I’ll keep you posted in case we do any benchmark tests.

  • JF

    If you have many child processes reading/updating the database. How do you manage consistency in a non-blocking maner ??

    • Hi JF, let me try and understand your question. You’re asking about how to maintain database consistency with multiple concurrent requests? If that’s your question then then the answer would be that the database guarantees the consistency, and the client application is not really expected to bother about it. But not sure if you meant to ask that question. Please clarify in case I’ve misunderstood your question

      • JF

        You are right! Thank you

        • You’re welcome! Would be great to know how you’re using Node if you are.

  • Shameer Kunjumohamed

    I think Java world is catching up in this area, first with Vert.x[ http://vertx.io/ ] and then Servlet 3.1. Now, undertow[ http://undertow.io ], from JBoss, which is the default webserver inside WildFly [ http://www.wildfly.org/ ], is a great alternative for asynchronous/non-blocking I/O servers. Tomcat 8 is also following the trend.

    Independent performance benchmarks show that undertow and Vert.x are far ahead than Node JS now. See this
    http://www.techempower.com/benchmarks/#section=data-r8&hw=ec2&test=query

    • Thanks Shameer, you’re right… a lot of new developments in Java are making it possible to use async I/O, and all the better for us developers… though the title may indicate otherwise, this post is more about comparing performance of async/non-blocking I/O v/s blocking I/O and not really node v/s java… so thanks again for pointing out these developments and more so for providing the relevant links!

      • Shameer Kunjumohamed

        It’s really interesting to see all these developments in the web programming world. Feel lucky to live in this age 🙂

  • Bibby Bilguun

    Great article. I’m thinking about developing image processing application using NodeJS and OpenCV. But I’m not sure whether it’s a good choice or not. What do you think? I’ve seen CloudCV Image Processing Platform built using nodejs and opencv. http://computer-vision-talks.com/2013/09/cloudcv/

    • Hi Bibby, I think this is a good combination, primarily because all the CPU intensive work will be done by OpenCV, and thereby allowing nodeJS to handle the front-end events. However, depending on the number of operations you expect OpenCV to perform, it may be required to spread out the compute load in a cluster and have nodeJS co-ordinate the execution. This kind of clustered setup can be put together with either an AMQP back-end like RabbitMQ or if you need closed-loop control, then Socket.IO. Do keep me informed about how your project progresses… would be keen to know!

      • Bibby Bilguun

        thanx for the reply. I’ll keep you informed. 🙂

      • Bibby Bilguun

        for the OpenCV performance I’ll use GPU card ( Cuda ). My application is not a big one. So no need to worry about clustering.

  • soro

    Node.js: Small machines, scale horizontally -> resilience + speed: all you need

  • Pingback: Node.js 與 Java 在同時性資料處理應用程式上的差異比較 | G. T. Wang()

  • really worthy article!

  • Adrian

    Hi, I have hard time understanding this critical part:

    “It is clear from this diagram that even though the single threaded concurrency model takes longer to execute the four requests, the amount of delay is much smaller than in the previous diagrams where there was no I/O wait time.“

    I mean both diagrams have same length green I/O waiting bars so the delay is the same – how do you measure it just looking at the diagrams?

    • The difference is that in the first diagram, we have 4 threads processing each request, and each is waiting for the corresponding I/O wait, while in the latter, there is a single process that processes all 4 requests and hence, instead of keeping the process waiting during the I/O wait, processes another request while one is waiting. Hope that clarifies the diagrams.