Thursday, August 30, 2007

Jump In The Pool, The Threads Are Fine

example source code, overview, java part I

10,000 soldiers may be more powerful than one, but, if they march in a straight line towards enemy fire, they are no more effective.

Threads are the same. Marching them in regiments versus a line adds power and efficiency at the same time.


Thread Pooling
In Java 5.0 (JRE 1.5), the java.util.concurrent package was made available which provided API for concurrent processing/efficient thread pooling. For some good details on the API, please refer to Java/Sun publications on JSR 166. The Distributed Computing Laboratory at Emory University has further documentation as well as a back-ported version of the package for use with older versions of JVM.

To implement the concurrent/thread pool model to the ArrayOfThreads object we created in previous post, the ExecutorService is utilized to invoke a Callable object (introduced in the concurrent package). AThread happens to be a Callable object versus a simple sub-class of Thread object which just has a run() method without a return. Utilized the Callable even in the earlier examples as the end goal is to be able to gather data back from the execution of this process.

Consequently, after submission of threads for processing, the framework for concurrency provides a mechanism to retrieve returned data per thread using the Future object. The major difference this makes in the implementation is that the Future handles the waiting and gathering of thread data, so the main application thread spawning the new processes does not have to wait until each thread completes before moving on.


...

/* code snippet from ArrayOfThreadsPooled.java
* @see java.util.concurrent.Executors
* @see java.util.concurrent.ExecutorService
* @see java.util.concurrent.Future
*/
int NUM_THRDS = 100, NUM_THRDS_POOLED = 100;
ExecutorService tpes = Executors.newFixedThreadPool(NUM_THRDS_POOLED);
Future futures[] = new Future[NUM_THRDS];
AThread calculators[] = new AThread[NUM_THRDS];
Object status[] = new Object[NUM_THRDS];

...

Later on using a for loop to instantiate the threads, we submit each to pool (where i represents the integer index used in loop).

calculators[i] = new AThread(i);
futures[i] = tpes.submit(calculators[i]);

On the first implementation of ArrayOfThreadsPooled we see that the process impracticality of running multiple threads has been removed as the process time is about a hundredth of the time since all 100 threads execute simultaneously (see output example 1).


Don't Be A Resource Hog
However, the problem of scaling is still an issue as your business systems involved in processing and application server itself will probably not appreciate the execution of 10,000 threads. To remedy this per user, the ArrayOfThreadsPooledv2 is enhanced by illustrating the fact that the fixed thread pool does not need to have the same size as the required amount of Callables, Futures, et al. Therefore, the NUM_THRDS_POOLED value can be changed to a lower or higher number than 100.

For example I found that my application worked with a total number of 15 processors running at a time (see output example 2). To better understand the benefit, the AThread class provided includes the ability to provide variable processing times. With variance in execution times, the thread pool with stop and start threads appropriately thus adding even more efficiencies over executing x number of threads at a time with identical run times (see output example 3).


Back 2 The Future
Through the Future get() method wrapped with exception handling, we can retrieve the returned object/data from the Callable later on.

Since the Futures involved take care of getting data, as a developer you get the benefit of all the threads starting at once, but having a method of identifying which response goes to which request. With this information, you can then sort, analyze, manipulate or otherwise report the data however you see fit.

Some further enhancements to this would be o find the optimum amount of threads needed to support all requests from your client base and instantiate the ExecutorService as one static version defined globally for the application.

public static int NUM_THRDS_CONCURRENT = 60;
public static ExecutorService tpes =
Executors.newFixedThreadPool(NUM_THRDS_CONCURRENT);

Each instance of ArrayOfThreadsPooledv2 that were executed would, therefore, submit its number of Callable objects necessary into the shared pool.

futures[i] = ArrayOfThreadsPooledv2.tpes.submit(calculators[i]);

If no other users were on, the response may be as quick as first iteration of ArrayOfThreadsPooled while providing the stability and respect of resources provided by second implementation.

With the source code and detail to this point, I hope those of you on the journey towards creating a multi-threaded application in Java have found enough useful information to utilize the full power of Java's great addition. Keep in mind that further enhancements exist in JRE 1.6.

Keep evolving development!

10,000 Threads Are Better Than One

example source code, overview
In this post we will be following the business logic for using multi-threading and concurrency as we explore the expansion of the availability service from its basic form to one extended for more practical application presented in the previous posting.

[basic availability service] an application that communicated with business systems (enterprise resource planning - inventory management in particular) to read current stock information, including current demand allocations; compare to new single demand; ultimately, displaying lead time to fulfill customer requirement whether from stock, assembling from stocked components on a bill of material, or purchasing/fabricating item through the portions if not all of its cumulative lead time...

Since the business world does not exist in a parallel universe where customer demand and actual supply are always in constant harmony, the basic implementation for availability above is insufficient as a final answer to a demand requirement that has differing priorities (e.i. fulfill two or more needs at once). To clarify, please re-read the following excerpt:

[extended availability service] As a customer, you may have an immediate need for five light bulbs or else you will be in darkness for the entire day, but you get cost efficiencies purchasing light bulbs in quantities of 1,000. The purpose of this extension to the availability service is to allow you as the customer understand that you can get 50 from stock today...

The first need of concern is to restore service: get the lights on! The second priority is to ensure that the effort to restore service is done in a fashion that positively impacts the bottom-line long term as well as lessen the likelihood of future outage. At its most basic sense, the response to the above variation in need could involve two calls to availability code: one returning lead time of quantity acquirable through stock (e.g. same day); the other, time through procurement/manufacturing processes (e.g. 60 days).

However, if you add usage and more complex items an scenarios to your thought process over my simple light bulb shortage, the application requirement slowly becomes (or at least it did in my case) one that asks the question "what is the individual lead time of each quantity of the total 1,000 light bulbs requested?"


An Array of Threads
Following the thought above, to answer the question, a user of the availability tool would run sequential requests with required quantities from 1 - 1,000. The first step is to change the process of making the requests an automated one versus having to make 1,000 separate requests. The user would like to query the system once and get 1,000 responses. With the potential for thousands or tens of thousands, the first logical conclusion I got was "10,000 threads are better than one."

The example thread AThread does a simple wait(x milliseconds) to simulate some processing time, but can easily be modified to be a synchronized (depending on locking requirements) block of code executing logic necessary to determine a specific item and required quantity availability. For example:


...

/* call() method for AThread.java
* @see java.util.concurrent.Callable#call()
*/
public Object call() throws Exception {
// Define your own result object and availability processor
LeadTimeResultObject ltro = new LeadTimeResultObject();
AvailabilityProcessor process = new AvailabilityProcessor();
// Use synchronized blocks for thread safety
synchronized(this){
// Insert application logic to get lead time
ltro = process.getLeadTimeResult("some item id", "some quantity");
}
return ltro;
}

...

The ArrayOfThreads class illustrates how to implement calling the lead time availability code through use of an AThread array.

AThread calculators[] = new AThread[100];

Using a for loop to iterate from 0 to 99 index of array, you can use the Constructor of the AThread object to pass in data needed to make each thread calculation unique (e.g. item id and quantity equal to 1 to 100 - array index + 1). Although this accomplishes the first task of automating the requests, each thread must complete serially making the net result of the application performance exactly that of having the main thread executing one availability process 100 times (see output example 1: processing times go in sequence and so total processing time is long).

In addition, this will most likely crash your application and/or back-end systems as user quantity requirements scale upward.

I know this is not very practical for the use case we are exploring, but another component of the application I wrote was used to keep statistics on requests for availability. These statistics were needed for later analysis and did not need to be sent back to user. Applying the methodology of thread arrays for background processing that needs to be done at some point is a perfect fit. User leaves the site or at least is satisfied by a response, while additional business logic is applied as more data is gathered and stored in a business intelligence/reporting back-end.

Thread pooling or utilizing all the threads simultaneously would answer both the automation of user process and most efficient run time concerns. See next post in the concurrent threading series for a detail on the ArrayOfThreadsPooled object included in the source code available for download above.

The Adventure Begins (Concurrent Threading Series)

Multi-threading background processes is very cool for spawning large calculations that write to statistical/business intelligence data stores while responding other data quickly to user from the main thread; however, when you need to do some medium to long running calculations multiple times and return a sorted/combined result set to your user without feeding into the attention deficit disorder inside all of us, multi-threaded background processes become even cooler.


Background
A practical or real life application for this would be an application that communicated with business systems (enterprise resource planning - inventory management in particular) to read current stock information, including current demand allocations; compare to new single demand; ultimately, displaying lead time to fulfill customer requirement whether from stock, assembling from stocked components on a bill of material, or purchasing/fabricating item through the portions if not all of its cumulative lead time.

Furthermore, the delivery method on this application to client was via the web, so a few seconds processing time is a long wait to many, so response time must be considered in any alterations.

Now consider a secondary request interface for this application that needs to determine separate responses for the same demand above. This was used by Customer Service/Sales Representatives who need to know the breakdown of lead time(s) from satisfying quantity of one to the total requested. As a customer, you may have an immediate need for five light bulbs or else you will be in darkness for the entire day, but you get cost efficiencies purchasing light bulbs in quantities of 1000. The purpose of this extension to the availability service is to allow you as the customer understand that you can get 50 from stock today, 250 built within a week, and 700 available two months from now. This is much more useful to satisfying both business needs than getting one response that indicates to get a complete order 1000 would be two months.


The Problem
Several months ago, during a migration from one business enabling system to another, I was faced with a need to architect this concurrent programming accomplished in Java as a .NET application. My initial thought was to leverage and reuse as much of the Java code I had already written to lessen the development cycle for this change and to ensure that the logic was implemented as close to original as possible then improve from there once integration was established. What did this mean? Microsoft J# to the rescue! Or at least so I hoped.

Microsoft’s J# does not support Java 5.0 as it is based on previous JRE version. This road blocked the effort immensely as it was not clear to me if Microsoft .NET even had the capability to perform the same concurrency programming and/or how, and so began the adventure to acquire knowledge.


The Adventure
The following set of posts try to go through thought process behind the multi-threading being done in the original Java implementation and hopefully arrive at the most efficient means of transforming to a .NET implementation. Whether or not my journey to converting from Java to .NET is successful, I figured this information may be helpful for new (or at least new to threading) developers that are looking for methodologies to accomplish what I have done in either language.

So buckle up and let’s explore together the migration of concurrent processing code from Java technology to J# and .NET in general.