Wednesday, September 05, 2007

Calling All Threads

example source code, overview

There is always room for improvement, so I won't say we are at journey's end; however, this is at least in the right vicinity. The original intent of this series was to show concurrent programming in .NET as it was done in a real life Java application. There are components of this transition that are fairly straightforward as some objects conceptually map directly to that of original implementation, despite syntactical differences. My hope is that the information presented below will be useful in bringing the business and programming concepts together as well as point out places where the implementation is not as obvious.

Firstly, let's begin with the creation of the "thread" class. Microsoft .NET does not allow the inheritance of the Thread class or have Runnable or Callable interfaces to implement. Instead, using an additional class or member function within the main object, threads are created through use of a delegate function. Threads are instanced by calling the constructor of System.Threading.Thread, passing it a reference to the delegate function.


Dim Thread1 as New System.Threading.Thread(AddressOf delegateFunction)
Thread1.Start()

However, as we discussed in previous posts, we want to use pooled threads versus multiple threads called in succession. With that said, the format to spawn a thread in .NET into a pool requires the submission of a WaitCallback delegate, which has a signature of delegateFunction(Arg As Object), to the shared System.Threading.ThreadPool object that can be likened (hopefully without ridicule) to the fixed thread pool ExecutorService we utilized in Java. The object argument passed to the delegate function holds both the input parameters and variable(s) for output after thread processing has completed. See the AStateObject class in the linked source files as an example.

The general setup of a delegate function and use of thread pooling is shown below:

...
Sub BeginTask(ByVal thrdNum As Integer, ByVal isVarCalcTime As Boolean)
state = New AStateObject(thrdNum, isVarCalcTime)
'ThreadPool is a shared/static system object
'that can be likened to the Java ExecutorService object
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf Task), state)
End Sub

Function EndTask() As Object
semaphore.WaitOne()
Return state.Status
End Function

Sub Task(ByVal StateObj As Object)
Dim varCalcTime As Long = AStateObject.DEFAULT_VAR_CALC_TIME
Dim thrdnum As Integer = AStateObject.DEFAULT_THRD_NUM
Dim isVarCalcTime As Boolean = AStateObject.DEFAULT_IS_VAR_CALC_TIME
' Use the state object fields as arguments.
Dim StObj As AStateObject = CType(StateObj, AStateObject)
isVarCalcTime = StObj.IsVarCalcTime
thrdnum = StObj.ThrdNum
If isVarCalcTime Then varCalcTime = AStateObject.getRandomCalcTime()

'Can use SyncLock or other synchronization techniques
'for real life calc that may hit backend systems
'Data that may be touched by multiple tasks simultaneously
Thread.Sleep(2000L + varCalcTime)
'SyncLock StObj
StObj.Status = Date.Now() 'Return value
'End SyncLock
semaphore.Set() ' Signal that the thread is done.
End Sub

...

The Task subroutine is the delegate that does the processing we want. This can be coded to perform business logic directly or call a web service or other API that does the processing/calculations for us. The object passed to the delegate is really done by reference despite the method signature as what is passed by value is the address of the object itself; therefore, interactions with this object like setting Status which is the return data object set values in the original instance of AStateObject in our case. The thread pool is then passed a user work item through WaitCallback delegate and the AStateObject associated.

As with the Java implementation, it is most efficient to submit all the threads to the pool for processing then circle back and retrieve results once each thread is ready to yield it. If we wait for thread to finish to get result at same time we add to user work item queue, then we may as well submit threads sequentially or allow the main process to iterate task without overhead of creating more threads. Assuming we want the efficiency, what you will see in the class ATask is that the invocation/queuing of task is done in a BeginTask and then the response is queried through calling EndTask.

The calls from the main process occur in this phased fashion as well, very similar to how it was implemented in Java as we also used three sections there: create threads; read results/wait; display results.

...
Dim NUM_THRDS As Integer = 100, I As Integer
Dim tasks(NUM_THRDS) As ATask
Dim status(NUM_THRDS) As Object

'Start threads
For I = 0 To (NUM_THRDS - 1)
tasks(I) = New ATask()
tasks(I).BeginTask(I, True)
Next

'Get the result of the threads
For I = 0 To (NUM_THRDS - 1)
Try
'Not using any processing that will throw an error,
'but just wanted to show as an example
status(I) = tasks(I).EndTask()
Catch e As Exception
status(I) = Nothing
End Try
Next

For I = (NUM_THRDS - 1) To 0 Step -1
If status(I) = Nothing Then status(I) = "Empty"
Console.WriteLine("Thread #" + I.ToString + ": " + status(I).ToString())
Next

'Clean-up objects used
status = Nothing
tasks = Nothing

...

The significance to this approach is that after the process has been put into a thread pool, the results cannot be extracted until after the process has fully completed. Therefore, the EndTask function waits on the thread making the middle block of code very much sequential, but since the threads run concurrently before hand it adds minimal if any processing overhead as you should never wait any longer than the longest running thread in the bunch. Since "run time" in the context of longest running process includes the amount of time a thread must sit in the queue, now may be a good time to show the syntax for increasing the thread pool size.

ThreadPool.SetMaxThreads(workerThreads, completionPortThreads)

Both parameters are integers representing the quantity of each type of thread available in the pool. Since .NET employs a shared thread pool, this follows the principal of finding the right number of threads for all concurrent users of the application. As the application is idle, users will have a faster response, then will slow down as load grows to ensure stability of system as too many threads can be as lethal as not enough.

Putting it all together, we have an application that emulates our Java example and yields pretty equal results (see example output). Hopefully this was successful in meeting my intent, which was to place links and thoughts around multi-threading in .NET (for Java developers) especially around concurrency. In trying to stick closely to the layout/concepts in the Java implementation, I am sure I have put together some odd .NET constructs, so all you VB/.NET purists out there must forgive me.

10,000 threads are still better than one if used tactically, especially as we move to a world of even faster response times with asynchronous interfaces. A follow-up to this exercise may be to show how to improve the user interface portion of the application to display some of the results immediately to browser/screen like how many items are in stock today as the program processes the results for items that need to be built for example in our item availability service. We'll see if I can wrap my brain around using the IAsyncResult process in-line with the other concepts here, but, until then...

keep evolving development

No comments: