Thursday, September 06, 2007

Calling All Threads Again (Async Revisited)

overview, calling all threads

So in the last post, we illustrated how to recreate Java code for parallel processing in Visual Basic .NET. One of the design features used in the implementation of retrieving data from task submitted to thread pool work item queue was to use a wait object to ensure that the thread completed before trying to collect the results.


Well that is the point of asynchronous processing, right: invoke a thread and then continue to do other work in main thread until work is complete. In our case, the additional processing we want to do is to start more threads or at least submit them to the queue and then we are satisfied in waiting for the results. Here is a look at the .NET asynchronous programming methodology for accomplishing what was done in my implementation of ATask by creating my BeginTask and EndTask routines. This should look very similar as asynchronous methods within .NET employ the same logic of beginning/invoking action with one method then ending/responding through another. The difference is that .NET will create all these methods for us versus my manually coded ones in previous post.

...
Delegate Function TaskDelegate(ByVal x As String) As String

Sub Main()
Dim str As String = "test"
Dim t As TaskDelegate = New TaskDelegate(AddressOf ATask)
Dim ar As IAsyncResult = t.BeginInvoke(str, Nothing, Nothing)
Console.WriteLine(t.EndInvoke(ar))
ar = t.BeginInvoke(str + " second run!", New AsyncCallback(AddressOf ACallback), t)
System.Threading.Thread.Sleep(5000L)
End Sub

Function ATask(ByVal x As String) As String
Return x.Trim()
End Function

Sub ACallback(ByVal ar As IAsyncResult)
Dim t As TaskDelegate = DirectCast(ar.AsyncState, TaskDelegate)
Console.WriteLine(t.EndInvoke(ar))
End Sub

...

The first piece of this code starts with the creation of worker function/class like the ATask.Task method from my last post. Above, this is represented by my highly complicated method ATask(String). Secondly, a delegate is declared with the same signature as function/sub routine created to do work.

As stated earlier, .NET will take care of creating the BeginInvoke and EndInvoke methods appropriately. Subsequently, the first call to t.BeginInvoke starts the processing of our task with the specified parameter data contained in the variable str. The t.EndInvoke returns a String as we instructed through the signature of method, so in our simple example, the result is fed right to console.

For more of a "Fire and Forget" implementation, the remaining parameters, we initially ignored in t.BeginInvoke by passing "Nothing" or null, are used. The first of the two is used to specify a callback method that has signature like ACallback. This method handles calling the t.EndInvoke. The second omitted parameter is used to pass the invocation the delegate object itself. As you will see in the second call to t.BeginInvoke, we are able to start the processing of work and then the results will be collected when ready by the callback method.


Where's The Pool
If you have been following along and aren't experts at this, you may be wondering what happening to the thread pool as we have been promoting the use of pooling since our worker thread count can go from one to 10,000. The fixed thread pool is great for ensuring that our system/application stays stable, so why eliminate it. Well, from what I have gathered, asynchronous threads are always run in the shared thread pool provided by the Common Language Runtime (CLR). Therefore, my understanding is that we do not need to submit these tasks to user work item queue.

With all this swimming around in a shared thread pool, there is a potential that managed code or other long running processes may be utilizing threads needed by this specific application or vice versa, so one thought is to use custom thread pooling. Custom thread pool development is not covered in this post, but I have found many different references to a fully implemented C# code from Mike Woodring. This would allow for processes using shared thread pool that we can't control to have adequate resources while ensuring that our application does not create an exorbitant amount of threads.

One interesting thought I had on this was to see if using standard threads if I could effectively code a mechanism in which I launch a finite number of threads (e.g. 15) and then monitor for the completion of any of the threads so that another iteration of the task can be spawned in that thread until all the tasks are complete. This is the concept behind thread pool, but just wondering if it can be done without the entire code associated with a full blown thread pool. This intrigues me as I can think of applications where it may be useful to spawn a set of processes within a thread pool and then to make them run faster spawn numerous non-pooled background threads. This eliminates some possibilities of deadlocks or issues with using shard pool, as our threads would not need to stay in the pool long as they would return rapidly. The pool would just be used for queueing the iteration of main working process while the background threads would be working on finite tasks within a working process instance.

Why not do this all with background threads if we can control their execution to be efficient?

We will see how my crazy mind develops this thought. May be easier just to grab the custom thread pool from Mike Woodring's site and make a reference to a compiled DLL that already has this work done. Why re-invent the wheel, right? Probably why I found other sites that had reference to DevelopMentor.ThreadPool as they just decided to copy his too.

The adventure continues...

No comments: