Friday, September 05, 2008

Range Searching Using Visual Basic .NET

The other day, I was asked a question about comparing dates in an array to a date range. The basic answer is to iterate through the array and compare each date to the beginning and ending of the range; however, with some curiosity exacerbated by another question planting the seed of binary searches in my head, I was determined to write a binary search algorithm for determining a value in a range. Along the way, I was enlightened/awoken to the concept of predicates in VB.NET and the System.Array.FindAll method. So this post will try to provide an overview of how to perform some range searching in .NET as I took research a step further and abstracted functionality to be more generic than with Date types.

With that said, here is the basic comparison to any range of objects that implement the IComparable interface.

Function MatchRange(Of T As IComparable)(ByVal c As T, ByVal rangeStart As T, ByVal rangeEnd As T) As Boolean
If c.CompareTo(rangeStart) >= 0 And c.CompareTo(rangeEnd) < 1 Then
Return True
End If

Return False
End Function
As long as you are comfortable with generics, the above is pretty straight forward (for more details on generics, see MSDN link in references) as we can now pass in any class that implements IComparable and take advantage of the public interface method CompareTo(Object) which will return one the following values: -1 when first object is smaller than the one being compared to; 0, the two equal; 1, first is larger. An easy demonstration of how this is used is MatchRange(5, 1, 10) will return true. So for back to original scenario of needing to determine all the values of a list that fall within a range, we get code like the following:

Dim lst As New List(Of String)()

For Each s As String In arrayOfObjects
If Ranges.MatchRange(s, RANGE_START, RANGE_END) Then
lst.Add(s)
End If
Next
Where Ranges is the name of utility class I placed my Public Shared Function MatchRange and RANGE_START and RANGE_END are string constants representing the beginning and ending value of the range we are searching within.

Speaking of the Ranges class, here is its full listing to dive right into the next step which was to implement a better range search using binary search algorithm:

Option Explicit On
Option Strict On

Namespace CrossedLogic.BlogSport.RangeSearching
Public Class Ranges

''' <summary>
''' Binary search through a sorted array to determine first index
''' greater or equal to start of range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="array"></param>
''' <param name="rangePoint"></param>
''' <returns>integer index or (-1) when not found</returns>
''' <remarks></remarks>
Public Shared Function BinaryStartOfRangeSearch(Of T As IComparable)(ByVal array As T(), ByVal rangePoint As T) As Integer
Dim low As Integer = 0
Dim high As Integer = array.Length - 1
Dim mid As Integer
Dim startComparedToHigh As Integer = rangePoint.CompareTo(array(high))

If startComparedToHigh = 0 Then
Return high
End If

If startComparedToHigh < 0 Then
If rangePoint.CompareTo(array(low)) <= 0 Then
Return low
End If

While low <> high
mid = CInt(Math.Floor((low + high) / 2))

If rangePoint.CompareTo(array(mid)) <= 0 Then
If rangePoint.CompareTo(array(mid - 1)) <= 0 Then
high = mid - 1
Else
Return mid
End If
Else
If rangePoint.CompareTo(array(mid + 1)) <= 0 Then
Return mid + 1
Else
low = mid + 1
End If
End If
End While
End If

Return -1
End Function

''' <summary>
''' Binary search through a sorted array to determine last index
''' lesser or equal to end of range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="array"></param>
''' <param name="rangePoint"></param>
''' <returns>integer index or (-1) when not found</returns>
''' <remarks></remarks>
Public Shared Function BinaryEndOfRangeSearch(Of T As IComparable)(ByVal array As T(), ByVal rangePoint As T) As Integer
Dim low As Integer = 0
Dim high As Integer = array.Length - 1
Dim mid As Integer
Dim endComparedToLow As Integer = rangePoint.CompareTo(array(low))

If endComparedToLow = 0 Then
Return low
End If

If endComparedToLow > 0 Then
If rangePoint.CompareTo(array(high)) >= 0 Then
Return high
End If

While low <> high
mid = CInt(Math.Ceiling((low + high) / 2))

If rangePoint.CompareTo(array(mid)) >= 0 Then
If rangePoint.CompareTo(array(mid + 1)) >= 0 Then
low = mid + 1
Else
Return mid
End If
Else
If rangePoint.CompareTo(array(mid - 1)) >= 0 Then
Return mid - 1
Else
high = mid - 1
End If
End If
End While
End If

Return -1
End Function

''' <summary>
''' Check if an object falls within a specified range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="c"></param>
''' <param name="rangeStart"></param>
''' <param name="rangeEnd"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Shared Function MatchRange(Of T As IComparable)(ByVal c As T, ByVal rangeStart As T, ByVal rangeEnd As T) As Boolean
If c.CompareTo(rangeStart) >= 0 And c.CompareTo(rangeEnd) < 1 Then
Return True
End If

Return False
End Function
End Class
End Namespace
And its usage:

Dim lst As New List(Of String)()

System.Array.Sort(arrayOfObjects)

Dim idxStart As Integer = Ranges.BinaryStartOfRangeSearch(arrayOfObjects, RANGE_START)
Dim idxEnd As Integer = Ranges.BinaryEndOfRangeSearch(arrayOfObjects, RANGE_END)

If idxStart >= 0 Then
If idxEnd < idxStart Then idxEnd = arrayOfObjects.Length - 1
For i As Integer = idxStart To idxEnd
lst.Add(arrayOfObjects(i))
Next
End If
First difference, you should see is that like regular binary search, the array must be sorted. Once we have a sorted array, we can use our binary search functions specific to beginning and ending of the range to get the respective index. Using the indices, we can add just those objects to the list we created to store results. Well a bit more code, but well worth it for the end game of faster searches of large object arrays like the example 13,149 string array used in the research for this post. Consequently, though, to my surprise, this code did not run faster mostly because of the requirement for sorting. With the original date question, even the initial solution above was sorted using System.Array.Sort; however, since that solution will yield the same list just unordered, for the sake of performance measurement I left array unsorted and then sorted as part of the binary search method.

Here are some of the results to illustrate difference:

---------------------------------------------------
Start processing by going through each item
End processing: 110ms - 9311
---------------------------------------------------
Start processing by using FindAll.
End processing: 13ms - 9311
---------------------------------------------------
Start processing by binary range search.
End processing: 59ms - 9311
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9311
---------------------------------------------------
Start processing by using FindAll.
End processing: 14ms - 9311
---------------------------------------------------
Start processing by binary range search.
End processing: 47ms - 9311
---------------------------------------------------
Start processing by going through each item
End processing: 10ms - 9312
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9312
---------------------------------------------------
Start processing by binary range search.
End processing: 50ms - 9312
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9313
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9313
---------------------------------------------------
Start processing by binary range search.
End processing: 51ms - 9313
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9314
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9314
---------------------------------------------------
Start processing by binary range search.
End processing: 55ms - 9314
---------------------------------------------------

As you will see above, the find by simple iteration of the array usually came in lower than the binary search method, although they both fluctuate, especially the simple loop method as it at times takes 100+ milliseconds (ms) while at others only 10 ms or below. Subsequently, you will notice that the FindAll although a few ms slower at times is very consistent and I found gives similar results regardless of the type of object in array whereas the other two vary as previously stated. So what is this FindAll method and how do I use it?

Well the FindAll method is a public shared function within System.Array. It's usage is not only the most consistent, but as you will see the most streamlined aside from delegate function you have to create. Here is a listing assuming availability of previously posted code:

lst.AddRange(System.Array.FindAll(arrayOfObjects, New Predicate(Of String)(AddressOf MatchRange)))

'Delegate implementation for Predicate needed in System.Array.FindAll
Function MatchRange(ByVal n As String) As Boolean
Return Ranges.MatchRange(n, RANGE_START, RANGE_END)
End Function
Yes! One line is all, you are not reading that incorrectly. Taking advantage of the List's AddRange capabilities that will add all elements of an array to list and the FindAll implementation that will return an array of elements from the array argument that match our range through use of the Predicate(Of T) delegate implemented here by a local MatchRange function that takes type T which is String in this case and does logic to respond true or false for a match. Since I have already gone through the work of coding a functioning match range routine, our Predicate simply becomes a call to our shared MatchRange function.

The one caveat to note with this solution is that since it only takes in object to be matched, the criteria and variables needed have to exist globally or just within this function which you can observe by my having to use RANGE_START and RANGE_END constants in my match. Moreover, if you are already using List(Of T) versus and array, you can simply utilize FindAll function of List to get a subset List(Of T). See references for more on the List.FindAll method.

Well hopefully you have enjoyed this journey.

In summary, you can most definitely use binary search method for primitives and possibly some types, but for Dates and Strings the System.Array.FindAll/List.FindAll are good utilities for large collections and the simple but effective for each loop and comparison.

Happy coding!


Source code: Download CrossedLogic.BlogSpot.RangeSearching.zip

References:

No comments: