Saturday, September 06, 2008

Range Searching Using Java

Welcome! Intrigued by my findings in my last post regarding Range Searching Using Visual Basic .NET, I decided to try to reproduce the FindAll methodology as well as my binary search based on range for grins and again was surprised by my findings. Binary search worked the fastest/consistently within Java code where all three methods were faster which was not too much of a surprise other than the ported FindAll since it and the concept of Predicate match was not native (at least not to my knowledge) at the time of writing this post.

Background summary for those that haven't read the original article in VB, this started as research based on a question of finding the dates within a list that match a selected date range. The intent was to go beyond the simple iteration and match method and find a faster means through binary search only to find the VB FindAll and Predicate implementation eluded to above and slower performance from binary range functions created.

With that said, onward and upward!

The journey in Java begins with replicating our utility class Ranges that handles the simple match method utilized by looping through array as well as our binary search implementation based on a start and end of a range.

The code listing is below:



package com.blogspot.crossedlogic.rangesearching;





public class Ranges {

private static final int NOT_FOUND = -1;



/**

* Binary search through a sorted array to determine first index

* greater or equal to start of range.

*

* @param array

* @param rangePoint

* @return integer index or (-1) when not found

*/

public static <T extends Comparable<T>> int binaryStartOfRangeSearch(T[] array, T rangePoint) {

int low = 0;

int high = array.length - 1;

int mid;

int comparedToHigh = rangePoint.compareTo(array[high]);



if (comparedToHigh == 0) { return high; }



if (comparedToHigh < 0) {

if (rangePoint.compareTo(array[low]) <= 0) { return low; }



while (low != high) {

mid = (int)Math.floor((low + high) / 2);



if (rangePoint.compareTo(array[mid]) <= 0) {

if (rangePoint.compareTo(array[mid - 1]) <= 0) {

high = --mid;

} else {

return mid;

}

} else {

if (rangePoint.compareTo(array[mid + 1]) <= 0) {

return ++mid;

} else {

low = ++mid;

}

}

}

}



return NOT_FOUND;

}



/**

* Binary search through a sorted array to determine last index

* lesser or equal to end of range.

*

* @param array

* @param rangePoint

* @return integer index or (-1) when not found

*/

public static <T extends Comparable<T>> int binaryEndOfRangeSearch(T[] array, T rangePoint) {

int low = 0;

int high = array.length - 1;

int mid;

int comparedToLow = rangePoint.compareTo(array[low]);



if (comparedToLow == 0) { return low; }



if (comparedToLow > 0) {

if (rangePoint.compareTo(array[high]) >= 0) { return high; }



while (low != high) {

mid = (int)Math.ceil((low + high) / 2);



if (rangePoint.compareTo(array[mid]) >= 0) {

if (rangePoint.compareTo(array[mid + 1]) >= 0) {

low = ++mid;

} else {

return mid;

}

} else {

if (rangePoint.compareTo(array[mid - 1]) >= 0) {

return --mid;

} else {

high = --mid;

}

}

}

}



return NOT_FOUND;

}



/**

* Check if an object falls within a specified range.

*

* @param c

* @param rangeStart

* @param rangeEnd

* @return

*/

public static <T extends Comparable<T>> boolean matchRange(T c, T rangeStart, T rangeEnd) {

if (c.compareTo(rangeStart) >= 0 && c.compareTo(rangeEnd) < 1) { return true; }



return false;

}

}

As stated in my last post, the use of generics expanded my research beyond Date types and provided a framework for searching different types as long as their range principals were the same as implemented and they implemented the Comparable interface. For more information on Generics, please see the reference(s) below; however, the matchRange function reads like this for example: taking in any type T denoted by the parameter c which implements the Comparable interface (T extends Comparable) and two other instances of type T representing a start and end of a range, return a true or false flag indicating existence within range.

Comparable interface is used the same here as the compareTo function has the same usage and return values (at least in typical implementations of primitive/standard types): -1 when first object is smaller than the one being compared to; 0, the two equal; 1, first is larger.

As you will see, the usage of these functions are virtually the same from VB implementation:


// simple loop and match method

for (String s : arrayOfObjects) {

if (Ranges.matchRange(s, RANGE_START, RANGE_END)) {

lst.add(s);

}

}



// binary range search method

Arrays.sort(arrayOfObjects);



int idxStart = Ranges.binaryEndOfRangeSearch(

arrayOfObjects, RANGE_START);

int idxEnd = Ranges.binaryEndOfRangeSearch(

arrayOfObjects, RANGE_END);



if (idxStart >= 0 && idxEnd >= 0) {

for (int i = idxStart; i <= idxEnd; i++) {

lst.add(arrayOfObjects[i]);

}

}

What will be a little different is the FindAll implementation as this was done using some non-native interfaces similar to methodology discussed in the reference section from faqs{dot}com. To get the same functionality, an interface for Predicate had to be added. In addition, a Predicates class was created that implemented a FindAll equivalent that requires an implemented Predicate to return a matching List of objects. Then using Java's anonymous class syntax, we implement Predicate's required match function as we would with delegate.

/**

* Predicate interface used to emulate .NET <code>System.Predicate(Of T)</code>.

*/

package com.blogspot.crossedlogic.rangesearching;



public interface Predicate<T> {



/**

* @param object to test against predicate criteria.

* @return boolean flag indicating a match (true) or no match (false).

*/

public boolean isMatch(T obj);

}



/**

* Predicates class containing static implementation of FindAll.

*/

package com.blogspot.crossedlogic.rangesearching;



import java.util.ArrayList;

import java.util.List;





public class Predicates {



/**

* Replicate .NET findAll and <code>Predicate</code>.

*

* @param <T>

* @param array

* @param match implementation of the Predicate interface.

* @return

* @see com.blogspot.crossedlogic.rangesearching.Predicate

*/

public static <T> List<T> findAll(T[] array, Predicate<T> match) {

List<T> lst = new ArrayList<T>();



for (T obj : array) {

if (match.isMatch(obj)) { lst.add(obj); }

}



return lst;

}

}



// Example usage of FindAll for range searching.

lst.addAll(Predicates.findAll(arrayOfObjects,

new Predicate<String>() {



@Override

public boolean isMatch(String s) {

return Ranges.matchRange(s, RANGE_START, RANGE_END);

}

}));

Not one line, but it is just one code statement as the solution is to utilize the range add capabilities of the List interface couples with the List response of the findAll function using a Predicate which again we implement as a simple call to our static matchRange function coupled with constants for start and end of range.

Well, I have kept you waiting long enough, here is how the results stack up:

--------------------------------------------------

Start processing by going through each item

End processing: 37ms - 9311

--------------------------------------------------

Start processing by using FindAll.

End processing: 36ms - 9311

--------------------------------------------------

Start processing by binary range search.

End processing: 14ms - 9311

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9311

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9311

--------------------------------------------------

Start processing by binary range search.

End processing: 2ms - 9311

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9312

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9312

--------------------------------------------------

Start processing by binary range search.

End processing: 2ms - 9312

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9313

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9313

--------------------------------------------------

Start processing by binary range search.

End processing: 3ms - 9313

--------------------------------------------------

Start processing by going through each item

End processing: 2ms - 9314

--------------------------------------------------

Start processing by using FindAll.

End processing: 2ms - 9314

--------------------------------------------------

Start processing by binary range search.

End processing: 3ms - 9314

--------------------------------------------------

Aside from the first iteration where the static objects are created for the first time, you will see 1-3 millisecond (ms) response time from each of the three methodologies giving us two other tools in the toolkit for searching arrays/lists when dealing with large number of elements and ranges.

I am glad I took this journey on the VB.NET side as it inspired me to want to explore this in Java and I am very pleased with the results. Hopefully, it will be useful for you in your travels as well and thanks for reading. We covered a lot of different ground in Generics, Predicates, sorting Arrays, binary search algorithm, and so on so please use links for more information on things I didn't go into details on including previous post as I may have glossed over something I discussed further there.



Thanks and remember to keep learning and/or keep smiling; you never want to end up a sad fool!


Source code: Download com.blogspot.crossedlogic.rangesearching.zip

References: