Tuesday, November 11, 2008

Java Custom Comparators: Sorting Text and Numbers Independently

Comparing strings and sorting lists of strings in Java is normally very simple as the API provides a nice means of comparing two strings as text. However, the need occasionally arises when we have values stored that can be either numeric or text or a combination thereof.




String[] inputStringArray = { "1", "19", "2", "3pt1", "3", "5pt1", "6pt2", "7pt3", "7pt3", "10", "10x1", "15", "7", "A7", "B87", "10pt5" };

// The input is a String array. Make that an ArrayList:
java.util.List<String> inputList = new java.util.ArrayList<String>(java.util.Arrays.asList(inputStringArray));

Say we have the above input array of string values that we have converted to an ArrayList as shown and now want to sort. In steps custom comparators. The easiest way to implement one is using an anonymous nested class at the point of use which will allow the flexibility of sorting differently each time Collections.sort() is called. If you find you need this more often, then convert this to a standalone class that implements the Comparator interface.




public static java.util.List<String> sort(java.util.List<String> list) {
java.util.Collections.sort(list, new java.util.Comparator<String>() {
/**
* Custom compare to sort numbers as numbers.
* Strings as strings, with numbers ordered before strings.
*
* @param o1
* @param o2
* @return
*/
@Override
public int compare(String o1, String o2) {
boolean isFirstNumeric, isSecondNumeric;

isFirstNumeric = o1.matches("\\d+");
isSecondNumeric = o2.matches("\\d+");

if (isFirstNumeric) {
if (isSecondNumeric) {
return Integer.valueOf(o1).compareTo(Integer.valueOf(o2));
} else {
return -1; // numbers always smaller than letters
}
} else {
if (isSecondNumeric) {
return 1; // numbers always smaller than letters
} else {
isFirstNumeric = o1.split("[^0-9]")[0].matches("\\d+");
isSecondNumeric = o2.split("[^0-9]")[0].matches("\\d+");

if (isFirstNumeric) {
if (isSecondNumeric) {
int intCompare = Integer.valueOf(o1.split("[^0-9]")[0]).compareTo(Integer.valueOf(o2.split("[^0-9]")[0]));
if (intCompare == 0) {
return o1.compareToIgnoreCase(o2);
}
return intCompare;
} else {
return -1; // numbers always smaller than letters
}
} else {
if (isSecondNumeric) {
return 1; // numbers always smaller than letters
} else {
return o1.compareToIgnoreCase(o2);
}
}
}
}
}
});
return list;
}

The above does the following:


  • Determines if the strings being compared are numeric values or not, using regular expression to match on \d which signifies a numeric digit.
  • If both values are numeric, then we compare as integers using its already implemented compareTo function.
  • If only one of the values is numeric, then it is the smallest as we want the numbers to sort first.
  • If both are non-numeric, then we simply compare as strings. However, to show extended support for our third case of blended numbers and text, we go through same logic as above on the two string values: if both begin with numbers, then compare as integers with difference that equal numbers then need to sort by text; if one begins with a number, then it is smaller; if both are text, then just compare as strings.

Hopefully that is clear, but to test the results, you can use the following code:


java.util.List<String> sortedList = sort(inputList);

System.out.println("Numbers and text sorted!");
for (String s : sortedList) {
System.out.println(s);
}

Hope that helps.

Happy coding!


References:

3 comments:

SH said...

Hi,

Try to use this inputStringArray:
"String[] inputStringArray = {"1", "19", "2", "3pt1", "3", "5pt1", "26", "pt3", "script", "10", "10x1", "15", "7", "A7", "B87", "10pt5"};"

You will see that your code is incorrect. In fact, you are splitting with number and taking the first array index without checking that this array may be empty. Which is happening when you have a full string without numbers.

Best regards,

SH said...

The solution would be that :
String[] array1 = o1.split("[^0-9]");
String[] array2 = o2.split("[^0-9]");
isFirstNumeric = array1.length != 0 && array1[0].matches("\\d+");
isSecondNumeric = array2.length != 0 && array2[0].matches("\\d+");

kccjr2 said...

Thank you for the input! I originally created this as an example of Comparator for another programmer who was to fill in the remaining code, including error handling and verification. I likely did not test for that case, so I appreciate you clarify for readers who may try to use this code "as-is" versus the intended use, which was as a learning tool for how comparators work. Anyway, thank you for reading and commenting.