Tuesday, November 11, 2008

Java Custom Comparators: Sorting Text and Numbers Independently

Comparing strings and sorting lists of strings in Java is normally very simple as the API provides a nice means of comparing two strings as text. However, the need occasionally arises when we have values stored that can be either numeric or text or a combination thereof.




String[] inputStringArray = { "1", "19", "2", "3pt1", "3", "5pt1", "6pt2", "7pt3", "7pt3", "10", "10x1", "15", "7", "A7", "B87", "10pt5" };

// The input is a String array. Make that an ArrayList:
java.util.List<String> inputList = new java.util.ArrayList<String>(java.util.Arrays.asList(inputStringArray));

Say we have the above input array of string values that we have converted to an ArrayList as shown and now want to sort. In steps custom comparators. The easiest way to implement one is using an anonymous nested class at the point of use which will allow the flexibility of sorting differently each time Collections.sort() is called. If you find you need this more often, then convert this to a standalone class that implements the Comparator interface.




public static java.util.List<String> sort(java.util.List<String> list) {
java.util.Collections.sort(list, new java.util.Comparator<String>() {
/**
* Custom compare to sort numbers as numbers.
* Strings as strings, with numbers ordered before strings.
*
* @param o1
* @param o2
* @return
*/
@Override
public int compare(String o1, String o2) {
boolean isFirstNumeric, isSecondNumeric;

isFirstNumeric = o1.matches("\\d+");
isSecondNumeric = o2.matches("\\d+");

if (isFirstNumeric) {
if (isSecondNumeric) {
return Integer.valueOf(o1).compareTo(Integer.valueOf(o2));
} else {
return -1; // numbers always smaller than letters
}
} else {
if (isSecondNumeric) {
return 1; // numbers always smaller than letters
} else {
isFirstNumeric = o1.split("[^0-9]")[0].matches("\\d+");
isSecondNumeric = o2.split("[^0-9]")[0].matches("\\d+");

if (isFirstNumeric) {
if (isSecondNumeric) {
int intCompare = Integer.valueOf(o1.split("[^0-9]")[0]).compareTo(Integer.valueOf(o2.split("[^0-9]")[0]));
if (intCompare == 0) {
return o1.compareToIgnoreCase(o2);
}
return intCompare;
} else {
return -1; // numbers always smaller than letters
}
} else {
if (isSecondNumeric) {
return 1; // numbers always smaller than letters
} else {
return o1.compareToIgnoreCase(o2);
}
}
}
}
}
});
return list;
}

The above does the following:


  • Determines if the strings being compared are numeric values or not, using regular expression to match on \d which signifies a numeric digit.
  • If both values are numeric, then we compare as integers using its already implemented compareTo function.
  • If only one of the values is numeric, then it is the smallest as we want the numbers to sort first.
  • If both are non-numeric, then we simply compare as strings. However, to show extended support for our third case of blended numbers and text, we go through same logic as above on the two string values: if both begin with numbers, then compare as integers with difference that equal numbers then need to sort by text; if one begins with a number, then it is smaller; if both are text, then just compare as strings.

Hopefully that is clear, but to test the results, you can use the following code:


java.util.List<String> sortedList = sort(inputList);

System.out.println("Numbers and text sorted!");
for (String s : sortedList) {
System.out.println(s);
}

Hope that helps.

Happy coding!


References:

Sunday, October 12, 2008

String to Value Algorithm

Well continuing on the thought of conversion routines, here is an example of taking a string value in Transact-SQL on Microsoft SQL Server (may work on other platforms that support same functions) and creating a unique integer value.

Code listing:

DECLARE @text nvarchar(100), @value bigint
SET @text = 'Smith'
SET @value = 0
WHILE LEN(@text) > 0
BEGIN
SET @value = @value + ASCII(LEFT(@text, 1)) * square(LEN(@text))
SET @text = RIGHT(@text, LEN(@text) - 1)
END
SELECT @value

Very simple hopefully. The basis is using the length of the string, create a large integer value that is multiplied against the ascii value of each letter in the string. I was asked this as a question, so I have not explored all of the uses of a function like this; however, as a quick hash it seems to work fine.

Again, just posting for the learning of some of the tools available in SQL like the ASCII and SQUARE functions. Hopefully making the life of some other DBA or programmer a little easier.


References:


Sunday, September 21, 2008

Converting Between String, ASCII, And Hex In .NET

Welcome again to my twisted mind. This post is just a quick example on converting between string, ascii and hexadecimal values using the .NET, specifically visual basic .NET programming.


Dim strOrig As String = "Hello"
Dim decimals As New List(Of Decimal)()
Dim hexStrs As New List(Of String)()
Dim intCurrent As Double

For Each c As Char In strOrig.ToCharArray()
intCurrent = Strings.Asc(c)
decimals.Add(intCurrent)
hexStrs.Add(Conversion.Hex(intCurrent))
Next

If hexStrs.Count > 0 Then
Console.Write("Hex: " & hexStrs(0))

For i As Integer = 1 To (hexStrs.Count - 1)
Console.Write(" " & hexStrs(i))
Next

Console.WriteLine("")
End If

If decimals.Count > 0 Then
Console.Write("Decimals: " & decimals(0))

For i As Integer = 1 To (decimals.Count - 1)
Console.Write(" " & decimals(i))
Next

Console.WriteLine("")
End If

Console.ReadLine()
As you can see above, .NET provides some nice conversion utilities for ascii value of a character and converting a decimal value to hexidecimal making the solution very straightforward as we just need to create a character array from string value and then convert each character to its ascii equivalent. Furthermore, we can quickly get the hexidecimal value for the character.


This is not rocket science by any stretch, but like posting things like this even as simple as it is so I for one don't forget it and to help another programmer get started on something more complex.


Anyway, for now that is it.


Happy programming.


References:


Friday, September 19, 2008

Using SQL To Find Work Days In Date Range II

In Using SQL To Find Work Days In Date Range, we created our fn_GetWorkDaysInRange user defined function in Microsoft SQL Server 2005. However, through our research into VB.NET and other simpler algorithms if you have been reading along, if we can do it better why not learn how.

So not to leave well enough alone, here is the code listing our fn_GetWorkDaysInRange revisited:

ALTER FUNCTION [dbo].[fn_GetWorkDaysInRange]
(
@startDate datetime -- first datetime in range
, @endDate datetime -- last datetime in range (can be in past)
, @includeStartDate bit -- flag to include start date as a work day
, @firstWkndDay int -- first day of weekend (e.g. Day(datetime))
, @lastWkndDay int -- last day of weekend (e.g. Day(datetime))
)
RETURNS int
AS
BEGIN
-- variables used in processing
DECLARE @workDays int, @sign int

-- parse input and calculate direction of date range
SET @firstWkndDay = Coalesce(@firstWkndDay, 0)
SET @lastWkndDay = Coalesce(@lastWkndDay, @firstWkndDay)
SET @startDate = Coalesce(@startDate, getdate())
SET @endDate = Coalesce(@endDate, @startDate)
SET @sign = Sign(DateDiff(dd, @startdate, @enddate))

-- set initial value of work days result based on include start date value
-- we use sign so that end dates older than start return negative work days
-- for work days to come up as positive value no matter what, sign usage can be replace by 1
SET @workDays = CASE @includeStartDate WHEN 0 THEN 0 ELSE
CASE
WHEN DatePart(dw, @startDate) IN (@firstWkndDay, @lastWkndDay)
THEN 0
ELSE Case @sign When 0 Then 1 Else @sign End
END
END

-- while end date is not equal to start date add sign (-1/1) number of days
-- and add to work days total if not a weekend
WHILE DateDiff(dd, @startDate, @endDate) <> 0
BEGIN
SET @startDate = DateAdd(dd, @sign, @startDate)
SET @workDays = @workDays +
CASE
WHEN DatePart(dw, @startDate) IN (@firstWkndDay, @lastWkndDay)
THEN 0
ELSE @sign
END
END

-- return working days result to caller
RETURN @workDays
END
The number of lines look very similar, but if you closely inspect this new version there are many changes/simplifications. The table variable is no longer needed, removing further dependence on new version(s) of SQL. The logic is reduced to while loop with two execution lines: increment/decrement date value; add to work days total if the new date value is a working day. The extra code is for readability of case logic.

To have this reflect how clean it really is, we can abstract out the logic for weekend which gets rid of the parameters and set statements for weekend day along with extensive case logic. We could then extend that separate function named something like fn_IsNotWeekDay to include logic to check date against our holiday table returning a bit flagging weekend/holiday. With CLR based user defined functions like we explored using VB.NET, this logic can be as complex as we are capable of coding in either SQL or .NET.

Until the next learning adventure.

Keep the code alive!


Related Articles/References:

Wednesday, September 17, 2008

Converting Between Calendar And Working Days In VB.NET

With all the fun of creating functionality similar to this in SQL, made some optimizations within original VB.NET functions for converting between calendar and working/business days and got to thinking it would be good to start taking advantage of the fact that SQL 2005 supports CLR assemblies. Instead of having to do overly complex SQL functions/queries/procedures, use SQL CLR as it will make your life much easier and you will look cool doing it.

Furthermore, is too easy not to. To convert my utility class already containing public shared functions for dealing with going back and forth between working and calendar days, I simply added the two imports shown in the code listing below; Partial identifier to my class; <SqlFunction()> to each method I want exposed to CLR. Even if I went the next step to convet to SqlTypes defined in the first import, that is not too difficult.

Well let's dive into the code listing:


Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server

Partial Public Class MyFirstUDFn
''' <summary>
''' Get calendar days from today equivalent to working days specified.
''' </summary>
''' <param name="wrkDays"></param>
''' <returns></returns>
''' <remarks></remarks>
<SqlFunction()> _
Public Shared Function CalendarDaysFromWorkDays(ByVal wrkDays As Integer) As Integer

'Using integer division, get the whole number of work weeks
Dim calDays As Integer = (wrkDays \ 5) * 7
calDays += (wrkDays Mod 5) 'Add back the remaining work days

'Adjust result to fall on working day as originally intended
While IsWeekend(DateAdd(DateInterval.Day, calDays, Date.Now()))
calDays += Math.Sign(wrkDays)
End While

Return calDays
End Function

'Create list of weekend days - for US this is Saturday (7) and Sunday (1)
Private Shared ReadOnly WEEKEND_DAYS As New List(Of Integer)(New Integer() {7, 1})
''' <summary>
''' Returns flag indicating if date is a weekend day or not.
''' </summary>
''' <param name="calDate"></param>
''' <returns></returns>
''' <remarks></remarks>
<SqlFunction()> _
Public Shared Function IsWeekend(ByVal calDate As DateTime) As Boolean
Return WEEKEND_DAYS.Contains(Weekday(calDate))
End Function

''' <summary>
''' Get working days from today equivalent to calendar days specified.
''' </summary>
''' <param name="calDays"></param>
''' <returns></returns>
''' <remarks></remarks>
<SqlFunction()> _
Public Shared Function WorkDaysFromCalendarDays(ByVal calDays As Integer) As Integer
Dim wrkDays As Integer = 0

If calDays > 0 Then
For i As Integer = 1 To calDays
If Not IsWeekend(DateAdd(DateInterval.Day, i, Date.Now())) Then
wrkDays += 1
End If
Next
ElseIf calDays < 0 Then
For i As Integer = calDays To (-1)
If Not IsWeekend(DateAdd(DateInterval.Day, i, Date.Now())) Then
wrkDays -= 1
End If
Next
End If

Return wrkDays
End Function
End Class
As you have noticed by now, the function logic look incredibly similar to the code we explored in our SQL adventure. As stated, the SQL code originated from the CalendarDaysFromWorkDays function and its counterpart; however, the code for WorkDaysFromCalendarDays is more like our SQL solution that its original just simplier form since these functions are used for a set number of days from current date versus a date range; however, the principles are the same.

One of the additions, is extending this to work with different weekend days and the bonus IsWeekend function that came as a result of implementing that. We could easily extend this to query for holidays and have a function for IsHoliday as well. See the reference to Using conn As New SqlConnection("context connection=true") within the Microsoft Solutions Developer Network (MSDN) link below.

Anyway, where do we go from here. Well the simple step is to create a SQL Server Project! For those of you using Visual Studio 2005/2008 Express as I am on one machine, you are probably asking yourself "What SQL Server Project?". Exactly. This is available in the full version of Visual Studio, so here is what I did to develop this in VS Express.

Create a new Class Library project in VS and ensure that you go into the properties of the project and remove the root namespace (you can leave this as-is or change to another namespace to have structured classes, but just take note of that for later as using methods within assemblies are of the format AssemblyName.ClassName.MethodName so if you have a class in a long namespace it must be declared like AssemblyName.[NamespaceName.ClassName].MethodName). Add a class item to your project and code away. Above code functions as-is and is a good example of all that is needed to make a normal VB function to SqlFunction or sub to SqlStoredProcedure.

Once compiled to a DLL, the do the following on the SQL Server:

sp_configure N'clr enabled', 1
go
reconfigure
go
CREATE ASSEMBLY MyAssembly FROM 'C:\SQLCLRProject.dll' WITH PERMISSION_SET = SAFE
Once you have the assembly created and CLR enabled, you can issue this SQL to test:

-- creates the function
CREATE FUNCTION udf_IsWeekend(@calDate datetime) RETURNS bit
AS
EXTERNAL NAME SQLUserDefinedFunctions.MyFirstUDFn.IsWeekend
GO

-- example usage
SELECT dbo.udf_IsWeekend(DateAdd(dd, 2, getdate()))
Now you are using your original .NET code in SQL server and no need to pull your here out finding SQL equivalents to pre-existing .NET functions. Although, that is fun in and of itself, sometimes you just need to save time and practicing code resuse is always a good thing.

Hope that helps!


Related Articles/References:


Converting Working Days To Calendar Days Using SQL

In the Using SQL To Find Work Days In Date Range post I mentioned that I original thought it would be an easy conversion from calendar days to work days as using simple mathematical formula in SQL, you can get calendar days or a calendar date from working days. It was my thought that the opposite of this formula would work; however, that is another post you can read at your leisure. In this post, I figured I would share my simply formula. It is so straight forward that hopefully it will save you from wasting time going for a more complex approach.

Here is the code listing:


DECLARE @workDays int, @calDays int
SET @workDays = 3

-- using integer division to convert work weeks to calendar weeks
-- AND modulus division to get partial week's days
SELECT @calDays = @workDays / 5 * 7 + @workDays % 5
-- just double check that end result is not on a weekend
WHILE DatePart(dw, DateAdd(dd, @calDays, getdate())) IN (7, 1)
SET @calDays = @calDays + 1
-- select away you have your calendar days and date if you would like
SELECT DateAdd(dd, @calDays, getdate())
As I said, pretty straight forward. Since SQL, at least in Microsoft SQL Server, will treat division of int datatypes as integer division, you will get the whole number amount of weeks represented by work days (i.e. 8/5 = 1). For other platforms, just use FLOOR function or its equivalent in your code (e.g. Floor(@workDays / 5) or the integer division operator to achieve the same results. Subsequently, modulus is taken of working days to get the partial week days involved which should be 0 - 4 that we can add to calendar days we calculated from multiplying the number of weeks by 7.

You can just run in query window as above, but to complement our function/procedure to get working days from a set of calendar dates you can put this in a stored procedure or user defined function the returns the calendar days themselves or the resulting date.

Keep evolving development!

Using SQL To Find Work Days In Date Range

I had a question come up today for Microsoft SQL Server 2005 on how to calculate the number of working/business days between two dates with a requirement that the answer must function in countries where weekend can be variable two days or even one. My first thought was that is simple. Famous last words!

Eight hours of research later, this article covers the calculation of working days in the date range which appears to work quite nicely. Here is a code listing:

CREATE FUNCTION [dbo].[fn_GetWorkDaysInRange]
(
@startDate datetime -- first datetime in range
, @endDate datetime -- last datetime in range (can be in past)
, @includeStartDate bit -- flag to include start date as a work day
, @firstWkndDay int -- first day of weekend (e.g. Day(datetime))
, @lastWkndDay int -- last day of weekend (e.g. Day(datetime))
)
RETURNS int
AS
BEGIN
-- variables used in processing
DECLARE @workDays int, @sign int
DECLARE @table table (calendarDate datetime, isWorkDay bit)

-- parse input and calculate direction of date range
SET @firstWkndDay = Coalesce(@firstWkndDay, 0)
SET @lastWkndDay = Coalesce(@lastWkndDay, @firstWkndDay)
SET @startDate = Coalesce(@startDate, getdate())
SET @endDate = Coalesce(@endDate, @startDate)
SET @sign = Sign(DateDiff(dd, @startdate, @enddate))

-- insert our starting date
INSERT INTO @table
VALUES (@startDate, Case @includeStartDate When 0 Then 0 Else NULL End)

-- add dates into table from start to end date
IF @sign > 0 -- use sign of date difference to determine direction
BEGIN
WHILE (SELECT MAX(calendarDate) FROM @table) < @enddate
INSERT INTO @table
SELECT DateAdd(dd, 1, MAX(calendarDate)), NULL FROM @table
END
ELSE
BEGIN
WHILE (SELECT MIN(calendarDate) FROM @table) > @enddate
INSERT INTO @table SELECT DateAdd(dd, -1, MIN(calendarDate)), NULL FROM @table
END

-- update table to tag work days
UPDATE @table
SET isWorkDay = CASE
WHEN DatePart(dw, calendarDate) IN (@firstWkndDay, @lastWkndDay)
THEN 0
ELSE 1
END
WHERE isWorkDay IS NULL

-- select the working days from our table into return variable
SELECT @workDays = COUNT(calendarDate) FROM @table WHERE isWorkDay = 1

RETURN (@workDays * @sign)
END
As stated above, this function will remove weekend days between the starting and ending range and thus return number of working days. Once we have that result, we can use a query to retrieve our holidays (or alternatively modify the above to take in a country code and lookup the weekend days and holidays returning the net business days).

Here is an example of this function's usage

-- create holidays table for testing data
CREATE TABLE [dbo].[Holidays](
[day] [datetime],
[holiday] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[country] [nvarchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS
) ON [PRIMARY]

INSERT INTO Holidays
SELECT '12/25/2008', 'Christmas', 'USA'
UNION SELECT '12/26/2008', 'Day After Christmas', 'USA'
-- end creation of table for testing data

DECLARE @workDays int, @holidays int
DECLARE @startDate datetime, @endDate datetime

SET @startDate = '12/15/2008'
SET @endDate = '12/29/2008'

SELECT @workDays = dbo.fn_GetWorkDaysInRange(@startDate, @endDate, 0, 7, 1)

SELECT @holiDays = COUNT([day])
FROM [Holidays]
WHERE [country] = 'USA' AND DatePart(dw, [day]) NOT IN (7, 1)
AND [day] BETWEEN @startDate AND @endDate

PRINT (@workDays)
PRINT (@holidays)
PRINT (@workDays - @holidays)

Results come out 10, 2, and 8 for each of the three print statements, respectively. Exactly what we wanted! It is a joy when it all works.

Hopefully this post will save you as long journey, but leave enough uncharted territory to have a little fun with in customizing to your own environment. I have even played with this myself to replace some logic I was using for determining shop working days, so enjoy. For those of you not on Microsoft SQL Server 2005, please keep in mind that other versions of Microsoft SQL Server that support user defined functions should work. Consequently, for other platforms or versions, the structure of this code can probably be manipulated to work in a stored procedure and/or using temporary table instead of a table variable and likewise for other features used not present in your system. The principles should be the same.

Hope this helps and happy coding!

Monday, September 15, 2008

DateSerial In Microsoft SQL Server 2005

Well, in Group By Time: TimeSerial Makes A Return To Microsoft SQL 2005 we explored bringing TimeSerial functionality to SQL Server, but of course we can't stop there as the DateSerial function is pretty useful too.

There are probably a number of different methods to achieve this, but here is what I came up with.

Code listing:


CREATE FUNCTION dbo.DateSerial(@year int, @month int, @day bigint)
RETURNS datetime
AS
BEGIN
DECLARE @date datetime

-- catch invalid year entries and default appropriately
SET @year = CASE
WHEN @year < 1900 Then 1900
When @year > 9999 Then year(getdate())
Else @year End

-- convert date by adding together like yyyymmdd
SET @date = Cast(Cast(@year * 10000 + 101 As varchar) As datetime)
-- Alternative method of parsing year into base date
-- SET @date = Cast('1/1/' + Cast(@year As varchar) As datetime)

-- Add to date the proper months subtracting 1 since we used 1 as start instead of zero.
SET @date = DateAdd(mm, @month - 1, @date)
-- Add to date the proper days subtracting 1 since we used 1 as start instead of zero.
SET @date = DateAdd(dd, @day - 1, @date)

RETURN @date
END
First line is to avoid errors in incorrect starting year value, but can be adjusted according to your own needs. The months and days are added in through simple DateAdd which allows for positive/negative numbers in addition to not being bound by 12 or 31 respectively making this like our TimeSerial function in that it can be used to simply convert a year, month, and day into date or to do some date math on the fly.

Usage:

SELECT dbo.DateSerial(YEAR(GETDATE()), MONTH(GETDATE()), 1 - 35) AS dateSerialized
This will return the date 35 days prior to the first day of the current month in the current year. Moreover, since this solution takes advantage of user defined function only, implementing in SQL 2000 should not be an issue.

So there you have it, DateSerial in SQL Server.

Happy coding!


References:


Sunday, September 07, 2008

Group By Time: TimeSerial Makes A Return To Microsoft SQL 2005

Last time we looked at grouping by time for our customer calls statistics by the half an hour. We ended with this code based on SQL 2005 common table expression syntax.

WITH TableByTime As (
SELECT Cast(DateName(hh, CallDateTime) As Int) As [Sort],
Right('0' + Case When DateName(hh, CallDateTime) = 0 Then '12'
When DateName(hh, CallDateTime) <= 12 Then DateName(hh, CallDateTime)
Else DateName(hh, DateAdd(hh, -12, CallDateTime)) End, 2) + ':' +
Case When DateName(n, CallDateTime) >= 30 Then '30' Else '00' End +
Case When DateName(hh, CallDateTime) < 12 Then 'AM' Else 'PM' End As [Hour],
Customer
FROM CustomerCalls (NoLock)
)
SELECT Hour, Count(*)
FROM TableByTime
GROUP BY Sort, Hour
ORDER BY Sort, Hour
We can be satisfied with this, but why waste a perfectly good opportunity to keep learning. In all seriousness, breakdowns by date/time are statistics I often have to get, so if you are anything like me it would be good to explore making this more reusable and streamlined. Since what we have above is very clean compared to the starting point, what we have left to do is create a function to emulate the TimeSerial function from Microsoft Access that does what our case statements are doing and more.

In summary, TimeSerial, takes in hour in military notation (i.e. 14 for 2PM), minutes, and seconds and translates to appropriate time in format hh:mm:ss with AM/PM indicator. In addition to straight time display, it could do calculations for you based on varying inputs and use of negatives. If you want more information on how TimeSerial functions, see reference for function in Microsoft Access below.

So diving in, we can write a function like this that adds TimeSerial to Microsoft SQL Server using a user defined function.

CREATE FUNCTION dbo.TimeSerial (@hrs int, @min int, @sec bigint)

RETURNS nvarchar(10)

AS

BEGIN

DECLARE @result nvarchar(10), @total bigint, @AMorPM nvarchar(2)

DECLARE @hours int, @minutes int, @seconds int



-- convert everything to seconds handling null params with isnull or coalesce

SET @total = IsNull(@sec,0) + IsNull(@min,0) * 60 + IsNull(@hrs,0) * 3600

SET @total = 86400 + @total % 86400 -- handle negative time relative to midnight



-- calculate the hour portion

SET @hours = 0

IF (@total >= 3600)

BEGIN

SET @hours = floor(@total/3600) % 24

SET @total = @total % 3600

END



-- set am/pm based on hours in HH format

SET @AMorPM = 'PM'

IF @hours < 12

BEGIN

SET @AMorPM = 'AM'

END



-- adjust hours to non-military time

IF @hours > 12 OR @hours = 0

BEGIN

SET @hours = abs(@hours - 12)

END



-- calculate the minutes and seconds portion

SET @minutes = 0

IF (@total >= 60)

BEGIN

SET @minutes = floor(@total/60)

SET @total = @total % 60

END



-- set seconds to remainder

SET @seconds = @total



SET @result = Cast(@hours As nvarchar(2)) + ':' + RIGHT('0'+Cast(@minutes As nvarchar(2)), 2)

SET @result = @result + ':' + RIGHT('0'+Cast(@seconds As nvarchar(2)),2) + @AMorPM



RETURN @result

END
As you will see in the code above, which hopefully speaks for itself as to what it is doing, we can do a little more than just format our time so we abstract out the need for extensive case when logic and give ourselves a handy utility function for our SQL toolkit.

Putting it in place with our original query, we can immediately simplify our syntax to this.

WITH TableByTime As (

SELECT Cast(DateName(hh, CallDateTime) As Int) As [Hr],

FLOOR(Cast(DateName(n, CallDateTime) As Int)/30) * 30 As [Mi],

Customer

FROM CustomerCalls (NoLock)

)

SELECT dbo.TimeSerial(Hr, Mi, 0) As [Hour],

Count(*)

FROM TableByTime

GROUP BY Hr, Mi

ORDER BY Hr, Mi
Aside from the TimeSerial function addition to the code, you will notice a better algorithm to get to group time in 30 minute buckets using floor which gets the lowest integer count of 30 in the number of minutes currently in our time. Since Microsoft SQL will typically do integer division on two numbers that are int datatypes this is probably unnecessary, but I like to be very deliberate in code for nothing else than I will remember what I was thinking when I look at it a year later. The premise here is that we will only ever get 0 or 1 from the division and then multiplying by 30 will give us the correct bucket.

Well we are close to being golden, but we still have a little fluff just to extract the hour and minute portions of time as we have to get as string then cast, so we could write functions for those as well; however, with Microsoft SQL Server, you can utilize ODBC canonical functions for { fn HOUR() } and { fn MINUTE() }.



dbo.TimeSerial({ fn HOUR(GETDATE()) }, FLOOR({ fn MINUTE(GETDATE()) } / 30) * 30, 0) AS HourBucket

As you see above, in combination with our user defined function for TimeSerial the ODBC canonical functions for HOUR and MINUTE make it very streamlined to get our hour bucket in one statement. For sorting purposes, it will probably still be a good idea to use the { fn HOUR() } in Hr column and { fn MINUTE() } in the Mi column in previous code using the common table expression we composed, then just use TimeSerial with those column values.

In conclusion, combining the flexibility of user defined functions and some of the tools in our SQL toolkit with Microsoft SQL Server, we can keep the user community happy with snappy time based reports. The common language runtime (CLR), which we didn't discuss here in detail, is another great means of adding user defined functionality. Anyway, happy coding.


References:


Saturday, September 06, 2008

Range Searching Using Java

Welcome! Intrigued by my findings in my last post regarding Range Searching Using Visual Basic .NET, I decided to try to reproduce the FindAll methodology as well as my binary search based on range for grins and again was surprised by my findings. Binary search worked the fastest/consistently within Java code where all three methods were faster which was not too much of a surprise other than the ported FindAll since it and the concept of Predicate match was not native (at least not to my knowledge) at the time of writing this post.

Background summary for those that haven't read the original article in VB, this started as research based on a question of finding the dates within a list that match a selected date range. The intent was to go beyond the simple iteration and match method and find a faster means through binary search only to find the VB FindAll and Predicate implementation eluded to above and slower performance from binary range functions created.

With that said, onward and upward!

The journey in Java begins with replicating our utility class Ranges that handles the simple match method utilized by looping through array as well as our binary search implementation based on a start and end of a range.

The code listing is below:



package com.blogspot.crossedlogic.rangesearching;





public class Ranges {

private static final int NOT_FOUND = -1;



/**

* Binary search through a sorted array to determine first index

* greater or equal to start of range.

*

* @param array

* @param rangePoint

* @return integer index or (-1) when not found

*/

public static <T extends Comparable<T>> int binaryStartOfRangeSearch(T[] array, T rangePoint) {

int low = 0;

int high = array.length - 1;

int mid;

int comparedToHigh = rangePoint.compareTo(array[high]);



if (comparedToHigh == 0) { return high; }



if (comparedToHigh < 0) {

if (rangePoint.compareTo(array[low]) <= 0) { return low; }



while (low != high) {

mid = (int)Math.floor((low + high) / 2);



if (rangePoint.compareTo(array[mid]) <= 0) {

if (rangePoint.compareTo(array[mid - 1]) <= 0) {

high = --mid;

} else {

return mid;

}

} else {

if (rangePoint.compareTo(array[mid + 1]) <= 0) {

return ++mid;

} else {

low = ++mid;

}

}

}

}



return NOT_FOUND;

}



/**

* Binary search through a sorted array to determine last index

* lesser or equal to end of range.

*

* @param array

* @param rangePoint

* @return integer index or (-1) when not found

*/

public static <T extends Comparable<T>> int binaryEndOfRangeSearch(T[] array, T rangePoint) {

int low = 0;

int high = array.length - 1;

int mid;

int comparedToLow = rangePoint.compareTo(array[low]);



if (comparedToLow == 0) { return low; }



if (comparedToLow > 0) {

if (rangePoint.compareTo(array[high]) >= 0) { return high; }



while (low != high) {

mid = (int)Math.ceil((low + high) / 2);



if (rangePoint.compareTo(array[mid]) >= 0) {

if (rangePoint.compareTo(array[mid + 1]) >= 0) {

low = ++mid;

} else {

return mid;

}

} else {

if (rangePoint.compareTo(array[mid - 1]) >= 0) {

return --mid;

} else {

high = --mid;

}

}

}

}



return NOT_FOUND;

}



/**

* Check if an object falls within a specified range.

*

* @param c

* @param rangeStart

* @param rangeEnd

* @return

*/

public static <T extends Comparable<T>> boolean matchRange(T c, T rangeStart, T rangeEnd) {

if (c.compareTo(rangeStart) >= 0 && c.compareTo(rangeEnd) < 1) { return true; }



return false;

}

}

As stated in my last post, the use of generics expanded my research beyond Date types and provided a framework for searching different types as long as their range principals were the same as implemented and they implemented the Comparable interface. For more information on Generics, please see the reference(s) below; however, the matchRange function reads like this for example: taking in any type T denoted by the parameter c which implements the Comparable interface (T extends Comparable) and two other instances of type T representing a start and end of a range, return a true or false flag indicating existence within range.

Comparable interface is used the same here as the compareTo function has the same usage and return values (at least in typical implementations of primitive/standard types): -1 when first object is smaller than the one being compared to; 0, the two equal; 1, first is larger.

As you will see, the usage of these functions are virtually the same from VB implementation:


// simple loop and match method

for (String s : arrayOfObjects) {

if (Ranges.matchRange(s, RANGE_START, RANGE_END)) {

lst.add(s);

}

}



// binary range search method

Arrays.sort(arrayOfObjects);



int idxStart = Ranges.binaryEndOfRangeSearch(

arrayOfObjects, RANGE_START);

int idxEnd = Ranges.binaryEndOfRangeSearch(

arrayOfObjects, RANGE_END);



if (idxStart >= 0 && idxEnd >= 0) {

for (int i = idxStart; i <= idxEnd; i++) {

lst.add(arrayOfObjects[i]);

}

}

What will be a little different is the FindAll implementation as this was done using some non-native interfaces similar to methodology discussed in the reference section from faqs{dot}com. To get the same functionality, an interface for Predicate had to be added. In addition, a Predicates class was created that implemented a FindAll equivalent that requires an implemented Predicate to return a matching List of objects. Then using Java's anonymous class syntax, we implement Predicate's required match function as we would with delegate.

/**

* Predicate interface used to emulate .NET <code>System.Predicate(Of T)</code>.

*/

package com.blogspot.crossedlogic.rangesearching;



public interface Predicate<T> {



/**

* @param object to test against predicate criteria.

* @return boolean flag indicating a match (true) or no match (false).

*/

public boolean isMatch(T obj);

}



/**

* Predicates class containing static implementation of FindAll.

*/

package com.blogspot.crossedlogic.rangesearching;



import java.util.ArrayList;

import java.util.List;





public class Predicates {



/**

* Replicate .NET findAll and <code>Predicate</code>.

*

* @param <T>

* @param array

* @param match implementation of the Predicate interface.

* @return

* @see com.blogspot.crossedlogic.rangesearching.Predicate

*/

public static <T> List<T> findAll(T[] array, Predicate<T> match) {

List<T> lst = new ArrayList<T>();



for (T obj : array) {

if (match.isMatch(obj)) { lst.add(obj); }

}



return lst;

}

}



// Example usage of FindAll for range searching.

lst.addAll(Predicates.findAll(arrayOfObjects,

new Predicate<String>() {



@Override

public boolean isMatch(String s) {

return Ranges.matchRange(s, RANGE_START, RANGE_END);

}

}));

Not one line, but it is just one code statement as the solution is to utilize the range add capabilities of the List interface couples with the List response of the findAll function using a Predicate which again we implement as a simple call to our static matchRange function coupled with constants for start and end of range.

Well, I have kept you waiting long enough, here is how the results stack up:

--------------------------------------------------

Start processing by going through each item

End processing: 37ms - 9311

--------------------------------------------------

Start processing by using FindAll.

End processing: 36ms - 9311

--------------------------------------------------

Start processing by binary range search.

End processing: 14ms - 9311

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9311

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9311

--------------------------------------------------

Start processing by binary range search.

End processing: 2ms - 9311

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9312

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9312

--------------------------------------------------

Start processing by binary range search.

End processing: 2ms - 9312

--------------------------------------------------

Start processing by going through each item

End processing: 1ms - 9313

--------------------------------------------------

Start processing by using FindAll.

End processing: 3ms - 9313

--------------------------------------------------

Start processing by binary range search.

End processing: 3ms - 9313

--------------------------------------------------

Start processing by going through each item

End processing: 2ms - 9314

--------------------------------------------------

Start processing by using FindAll.

End processing: 2ms - 9314

--------------------------------------------------

Start processing by binary range search.

End processing: 3ms - 9314

--------------------------------------------------

Aside from the first iteration where the static objects are created for the first time, you will see 1-3 millisecond (ms) response time from each of the three methodologies giving us two other tools in the toolkit for searching arrays/lists when dealing with large number of elements and ranges.

I am glad I took this journey on the VB.NET side as it inspired me to want to explore this in Java and I am very pleased with the results. Hopefully, it will be useful for you in your travels as well and thanks for reading. We covered a lot of different ground in Generics, Predicates, sorting Arrays, binary search algorithm, and so on so please use links for more information on things I didn't go into details on including previous post as I may have glossed over something I discussed further there.



Thanks and remember to keep learning and/or keep smiling; you never want to end up a sad fool!


Source code: Download com.blogspot.crossedlogic.rangesearching.zip

References:


Friday, September 05, 2008

Range Searching Using Visual Basic .NET

The other day, I was asked a question about comparing dates in an array to a date range. The basic answer is to iterate through the array and compare each date to the beginning and ending of the range; however, with some curiosity exacerbated by another question planting the seed of binary searches in my head, I was determined to write a binary search algorithm for determining a value in a range. Along the way, I was enlightened/awoken to the concept of predicates in VB.NET and the System.Array.FindAll method. So this post will try to provide an overview of how to perform some range searching in .NET as I took research a step further and abstracted functionality to be more generic than with Date types.

With that said, here is the basic comparison to any range of objects that implement the IComparable interface.

Function MatchRange(Of T As IComparable)(ByVal c As T, ByVal rangeStart As T, ByVal rangeEnd As T) As Boolean
If c.CompareTo(rangeStart) >= 0 And c.CompareTo(rangeEnd) < 1 Then
Return True
End If

Return False
End Function
As long as you are comfortable with generics, the above is pretty straight forward (for more details on generics, see MSDN link in references) as we can now pass in any class that implements IComparable and take advantage of the public interface method CompareTo(Object) which will return one the following values: -1 when first object is smaller than the one being compared to; 0, the two equal; 1, first is larger. An easy demonstration of how this is used is MatchRange(5, 1, 10) will return true. So for back to original scenario of needing to determine all the values of a list that fall within a range, we get code like the following:

Dim lst As New List(Of String)()

For Each s As String In arrayOfObjects
If Ranges.MatchRange(s, RANGE_START, RANGE_END) Then
lst.Add(s)
End If
Next
Where Ranges is the name of utility class I placed my Public Shared Function MatchRange and RANGE_START and RANGE_END are string constants representing the beginning and ending value of the range we are searching within.

Speaking of the Ranges class, here is its full listing to dive right into the next step which was to implement a better range search using binary search algorithm:

Option Explicit On
Option Strict On

Namespace CrossedLogic.BlogSport.RangeSearching
Public Class Ranges

''' <summary>
''' Binary search through a sorted array to determine first index
''' greater or equal to start of range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="array"></param>
''' <param name="rangePoint"></param>
''' <returns>integer index or (-1) when not found</returns>
''' <remarks></remarks>
Public Shared Function BinaryStartOfRangeSearch(Of T As IComparable)(ByVal array As T(), ByVal rangePoint As T) As Integer
Dim low As Integer = 0
Dim high As Integer = array.Length - 1
Dim mid As Integer
Dim startComparedToHigh As Integer = rangePoint.CompareTo(array(high))

If startComparedToHigh = 0 Then
Return high
End If

If startComparedToHigh < 0 Then
If rangePoint.CompareTo(array(low)) <= 0 Then
Return low
End If

While low <> high
mid = CInt(Math.Floor((low + high) / 2))

If rangePoint.CompareTo(array(mid)) <= 0 Then
If rangePoint.CompareTo(array(mid - 1)) <= 0 Then
high = mid - 1
Else
Return mid
End If
Else
If rangePoint.CompareTo(array(mid + 1)) <= 0 Then
Return mid + 1
Else
low = mid + 1
End If
End If
End While
End If

Return -1
End Function

''' <summary>
''' Binary search through a sorted array to determine last index
''' lesser or equal to end of range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="array"></param>
''' <param name="rangePoint"></param>
''' <returns>integer index or (-1) when not found</returns>
''' <remarks></remarks>
Public Shared Function BinaryEndOfRangeSearch(Of T As IComparable)(ByVal array As T(), ByVal rangePoint As T) As Integer
Dim low As Integer = 0
Dim high As Integer = array.Length - 1
Dim mid As Integer
Dim endComparedToLow As Integer = rangePoint.CompareTo(array(low))

If endComparedToLow = 0 Then
Return low
End If

If endComparedToLow > 0 Then
If rangePoint.CompareTo(array(high)) >= 0 Then
Return high
End If

While low <> high
mid = CInt(Math.Ceiling((low + high) / 2))

If rangePoint.CompareTo(array(mid)) >= 0 Then
If rangePoint.CompareTo(array(mid + 1)) >= 0 Then
low = mid + 1
Else
Return mid
End If
Else
If rangePoint.CompareTo(array(mid - 1)) >= 0 Then
Return mid - 1
Else
high = mid - 1
End If
End If
End While
End If

Return -1
End Function

''' <summary>
''' Check if an object falls within a specified range.
''' </summary>
''' <typeparam name="T"></typeparam>
''' <param name="c"></param>
''' <param name="rangeStart"></param>
''' <param name="rangeEnd"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Shared Function MatchRange(Of T As IComparable)(ByVal c As T, ByVal rangeStart As T, ByVal rangeEnd As T) As Boolean
If c.CompareTo(rangeStart) >= 0 And c.CompareTo(rangeEnd) < 1 Then
Return True
End If

Return False
End Function
End Class
End Namespace
And its usage:

Dim lst As New List(Of String)()

System.Array.Sort(arrayOfObjects)

Dim idxStart As Integer = Ranges.BinaryStartOfRangeSearch(arrayOfObjects, RANGE_START)
Dim idxEnd As Integer = Ranges.BinaryEndOfRangeSearch(arrayOfObjects, RANGE_END)

If idxStart >= 0 Then
If idxEnd < idxStart Then idxEnd = arrayOfObjects.Length - 1
For i As Integer = idxStart To idxEnd
lst.Add(arrayOfObjects(i))
Next
End If
First difference, you should see is that like regular binary search, the array must be sorted. Once we have a sorted array, we can use our binary search functions specific to beginning and ending of the range to get the respective index. Using the indices, we can add just those objects to the list we created to store results. Well a bit more code, but well worth it for the end game of faster searches of large object arrays like the example 13,149 string array used in the research for this post. Consequently, though, to my surprise, this code did not run faster mostly because of the requirement for sorting. With the original date question, even the initial solution above was sorted using System.Array.Sort; however, since that solution will yield the same list just unordered, for the sake of performance measurement I left array unsorted and then sorted as part of the binary search method.

Here are some of the results to illustrate difference:

---------------------------------------------------
Start processing by going through each item
End processing: 110ms - 9311
---------------------------------------------------
Start processing by using FindAll.
End processing: 13ms - 9311
---------------------------------------------------
Start processing by binary range search.
End processing: 59ms - 9311
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9311
---------------------------------------------------
Start processing by using FindAll.
End processing: 14ms - 9311
---------------------------------------------------
Start processing by binary range search.
End processing: 47ms - 9311
---------------------------------------------------
Start processing by going through each item
End processing: 10ms - 9312
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9312
---------------------------------------------------
Start processing by binary range search.
End processing: 50ms - 9312
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9313
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9313
---------------------------------------------------
Start processing by binary range search.
End processing: 51ms - 9313
---------------------------------------------------
Start processing by going through each item
End processing: 6ms - 9314
---------------------------------------------------
Start processing by using FindAll.
End processing: 8ms - 9314
---------------------------------------------------
Start processing by binary range search.
End processing: 55ms - 9314
---------------------------------------------------

As you will see above, the find by simple iteration of the array usually came in lower than the binary search method, although they both fluctuate, especially the simple loop method as it at times takes 100+ milliseconds (ms) while at others only 10 ms or below. Subsequently, you will notice that the FindAll although a few ms slower at times is very consistent and I found gives similar results regardless of the type of object in array whereas the other two vary as previously stated. So what is this FindAll method and how do I use it?

Well the FindAll method is a public shared function within System.Array. It's usage is not only the most consistent, but as you will see the most streamlined aside from delegate function you have to create. Here is a listing assuming availability of previously posted code:

lst.AddRange(System.Array.FindAll(arrayOfObjects, New Predicate(Of String)(AddressOf MatchRange)))

'Delegate implementation for Predicate needed in System.Array.FindAll
Function MatchRange(ByVal n As String) As Boolean
Return Ranges.MatchRange(n, RANGE_START, RANGE_END)
End Function
Yes! One line is all, you are not reading that incorrectly. Taking advantage of the List's AddRange capabilities that will add all elements of an array to list and the FindAll implementation that will return an array of elements from the array argument that match our range through use of the Predicate(Of T) delegate implemented here by a local MatchRange function that takes type T which is String in this case and does logic to respond true or false for a match. Since I have already gone through the work of coding a functioning match range routine, our Predicate simply becomes a call to our shared MatchRange function.

The one caveat to note with this solution is that since it only takes in object to be matched, the criteria and variables needed have to exist globally or just within this function which you can observe by my having to use RANGE_START and RANGE_END constants in my match. Moreover, if you are already using List(Of T) versus and array, you can simply utilize FindAll function of List to get a subset List(Of T). See references for more on the List.FindAll method.

Well hopefully you have enjoyed this journey.

In summary, you can most definitely use binary search method for primitives and possibly some types, but for Dates and Strings the System.Array.FindAll/List.FindAll are good utilities for large collections and the simple but effective for each loop and comparison.

Happy coding!


Source code: Download CrossedLogic.BlogSpot.RangeSearching.zip

References:

Thursday, September 04, 2008

Group By Time: The Microsoft SQL Server 2005 Way

Statistics, we all seem to need them, but it is not always easy getting them exactly the way you would like. This article is going to explore the simple task of grouping data by hour spiced up a bit by displaying results in standard time format with AM/PM instead of military time and groupings every 30 minutes.
Say you have the following data.

Table: CustomerCalls


Desired Report


Starting with the basics, we do the following:


SELECT DateName(hh, CallDateTime) As [Hour], Count(*) As [Calls]

FROM CustomerCalls

GROUP BY DateName(hh, CallDateTime)

Using the DateName function, we grab the hour portion of datetime column in our table and group rows and count. All is well and we get results like this by military time hour of the day:



Great! But now we need to put in a little bit of fancy formatting to change 0100 to 01:00AM for the non-combat oriented among us.


SELECT Right('0' + Case When DateName(hh, CallDateTime) = 0 Then '12'
When DateName(hh, CallDateTime) <= 12 Then DateName(hh, CallDateTime)
Else DateName(hh, DateAdd(hh, -12, CallDateTime)) End, 2) + ':' +
Case When DateName(n, CallDateTime) >= 30 Then '30' Else '00' End +
Case When DateName(hh, CallDateTime) < 12 Then 'AM' Else 'PM' End As [Hour],
Count(*) As [Calls]
FROM CustomerCalls(NoLock)
GROUP BY Right('0' + Case When DateName(hh, CallDateTime) = 0 Then '12'
When DateName(hh, CallDateTime) <= 12 Then DateName(hh, CallDateTime)
Else DateName(hh, DateAdd(hh, -12, CallDateTime)) End, 2) + ':' +
Case When DateName(n, CallDateTime) >= 30 Then '30' Else '00' End +
Case When DateName(hh, CallDateTime) < 12 Then 'AM' Else 'PM' End


Life is all well and good again. Well maybe. That code is crazy as we must have the entire case logic in the group by statement to not violate SQL syntax rules when grouping, so what to do next. Well good news for us there is the WITH statement in SQL Server 2005 for creating Common Table Expressions (CTE). So diving into CTE, the code all together including some added functionality to sort in correct time of day order is below:



WITH TableByTime As (
SELECT Cast(DateName(hh, CallDateTime) As Int) As [Sort],
Right('0' + Case When DateName(hh, CallDateTime) = 0 Then '12'
When DateName(hh, CallDateTime) <= 12 Then DateName(hh, CallDateTime)
Else DateName(hh, DateAdd(hh, -12, CallDateTime)) End, 2) + ':' +
Case When DateName(n, CallDateTime) >= 30 Then '30' Else '00' End +
Case When DateName(hh, CallDateTime) < 12 Then 'AM' Else 'PM' End As [Hour],
Customer
FROM CustomerCalls (NoLock)
)
SELECT Hour, Count(*)
FROM TableByTime
GROUP BY Sort, Hour
ORDER BY Sort, Hour


Wonderful! And code is not too unmanageable in my humble opinion. Probably can break it down some more, but as you see the benefits of CTE has already improved readability that I stopped here for implementation.

Let’s take a deeper look at the code since we glossed over this earlier. First column takes the normal military hour ensuring it is functioning as an integer for use as our sorting mechanism later since time in military notation goes from 0 – 23 which is nicely ordered for us already. Second column, [Hour], is our case when breaking down datetime column to standard time of day: 12:00AM – 11:30PM as we also group the times within the first and last 30 minutes of each hour. The third column (could be columns) is data we want from the table – used Customer which we didn’t really need for our aggregate which is only a count; however, if we were doing sums or other statistical analysis on numerical data in the CustomerCalls table we could add those columns here.

Now that we have our CTE, you can see the subsequent T-SQL statement where we actually implement the GROUP BY statement is drastically simplified as we have [Hour] column to call directly instead of the entire case logic. We ORDER BY our sort column and then by our hour and presto we have our results.

In conclusion, using Common Table Expressions in SQL Server 2005 adds a great deal of power and flexibility beyond my simple case here while streamlining SQL statements to be easier to maintain/read, so read up and enjoy the cool new features of Microsoft SQL Server.


References: