Normalizing a table
- by Alex
I have a legacy table, which I can't change.
The values in it can be modified from legacy application (application also can't be changed).
Due to a lot of access to the table from new application (new requirement), I'd like to create a temporary table, which would hopefully speed up the queries.
The actual requirement, is to calculate number of business days from X to Y. For example, give me all business days from Jan 1'st 2001 until Dec 24'th 2004. The table is used to mark which days are off, as different companies may have different days off - it isn't just Saturday + Sunday)
The temporary table would be created from a .NET program, each time user enters the screen for this query (user may run query multiple times, with different values, table is created once), so I'd like it to be as fast as possible. Approach below runs in under a second, but I only tested it with a small dataset, and still it takes probably close to half a second, which isn't great for UI - even though it's just the overhead for first query.
The legacy table looks like this:
CREATE TABLE [business_days](
[country_code] [char](3) ,
[state_code] [varchar](4) ,
[calendar_year] [int] ,
[calendar_month] [varchar](31) ,
[calendar_month2] [varchar](31) ,
[calendar_month3] [varchar](31) ,
[calendar_month4] [varchar](31) ,
[calendar_month5] [varchar](31) ,
[calendar_month6] [varchar](31) ,
[calendar_month7] [varchar](31) ,
[calendar_month8] [varchar](31) ,
[calendar_month9] [varchar](31) ,
[calendar_month10] [varchar](31) ,
[calendar_month11] [varchar](31) ,
[calendar_month12] [varchar](31) ,
misc.
)
Each month has 31 characters, and any day off (Saturday + Sunday + holiday) is marked with X. Each half day is marked with an 'H'. For example, if a month starts on a Thursday, than it will look like (Thursday+Friday workdays, Saturday+Sunday marked with X):
' XX XX ..'
I'd like the new table to look like so:
create table #Temp (country varchar(3), state varchar(4), date datetime, hours int)
And I'd like to only have rows for days which are off (marked with X or H from previous query)
What I ended up doing, so far is this:
Create a temporary-intermediate table, that looks like this:
create table #Temp_2 (country_code varchar(3), state_code varchar(4), calendar_year int, calendar_month varchar(31), month_code int)
To populate it, I have a union which basically unions calendar_month, calendar_month2, calendar_month3, etc.
Than I have a loop which loops through all the rows in #Temp_2, after each row is processed, it is removed from #Temp_2.
To process the row there is a loop from 1 to 31, and substring(calendar_month, counter, 1) is checked for either X or H, in which case there is an insert into #Temp table.
[edit added code]
Declare @country_code char(3)
Declare @state_code varchar(4)
Declare @calendar_year int
Declare @calendar_month varchar(31)
Declare @month_code int
Declare @calendar_date datetime
Declare @day_code int
WHILE EXISTS(SELECT * From #Temp_2) -- where processed = 0)
BEGIN
Select Top 1 @country_code = t2.country_code, @state_code = t2.state_code, @calendar_year = t2.calendar_year, @calendar_month = t2.calendar_month, @month_code = t2.month_code From #Temp_2 t2 -- where processed = 0
set @day_code = 1
while @day_code <= 31
begin
if substring(@calendar_month, @day_code, 1) = 'X'
begin
set @calendar_date = convert(datetime, (cast(@month_code as varchar) + '/' + cast(@day_code as varchar) + '/' + cast(@calendar_year as varchar)))
insert into #Temp (country, state, date, hours) values (@country_code, @state_code, @calendar_date, 8)
end
if substring(@calendar_month, @day_code, 1) = 'H'
begin
set @calendar_date = convert(datetime, (cast(@month_code as varchar) + '/' + cast(@day_code as varchar) + '/' + cast(@calendar_year as varchar)))
insert into #Temp (country, state, date, hours) values (@country_code, @state_code, @calendar_date, 4)
end
set @day_code = @day_code + 1
end
delete from #Temp_2 where @country_code = country_code AND @state_code = state_code AND @calendar_year = calendar_year AND @calendar_month = calendar_month AND @month_code = month_code
--update #Temp_2 set processed = 1 where @country_code = country_code AND @state_code = state_code AND @calendar_year = calendar_year AND @calendar_month = calendar_month AND @month_code = month_code
END
I am not an expert in SQL, so I'd like to get some input on my approach, and maybe even a much better approach suggestion.
After having the temp table, I'm planning to do (dates would be coming from a table):
select cast(convert(datetime, ('01/31/2012'), 101) -convert(datetime, ('01/17/2012'), 101) as int) - ((select sum(hours) from #Temp where date between convert(datetime, ('01/17/2012'), 101) and convert(datetime, ('01/31/2012'), 101)) / 8)
Besides the solution of normalizing the table, the other solution I implemented for now, is a function which does all this logic of getting the business days by scanning the current table. It runs pretty fast, but I'm hesitant to call a function, if I can instead add a simpler query to get result.
(I'm currently trying this on MSSQL, but I would need to do same for Sybase ASE and Oracle)