Never trust auto generated tables

No Comments

Written by Luis Miguel – Originally published on November 14, 2010.

There is a huge difference between numeric data types. Without going into the details of how much memory they use or any other technicalities that could take a whole chapter in a book, let’s just say that you can, on purpose or accidentally, use a larger than necessary numeric type For example Decimal, to store Integers, and your program will not crash and everything will “just work”.

But will it work efficiently? Could this be considered an optimized design? Not at all.

Consider this scenario

1 – You use an SSIS package to import data from Oracle into SQL Server and you tell the OLE DB Destination in the Data Flow Task to create new tables for you automatically
2 – You know the application logic and you know that certain fields will never hold a value greater than 1000. What you don’t know is that these fields have been stored in Oracle using the Number(15,0) data type and SSIS set them in the destination table as Numeric(15,0)
3 – You create a join between the two imported tables and start manipulating the data. As you run the update queries you notice that for certain logic using certain fields, updating a third table with about 20,000 records takes literally 3 seconds, while performing the same operation but using a different field takes close to 3 minutes. Obviously there is an enormous difference that at first look does not make any sense

As you start looking more deeply into the problem you notice that the query that runs very fast uses a column of the Integer data type, while the slow query uses a field set to Numeric(15,0).

Since you know that this is not the correct data type for the field you alter the table and change it to Integer. Now you run the two queries and they BOTH take 3 seconds.

Consider how this small (data type) lapse made a simple update operation run 60 times slower than it should have normally taken.

Lesson learned: “Trust but verify”. At least when performance matters, which is probably 99.99% of the time.

It sure is convenient to let SSIS create the table automatically for you. Having to manually create a table before bringing in the data could be very labor intensive, especially if the table has dozens of columns. Just go back and look at what SSIS did for you and make sure it is what you need.

Technologies used:
- Oracle 9 on Solaris 9
- SQL Server 2008 Standard on Windows Server 2008 Enterprise 64-bit
- SSIS 2005
- Microsoft Office (Excel) 2010 Professional

Powered by Netfirms