SQL SERVER – Introduction to Adaptive ETL Tool – How adaptive is your ETL?

I am often reminded by the fact that BI/data warehousing infrastructure is very brittle and not very adaptive to change. There are lots of basic use cases where data needs to be frequently loaded into SQL Server or another database. What I have found is that as long as the sources and targets stay the same, SSIS or any other ETL tool for that matter does a pretty good job handling these types of scenarios.

But what happens when you are faced with more challenging scenarios, where the data formats and possibly the data types of the source data are changing from customer to customer?  Let’s examine a real life situation where a health management company receives claims data from their customers in various source formats. Even though this company supplied all their customers with the same claims forms, they ended up building one-off ETL applications to process the claims for each customer.

Why, you ask? Well, it turned out that the claims data from various regional hospitals they needed to process had slightly different data formats, e.g. “integer” versus “string” data field definitions.  Moreover the data itself was represented with slight nuances, e.g. “0001124” or “1124” or “0000001124” to represent a particular account number, which forced them, as I eluded above, to build new ETL processes for each customer in order to overcome the inconsistencies in the various claims forms.  As a result, they experienced a lot of redundancy in these ETL processes and recognized quickly that their system would become more difficult to maintain over time.

So imagine for a moment that you could use an ETL tool that helps you abstract the data formats so that your ETL transformation process becomes more reusable. Imagine that one claims form represents a data item as a string – acc_no(varchar) – while a second claims form represents the same data item as an integer – account_no(integer). This would break your traditional ETL process as the data mappings are hard-wired.  But in a world of abstracted definitions, all you need to do is create parallel data mappings to a common data representation used within your ETL application; that is, map both external data fields to a common attribute whose name and type remain unchanged within the application.

acc_no(varchar) is mapped to account_number(integer)

expressor Studio first claim form schema mapping

expressor Studio first claim form schema mapping

account_no(integer) is also mapped to account_number(integer)

expressor Studio second claim form schema mapping

expressor Studio second claim form schema mapping

All the data processing logic that follows manipulates the data as an integer value named account_number.

Well, these are the kind of problems that that the expressor data integration solution automates for you.  I’ve been following them since last year and encourage you to check them out by downloading their free expressor Studio ETL software.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

SQL SERVER – Automated Type Conversion using Expressor Studio

Recently I had an interesting situation during my consultation project. Let me share to you how I solved the problem using Expressor Studio.

Consider a situation in which you need to read a field, such as customer_identifier, from a text file and pass that field into a database table. In the source file’s metadata structure, customer_identifier is described as a string; however, in the target database table, customer_identifier is described as an integer. Legitimately, all the source values for customer_identifier are valid numbers, such as “109380”.

To implement this in an ETL application, you probably would have hard-coded a type conversion function call, such as:

output.customer_identifier=stringToInteger(input.customer_identifier)

That wasn’t so bad, was it? For this instance, programming this hard-coded type conversion function call was relatively easy.

However, hard-coding, whether type conversion code or other business rule code, almost always means that the application containing hard-coded fields, function calls, and values is: a) specific to an instance of use; b) is difficult to adapt to new situations; and c) doesn’t contain many reusable sub-parts. Therefore, in the long run, applications with hard-coded type conversion function calls don’t scale well. In addition, they increase the overall level of effort and degree of difficulty to write and maintain the ETL applications.

To get around the trappings of hard-coding type conversion function calls, developers need an access to smarter typing systems. Expressor Studio product offers this feature exactly, by providing developers with a type conversion automation engine based on type abstraction.

The theory behind the engine is quite simple. A user specifies abstract data fields in the engine, and then writes applications against the abstractions (whereas in most ETL software, developers develop applications against the physical model). When a Studio-built application is run, Studio’s engine automatically converts the source type to the abstracted data field’s type and converts the abstracted data field’s type to the target type. The engine can do this because it has a couple of built-in rules for type conversions.

So, using the example above, a developer could specify customer_identifier as an abstract data field with a type of integer when using Expressor Studio. Upon reading the string value from the text file, Studio’s type conversion engine automatically converts the source field from the type specified in the source’s metadata structure to the abstract field’s type. At the time of writing the data value to the target database, the engine doesn’t have any work to do because the abstract data type and the target data type are just the same. Had they been different, the engine would have automatically provided the conversion.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Interesting Whitepaper – We Loaded 1TB in 30 Minutes with SSIS, and So Can You

We Loaded 1TB in 30 Minutes with SSIS, and So Can You
SQL Server Technical Article
Writers: Len Wyatt, Tim Shea, David Powell
Published: March 2009

In February 2008, Microsoft announced a record-breaking data load using Microsoft SQL Server Integration Services (SSIS): 1 TB of data in less than 30 minutes. That data load, using SQL Server Integration Services, was 30% faster than the previous best time using a commercial ETL tool. This paper outlines what it took: the software, hardware, and configuration used. We will describe what we did to achieve that result, and offer suggestions for how to relate these techniques to typical scenarios. Even for customers who don’t have needs quite like this benchmark, such efforts can teach a lot about getting optimal performance.

Read the white paper here.

Abstract courtesy Microsoft

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Best Practices for Integration Services Configurations

Best Practices for Integration Services Configurations
by Jamie Thomson

This article explains what SQL Server Integration Services configurations are used for, why you should use Integration Services configurations, and what options you have for leveraging configurations. It will also make some simple recommendations that are based on my experiences of building Integration Services packages in a real-world environment. An understanding of the terms “package”, “Business Intelligence Development Studio”, and “dtexec.exe” in the context of Integration Services is assumed.

There five basic types of Integration Services configurations.

  • XML Configuration File
  • Environment Variable Configuration
  • Parent Package Configuration
  • Registry Configuration
  • SQL Server Configuration

Read Best Practices for Integration Services Configurations

Abstract courtesy : Microsoft

Reference: Pinal Dave (http://blog.sqlauthority.com)