SQL SERVER – Introduction to Adaptive ETL Tool – How adaptive is your ETL?

I am often reminded by the fact that BI/data warehousing infrastructure is very brittle and not very adaptive to change. There are lots of basic use cases where data needs to be frequently loaded into SQL Server or another database. What I have found is that as long as the sources and targets stay the same, SSIS or any other ETL tool for that matter does a pretty good job handling these types of scenarios.

But what happens when you are faced with more challenging scenarios, where the data formats and possibly the data types of the source data are changing from customer to customer?  Let’s examine a real life situation where a health management company receives claims data from their customers in various source formats. Even though this company supplied all their customers with the same claims forms, they ended up building one-off ETL applications to process the claims for each customer.

Why, you ask? Well, it turned out that the claims data from various regional hospitals they needed to process had slightly different data formats, e.g. “integer” versus “string” data field definitions.  Moreover the data itself was represented with slight nuances, e.g. “0001124” or “1124” or “0000001124” to represent a particular account number, which forced them, as I eluded above, to build new ETL processes for each customer in order to overcome the inconsistencies in the various claims forms.  As a result, they experienced a lot of redundancy in these ETL processes and recognized quickly that their system would become more difficult to maintain over time.

So imagine for a moment that you could use an ETL tool that helps you abstract the data formats so that your ETL transformation process becomes more reusable. Imagine that one claims form represents a data item as a string – acc_no(varchar) – while a second claims form represents the same data item as an integer – account_no(integer). This would break your traditional ETL process as the data mappings are hard-wired.  But in a world of abstracted definitions, all you need to do is create parallel data mappings to a common data representation used within your ETL application; that is, map both external data fields to a common attribute whose name and type remain unchanged within the application.

acc_no(varchar) is mapped to account_number(integer)

SQL SERVER - Introduction to Adaptive ETL Tool - How adaptive is your ETL? express1
expressor Studio first claim form schema mapping

account_no(integer) is also mapped to account_number(integer)

SQL SERVER - Introduction to Adaptive ETL Tool - How adaptive is your ETL? express2
expressor Studio second claim form schema mapping

All the data processing logic that follows manipulates the data as an integer value named account_number.

Well, these are the kind of problems that that the expressor data integration solution automates for you.  I’ve been following them since last year and encourage you to check them out by downloading their free expressor Studio ETL software.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Business Intelligence, ETL, SSIS
Previous Post
SQL SERVER – WRITELOG – Wait Type – Day 17 of 28
Next Post
SQL SERVER – LOGBUFFER – Wait Type – Day 18 of 28

Related Posts

2 Comments. Leave new

  • Hi Pinal,
    Interesting topic, yes this is a scenario one faces while working with different types of data sources coming from diff vendors. The concept of having an adaptive ETL is very appealing, I wil check out the tool which you have mentioned.

    Reply
  • Michael Waclawiczek
    February 21, 2011 10:17 pm

    Hi Pinal,

    I’d like to make your readers aware of the fact that we will be holding a live webinar on the topic of “Semantic Types: making data mapping simpler & easier to maintain” on February 23, 2011.

    Readers can register at .

    Regards,
    Michael Walcawiczek
    VP Marketing, expressor

    Reply

Leave a Reply