SQL SERVER – SSIS Execution Control Using Precedence Constraints – Notes from the Field #021

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Lots of people start with lots of enthusiasm, but when it is about building a control flow, which addresses all the necessary elements of the package execution they face issues with the order of the tasks. This is a very critical subject and it requires some expert advice.

Linchpin People are database coaches and wellness experts for a data driven world. In this 21th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to control the flow of the program using SSIS package execution.


In this post, I’m going to review the essentials of precedence constraints in SQL Server Integration Services, and will demonstrate how these can be used to precisely control the flow of the program during the SSIS package execution.

In a control flow for SQL Server Integration Services, the “lines” we commonly use to connect tasks to each other are actually smart controls.  These precedence constraints allow the ETL developer to customize the program flow from one task to another.  The most common use of precedence constraints is to simply connect two tasks with the default constraint, which is the success constraint.  In this usage, the downstream task will be executed if and only if the execution of the upstream task is successful.  As shown in the example below, the task DFT Write data to output table will be executed if the preceding task named SQL Truncate output table is successfully executed.  Similarly, the final task in the chain, named SQL Log row count to audit tbl, will be executed only if the preceding data flow task was successful.

The green lines connecting these tasks are the precedence constraints.  In the above example, we know that these are success precedence constraints by their color.

Using precedence constraints for success operations is not the only option we have.  Because these are smart controls, we can configure them in a variety of ways for maximum flexibility.  An example of this would be use a failure precedence constraint to code for the possibility of failure.  In this scenario, we could create one or more tasks that would be executed in the event of a failure within a package.  As shown below, we have a snippet of a package showing a failure path from the data flow by way of a failure precedence constraint.

In this example, we connect a precedence constraint from the data flow task to the task named SQL Remove data from partial load, but instead of using the default setting, we reconfigure this as a failure constraint (shown with the “Failure” label and red color) to redirect the execution flow to that task in the event that the data flow execution fails.  Downstream from there, we have yet another precedence constraint configured as a completion constraint (indicated by the label and the black color).  This constraint indicates that the task SQL Log failure event to event log will be executed whenever the upstream task is finished executing, irrespective of whether the result of that execution was success or failure.

To configure a precedence constraint to a setting other than the default value of success, simply double-click on the precedence constraint and set the appropriate setting, the options for which are shown below.

As shown, we can choose Success (the default setting), Failure, or Completion.

Note that there are a few other options as well.  Above the selection for the type of constraint, you’ll see that we can also choose the evaluation operation.  The default setting is Constraint, which means that the precedence constraint uses only the success/failure/completion results as described above.  However, we also have some other options.  We can use an SSIS expression along with, or perhaps instead of, the execution status evaluation to determine how to control the execution flow.

Additionally, we have the ability to control the behavior when multiple constraints are used.  By default, if multiple constraints are connected to a single downstream task, then all of those constraints must evaluate true for said downstream task to execute.  However, we can override that default behavior by setting this value for multiple constraints to Logical OR, indicating that the downstream task will be executed if any of the connected constraints evaluates true.

As shown in the example below, we have modified the original example, changing both the evaluation operation as well as the multiple constraint behavior.  On the constraint between the data flow task and the data deletion task, we add an SSIS expression to check the row count check to confirm whether any rows were loaded as part of the failed execution (since we wouldn’t need to run the delete operation if no rows were loaded).  Further, we’ve added a completion constraint between the data flow task and the task named SQL Log failure event to event log and set the multiple constraint behavior to Logical OR to make sure we log the failure whether or not the delete operation occurred.

We get visual reminders of these settings as well.  The function indicator (which looks like an italicized fx) appears on the precedence constraint on which we are using the expression.  For the Logical OR setting, the constraints will appear as dashed lines rather than solid lines.

Precedence constraints are much more than just lines drawn between two tasks.  In fact, these are powerful and flexible decision tools that allow the ETL developer a great deal of control over the flow of the SSIS execution.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – SSIS Data Flow Troubleshooting – Part1 – Notes from the Field #019

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – How to troubleshoot SSIS data flow? It is such a complicated problem that everyone thinks they know, but not everyone can resolve this issue. Any Leonard is a world renowned expert, explains in this blog post how to troubleshoot SSIS data flow.


SQL Server Integration Services (SSIS) is designed to move data. Several SSIS tasks can move data but the Data Flow Task is arguably used most to accomplish moving data. When bad things happen to good developers, it helps to know how to troubleshoot. Writing about all the ways to troubleshoot an SSIS Data Flow Task will take more than one post. So this is Part 1.

Building a Quick Demo Package

If you are interested in working through some demos with me, create a new SSIS project named DataFlowTests. Add a Data Flow Task to the Control Flow as shown in Figure 1:

You can use any source data you desire, but I am going to use a file containing weather data captured from my weather station here in Farmville, Virginia beginning in December 2008. You can obtain this data here. Click the Data Flow tab to edit the Data Flow Task. Add a Flat File Connection Manager configured to consume the sensor1-all.csv file containing the December 2008 weather data. Add a Flat File Source adapter and link it to the Flat File Connection Manager. You Data Flow should appear as shown in Figure 2:

To test the Flat File Source adapter, we need to connect an output to another component. Add a Union All transformation and connect the output of the Flat File Source to it, as shown in Figure 3:

You can now test-execute the SSIS package. You should see results similar to those shown in Figure 4:

This data flow isn’t doing very much. The Flat File Source adapter is coupled to the Flat File Connection Manager. The Flat File Connection Manager reads data from the sensor1.all.csv file you downloaded from andyweather.com. Weather data is read into the data flow from the file via the Flat File Connection Manager and the Flat File Source adapter. The 106 rows of weather data are then passed into the Data Flow Path connecting the Flat File Source adapter output to the first Union All Input (Union All Input 1).

Insert a Warning

Let’s create an issue that will raise a warning. Open the Flat File Connection Manager Editor, click on the Advanced page, select the Min T column, and change its DataType property to “four-byte signed integer [DT_I4] as shown in Figure 5:

When you close the Flat File Connection Manager Editor, a warning icon displays on the Flat File Source adapter. If you hover over the Flat File Source adapter with your mouse, a tooltip will display some of the warning text as shown in Figure 6:

You can see more of the Warning if you execute the package and the view the Progress (if the package is running the SSIS Debugger) or Execution Results tab (if the package has been executed in the Debugger and then the Debugger has been stopped), as shown in Figure 7:

The warning we injected by changing the DataType property of the Min T column in the Flat File Connection Manager does not cause an error. The warning stems from the fact that there is a difference between the data type of the Min T column flowing into the Flat File Source adapter from the Flat File Connection Manager and the data type of the Min T column flowing out of the Flat File Source adapter. You can correct this warning by opening the Flat File Source adapter and clicking the Columns page. When opening the Editor (or Advanced Editor) of the Flat File Source adapter, you will be prompted as shown in Figure 8:

Click the “Yes” button to synchronize the Min T column’s data type in the Flat File Source adapter’s output (the metadata of the “output column”) with the Min T column’s data type supplied from the Flat File Connection Manager (the metadata of the “external column”).

Insert an Error

One way to generate an error in our simple data flow task is to configure the Flat File Connection Manager to perform an impossible data type coercion. Every value in the Min HI column contains “—“. Let’s configure the Min HI column in the Flat File Connection Manager and the Flat File Source adapter as data type Integer.

First, open the Flat File Connection Manager Editor and change the Min HI DataType property to DT_I4 as shown in Figure 9:

When you click the “OK” button, we see the same warning displayed in Figure 6 – the external columns are out of synchronization with the data source columns. Right-click the Flat File Source adapter and click Show Advanced Editor as shown in Figure 10:

When you attempt to open the Advanced Editor, you will be prompted to fix the data type mismatch between the output columns and the external columns. Click the “No” button to proceed, as shown in Figure 11:

When the Advanced Editor for the Flat File Source adapter opens, click the Input and Output Properties tab. Expand the Flat File Source Output node, and then expand the Output Columns node. Click on the Min HI column and change the DataType property to “four-byte signed integer [DT_I4] as shown in Figure 12:

Click the “OK” button to close the Advanced Editor. Note the Union All now displays a validation error as shown in Figure 13:

What has happened? The tooltip reveals an issue with a metadata mismatch, but the error is truncated. We can view the full error message in the Error List by clicking ViewàError List as shown in Figure 14:

Note: The Error List option is missing from the View menu in some versions of SSDT-BI. In those environments, use the keystrokes. Hold down Ctrl and press \ followed by E.

The Error List window displays the full text of errors, warnings, and other informational messages, and is shown in Figure 15:

There are a couple ways to fix this error. The easy way is to delete the Union All and add a new Union All transformation in its place (the hard way is to edit the package’s XML).

Conclusion

We are going to stop here. We have introduced an error in the Flat File Source Adapter, though. And we will begin there in Part 2.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – List of Article on Expressor Data Integration Platform

The ability to transform data into meaningful and actionable information is the most important information in current business world. In this fast growing and changing business needs effective data integration is single most important thing in making proper decision making. I have been following expressor software since November 2010, when I met expressor team in Seattle. Here are my posts on their innovative data integration platform and expressor Studio, a free desktop ETL tool:

4 Tips for ETL Software IDE Developers

Introduction to Adaptive ETL Tool – How adaptive is your ETL?

Sharing your ETL Resources Across Applications with Ease

expressor Studio Includes Powerful Scripting Capabilities

expressor 3.2 Release Review

5 Tips for Improving Your Data with expressor Studio

As I had mentioned in some of my blog posts on them, I encourage you to download and test-drive their Studio product – it’s free.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Import CSV File into Database Table Using SSIS

It is very frequent request to upload CSV file to database or Import CSV file into database. I have previously written article how one can do this using T-SQL over here SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server.

One of the request which I quite often encounter is how I can do the same using SSIS package and automate it. Today we will understand the same using images.

First of all, let us create the table where we want to insert the CSV file.

Here is the sample table which I am creating where I will import my CSV.

CREATE TABLE [EmployeeImported](
[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
[ContactID] [int] NOT NULL,
[ManagerID] [int] NULL,
[Title] [varchar](100) NOT NULL,
[MaritalStatus] [char](1) NOT NULL,
[Gender] [char](1) NOT NULL,
[HireDate] [datetime] NOT NULL
)
ON [PRIMARY]
GO

Now let us open SQL Server Business Intelligence studio. (Path: Start >All Programs > Microsoft Sql Server 2008 R2 > SQL Server Business Intelligence Development Studio. Select Integration Services Project. Create a new project and save it.

Click on Control Flow and drag Data Flow Task to the right side pan.

Double click on the Control Flow task.

It will take you to Data Flow pan.

Drag Flat File Source from Toolbox to Data Flow task pan.

Configure New Connection by clicking New.

Select the Path of the file and specify Text Qualifier. For me the text qualifier is comma(,).

Click on Columns and adjust OutputColumnWidth – match it with width of your original data. If you do not know leave it as default (at 50).

Click on on following screen.

Now Select OLE DB Destination from right side Toolbox and drag to below the Flat File Source.

Put them near to each other ad demonstrated below.

Connect Green Arrow of Flat File Source with OLE DB Destination.

Double click on OLE DB Destination and connect to the database and table created earlier in the code.

After configuring connection the mapping needs to be adjusted as well.

Now on mappings tab connect both the size. I have not connected very first column as it is identify column for me.

Clicking OK will bring me to following screen.

Now click on F5 and it will execute the package in debug mode.

Do let me know your opinion about this article.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Introduction to Adaptive ETL Tool – How adaptive is your ETL?

I am often reminded by the fact that BI/data warehousing infrastructure is very brittle and not very adaptive to change. There are lots of basic use cases where data needs to be frequently loaded into SQL Server or another database. What I have found is that as long as the sources and targets stay the same, SSIS or any other ETL tool for that matter does a pretty good job handling these types of scenarios.

But what happens when you are faced with more challenging scenarios, where the data formats and possibly the data types of the source data are changing from customer to customer?  Let’s examine a real life situation where a health management company receives claims data from their customers in various source formats. Even though this company supplied all their customers with the same claims forms, they ended up building one-off ETL applications to process the claims for each customer.

Why, you ask? Well, it turned out that the claims data from various regional hospitals they needed to process had slightly different data formats, e.g. “integer” versus “string” data field definitions.  Moreover the data itself was represented with slight nuances, e.g. “0001124” or “1124” or “0000001124” to represent a particular account number, which forced them, as I eluded above, to build new ETL processes for each customer in order to overcome the inconsistencies in the various claims forms.  As a result, they experienced a lot of redundancy in these ETL processes and recognized quickly that their system would become more difficult to maintain over time.

So imagine for a moment that you could use an ETL tool that helps you abstract the data formats so that your ETL transformation process becomes more reusable. Imagine that one claims form represents a data item as a string – acc_no(varchar) – while a second claims form represents the same data item as an integer – account_no(integer). This would break your traditional ETL process as the data mappings are hard-wired.  But in a world of abstracted definitions, all you need to do is create parallel data mappings to a common data representation used within your ETL application; that is, map both external data fields to a common attribute whose name and type remain unchanged within the application.

acc_no(varchar) is mapped to account_number(integer)

expressor Studio first claim form schema mapping

expressor Studio first claim form schema mapping

account_no(integer) is also mapped to account_number(integer)

expressor Studio second claim form schema mapping

expressor Studio second claim form schema mapping

All the data processing logic that follows manipulates the data as an integer value named account_number.

Well, these are the kind of problems that that the expressor data integration solution automates for you.  I’ve been following them since last year and encourage you to check them out by downloading their free expressor Studio ETL software.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Automated Type Conversion using Expressor Studio

Recently I had an interesting situation during my consultation project. Let me share to you how I solved the problem using Expressor Studio.

Consider a situation in which you need to read a field, such as customer_identifier, from a text file and pass that field into a database table. In the source file’s metadata structure, customer_identifier is described as a string; however, in the target database table, customer_identifier is described as an integer. Legitimately, all the source values for customer_identifier are valid numbers, such as “109380”.

To implement this in an ETL application, you probably would have hard-coded a type conversion function call, such as:

output.customer_identifier=stringToInteger(input.customer_identifier)

That wasn’t so bad, was it? For this instance, programming this hard-coded type conversion function call was relatively easy.

However, hard-coding, whether type conversion code or other business rule code, almost always means that the application containing hard-coded fields, function calls, and values is: a) specific to an instance of use; b) is difficult to adapt to new situations; and c) doesn’t contain many reusable sub-parts. Therefore, in the long run, applications with hard-coded type conversion function calls don’t scale well. In addition, they increase the overall level of effort and degree of difficulty to write and maintain the ETL applications.

To get around the trappings of hard-coding type conversion function calls, developers need an access to smarter typing systems. Expressor Studio product offers this feature exactly, by providing developers with a type conversion automation engine based on type abstraction.

The theory behind the engine is quite simple. A user specifies abstract data fields in the engine, and then writes applications against the abstractions (whereas in most ETL software, developers develop applications against the physical model). When a Studio-built application is run, Studio’s engine automatically converts the source type to the abstracted data field’s type and converts the abstracted data field’s type to the target type. The engine can do this because it has a couple of built-in rules for type conversions.

So, using the example above, a developer could specify customer_identifier as an abstract data field with a type of integer when using Expressor Studio. Upon reading the string value from the text file, Studio’s type conversion engine automatically converts the source field from the type specified in the source’s metadata structure to the abstract field’s type. At the time of writing the data value to the target database, the engine doesn’t have any work to do because the abstract data type and the target data type are just the same. Had they been different, the engine would have automatically provided the conversion.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Interesting Whitepaper – We Loaded 1TB in 30 Minutes with SSIS, and So Can You

We Loaded 1TB in 30 Minutes with SSIS, and So Can You
SQL Server Technical Article
Writers: Len Wyatt, Tim Shea, David Powell
Published: March 2009

In February 2008, Microsoft announced a record-breaking data load using Microsoft SQL Server Integration Services (SSIS): 1 TB of data in less than 30 minutes. That data load, using SQL Server Integration Services, was 30% faster than the previous best time using a commercial ETL tool. This paper outlines what it took: the software, hardware, and configuration used. We will describe what we did to achieve that result, and offer suggestions for how to relate these techniques to typical scenarios. Even for customers who don’t have needs quite like this benchmark, such efforts can teach a lot about getting optimal performance.

Read the white paper here.

Abstract courtesy Microsoft

Reference : Pinal Dave (http://blog.SQLAuthority.com)