SQL SERVER – The Basics of the Execute Package Task – Notes from the Field #067

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications.

andyleonard SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – What are the Basics of the Execute Package Task and where do we start with it? Andy was very kind to answer the questions and provides plenty of information about how a novice developer can learn SSIS from the beginning and become expert in the technology.


Learning how to configure and use the SQL Server Integration Services (SSIS) Execute Package Task gives you a great starting point for understanding SSIS package orchestration. I advocate writing small, functional SSIS packages that perform a unit of work. That’s a great idea for many reasons. But it begs the question: “How do I string together the execution of these packages?” Remember: SSIS is a software development platform. With “SQL Server” included in the name, it is easy for people to confuse SSIS as a database tool or accessory, but Control Flow Tasks put that confusion to rest.

SSIS provides several Control Flow tasks. Here is a list that provides a good approximation of which tasks I use most, from most-used to least-used:

In this article I provide a basic example of configuring the SSIS Execute Package Task, shown in Figure 1:

notd 66 1 SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067
Figure 1: SSIS Execute Package Task

The Execute Package Task provides one way to implement an SSIS Design Pattern for SSIS package execution known as the Parent-Child pattern. When an SSIS package uses the Execute Package Task to start another SSIS package, the package with the Execute Package Task is called the Parent and the package started by the Execute Package Task is called the Child.

The Execute Package Task changed between SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services. The changes support the SSIS 2012 (and 2014) Catalog. There is a new property called ReferenceType which defaults to “Project Reference” in packages executing in Project Deployment Mode (the default mode for building SSIS 2012 and SSIS 2014 SSIS packages), as shown in Figure 2:

notd 66 2 SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067
Figure 2: Options for the Execute Package Task ReferenceType Property

Project Reference is used to execute an SSIS package – a Child package – in the same SSIS project with the package that contains the Execute Package Task – the Parent package. When Project Reference is selected, the next property in the property grid is PackageNameFromProjectReference, a dropdown containing a list of all the SSIS package in the SSIS project.

Setting the ReferenceType property to External Reference is a way to execute SSIS packages that are stored in the file system or the msdb database. When External Reference is selected the next properties in the property grid change to reflect this backwards-compatible functionality, as shown in Figure 3:

notd 66 3 SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067
Figure 3: Setting the ReferenceType Property to External Reference

The Location and Connection properties are used to specify an OLE DB Connection (to the msdb database) or a File Connection (to the location of the dtsx file). SQL Server locations also require the name of the SSIS package; the File System option does not because the SSIS package is the only thing in the file. The External Reference ReferenceType setting is useful for importing SSIS solutions that contain earlier versions of the Execute Package Task because External Reference behaves exactly like previous versions of the Execute Package Task.

The remaining properties in the property grid – Password and ExecuteOutOfProcess – are shared between ReferenceType options. Password is used if the SSIS package is password-protected (if the ProtectionLevel property of the SSIS package is set to either EncryptSensitiveWithPassword or EncryptAllWithPassword). ExecuteOutOfProcess is a setting that determines whether the package will be executed as part of the current process or a new process.

When executing SSIS packages in Project Deployment Mode (using the Project Reference ReferenceType), Parameter Bindings are enabled and allow values to be passed from the Parent package to the Child package, as shown in Figure 4:

notd 66 4 SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067
Figure 4: Binding a Parent Package Parameter to a Child Package Parameter

Parameter Bindings are disabled when the ReferenceType property is set to External Reference.

Once configuration is complete, click the OK button to close the Execute Package Task Editor. You can test execution by pressing the F5 key or selecting “Start Debugging” from the SSIS dropdown menu. A successfully-executed Execute Package Task will appear as shown in Figure 5:

notd 66 5 SQL SERVER   The Basics of the Execute Package Task   Notes from the Field #067
Figure 5: A Successful Execution!

The Execute Package Task drives a powerful data integration architecture pattern: Parent-Child execution. Using SSIS Precedent Constraints and Sequence Containers with the Execute Package Task, a data integration developer can develop SSIS “driver” packages that call Child SSIS packages in any combination of parallel and serial orders of execution.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the SSIS Script Task – Notes from the Field #065

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications.

andyleonard SQL SERVER   The Basics of the SSIS Script Task   Notes from the Field #065

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – What are the basics of the SSIS script task and where do we start with it? Andy was very kind to answer the questions and provides plenty of information about how a novice developer can learn SSIS from the beginning and become expert in the technology.


Learning how to configure and use the SQL Server Integration Services (SSIS) Script Task gives you a great starting point for learning how to use SSIS Control Flow tasks. You’ll find that SQL Server Control Flow tasks provide powerful development capabilities that may surprise you. Let there be no mistake, SSIS is a software development platform. With “SQL Server” included in the name, it is easy for people to confuse SSIS as a database tool or accessory, but Control Flow Tasks put that confusion to rest.

SSIS provides several Control Flow tasks. Here is a list that provides a good approximation of which tasks I use most, from most-used to least-used:

This article will focus on a basic example of configuring the SSIS Script Task, shown in Figure 1. People often think the SSIS Script Task is just for generating log messages, but let’s look at some of its more powerful capabilities.

ntf 65 1 SQL SERVER   The Basics of the SSIS Script Task   Notes from the Field #065

Figure 1: SSIS Script Task

The Script Task supports Microsoft Visual C# and Microsoft Visual Basic languages. You can choose which language you will use prior to clicking the Edit Script button. Once you’ve clicked the Edit Script button, you cannot change the Script Language property.

At the heart of the Script Task is the ability to read a variable value from the SSIS package into the script and then write a message out of the Script Task. Because the Script Task can interact with SSIS variables, you can use .Net code to manipulate and respond to variable values.

For the purpose of our demonstration, we will add code to the Script Task to create a message for SSIS logging.

Messaging is a key component of enterprise architecture. Messages are generated (published, raised, produced, etc.) by Senders and received (subscribed, consumed, requested, etc.) by Listeners. Messaging is a complex topic. Entire books are dedicated to the topic.

Add variables to the ReadOnlyVariables and ReadWriteVariables properties by clicking the ellipsis inside the value textbox for each property. When you click the ellipsis, a Select Variables dialog appears, as you see in Figure 2.

ntf 65 2 SQL SERVER   The Basics of the SSIS Script Task   Notes from the Field #065

Figure 2: Adding Read Only Variables to the Script Task

SSIS variables use a two-part naming convention: <Namespace>::<VariableName>. I added the SSIS variables System::TaskName and System::PackageName. The selected variables are in the System namespace and contain the name of the task and package, respectively.

Click the Edit Script button to open the script editing window as shown in Figure 3:

ntf 65 3 SQL SERVER   The Basics of the SSIS Script Task   Notes from the Field #065

Figure 3: The Script Task Script Editor

Add the following code beneath the instruction to Add your code here:

Dim sPackageName As String = _ Dts.Variables("System::PackageName").Value.ToString
Dim sMsg As String = "I am " & sPackageName
Dts.Events.Fireinformation(1001, sPackageName, sMsg, "", 0, True)

Listing 1: VB.Net Code

The first line of the code in Listing 1 creates a VB.Net variable named sPackageName and initializes the value of this variable, setting it to the value of the SSIS Variable named System::PackageName. This is one way to read SSIS Variable values into a Script Task’s script.

The second line of the code in Listing 1 creates a VB.Net variable named sMsg and initializes the value of this variable with the text, “I am” plus the value contained in the sPackageName VB.Net variable.

The third line of the code shown in Listing 1 calls the FireInformation method in the Dts.Events assembly. FireInformation accepts six arguments:

  1. InformationCode – an integer you can use to identify custom messages sent from the SSIS package.
  2. SubComponent – a string you can use to populate the source of the message.
  3. Description – a string containing the message you wish to send.
  4. HelpFile – a string containing the location of a Help file (should a help file exist).
  5. HelpContext – an integer pointing to the help context (should a help context exist).
  6. FireAgain – a Boolean value indicating whether the event should subsequently fire.

My VB code includes a reference to the FireAgain argument. Note: This MSDN post states: “Because firing of an event may be expensive, the run-time engine provides a mechanism for suppressing events that you are not interested in. Every event firing method has a FireAgain parameter. If the value of this variable is false, after the method returns, the caller will not fire this event again for the duration of the current execution.” I understand this to mean setting FireAgain to False would suppress future FireInformation events in my Script Task, but testing indicates this is not correct. This is important for you to know because you may opt to use the FireAgain argument to manage when you raise events. I do not use the FireAgain argument for this purpose. I typically manage the decision to raise events in conditional logic surrounding the Event-raising code.

A test execution of the package reveals an Information message recorded on the Progress / Execution Results tab as shown in Figure 4.

ntf 65 4 SQL SERVER   The Basics of the SSIS Script Task   Notes from the Field #065

Figure 4: the Information Message on the Progress / Execution Results Tab

Information messages generated from Script Tasks will appear in SSIS logs and the SSIS Catalog. With this information, you can monitor the progress of SSIS execution while the package is running, or view the historical status of variable values or package messages after the package execution has completed.

The SSIS Script Task can accomplish much more than generating log messages. This brief introduction and basic example have demonstrated how to get started configuring and using the SSIS Script Task. As you can see, SSIS Script Tasks give you development capabilities you may not have been aware of.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Creating the SSIS Catalog – Notes from the Field #058

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Creating the SSIS catalog.

Tim Mitchell SQL SERVER   Creating the SSIS Catalog   Notes from the Field #058Linchpin People are database coaches and wellness experts for a data driven world. In this 58th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to Creating the SSIS Catalog.


With the release of SQL Server 2012, Integration Services (SSIS) received a significant overhaul.  The most notable change was the addition of the SSIS catalog, a new deployment and storage mechanism that helps to centralize and standardize the storage, execution, and logging of SSIS packages.  In this post, I will briefly illustrate how to set up the SSIS catalog on a new installation of SQL Server 2012 or 2014.

The SSIS Catalog

When connecting to an instance of SQL Server using SSMS 2012 or 2014, you’ll notice that there is a new node in the tree, entitled Integration Services Catalogs.  By default, this node will be empty as shown below.

58 1 SQL SERVER   Creating the SSIS Catalog   Notes from the Field #058

Before we can begin deploying SSIS packages to the catalog on this instance, we need to create the catalog.  It is worth noting here that, although the plural node name of Integration Services Catalogs implies that we may have more than one catalog, we can only create one SSIS catalog per SQL Server instance.

Creating the SSIS Catalog

To create the SSIS catalog, simply right click the Integration Services Catalogs node and choose Create Catalog.

58 2 SQL SERVER   Creating the SSIS Catalog   Notes from the Field #058

Next, you’ll be presented with the Create Catalog dialog, which allows you to set the options for creating the catalog.  There are really only two things that you need to configure here:

  • Enable automatic execution of Integration Services stored procedure at SQL Server startup. This option will enable the automatic execution of a cleanup stored procedure when SQL Server starts.  This stored procedure will check for and update any packages whose status in the catalog was left in limbo due to an unexpected shutdown of SQL Server.  I don’t know of any downsides to enabling this, so I recommend that you leave this box checked.
  • Encryption password. Because you can store sensitive information (such as database or FTP passwords) in the SSIS catalog, this database requires an encryption password to properly secure those assets.

58 3 SQL SERVER   Creating the SSIS Catalog   Notes from the Field #058

You’ll also notice that the name of the catalog database (SSISDB) is presented in this window, but it is not editable.  In the current versions (2012/2014) of SSIS, the name of the catalog database must be set to SSISDB.  Also, the SSIS catalog requires enabling CLR integration, so this option is shown but is also not editable.

Once you’ve configured the encryption password and selected the option of whether to allow the cleanup stored procedure to run, you’re ready to create the catalog.  Simply click the OK button in this window to create the catalog.  After so doing, you can browse to the Object Explorer in SSMS and see that we now have the SSISDB catalog listed under the Integration Services Catalogs node.  The catalog is now ready for use!

58 4 SQL SERVER   Creating the SSIS Catalog   Notes from the Field #058

One final note on the SSIS catalog: If you browse the list of databases after creating the catalog, you’ll notice that there is a new user database named SSISDB in the list of databases.  The SSIS catalog is contained in its own database on the SQL Server instance on which it is installed.  Unlike prior versions of SSIS which stored the SSIS assets in MSDB, the SSIS catalog uses a dedicated user database to better isolate the SSIS storage.

Conclusion

With a new installation of SQL Server 2012 or 2014, the SSIS catalog is not installed by default.  This brief post demonstrated how to easily and quickly set up the SSIS catalog on a new installation of SQL Server.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the SSIS Data Flow Task – Notes from the Field #057

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. As wikipedia says – It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

andyleonard SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – How to learn SSIS data flow task? Andy was very kind to answer the questions and provides plenty of information about how a novice developer can learn SSIS from the beginning and become expert in the technology.


If you know SQL Server, you’re likely aware of SQL Server Integration Services (SSIS). What you might not realize is that SSIS is a development platform that allows you to create and perform some interesting Control Flow Tasks. In the first blog post in this series, I showed how to use the Execute SQL Task. Now, let’s look at the Data Flow Task. When developing solutions with SSIS, I use a handful of Control Flow tasks:

  • Execute SQL Task
  • Data Flow Task
  • Script Task
  • Execute Package Task
  • File System Task
  • Execute Process Task

This list is a good approximation of which tasks I use most, too – from most-used to least-used. In this article I provide a basic example of configuring the SSIS Data Flow Task, shown in Figure 1:

n57 1 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 1: SSIS Data Flow Task

The SSIS Data Flow Task is a very special task. It is the only task to have its own tab in the Integrated Development Environment (IDE) as shown in Figure 2:

n57 2 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 2: The Data Flow Tab

If you click on the tab, you will note a new SSIS Toolbox containing Data Flow-specific components, as shown in Figure 3:

n57 3 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 3: Data Flow SSIS Toolbox

SSIS Data Flows are typically used to move data from one location to another. The data flow accomplishes data movement by first reading data into Data Flow Buffers. Think of a buffer as a region of memory SSIS uses to hold data rows as the rows are processed by the data flow. In Figure 4, I have configured an OLE DB Source Adapter to pump data rows into the data flow:

n57 4 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 4: Configuring an OLE DB Source Adapter

The data is often transformed while being moved from one location to another. The SSIS data flow components that perform transforming operations are called Transformations, and they are joined to other data flow components by Data Flow Paths. An example of a transformation is the Derived Column Transformation, as shown in Figure 5:

n57 5 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 5: Adding a Derived Column Transformation and a Data Flow Path

You can use transformations to perform many operations (e.g., you can manipulate values of columns in rows, you can remove or redirect rows based on column values, etc.) on the data as it flows through the data flow task. For example, the Derived Column Transformation permits you to manipulate (transform) existing data or to combine existing data to create new columns, as shown in Figure 6:

n57 6 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 6: Creating a New Column with the Derived Column Transformation

I created a new column named “UpperCaseName” in the Derived Column Transformation. I used SSIS Expression Language to define the transform – “UPPER([Name])” in this case.

Note: SSIS Expression Language is very powerful and very difficult to learn. For more information, please see Linchpin People’s SSIS Expression Language series.

Now I need to land these rows into a table. I’ll use an OLE DB Destination Adapter – connected from the Derived Column Transformation via data flow path – to accomplish loading our transformed rows into a table, as shown in Figure 7:

n57 7 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 7: Configuring an OLE DB Destination Adapter

Once the OLE DB Destination Adapter is configured, you can execute either the package or the Data Flow Task as shown in Figure 8:

n57 8 SQL SERVER   The Basics of the SSIS Data Flow Task   Notes from the Field #057
Figure 8: Test Execution Successful!

In this article, I shared an introduction to the SSIS Data Flow Task and some of its functionality. Although I barely scratched the surface of Data Flow capabilities, you should now be able to compose and test your first SSIS Data Flow!

:{>

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the SSIS Execute SQL Task – Notes from the Field #044

[Note from Pinal]: This is a new episode of Notes from the Field series. SSIS is an interesting concept. There are more hidden features with SSIS than any other product. Additionally, there are many visible features which are not known to most of the people. Earlier I mention to SQL developer that they can execute SQL queries from SSIS they looked at me like they have seen the ghost. They always thought that they can only do the task in SSIS which are available to them in the tool box. Well, I think when I will get the similar question next time, I am going to forward them a link to this blog post. Andy has explained in very simple words how one can execute SQL Task in SSIS.

andyleonard SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044


With “SQL Server” included in the name of SQL Server Integration Services (SSIS), it is easy for people to assume SSIS is simply a database tool or accessory. Let there be no mistake, SSIS is a software development platform. To give you a better understanding of how to use SSIS as a development platform, I’ll write a series of blogs that step through how to use Control Flow tasks. This first article will cover the Execute SQL Task.

When developing solutions with SSIS, I use a handful of Control Flow tasks:

  • Execute SQL Task
  • Data Flow Task
  • Script Task
  • Execute Package Task
  • File System Task
  • Execute Process Task

This list is a good approximation of which tasks I use most, too – from most-used to least-used. In this article I provide a basic example of configuring the SSIS Execute SQL Task, shown in Figure 1:

ssise1 SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044
Figure 1: SSIS Execute SQL Task

Three things are required to configure an Execute SQL Task:

  1. Connection Type
  2. Connection Manager
  3. SQL Statement

Connection Type

The default Connection Type is OLE DB, as shown in Figure 2:

ssise2 SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044
Figure 2: Available Connection Type Property values

I configure Execute SQL Tasks to execute SQL statements like truncate a table, update or insert a single row of data, or call a stored procedure. I typically use OLE DB or ADO.NET connection types; but I occasionally use ODBC connection types. (When executing parameterized statements I find ADO.NET offers a cleaner interface. I will cover parameters in another article).

After selecting a connection type, selecting the Connection Manager is next. The Execute SQL Task Editor filters the list of available connection managers, displaying only connection managers of the (connection) type configured in the previous step.

Figure 3 shows how to select a Connection Manager:

ssise3 SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044
Figure 3: Selecting a Connection Manager

The last required step to configuring an SSIS Execute SQL Task is to supply an SQL Statement. There are two properties for this: SQLSourceType and SQLStatement. The SQL Source Type property specifies the source of the SQL Statement as one of the following:

  • Direct Input
  • File Connection
  • Variable

The SQL Statement can be entered manually (direct input). It can be stored in a file (file connection) or the SQL Statement can be stored in an SSIS variable (variable). In most cases you will manually enter the query as shown in Figure 4:

ssise4 SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044
Figure 4: Entering the Query

Right-click the task and click “Execute Task” to test the task configuration. Or simply press the F5 key to execute the entire SSIS package, as shown in Figure 5:

ssise5 SQL SERVER   The Basics of the SSIS Execute SQL Task   Notes from the Field #044
Figure 5: Success!

You now know the basics of configuring an SSIS Execute SQL Task. Go code!

:{>

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Andy Defines Basic RDBMS: Isolation in Processes – Notes from the Field #038

[Note from Pinal]: This is a new episode of Notes from the Field series. Every time I give an introductory note, however, this time there is no need of intro note. This note is from Andy and as we all know he is amazing person when we have to understand the fundamentals. He has written this blog post with such an interesting way that you must read it to understand the very basic fundamental of Isolation.

andyleonard SQL SERVER   Andy Defines Basic RDBMS: Isolation in Processes   Notes from the Field #038


When I think about SQL Server isolation in processes, it reminds me of eggs. Our family raises chickens for their eggs. Fresh eggs are very tasty, but there’s always the risk of a bad egg.

In the image above, I am preparing to scramble five eggs. I have cracked and opened five eggs (you can tell by the number of eggshells), but only four egg yolks are shown. “Why are there only four yolks, Andy?” I’m glad you asked.

My process for opening the eggs involves first dropping the contents of the egg into the mug, examining the contents, then – if satisfactory – adding them to the bowl for mixing. You don’t have to do this to make five scrambled eggs; you can crack the eggs right over the bowl.

eggs SQL SERVER   Andy Defines Basic RDBMS: Isolation in Processes   Notes from the Field #038

But what happens when you open a bad egg? You risk ruining the entire batch of mostly good eggs. If you crack the eggs over the bowl, you have to pick five good eggs in a row to get a batch that’s ready to scramble. You may get lucky and it may only require five eggs. But the risk of a bad egg is ever present. You could get one good egg, followed by a bad egg. Now you have to throw both the good and bad egg out and begin again. The next time you may get three good eggs and then one bad egg. Now you’ve wasted three good eggs.

Isolating the eggs is a good practice. If I first empty the egg into a mug, I have constrained the process so that I only risk one egg at a time. In so doing, I have successfully mitigated risk to the least possible unit. I will only throw out the bad eggs without risking contamination of the good eggs in the bowl.

Isolation is generically defined as, “the process of separating somebody or something from others, or the fact of being alone and separated from others.”

In database terms, isolation is one of the four defining properties (i.e., atomic, consistent, isolated, durable—remember these by using the acronym ACID) of a Relational Database Management System (RDBMS). Similar to isolating bad eggs from the good, RDBMS isolation keeps individual database transactions from intermingling with each other during execution. It’s not that other transactions are bad, we just want to keep them separated so that data from one transaction doesn’t corrupt data from another transaction.

Are isolated transactions completely unaffected by each other? No, unlike the example of completely isolating a bad egg from the mix of good eggs, RDBMS isolation doesn’t prevent one transaction from influencing or impeding another transaction. An example of influence is resource contention; an example of impedance is locking. Isolation simply guarantees the results of the transaction will not be affected by concurrently executing transactions.

You can learn more about these properties from this awesome post by my friend, Pinal Dave: SQL Server – ACID (Atomicity, Consistency, Isolation, Durability).

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SSIS Component Error Outputs – Notes from the Field #034

[Notes from Pinal]: Error confuses people and stops the operations. Developer search more on error online rather than best practices. Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems.

Tim Mitchell SQL SERVER   SSIS Component Error Outputs   Notes from the Field #034Linchpin People are database coaches and wellness experts for a data driven world. In this 34th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand more about SSIS component errors.


When working in the SSIS data flow, you’ll notice that many sources and transformations and some destinations have a built-in output to handle errors.  The error output allows the SSIS developer to create a separate path through which error rows can be directed.  In this post, we’ll briefly discuss the essentials and design patterns for using error outputs on data sources and transformations.

Overview

Error outputs are a secondary path through which the data flow can send rows that do not conform to data type, length, or transformation standards defined by the ETL developer.  That’s a lengthy way to say that it’s where you can send your junk data.  In the SSIS designer, clicking on a source or transformation will often show not one but two possible outputs: the primary output (the “good” data, indicated by the blue line) and the error output (identified by the red line).  As shown on the flat file source below, when selecting a source or transformation, those that have an available error output will appear with both output connectors ready for selection.

ssis34 1 SQL SERVER   SSIS Component Error Outputs   Notes from the Field #034

Using this design will allow you to handle errors such as:

  • Incorrectly typed data
  • String data longer than its specification
  • Invalid transformation logic
  • Relational constraint errors at the destination

Error outputs provide a great way to handle two things at once: providing a secondary path through which bad data can be either triaged or cleaned up, and preventing the package from failing based on just a few bad rows of data.

When connecting an error output to a downstream component, you’ll be prompted with a configuration window in which you can select the error output options.  The most important thing to remember here is that default behavior is to fail the component upon any error, even when you connect the source or transformation’s error output.  The default options are shown below.

ssis34 2 SQL SERVER   SSIS Component Error Outputs   Notes from the Field #034

As mentioned, you must explicitly set the behavior to Redirect row for the column(s) that will be handled by the error output.  You can do this on a column-by-column basis, if you want to handle error conditions on that level, but in most cases I see that all columns are set to either fail or redirect as a group.  In the screenshot below, I’ve set all columns to redirect upon error or truncation.

ssis34 3 SQL SERVER   SSIS Component Error Outputs   Notes from the Field #034

Practical use

So what happens when we connect a component’s error output?  In most cases, any rows in error would be redirected to that output, which we can use as a secondary path for cleanup or triage.  Assuming we change the error and truncation behavior to Redirect row, errors that are captured by the error output would not result in a failure of that source.

There are several design patterns surrounding the use of error outputs on sources or transformations.  Among the most common:

Cleanse inline.  If possible, the data can be sanitized inline using other SSIS components.  If the ETL can be built to handle most of the known data deficiencies, this is usually the cleanest way to handle error rows.  Very often, data quality tools such as SQL Server Data Quality Services can be useful for inline cleansing.

Send to triage.  If the package does not have cleansing logic, or there are rows that still cannot be cleansed using that logic, the bad data can be sent to triage for manual review or cleansing.  Though this pattern does require manual intervention, it also allows us to audit and review the reasons for those failures by preserving the data.

Ignore. It is possible to redirect error rows into the bit bucket, resulting in a loss of data.  In rare cases, this is acceptable; however, in most cases, there should be some measure of data capture to improve the quality of the ETL process.

Which is the best pattern to use?  As always, it depends.  Factors such as the criticality of the data, the types of errors expected, the frequency of the ETL process, and many others must be used to decide how, if at all, to use the error paths in SSIS.

Caveats

As with any ETL function, there are a few caveats and cautions around using error outputs.  First of all, I don’t recommend adding an error output path to every component that supports it simply because it’s available.  Redirecting error rows is not always appropriate – sometimes it makes sense to fail the package rather than trying to programmatically clean up or triage error data.

Additionally, you’ll need to familiarize yourself with the different types of error outputs generated by each component.  For example, if you use the error output of the flat file source, it will not break apart the data into individual columns (even for those elements not in error for that row) – because the error output is a relatively low-level function, what you get is a single string with all of the data in it, and if you have processes that consume this data downstream of the error output, you’ll need to do some transformation on the errored rows before you can do any cleanup.

Finally, it is important to note that not all components support error outputs.  Because of the differences in the types of operations performed by each source, destination, or transformation component, some of them do not include an error output path.

Conclusion

In this post, we’ve briefly covered the error output facilities for the SSIS data flow, along with a few design patterns for implementing these in a package.  In a future post, we’ll talk more about implementing an end-to-end solution for handling error rows in an SSIS data flow.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Parsing SSIS Catalog Messages – Notes from the Field #030

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

andyleonard SQL SERVER   Parsing SSIS Catalog Messages   Notes from the Field #030

In this episode of the Notes from the Field series I requested SSIS Expert Andy Leonard to discuss one of the most interesting concepts of SSIS Catalog Messages. There are plenty of interesting and useful information captured in the SSIS catalog and we will learn together how to explore the same.


The SSIS Catalog captures a lot of cool information by default. Here’s a query I use to parse messages from the catalog.operation_messages table in the SSISDB database, where the logged messages are stored.

This query is set up to parse a default message transmitted by the Lookup Transformation. It’s one of my favorite messages in the SSIS log because it gives me excellent information when I’m tuning SSIS data flows. The message reads similar to:

Data Flow Task:Information: The Lookup processed 4485 rows in the cache. The processing time was 0.015 seconds. The cache used 1376895 bytes of memory.

The query:

USE SSISDB
GO
DECLARE @MessageSourceType INT = 60
DECLARE @StartOfIDString VARCHAR(100) = 'The Lookup processed '
DECLARE @ProcessingTimeString VARCHAR(100) = 'The processing time was '
DECLARE @CacheUsedString VARCHAR(100) = 'The cache used '
DECLARE @StartOfIDSearchString VARCHAR(100) = '%' + @StartOfIDString + '%'
DECLARE @ProcessingTimeSearchString VARCHAR(100) = '%' + @ProcessingTimeString + '%'
DECLARE @CacheUsedSearchString VARCHAR(100) = '%' + @CacheUsedString + '%'
SELECT operation_id
, SUBSTRING(MESSAGE,
(
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1))
-
(
PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1))) AS LookupRowsCount
, SUBSTRING(MESSAGE,
(
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1))
-
(
PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))) AS LookupProcessingTime
, CASE WHEN (CONVERT(numeric(3,3),SUBSTRING(MESSAGE,
(
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1))
-
(
PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))))) = 0
THEN 0
ELSE CONVERT(bigint,SUBSTRING(MESSAGE,
(
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1))
-
(
PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1))))
/
CONVERT(numeric(3,3),SUBSTRING(MESSAGE,
(
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1))
-
(
PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))))
END AS LookupRowsPerSecond
, SUBSTRING(MESSAGE,
(
PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1))
-
(
PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1))) AS LookupBytesUsed
,CASE WHEN (CONVERT(bigint,SUBSTRING(MESSAGE,
(
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1))
-
(
PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))))= 0
THEN 0
ELSE CONVERT(bigint,SUBSTRING(MESSAGE,
(
PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1))
-
(
PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1))))
/
CONVERT(bigint,SUBSTRING(MESSAGE,
(
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1),
((
CHARINDEX(' ',
MESSAGE,
PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1))
-
(
PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1))))
END AS LookupBytesPerRow
FROM [catalog].[operation_messages]
WHERE message_source_type = @MessageSourceType
AND MESSAGE LIKE @StartOfIDSearchString
GO

Note that you have to set some parameter values:

  • @MessageSourceType [int] – represents the message source type value from the following results:
    Value     Description
    10           Entry APIs, such as T-SQL and CLR Stored procedures
    20           External process used to run package (ISServerExec.exe)
    30           Package-level objects
    40           Control Flow tasks
    50           Control Flow containers
    60           Data Flow task
    70           Custom execution message
    Note: Taken from Reza Rad’s (excellent!) helper.MessageSourceType table found here.
  • @StartOfIDString [VarChar(100)] – use this to uniquely identify the message field value you wish to parse.
    In this case, the string ‘The Lookup processed ‘ identifies all the Lookup Transformation messages I desire to parse.
  • @ProcessingTimeString [VarChar(100)] – this parameter is message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Processing Time value.
  • For this execution, I use the string ‘The processing time was ‘.
  • @CacheUsedString [VarChar(100)] – this parameter is also message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Cache  Used value. It returns the memory used, in bytes.
    For this execution, I use the string ‘The cache used ‘.
  • The other parameters are built from variations of the parameters listed above.

The query parses the values into text. The string values are converted to numeric values for ratio calculations; LookupRowsPerSecond and LookupBytesPerRow. Since ratios involve division, CASE statements check for denominators that equal 0.

Here are the results in an SSMS grid:

ssiscata SQL SERVER   Parsing SSIS Catalog Messages   Notes from the Field #030

This is not the only way to retrieve this information. And much of the code lends itself to conversion to functions. If there is interest, I will share the functions in an upcoming post.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance.

Tim Mitchell SQL SERVER   SSIS Look Up Component   Cache Mode   Notes from the Field #028Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode.


In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion.  The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values.  Properly configured, it is reliable and reasonably fast.

Among the many settings available on the lookup component, one of the most critical is the cache mode.  This selection will determine whether and how the distinct lookup values are cached during package execution.  It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results.

cachemode SQL SERVER   SSIS Look Up Component   Cache Mode   Notes from the Field #028

Full Cache

The full cache mode setting is the default cache mode selection in the SSIS lookup transformation.  Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location.  As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS.

The most commonly used cache mode is the full cache setting, and for good reason.  The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well.  Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution.

There are a few potential gotchas to be aware of when using full cache mode.  First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data.  If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data.  Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation.  This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation).  There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation.  Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation.

Full cache mode is ideally useful when one or all of the following conditions exist:

  • The size of the reference data set is small to moderately sized
  • The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable
  • Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data

Partial Cache

When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow.  Initially, each distinct value will be retrieved individually from the specified source, and then cached.  To be clear, this is a row-by-row lookup for each distinct key value(s).

This is a less frequently used cache setting because it addresses a narrower set of scenarios.  Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set.  If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)).  Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode.

Using partial cache mode is ideally suited for the conditions below:

  • The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small
  • The size of the lookup data is too large to effectively store in cache
  • The lookup source is well indexed to allow for fast retrieval of row-by-row values

No Cache

As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS.  As a result, every single row in the pipeline data set will require a query against the lookup source.  Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused.  In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful.

As such, it’s critical to know your data before choosing this option.  Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline.

I would recommend considering the no cache mode only when all of the below conditions are true:

  • The reference data set is too large to reasonably be loaded into SSIS memory
  • The pipeline data set is small and is not expected to grow
  • There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values)

Conclusion

The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow.  Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution.  Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SSIS Execution Control Using Precedence Constraints – Notes from the Field #021

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Lots of people start with lots of enthusiasm, but when it is about building a control flow, which addresses all the necessary elements of the package execution they face issues with the order of the tasks. This is a very critical subject and it requires some expert advice.

Tim Mitchell SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021Linchpin People are database coaches and wellness experts for a data driven world. In this 21th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to control the flow of the program using SSIS package execution.


In this post, I’m going to review the essentials of precedence constraints in SQL Server Integration Services, and will demonstrate how these can be used to precisely control the flow of the program during the SSIS package execution.

In a control flow for SQL Server Integration Services, the “lines” we commonly use to connect tasks to each other are actually smart controls.  These precedence constraints allow the ETL developer to customize the program flow from one task to another.  The most common use of precedence constraints is to simply connect two tasks with the default constraint, which is the success constraint.  In this usage, the downstream task will be executed if and only if the execution of the upstream task is successful.  As shown in the example below, the task DFT Write data to output table will be executed if the preceding task named SQL Truncate output table is successfully executed.  Similarly, the final task in the chain, named SQL Log row count to audit tbl, will be executed only if the preceding data flow task was successful.

ssis n21 1 SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021

The green lines connecting these tasks are the precedence constraints.  In the above example, we know that these are success precedence constraints by their color.

Using precedence constraints for success operations is not the only option we have.  Because these are smart controls, we can configure them in a variety of ways for maximum flexibility.  An example of this would be use a failure precedence constraint to code for the possibility of failure.  In this scenario, we could create one or more tasks that would be executed in the event of a failure within a package.  As shown below, we have a snippet of a package showing a failure path from the data flow by way of a failure precedence constraint.

ssis n21 2 SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021

In this example, we connect a precedence constraint from the data flow task to the task named SQL Remove data from partial load, but instead of using the default setting, we reconfigure this as a failure constraint (shown with the “Failure” label and red color) to redirect the execution flow to that task in the event that the data flow execution fails.  Downstream from there, we have yet another precedence constraint configured as a completion constraint (indicated by the label and the black color).  This constraint indicates that the task SQL Log failure event to event log will be executed whenever the upstream task is finished executing, irrespective of whether the result of that execution was success or failure.

To configure a precedence constraint to a setting other than the default value of success, simply double-click on the precedence constraint and set the appropriate setting, the options for which are shown below.

ssis n21 3 SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021

As shown, we can choose Success (the default setting), Failure, or Completion.

Note that there are a few other options as well.  Above the selection for the type of constraint, you’ll see that we can also choose the evaluation operation.  The default setting is Constraint, which means that the precedence constraint uses only the success/failure/completion results as described above.  However, we also have some other options.  We can use an SSIS expression along with, or perhaps instead of, the execution status evaluation to determine how to control the execution flow.

ssis n21 4 SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021

Additionally, we have the ability to control the behavior when multiple constraints are used.  By default, if multiple constraints are connected to a single downstream task, then all of those constraints must evaluate true for said downstream task to execute.  However, we can override that default behavior by setting this value for multiple constraints to Logical OR, indicating that the downstream task will be executed if any of the connected constraints evaluates true.

As shown in the example below, we have modified the original example, changing both the evaluation operation as well as the multiple constraint behavior.  On the constraint between the data flow task and the data deletion task, we add an SSIS expression to check the row count check to confirm whether any rows were loaded as part of the failed execution (since we wouldn’t need to run the delete operation if no rows were loaded).  Further, we’ve added a completion constraint between the data flow task and the task named SQL Log failure event to event log and set the multiple constraint behavior to Logical OR to make sure we log the failure whether or not the delete operation occurred.

ssis n21 5 SQL SERVER   SSIS Execution Control Using Precedence Constraints   Notes from the Field #021

We get visual reminders of these settings as well.  The function indicator (which looks like an italicized fx) appears on the precedence constraint on which we are using the expression.  For the Logical OR setting, the constraints will appear as dashed lines rather than solid lines.

Precedence constraints are much more than just lines drawn between two tasks.  In fact, these are powerful and flexible decision tools that allow the ETL developer a great deal of control over the flow of the SSIS execution.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)