[Notes from Pinal]: Error confuses people and stops the operations. Developer search more on error online rather than best practices. Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems.
Linchpin People are database coaches and wellness experts for a data driven world. In this 34th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand more about SSIS component errors.
When working in the SSIS data flow, you’ll notice that many sources and transformations and some destinations have a built-in output to handle errors. The error output allows the SSIS developer to create a separate path through which error rows can be directed. In this post, we’ll briefly discuss the essentials and design patterns for using error outputs on data sources and transformations.
Error outputs are a secondary path through which the data flow can send rows that do not conform to data type, length, or transformation standards defined by the ETL developer. That’s a lengthy way to say that it’s where you can send your junk data. In the SSIS designer, clicking on a source or transformation will often show not one but two possible outputs: the primary output (the “good” data, indicated by the blue line) and the error output (identified by the red line). As shown on the flat file source below, when selecting a source or transformation, those that have an available error output will appear with both output connectors ready for selection.
Using this design will allow you to handle errors such as:
Error outputs provide a great way to handle two things at once: providing a secondary path through which bad data can be either triaged or cleaned up, and preventing the package from failing based on just a few bad rows of data.
When connecting an error output to a downstream component, you’ll be prompted with a configuration window in which you can select the error output options. The most important thing to remember here is that default behavior is to fail the component upon any error, even when you connect the source or transformation’s error output. The default options are shown below.
As mentioned, you must explicitly set the behavior to Redirect row for the column(s) that will be handled by the error output. You can do this on a column-by-column basis, if you want to handle error conditions on that level, but in most cases I see that all columns are set to either fail or redirect as a group. In the screenshot below, I’ve set all columns to redirect upon error or truncation.
So what happens when we connect a component’s error output? In most cases, any rows in error would be redirected to that output, which we can use as a secondary path for cleanup or triage. Assuming we change the error and truncation behavior to Redirect row, errors that are captured by the error output would not result in a failure of that source.
There are several design patterns surrounding the use of error outputs on sources or transformations. Among the most common:
Cleanse inline. If possible, the data can be sanitized inline using other SSIS components. If the ETL can be built to handle most of the known data deficiencies, this is usually the cleanest way to handle error rows. Very often, data quality tools such as SQL Server Data Quality Services can be useful for inline cleansing.
Send to triage. If the package does not have cleansing logic, or there are rows that still cannot be cleansed using that logic, the bad data can be sent to triage for manual review or cleansing. Though this pattern does require manual intervention, it also allows us to audit and review the reasons for those failures by preserving the data.
Ignore. It is possible to redirect error rows into the bit bucket, resulting in a loss of data. In rare cases, this is acceptable; however, in most cases, there should be some measure of data capture to improve the quality of the ETL process.
Which is the best pattern to use? As always, it depends. Factors such as the criticality of the data, the types of errors expected, the frequency of the ETL process, and many others must be used to decide how, if at all, to use the error paths in SSIS.
As with any ETL function, there are a few caveats and cautions around using error outputs. First of all, I don’t recommend adding an error output path to every component that supports it simply because it’s available. Redirecting error rows is not always appropriate – sometimes it makes sense to fail the package rather than trying to programmatically clean up or triage error data.
Additionally, you’ll need to familiarize yourself with the different types of error outputs generated by each component. For example, if you use the error output of the flat file source, it will not break apart the data into individual columns (even for those elements not in error for that row) – because the error output is a relatively low-level function, what you get is a single string with all of the data in it, and if you have processes that consume this data downstream of the error output, you’ll need to do some transformation on the errored rows before you can do any cleanup.
Finally, it is important to note that not all components support error outputs. Because of the differences in the types of operations performed by each source, destination, or transformation component, some of them do not include an error output path.
In this post, we’ve briefly covered the error output facilities for the SSIS data flow, along with a few design patterns for implementing these in a package. In a future post, we’ll talk more about implementing an end-to-end solution for handling error rows in an SSIS data flow.
If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.
Reference: Pinal Dave (http://blog.sqlauthority.com)