SQL SERVER – Rewriting Database History – Notes from the Field #094

[Note from Pinal]: This is a 94th episode of Notes from the Field series. When I read the title of this article – I was extremely intrigued with it – Rewriting Database History! When I was a kid, history was my favorite subject and till today when I have to deal with history, I always jump first to read and enjoy it. When I see this article from Kevin, I was so delighted. I started to read it immediately, and I did not stop reading it till it was finished. It was 20 minutes well spent. The journey starts from ORM and end at the ORM, however, when it comes to full circle there are so many things in between. Trust me Kevin can be a great historical novel writer.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains the interesting subject of rewriting database history. Read the experience of Kevin in his own words.

KevinHazzard SQL SERVER   Rewriting Database History   Notes from the Field #094

Way back in June of 2006, my friend Ted Neward wrote a very detailed piece called The Vietnam of Computer Science. Juxtaposing America’s involvement in Southeast Asia in the last century with the battles we fight in modern, business-driven enterprises, that article examined the unwinnable war that Object-Relational Mapping (ORM) tools have become. The title of the article is probably a bit politically incorrect these days given that Vietnam’s world standing and economy have improved vastly over the last decade. Moreover, lots of database and application developers working today don’t even remember the war in Vietnam so the allegory may fail for them. But for those of us who grew up in the 1960s and 1970s, the writing evokes plenty of painful memories that paint ORMs as technological quagmires.

ORM tools have evolved quite a bit since 2006 but they’re still generally pretty awful. The metadata-driven code generators buried within tools like Hibernate and Microsoft’shistory1 SQL SERVER   Rewriting Database History   Notes from the Field #094 Entity Framework do a much better job of writing decently-performing queries than they did in the early days. But the cruft, wacky idioms and generally non-sensible constraints one must suffer to use these tools is overwhelming for all but a handful of gladiators who gain some odd sense of purpose in buttressing their companies for long-running conflicts. Nowhere is this more apparent than in the Ruby on Rails space where some sort of addiction to total database ignorance seems to have infected the vast majority of that community. If you want to understand why Ruby on Rails is in decline in recent years, look no further than the chatty, miserably-performing queries that the Active Record ORM generates when accessing complex, relational data structures. Any DBA or database developer who has been enlisted to debug performance problems in those sorts of melees knows there’s a better way to win hearts and minds.

For most developers who have shunned traditional ORMs, I often hear them speak about how unwieldy, brittle or inflexible they found the tools to be. The relational abstractions that they provide is sometimes too weak, making common business operations difficult to express in languages like C# and Java. To solve that problem, the abstraction may become leaky, exposing details that create dependencies which complicate deployments or lock you into vendor or version-specific features.

For database people, ORM aversion is typically related to the naiveté of machine-generated queries. Chattiness is a big problem with full-service ORMs, for example. Mature ORMs do a better job today emitting correlated sub-queries or common table expressions than they did in the past. But that often depends on good mappings which depend in turn on good metadata. Unfortunately, perfect database metadata is quite rare. Anyone who watches the tracing of a database server suffering under the garrulous assault of an ORM-driven application understands just how little the ORM really knows about the database.

history2 SQL SERVER   Rewriting Database History   Notes from the Field #094If the protestors can convince management that the ORM war is unwinnable, the response can move in several potential directions. One might rationalize that the relational representation of data is the real foe, i.e. that because data is rarely used in third normal form within applications, it shouldn’t be stored that way on disks. The pundits of the NoSQL movement say that relational storage models are a holdover from an age when storage and bandwidth were much more expensive than they are today. That is a massive oversimplification for sure, but there are some cases where a document-oriented databases make a lot of sense.

For example, I worked in a shop long ago that used a UniVerse database to manage large caches of medical claims data. As a multi-valued database much like today’s NoSQL databases, UniVerse stores data with all the data that it references in a single document. So a patient’s demographic information can be stored in line with the related medical claims, diagnostic history and lab results. For that system, there was only one index on the database which was used to find individual, semi-structured blobs of patient records. The rules engine which sat atop the database always needed all of a single patient’s information whenever it ran a rule so storing the entire document describing a patient as a single database entity was both practical and highly efficient.

In those days, UniVerse was an IBM product that competed against their own DB2 relational database. For some reason, IBM was keen on getting our company to move from UniVerse to DB2 so we invited them in for a proof of concept to build a relational model of our medical database in DB2. When they were done, they loaded several terabytes of data into the new relational database and aimed the rules engine there. After weeks of tweaking, rewriting and hand-wringing, they never could make the relational database outperform UniVerse. Moreover, it wasn’t even a close race. The UniVerse database running on much older hardware could fetch whole patient documents in a few milliseconds. The best we could get DB2 to do using its normalized implementation was measured in hundreds of milliseconds. The difference between five milliseconds and half a second may not sound like a major problem, but when you’re handling millions of queries per day and trying to satisfy a three second, end-to-end Service Level Agreement with your Point of Sale providers, those delays add up very quickly.

Of course, document-oriented databases like Mongo and multi-valued databases like UniVerse are not always viable solutions to bridging the gulf between on-disk and in-memory data history3 SQL SERVER   Rewriting Database History   Notes from the Field #094representations. For 20 years, I’ve made the claim that databases are an unfortunate consequence of history. Bear with me because I don’t mean to disparage database technologies at all. Quite the opposite, I love and respect modern database systems of many types. However, it’s very true that if you could travel back in time to 1943 and give the developers of the ENIAC computer at the University of Pennsylvania a large scale memory device to attach to their invention, databases as we know them would never have evolved. Instead, all of the rich query tools, and consistency features of modern transactional databases would have been developed inside of our applications instead.

In that sense, document-oriented databases and ORMs are both attempts to undo history. The former solution means to eliminate the mismatch between applications and databases altogether by making the data storage model match the common data utilization pattern. The problem is that by coercing the database to adapt to the application’s model, many of the features that evolved in transactional systems simply aren’t available. So-called eventual consistency scares developers who have come to depend on databases that offer transactions which are Atomic, Consistent, Isolated and Durable (ACID). This is most unfortunate because had the ENIAC developers had unlimited in-memory storage available to them, I believe that in-memory data handling semantics would have evolved to be ACID-compliant just as they are in modern relational databases. Document-oriented databases redo history in a way that doesn’t honor so many of the good choices that were made in real history.

Rather than attempting to undo history per se, ORMs try to hide history from us by making it seem that the underlying database’s storage and query semantics exist inside the application’s space. While this is a better way to undo history because it preserves the best parts of it, this is perhaps the hardest problem in all of computer science today. The reasons that it is so difficult are myriad. Incomplete metadata is often the culprit. The cardinality between entities, for example, is nearly impossible to get right simply by inspecting a database’s available foreign keys. Moreover, the relationships between entities in separate databases or in separate but related data integration processes often has no metadata to tie them together. The Master Data Management (MDM) structures and processes used to manage code assignments can also have a huge impact on an ORM’s query generation. Languages like C# and Java lack even the most basic Design by Contract (DbC) features so it’s no wonder that their handling of database constraints is also quite weak. For example, imagine a simple database constraint that limits an integer value to multiples of ten. Now imagine an ORM trying to convey and enforce that constraint in a language like C#. You get the picture. This is a very difficult, effectively impossible, problem to solve.

history4 SQL SERVER   Rewriting Database History   Notes from the Field #094So what’s the solution? Should ORMs be used at all? Should we abandon relational databases and move entirely over to document-oriented databases? Yes and no, not necessarily in that order. ORMs have taught us a lot over the past 20 years. The number one thing I’ve learned from them is to be pragmatic about my approach to data. I use stored procedures whenever there’s a way to reduce the chattiness and excessive round trip calls that ORMs often induce. That keeps my DBAs happy. For simple, table accesses that are indexed well for my common query predicates, I’ll use an ORM just to save me some time.

Nowadays, I rely on so-called micro-ORMs and auto-mapper tools more than traditional ORMs. Micro-ORMs are an admission that full-service ORMs are often too complex or that they unnecessarily constrain my use of data. Using less ambitious micro-ORMs like Dapper gives me the benefits I desire without all the weight and ceremony of a full-featured ORM. Lastly, when document-oriented storage seems warranted, I evaluate the criteria objectively and openly admit when NoSQL is the correct choice. However, this is less common than avid NoSQL proponents would have you believe. Relational databases are still the best choices for storage and data handling in the vast majority of cases, mostly because of their maturity, rich feature sets, high reliability and great support.

Nine years later and my friend Ted is still mostly correct. ORMs are an unwinnable war. But they’ve taught us a lot and there are other options for solving what is perhaps the hardest problem out there today: undoing history and redoing it as we think it might have been done under different circumstances. No matter which way the market goes, the technical equivalent of sober diplomacy should triumph over conflict.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Performance Monitoring for Analysis Services – Notes from the Field #093

[Note from Pinal]: This is a new episode of Notes from the Field series. When we build any application, we build it with zeal and enthusiasm. We believe it will do well for many years but it is not the case always. After a while performance started to go down and everything is not in the best shape. This is the time when we need to monitor performance and take action based on our analysis.

Bill%20Anton SQL SERVER   Performance Monitoring for Analysis Services   Notes from the Field #093

In this episode of the Notes from the Field series I asked SQL Expert Bill Anton a very crucial question – How to monitor performance of analysis services? Bill was very kind to answer the questions and provides plenty of information about how a novice developer can work with SSAS.

When it comes to ad-hoc query performance in business intelligence solutions, very few technologies rival a well-designed Analysis Services Multidimensional cube or Tabular model. And when that cube/tabular model is performing well, as it usually does in the beginning, life is good!

notes 93 1 SQL SERVER   Performance Monitoring for Analysis Services   Notes from the Field #093

Photo Credit: SuperCar-RoadTrip.fr

Over time, however, things can change. The underlying business evolves (hopefully) and information workers start asking different questions, resulting in new query patterns. The business grows and now more users are asking more questions of larger data volumes. Performance starts to deteriorate. Queries are taking longer to complete and nightly processing starts to run past the end of the maintenance window into business hours. Users are complaining, management is unhappy, and life is no longer good.

notes 93 2 SQL SERVER   Performance Monitoring for Analysis Services   Notes from the Field #093

Photo Credit: Charlie

How did we not see this coming?

If this describes your personal situation, don’t worry, you’re not alone. In fact the majority of clients who bring me in to troubleshoot and resolve their Analysis Services performance problems ask the same question (or some variation of it).

In my experience, 80% of the time the reason no one sees this type of issue coming is because there wasn’t a performance monitoring solution in place. The other 20% who have a performance monitoring solution simply aren’t using it or reviewing information being collected.

I don’t know why so many Analysis Services environments are neglected (I’m not a Business or IT therapist) but I’m going to tell you what steps need to be taken if you want to avoid it.

The Secret is Simple

The first step to maintaining a well running Analysis Services instance is simple. Take measurements!

notes 93 3 SQL SERVER   Performance Monitoring for Analysis Services   Notes from the Field #093

Photo Credit: Official U.S. Navy Page

Taking regular measurements is the only way to know how things are performing overtime. This seems ridiculously obvious, but so few companies actually do it. My hunch is that these folks just don’t know what needs to be measured.

Analysis Services can be broken down into 2 fundamental types of workloads:

Processing Workloads

Processing is the term used to describe how data gets loaded into the Analysis Services database. This workload is usually done at night, outside of business hours, so as not to coincide with user activity.

From a performance perspective, the primary things to keep an eye on are:

Processing Duration
This is the amount of time it takes to load data into the Analysis Services database. As long as the processing duration fits within the scheduled maintenance window, there’s not much to worry about. However, if this duration is increasing over time and you think it will eventually run past that window, then you’ll need to review the rest of the information to figure out “why” and “what to do”.

How long does it take to process the Analysis Services database?
Is the processing duration increasing over time?

Resource Consumption
Keeping an eye on resource consumption (e.g. CPU, memory, disk, network) during processing is also a good idea. This kind of information can help shed some light on bottlenecks and provide guidance when coming up with a solution to processing related problems.

Is the processing workload requiring more and more memory? Are we maxing out CPU? How’s disk space?

There are many solutions to problems faced during processing, but without some insight into what’s happening on the system, it’s hard to know which solution is the optimal one.

For example, say we have a Multidimensional cube and notice that the processing duration for one of the measure groups is steadily increasing over time. We review the resource consumption and see that there’s plenty of CPU/Memory/IO to spare. In this case, we may consider partitioning this particular fact table and only processing the most recent partition or processing the partitions in parallel.

Pro Tip: Breaking up processing into stages will provide context to the information above and make it much easier to see which part(s) of the Analysis Services database are contributing to the increase in processing duration or memory consumption.

Query Workloads

This refers to any activity that generates queries against the SSAS database. It could be users running reports, loading dashboards, analyzing data via pivot tables, etc. Because users typically don’t run the same queries at the same time every day, this workload can be much more complicated to monitor.

The key to success is to start with the high level stuff and only go into the details if/when necessary.

The single most important thing to track for this type of workload is a log of the actual queries being executed against the Analysis Services database.

Not only will you be able to see which queries are slow, but you’ll also have the actual MDX/DAX executed. You won’t have to wait for the user to complain (telling you their report is taking too long to load) because you’ll already have the query and can start reviewing it immediately.

Some Analysis Services implementations actually have service level agreements (SLA) with criteria such as “no query should take more than 30 seconds to complete” and “the average query response time should be less than 5 seconds”. If you’re tracking every query against the Analysis Services database, not only will you know if the SLA has been violated, but you’ll know which query or queries it was that led up to the violation and can start troubleshooting immediately.

Tracking the number of folks using your system (and when they are using it) will prove very helpful for knowing if/when to start considering options for scaling the system up and/or out.

This information can usually be extracted from whatever mechanism you use to track queries being executed against the Analysis Services database, but it is important enough to deserve its own section.

Resource Consumption
In the same vein as the above discussion around tracking resource consumption of the processing workload, you’ll also want to track the same measures (e.g. CPU, memory, disk, network) throughout the day. This information may provide some clues as to why a query is slow.

For example, say you’re reviewing the top 10 slowest queries from the previous week and find several of the queries are now running very fast. At this point you can switch over and start looking at the state of the system last week at the time the query was slow (from a resource consumption perspective) and perhaps find some clues, such as memory pressure or CPU bottleneck caused by a spike in activity.

Here are some examples of the types of questions you should be able to answer with the above information:

What are the top 10 slowest queries each week?

Who are your top users and what is the average number of queries they execute per day/week/month?

What are the average number of users per day/week/month?

What is the max number/average number of concurrent users per day/week/month?

Pro Tip: some folks incorrectly assume the OLAPQueryLog tracks queries. However, this table only tracks parts of queries (storage engine requests). A single query executed against an Analysis Services database could potentially generate 10s of records in this table. The implication is that it doesn’t give you the whole story and you won’t always be able to determine which queries are slow.

Next Steps

Now that you know what types of information you should be collecting and reviewing regularly, the next step is to figure out how you’re going to collect that information. The good news is that there are quite a few options available. The bad news is that you’ll have to wait until next time to find out. Here is the SQLPASS session which discusses about the same concept Analysis Services: Show Me Where It Hurts.

If you want to get started on performance monitoring with SSAS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – A Stored Procedure for Executing SSIS Packages in the SSIS Catalog – Notes from the Field #092

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications.

andyleonard SQL SERVER   A Stored Procedure for Executing SSIS Packages in the SSIS Catalog   Notes from the Field #092

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – How to create a stored procedure for executing SSIS Package in the SSIS Catalog? Andy was very kind to answer the questions and provides plenty of information about how a novice developer can learn SSIS from the beginning and become expert in the technology.

The following is a snippet from Chapter 2 of the book SSIS Design Patterns co-written by Matt Masson (Blog | @mattmasson), Tim Mitchell (Blog | @Tim_Mitchell), Jessica Moss (Blog | @jessicammoss), and Michelle Ufford (Blog | @sqlfool). Earlier in the chapter there are demos that describe a simple SSIS package named Chapter2.dtsx which is part of an SSIS project named Chapter2, and which is deployed to an instance of the SSIS Catalog in a Folder named “Chapter 2”. But you can use this stored procedure to execute any SSIS package in the SSIS Catalog. That’s the whole point!

SQL Server 2014 provides a new way to manage and execute Integration Services packages: Integration Server Catalogs. We explore this method next.

Integration Server Catalogs

You can only manage SSIS projects that use the Project Deployment Model in Integration Services Catalogs. To execute a package in the catalog, use SSMS to connect to the instance of SQL Server hosting the SSISDB database. Expand the Integration Services Catalogs node, and then expand the SSISDB node. Drill into the folder containing the SSIS project and package(s). Right-click the package you wish to execute and click Execute, as shown in Figure 2-1.

notes91 1 SQL SERVER   A Stored Procedure for Executing SSIS Packages in the SSIS Catalog   Notes from the Field #092

Figure 2-1. Executing an SSIS Package deployed to the SSIS Catalog

The Execute Package Window displays, as shown in Figure 2-2. It allows you to override Parameter values, ConnectionString properties of Connection Managers built at design-time, or any other externalize-able property accessible from a Package Path (via the Advanced tab) for this execution instance of the SSIS package stored in the SSIS Catalog.

notes91 2 SQL SERVER   A Stored Procedure for Executing SSIS Packages in the SSIS Catalog   Notes from the Field #092

Figure 2-2. Execute Package Window

Integration Server Catalog Stored Procedures

Please note the Script button above the Parameters tab in Figure 2-2. This button allows you to generate Transact-SQL statements that will execute the SSIS package. For the Chapter2.dtsx package stored in the SSIS Catalog, the scripts will appear similar to that in Listing 2-1.

Listing 2-1. Transact-SQL Script Generated From the Execute Package Window

Declare @execution_id bigint
EXEC [SSISDB].[catalog].[create_execution]
  ,@execution_id=@execution_id OUTPUT
Select @execution_id
DECLARE @var0 smallint = 1
EXEC [SSISDB].[catalog].[set_execution_parameter_value]
EXEC [SSISDB].[catalog].[start_execution] @execution_id

You can use these same stored procedures to execute SSIS Packages in the SSIS Catalog! In fact, I designed a script to create a wrapper stored procedure that will call the Transact-SQL statements executed when an SSIS Package is executed in the SSIS Catalog. You can see that script in Listing 2-2.

Listing 2-2. Script to Build a Wrapper Stored Procedure for Executing SSIS Packages in the SSIS Catalog

 /* Select the SSISDB database */

 /* Create a parameter (variable) named @Sql */
Declare @Sql varchar(2000)

 /* Create the Custom schema if it does not already exist */
print 'Custom Schema'
If Not Exists(Select name 
              From sys.schemas 
                Where name = 'custom')
   /* Create Schema statements must occur first in a batch */
  print ' - Creating custom schema'
  Set @Sql = 'Create Schema custom'
  print ' - Custom schema created'
 print ' - Custom Schema already exists.'
print ''

 /* Drop the Custom.execute_catalog_package Stored Procedure if it already exists */
print 'Custom.execute_catalog_package Stored Procedure'
  If Exists(Select s.name + '.' +  p.name
            From sys.procedures p
            Join sys.schemas s
                On s.schema_id = p.schema_id
         Where s.name = 'custom'
           And p.name = 'execute_catalog_package')
    print ' - Dropping custom.execute_catalog_package'
    Drop Procedure custom.execute_catalog_package
    print ' - Custom.execute_catalog_package dropped'

   /* Create the Custom.execute_catalog_package Stored Procedure */
  print ' - Creating custom.execute_catalog_package'


     Stored Procedure: custom.execute_catalog_package
     Author: Andy Leonard
     Date: 4 Mar 2012
     Description: Creates a wrapper around the SSISDB Catalog procedures
                  used to start executing an SSIS Package. Packages in the
                SSIS Catalog are referenced by a multi-part identifier
                 - or path - that consists of the following hierarchy:
        Catalog Name: Implied by the database name in Integration Server 2014
        |-Folder Name: A folder created before or at Deployment to contain the SSIS project
        |-Project Name: The name of the SSIS Project deployed
        |-Package Name: The name(s) of the SSIS Package(s) deployed

        @FolderName [nvarchar(128)] {No default} – 
         contains the name of the Folder that holds the SSIS Project
        @ProjectName [nvarchar(128)] {No default} – 
         contains the name of the SSIS Project that holds the SSIS Package
        @PackageName [nvarchar(260)] {No default} – 
         contains the name of the SSIS Package to be executed
        @ExecutionID [bigint] {Output} – 
         Output parameter (variable) passed back to the caller
        @LoggingLevel [varchar(16)] {Default} – 
         contains the (case-insensitive) name of the logging level
         to apply to this execution instance
        @Use32BitRunTime [bit] {Default} – 
         1 == Use 64-bit run-time
                                                      0 == Use 32-bit run-time
        @ReferenceID [bigint] {Default} –          contains a reference to an Execution Environment
        @ObjectType [smallint] –          contains an identifier that appears to be related to the          SSIS PackageType property 
        Guessing: @ObjectType == PackageType.ordinal (1-based-array) * 10
         Must be 20, 30, or 50 for catalog.set_execution_parameter_value
         stored procedure

        1. Create and deploy an SSIS Package to the SSIS Catalog.
        2. Exec custom.execute_catalog_package and pass it the 
          following parameters: @FolderName, @ProjectName, @PackageName, @ExecutionID Output 
        @LoggingLevel, @Use32BitRunTime, @ReferenceID, and @ObjectType are optional and 
        defaulted parameters.

           Declare @ExecId bigint
           Exec custom.execute_catalog_package
        ,@ExecId Output
        3. When execution completes, an Execution_Id value should be returned.
        View the SSIS Catalog Reports to determine the status of the execution 
        instance and the test.

Create Procedure custom.execute_catalog_package
  @FolderName nvarchar(128)
 ,@ProjectName nvarchar(128)
 ,@PackageName nvarchar(260)
 ,@ExecutionID bigint Output
 ,@LoggingLevel varchar(16) = 'Basic'
 ,@Use32BitRunTime bit = 0
 ,@ReferenceID bigint = NULL
 ,@ObjectType smallint = 50

  Set NoCount ON
   /* Call the catalog.create_execution stored procedure
      to initialize execution location and parameters */
  Exec catalog.create_execution
   @package_name = @PackageName
  ,@execution_id = @ExecutionID Output
  ,@folder_name = @FolderName
  ,@project_name = @ProjectName
  ,@use32bitruntime = @Use32BitRunTime
  ,@reference_id = @ReferenceID

   /* Populate the @ExecutionID parameter for OUTPUT */
  Select @ExecutionID As Execution_Id

   /* Create a parameter (variable) named @Sql */
  Declare @logging_level smallint
   /* Decode the Logging Level */
  Select @logging_level = Case 
                           When Upper(@LoggingLevel) = 'BASIC'
                           Then 1
                           When Upper(@LoggingLevel) = 'PERFORMANCE'
                           Then 2
                            When Upper(@LoggingLevel) = 'VERBOSE'
                           Then 3
                           Else 0 /* 'None' */
   /* Call the catalog.set_execution_parameter_value stored
      procedure to update the LOGGING_LEVEL parameter */
  Exec catalog.set_execution_parameter_value
   ,@object_type = @ObjectType
   ,@parameter_name = N'LOGGING_LEVEL'
   ,@parameter_value = @logging_level

   /* Call the catalog.start_execution (self-explanatory) */
  Exec catalog.start_execution



If you execute this script to create the custom schema and stored procedure in your instance of the SSISDB database, you can test it using the statement in Listing 2-3.

Listing 2-3. Testing the SSISDB.custom.execute_catalog_package Stored Procedure

Declare @ExecId bigint
Exec SSISDB.custom.execute_catalog_package 'Chapter2','Chapter2','Chapter2.dtsx',
@ExecId Output


This custom.execute_catalog_package stored procedure can be used to execute an SSIS package from any folder and project in the SSIS Catalog.

The SSIS Script Task can accomplish much more than generating log messages. This brief introduction and basic example have demonstrated how to get started configuring and using the SSIS Script Task. As you can see, SSIS Script Tasks give you development capabilities you may not have been aware of.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – DevOps for the DBA – Notes from the Field #091

[Note from Pinal]: This is a 91st episode of Notes from the Field series. Divas is a new buzz word with lots of substance behind it. Not everyone understands what it means and not everyone actually related it to their daily job. Lots of people think that it is alright to not explore and learn about this as after a few days or months it will be meaningless. However, the reality seems very different about DevOps as it is one of the concepts which is just rediscovered. I recently asked my friend Stuart about what it means by DevOps for DBA and he had a informative answer to this question.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about DevOps for the DBA. Let us read the story in his own words.

 SQL SERVER   DevOps for the DBA   Notes from the Field #091

Not every problem that a DBA encounters is a technical one; sometimes, it’s a procedural one.  I’ve recently become enamored with the philosophy of DevOps, and have tried to find ways to transform those concepts and ideas into practical, actionable advice for my clients.  It’s still a work in progress, but I’ll try to summarize my thoughts today (and, they could change tomorrow).

A Brief Summary of DevOps

DevOps is a collaborative approach to software development and delivery that is influenced by Agile development methods and Lean IT principles; the idea is todevops SQL SERVER   DevOps for the DBA   Notes from the Field #091 decrease time-to-market by looking at (primarily enterprise) software development as a life cycle, and finding ways to automate and increase efficiency.  The general framework is often described as consisting of the Three Ways (a phrase fully explored in The Phoenix Project):

  1. The First Way: Systems Thinking. A complete software development lifecycle encompasses all the functionality involved in both development and operations (including QA and acceptance testing); the entire system is intertwined in order for the business to bring value to customers.  Development responsibility isn’t complete when software passes QA, and Operational responsibilities don’t simply begin during the install.  Both groups should be involved to some degree throughout the process.
  2. The Second Way: Amplify Feedback Loops. No system is complete without feedback, and DevOps approaches focus on increasing the immediacy of that feedback.  Bugs need to be simple to report and acknowledge; suggestions should be taken seriously.
  3. The Third Way: Culture of Continual Experimentation and Learning. Success is found when you take a risk, and DevOps is about creating an environment where risk is encouraged and rewarded.  This is often problematic for Operations, who are usually responsible for maintaining the stability of an environment, but that’s a key to this principle: Ops MUST be involved in finding a way for Dev to take risks while continuing to ensure the stability of the environment.

From Philosophy to Practice

All that’s well and good, but what does it mean for the average DBA? How do we go from thinking about processes, procedures, and philosophy to real meaningful change?  Here’s three specific practices that operational DBA’s should implement as soon as possible:

  1. Respect the role of development. Consider the following scenario: some performance issue has occurred, in operations, and the DBA has stepped in to make a fix (say an index change, or something more drastic like a stored proc rewrites).  The implemented fix is now different than the source of truth for development (their source control), and that’s bad.  It hampers the development effort, but sometimes emergency interventions are necessary to keep the business afloat; what do you do?First and foremost, DBA’s need to limit the changes that they make to strictly emergency fixes; emergency fixes need to be reported to develop as soon as possible.   Not all performance changes need to be emergency fixes; those should be reported to dev, but unless the business is grinding to a halt, the DBA’s should not change the code in operations.  This practice illustrates both the First and the Second Way above.
  1. Request simple methods to supply feedback. If developers are responsible for writing code, they have to be receptive to feedback from operations.  I’ve worked in shops where only the developers and QA had access to create bugs; this kills the feedback loop.  Operational DBA’s must find ways to report issues to development as soon as possible; any electronic tracking measure should be simple to identify who reported an issue, when they reported it, and what was decided to do about.  If the feedback is associated with an emergency change, the DBA should include the code that they changed, so that source control can be edited to include the current production version.  The key takeaway is that EVERY change to an operational system must be reported to the developers.
  2. Strive to provide operational support to development efforts. The best development environment is one that is identical to operations; that’s not always feasible or practical to implement.   DBA’s need to be part of the development process by finding ways to make the dev environment as similar to production as possible; virtualization technology can be a great tool for that, whether it’s virtualizing servers (e.g., VMware, Hyper-V, VirtualBox) or databases (e.g., Delphix).  Short of that, you can just make sure that your developers have a good representation of the current production statistics.


Software development is hard; systems administration is hard.  Working together can make it easier, but you have to focus on practical changes to your software culture to make things happen.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What are T-SQL Median? – Notes from the Field #090

Kathi SQL SERVER   What are T SQL Median?   Notes from the Field #090[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Median. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.

SQL Server has a wealth of useful T-SQL functions that let you do just about anything you need to do. One function that is sorely missing is a MEDIAN function. A median is the middle value in a set of values. For example, 3 is the median of the set 1, 3, 5. If the set is even, the median is calculated or “interpolated” by averaging the two middle values. The median of this set 1, 3, 5, 7 is 4.

You will see median values used frequently even if you do not work with statistics yourself. For example, you could see a news story listing the median household income of a city by neighborhood, the median salary for developers by language, or the median age of a population.

As I mentioned, there is no MEDIAN function in T-SQL. If, however, you are fortunate enough to be running SQL Server 2012 or 2014, there is an easy workaround. In 2012, Microsoft introduced a new function called PERCENTILE_CONT, one of the T-SQL window functions. Given a percent rank and a set of values, PERCENTILE_CONT will return the value ranked at that percent. If there is not an exact value found at the rank, PERCENTILE_CONT will interpolate the answer instead. If you supply 0.5, meaning 50%, PERCENTILE_CONT will return the median value. Here is an example that calculates the median sales by month for each year:

WITH Totals AS (
SELECT YEAR(OrderDate) AS OrderYear,
MONTH(OrderDate) AS OrderMonth,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
SELECT OrderYear, OrderMonth, TotalSales,
OVER(PARTITION BY Totals.OrderYear) AS MedianSales
FROM Totals
ORDER BY OrderYear, TotalSales;

Before we take a look at the results, let’s review the query. The query has a common table expression (CTE) that calculates the sales for each month and year. In the outer query, the PERCENTILE_CONT function is used with an argument of 0.5. PERCENTILE_CONT is a T-SQL window function, but the syntax for it and a similar function, PERCENTILE_DISC, is different than the other window functions. The function name is followed by the WITHIN GROUP clause. This is where the ORDER BY expression goes, not in the OVER clause. The ORDER BY expression is used by the function to line up the values so that the ranks can be compared to the percentile rank value provided as an argument. The OVER clause supports PARTITION BY, and in this case we would like to get the median within each year, so the PARTITION BY expression is OrderYear.

In 2005, there were orders placed between July and December, six months. Six is even, so the function calculates the average of two middle months as shown in Figure 1.

notes 90 1 SQL SERVER   What are T SQL Median?   Notes from the Field #090

Figure 1: The Median Sales for 2005

Every year in the data has an even number of months except for 2008. In 2008, there are seven months with sales, so the function can find an exact median value. Figure 2 shows the 2008 sales:

notes 90 2 SQL SERVER   What are T SQL Median?   Notes from the Field #090

Figure 2: The Median Sales for 2008

You may be wondering what the difference is between PERCENTILE_CONT and PERCENTILE_DISC. PERCENTILE_DISC always returns an exact value found in the set, never a calculation. For calculating the median, PERCENTILE_CONT is the function to use.

notes 82 3 SQL SERVER   What are T SQL Median?   Notes from the Field #090If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Enhancing Reference Data in Master Data Services – Notes from the Field #089

[Note from Pinal]: This is a 89th episode of Notes from the Field series. Master Data Services is one of the most important, but a very little explored feature of SQL Server. If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.

In this episode of the Notes from the Field series database expert Reeves Smith explains why one should add referencing data to Master Data services to our enterprise. Read the experience of Reeves in his own words.

Reeves Smith SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

In a previous post, I talked about adding reference data to Master Data Services and how it can enhance your Enterprise Data Warehouse. Through the Master Data Services add-in for Excel, I added reference data into a Master Data Services model. This model was created with data types that were inferred from the given values that were imported into the model. All of the data types were of basic types like strings or numeric.

In this post, I’m going to update the data model we created in the previous posts and add domain values to the column’s data types. Domain values are a way to restrict a column’s values to a defined list. This also provides a drop down for the column that restricts any free form text.

The steps are:

  • Create two entities that will be used for the domain values
  • Update the Customers entity to use the newly created entity

All of these step will be performed within Excel using the Master Data Services add-in for Excel.

Using the Demo model, created in the last post, we will update the MaritalStatus column with domain values.

Step 1 – Create Additional Entities

We will start with the MaritalStatus column of the Customers entity and create an additional entity called MaritalStatus. This entity will be used for the domain values in the MaritalStatus column of the Customers entity.

  1. Create an Excel worksheet with the reference data for MaritalStatus and provide a header for each of the columns in the data set. In this example we use Code and Name.

n 89 1 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 1 – MaritalStatus Excel worksheet data

  1. Select the data including the headers and click the Create Entity button from the Master Data tab in Excel.
  2. Enter the data for the following items:
    1. My data has headers: Checked
    2. Model: Demo
    3. Version: VERSION_1
    4. New entity name: MaritalStatus
    5. Code: Code
    6. Name (optional): Name

n 89 2 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 2 – Create Entity Dialog

  1. Click OK.

Step 2 – Update the Customers Entity

  1. Connect to the Customers Entity.

If you need help connecting to the Customer entity follow the below steps:

  1. From the Master Data tab click the Connect button and connect to the MDS instance with the Demo model.

n 89 3 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 3 – Connecting to MDS instance

  1. Select the Demo model and double click on the Customers Entity.

n 89 4 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 4 – The Master Data Explorer dialog

  1. Select the any cell in the MaritalStatus column of the Customer Entity.
  2. Click the Attribute Properties button.
  3. Change the Attribute type to Constrained list (Domain-based) and select the MartialStatus entity for Populate the Attribute with values from:

n 89 5 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 5 – Attribute Properties Dialog

Notice that the column has a different look with a constrained list and if a cell is selected, a drop down with the available values is now present. This keeps the users from adding values outside of the list.

n 89 6 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 6 – MaritalStatus as a Constrained List

Creating Additional Entities

Adding domain values to the other columns would require the same process.

Note: In the current dataset Education and Occupation does not have code values. One method to solve this creates an entity where the code and name contain the same values, see Figure 7.

n 89 7 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 7 – Education Entity


In the previous articles on Master Data Services we added reference data to a Master Data Service model. This data can be maintained through the web interface and Excel add-in by the subject matter experts (SME). By creating a robust model the SMEs are better able to create and maintain quality data.

Hope this helps.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using Project Connections in SSIS – Notes from the Field #088

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Using Project Connections in SSIS.

Tim Mitchell SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088Linchpin People are database coaches and wellness experts for a data driven world. In this 88th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to use raw files in SSIS.

In SQL Server Integration Services, connection managers are used as gateways for most any external read and write operation. Connection managers are type- and format-specific, and in the case of relational database connection managers, they are usually specific to vendor (Oracle, DB2, etc.) as well.

In most use cases, the same connection will be used across multiple packages in the same project. In pre-2012 versions of SSIS, each package would have its own connection manager for every connection used in that package. Creating and maintaining all those connection managers could be time-consuming as the number of packages grows. In SQL Server 2012, Microsoft added project connections to SSIS, allowing for the creation of connections that were accessible across all packages in a project. Instead of having to create a copy of each connection manager in every package, developers can now simply create the connection at the project level. Project connections will automatically show up in the connection manager tray for all packages in that project.

n 88 1 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

As shown, any project connection automatically has the designation (project) prepended to the name to clearly indicate that it is a project connection. Those without this designation are package connections, and are only accessible from within that package.

Project connections will also appear in the Solution Explorer window, under the Connection Managers tab.

n 88 2 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

You can create a new project connection by right-clicking on the Connection Managers node shown above, and walking through the steps to build the connection. Similarly, you can edit or delete an existing project-level connection manager from this same window.

You can also promote an existing package connection to a project connection by right-clicking on the package connection and selecting Convert to Project Connection.

n 88 3 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

Coincidentally, you can also convert back to a package connection through a similar process. Right-clicking on a project connection will expose an option to Convert to Package Connection. However, you have to be careful when choosing this option. If you convert a project connection to a package connection, that connection will then be visible only in the package in which you are currently working. If you have used the connection in any other packages in that project, those operations will fail because the connection is no longer visible at the project level. You will get a warning message when you attempt to convert a project connection to a package connection.

n 88 4 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

Finally, if you are using project connections, you can still use dynamic properties such as expressions and parameters. Do be aware that, if you use parameters to configure a project connection, you must use project parameters rather than package parameters. The latter is not accessible beyond the scope of a single package, and therefore would not always be accessible for project connections. Fortunately, the UI for the expression builder limits you to only project parameters when configuring project connections.

n 88 5 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

In conclusion, the project connection in SSIS is an excellent tool for configuring connections at the project level to minimize the extra work required for sharing connections across multiple packages in a project.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Why Haven’t You Disabled CPU Power Saving Yet?! – Notes from the Field #087

[Note from Pinal]: This is an 87th episode of Notes from the Field series. Every week, I personally wait for the notes from the fields from Mike because it contains lessons of life which directly impacts DBA and Developer’s life positively. Mike Walsh is an amazing person to meet and know. I am fortunate to be his friend. Every single time I meet him, I learn something as a human. He is always looking for an opportunity to improve the life of DBAs and Developers.

mikewalsh SQL SERVER   Why Haven’t You Disabled CPU Power Saving Yet?!   Notes from the Field #087Mike normally talks about something that impacts a client, impacts DBAs or a professional development topic. However, in today’s blog post Mike has something very interesting and very helpful to the ever DBA in the world.

Read the entire story in his own words.

Intel CPU Core i7 2600K SQL SERVER   Why Haven’t You Disabled CPU Power Saving Yet?!   Notes from the Field #087Today’s post is going to be a very short public service announcement that I shouldn’t have to make, but part of my job as a consultant at Linchpin People is going to customers and performing WellDBA exams. A WellDBA exam is basically an extensive health check of a SQL Server environment. We are looking for small and simple changes to help performance, reliability or manageability. We are looking for deeper changes that could be made to help in these areas and we are helping clients see what sort of architecture decisions and foundational decisions should be addressed to scale into the future.

We have a free checklist you can use to check your own servers also. It describes some of our most common findings and helps you get on the road to having a healthy and a scalable SQL Server environment. You can download it and review your own environment to make sure you are following best practices.

Power Saving?! Hasn’t the SQL Family Talked About That A Lot?!?

Yes! There are posts everywhere about it and why it isn’t great for SQL Servers. Even still – it is something we find on at least 70% of the WellDBA exams we provide to customers. These are customers of all sorts and sizes with all manner of DBA teams or accidental DBAs at the helm.

So… Yes. A reminder about this setting.

What’s This All About?

Processors today have a Power Saving setting on them. They allow themselves to basically be told to operate at a lower performance profile to cut down on power costs. When the CPU gets really busy – the performance may be ramped up – then it ramps back down.

This sounds great in theory – but SQL Server isn’t normally just pegging the CPU – so the CPU performance may never go all the way up – and when it does, it often will go up just as the demand period is over and the boost isn’t needed.

The default on most Windows Server installations is “Balanced Mode” – this means out of the box you are allowing Windows to under clock your CPU experience and only increase it as/when/if needed.

Another way of saying it – you’ve taken your SQL Server instance – which you’ve licensed by CPU core nowadays – and forced it to run on lower power CPUs then you paid for when buying the server and paid for when licensing SQL Server.

What can you do?

Three simple steps really:

  1. See if you are running in balanced mode – You can go to start and then run and type in powercfg.cpl (or browse to control panel and then power settings). And see if you are running in balanced or High Performance.
  2. If running in balanced – change it to High Performance
  3. Consider making a group policy for your SQL Server servers and make sure they always run in High Performance mode.
  4. Celebrate that you are getting the performance you expected and paid for.

That’s it. I could pile on more. Over at the Linchpin blog, I put up a blog post a year ago that shows the impact of this setting and how to check it. There are more details and words there – but the important truth is found in this article that you are reading.

My plea for you is that you will just check this setting yourself and fix it for yourself. You won’t see queries take 200% less time, but you could get a boost in performance to some degree, sometimes even a 10-30% increase.

If you want to get started with performance analytics and Database Healthcheck of SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – How Do We Find Deadlocks? – Notes from the Field #086

[Note from Pinal]: This is an 86th episode of Notes from the Fields series. Deadlock is very annoying when they happen in our database. In my early career my biggest frustration was I had no idea why the deadlocks are happening and because I did not know the root cause, I was never able to solve them. As time passed by, now I have a better understanding of this situation.

JohnSterrett SQL SERVER   How Do We Find Deadlocks?   Notes from the Field #086

In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very interesting subject of how to find deadlocks in SQL Server. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words.

For one reason or another I tend to find out that DBAs in the field have a hard time detecting and understand how deadlocks occur.  A deadlock occurs when two or more sessions are blocked on each other and one statement has to be terminated to prevent the sessions from being hung on each other.

Let’s take a quick look at locking, blocking and deadlocks with a real-world example that is included in the video below. In the simple example provided in the video there are two sessions each with statements that update the same tables concurrently with explicit transactions that are held until the statements are committed or rollback. For example, session one updates all the records in table one while session two updates all the records in table two. At this point we have locking. Session one has a lock on table one and session two has a lock on table two, but neither session is blocking each other from doing work.  Next while our statements are not committed in either session we have session one update the same records being updated on session two. This is known as blocking because session two is blocking session ones update table two.  Next we have session two attempting to update table one which is currently being locked by session one. Now we have a deadlock because session one is blocked by session two and session two is blocked by session one. To prevent both sessions from being hung one of them becomes the deadlock victim while the other get to continue its work.

How Do We Find Deadlocks?

Before SQL Server 2008 we would have to rely on configuring trace flags, service broker, or a server side traces to capture deadlock information so we could be proactive and take action to prevent the deadlocks from reoccurring. In SQL Server 2008 we were given extended events and could leverage a default system health extended event which is automatically running in the background to identify deadlocks. This is a great addition because we didn’t have to manually configure anything to catch the deadlocks. In SQL Server 2012 this got easier because we could rely on the graphical user interface inside of SQL Server Management Studio to filter our system health extended event to focus straight on our deadlocks.

In the video below you will be able to create deadlocks and be able to proactively monitor them with the native tools provided with SQL Server Management Studio in SQL Server 2012 or 2014.

Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Converting Standard SSMS File Results to Delimited – Notes from the Field #085

[Note from Pinal]: This is an 85th episode of Notes from the Field series. SSMS is my favorite IDE. I work with many different databases and different IDE. However, I get extreme comfort when I am working with SSMS. Though I like SSMS does not mean it is the best tool and it can accomplish all the tasks when I want it to do. Here is one of the task – converting file results to comma delimited result. However, my friend Kevin came in the rescue. He helped me out in this situation.

Let us read how we can achieve this task of converting standard SSMS file results to Delimited resultset.

KevinHazzard SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Now and again, clients send me files to load into the database that they’ve generated using the Results To File feature in SQL Server Management Studio (SSMS). If you’re unfamiliar with this feature, let me explain it. As shown in Figure 1, after right-clicking in a query window in SSMS, you may select the Results to File option from the context menu. With this option enabled, the next time the query is run, a dialog box will appear allowing you to choose the generated file’s location.

notes85 1 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 1 – An easy way to send the results of a query in SSMS to a file.

By default, the file that’s emitted will be column-aligned otherwise known as fixed-width columns. The first row will contain the column names and a second row will have groups of hyphen characters acting as a sort of text-based underscoring. At the end of the file, a count of the emitted rows will also be included. All of these features can be seen in Figure 2, where I’ve hidden the middle rows so you can see the beginning and the end of the file.

notes85 2 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 2 – A standard, column-aligned result file with trailing row count.

When my clients ask me to load these types of files using SQL Server Integration Services (SSIS), there are a few problems to overcome. First of all, fixed-width data is not simple to load. The column specification of the flat-file source has to be updated manually to include the exact width of each column which can be time-consuming and prone to errors. Secondly, the row count at the end of the file isn’t data that can be loaded so it will cause an exception or load incorrectly, forcing us to add special handling logic in the package for that. Lastly, the literal word NULL appearing throughout the file whenever there is a missing value doesn’t really convey the absence of data. Those will also have to be handled with custom code.

To address the first problem, I sometimes ask my clients to re-run their queries setting the output format to comma or tab delimited. In SSMS, this can be done by right-clicking in the query window, selecting the Query Options… item from the context menu and choosing a different Output format in the Results / Text section of the dialog box. The second problem can be solved by adding the SET NOCOUNT ON directive to the start of the query that generated the file. I may also ask my clients to include that change. The third problem, where SSMS emits literal NULL strings whenever values are missing, can also be handled with the use of ISNULL or COALESCE to do NULL detection in the original query. This is yet one more change I must ask my clients to make.

As an experienced consultant, I understand that asking my clients to make so many modifications to their workflow is often not worth their time and trouble. That’s what they hire me for. It’s best to find ways to make whatever my clients provide work for them, especially if it can be automated. To that end, I wrote a C# function called ConvertColumnAlignedFileToDelimitedFile that helps to overcome all of these problems.

public static int ConvertColumnAlignedFileToDelimitedFile(
stringsourcePath, string targetPath, string delimiter = "\t")
intlineNdx = 0;
using (StreamWriter writer = File.CreateText(targetPath))
string header = null;
int[] columnWidths = null;
foreach (string line in File.ReadLines(sourcePath, Encoding.UTF8))
if (lineNdx == 0)
header = line; // save the header for subsequent processing
else if (lineNdx == 1)
columnWidths = ProcessSeparatorRow(line, header,
delimiter, writer);
// stop processing on an empty line
if (line.Length == 0) break;
ProcessDataRow(line, columnWidths, delimiter, writer);
returnlineNdx - 2;

Figure 3 – A function that converts a column-aligned file to a delimited file.

The function takes the name of a source file, a target file and the delimiter that will be inserted between values. There’s no magic here but there is an interesting trick that takes advantage of some metadata lurking in the column-aligned output file. Look at Figure 2 again. That second row containing hyphen characters actually uncovers a key constraint that we otherwise wouldn’t know: the maximum length of each column. Each block of hyphens is separated by a space so if we count the length of each hyphen group, we’ll know how to parse the entire file. I’ve provided a helper function called ProcessSeparatorRow that reads the metadata buried in the hyphen groups, writes out the header row and returns the column widths to the calling function.

private static int[] ProcessSeparatorRow(string line,
string header, string delimiter, StreamWriter writer)
string[] columns = line.Split(' ');
int[] columnWidths = new int[columns.Length];
for (int columnNdx = 0, startPostion = 0;
columnNdx < columnWidths.Length; columnNdx++)
columnWidths[columnNdx] = columns[columnNdx].Length;
int length =
(startPostion + columnWidths[columnNdx] <= header.Length)
? columnWidths[columnNdx]
: header.Length - startPostion;
string name = header.Substring(startPostion, length).Trim();
startPostion += columnWidths[columnNdx] + 1;
writer.Write((columnNdx < columnWidths.Length - 1)
? delimiter : Environment.NewLine);
return columnWidths;

Figure 4 – A function that processes the metadata in the separator row.

For the data rows beyond the header and the separator, when an empty line is encountered, the processing just stops. That will handle the trailing row count problem discussed earlier. Writing out the non-empty rows in the remainder of the file is straightforward with the helper function called ProcessDataRow shown in Figure 5.

private static void ProcessDataRow(string line,
int[] columnWidths, string delimiter, StreamWriter writer)
for (int columnNdx = 0, startPosition = 0;
columnNdx < columnWidths.Length; columnNdx++)
int length =
(startPosition + columnWidths[columnNdx] <= line.Length)
? columnWidths[columnNdx]
: line.Length - startPosition;
string value = line.Substring(startPosition, length).Trim();
if (value.Equals("NULL", StringComparison.InvariantCultureIgnoreCase))
value = String.Empty;
startPosition += columnWidths[columnNdx] + 1;
writer.Write((columnNdx < columnWidths.Length - 1)
? delimiter : Environment.NewLine);

Figure 5 – Process a normal row of data by writing it out using the specified delimiter.

Running the conversion function with the output.rpt sample file shown in Figure 2 produces the tab-delimited file shown in Figure 6. The tab characters appear as orange-colored arrows in the output. Notice also that wherever the word NULL appeared in the original file, it has been converted to an empty string in the output. A couple of lines of C# code in the ProcessDataRow function that handled that quite nicely.

notes85 3 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 6 – The converted file with tab delimiters and detected NULLs.

In closing, let me say that while the C# code shown in this article is somewhat stand-alone, it can easily be put into a Script Task or a Script Component in SSIS. I often use code like this within SSIS to do file preparation or validation before loading the data into staging tables. However, having a command-line utility to convert SSMS results files into delimited files with NULL detection is also quite handy. I’ll even admit that I’ve invoked such a utility as a job step in SQL Agent from time to time. Remember: the job of a good consultant is to make the difficult seem both easy and repeatable. Hopefully, you’ll find this code useful in a variety of situations, too.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)