[Note from Pinal]: This is an 85th episode of Notes from the Field series. SSMS is my favorite IDE. I work with many different databases and different IDE. However, I get extreme comfort when I am working with SSMS. Though I like SSMS does not mean it is the best tool and it can accomplish all the tasks when I want it to do. Here is one of the task – converting file results to comma delimited result. However, my friend Kevin came in the rescue. He helped me out in this situation.
Let us read how we can achieve this task of converting standard SSMS file results to Delimited resultset.
Now and again, clients send me files to load into the database that they’ve generated using the Results To File feature in SQL Server Management Studio (SSMS). If you’re unfamiliar with this feature, let me explain it. As shown in Figure 1, after right-clicking in a query window in SSMS, you may select the Results to File option from the context menu. With this option enabled, the next time the query is run, a dialog box will appear allowing you to choose the generated file’s location.
Figure 1 – An easy way to send the results of a query in SSMS to a file.
By default, the file that’s emitted will be column-aligned otherwise known as fixed-width columns. The first row will contain the column names and a second row will have groups of hyphen characters acting as a sort of text-based underscoring. At the end of the file, a count of the emitted rows will also be included. All of these features can be seen in Figure 2, where I’ve hidden the middle rows so you can see the beginning and the end of the file.
Figure 2 – A standard, column-aligned result file with trailing row count.
When my clients ask me to load these types of files using SQL Server Integration Services (SSIS), there are a few problems to overcome. First of all, fixed-width data is not simple to load. The column specification of the flat-file source has to be updated manually to include the exact width of each column which can be time-consuming and prone to errors. Secondly, the row count at the end of the file isn’t data that can be loaded so it will cause an exception or load incorrectly, forcing us to add special handling logic in the package for that. Lastly, the literal word NULL appearing throughout the file whenever there is a missing value doesn’t really convey the absence of data. Those will also have to be handled with custom code.
To address the first problem, I sometimes ask my clients to re-run their queries setting the output format to comma or tab delimited. In SSMS, this can be done by right-clicking in the query window, selecting the Query Options… item from the context menu and choosing a different Output format in the Results / Text section of the dialog box. The second problem can be solved by adding the SET NOCOUNT ON directive to the start of the query that generated the file. I may also ask my clients to include that change. The third problem, where SSMS emits literal NULL strings whenever values are missing, can also be handled with the use of ISNULL or COALESCE to do NULL detection in the original query. This is yet one more change I must ask my clients to make.
As an experienced consultant, I understand that asking my clients to make so many modifications to their workflow is often not worth their time and trouble. That’s what they hire me for. It’s best to find ways to make whatever my clients provide work for them, especially if it can be automated. To that end, I wrote a C# function called ConvertColumnAlignedFileToDelimitedFile that helps to overcome all of these problems.
public static int ConvertColumnAlignedFileToDelimitedFile( string sourcePath, string targetPath, string delimiter = "\t") { int lineNdx = 0; using (StreamWriter writer = File.CreateText(targetPath)) { string header = null; int[] columnWidths = null; foreach (string line in File.ReadLines(sourcePath, Encoding.UTF8)) { if (lineNdx == 0) header = line; // save the header for subsequent processing else if (lineNdx == 1) columnWidths = ProcessSeparatorRow(line, header, delimiter, writer); else { // stop processing on an empty line if (line.Length == 0) break; ProcessDataRow(line, columnWidths, delimiter, writer); } lineNdx++; } } return lineNdx - 2; }
Figure 3 – A function that converts a column-aligned file to a delimited file.
The function takes the name of a source file, a target file and the delimiter that will be inserted between values. There’s no magic here but there is an interesting trick that takes advantage of some metadata lurking in the column-aligned output file. Look at Figure 2 again. That second row containing hyphen characters actually uncovers a key constraint that we otherwise wouldn’t know: the maximum length of each column. Each block of hyphens is separated by a space so if we count the length of each hyphen group, we’ll know how to parse the entire file. I’ve provided a helper function called ProcessSeparatorRow that reads the metadata buried in the hyphen groups, writes out the header row and returns the column widths to the calling function.
private static int[] ProcessSeparatorRow(string line, string header, string delimiter, StreamWriter writer) { string[] columns = line.Split(' '); int[] columnWidths = new int[columns.Length]; for (int columnNdx = 0, startPostion = 0; columnNdx < columnWidths.Length; columnNdx++) { columnWidths[columnNdx] = columns[columnNdx].Length; int length = (startPostion + columnWidths[columnNdx] <= header.Length) ? columnWidths[columnNdx] : header.Length - startPostion; string name = header.Substring(startPostion, length).Trim(); startPostion += columnWidths[columnNdx] + 1; writer.Write(name); writer.Write((columnNdx < columnWidths.Length - 1) ? delimiter : Environment.NewLine); } return columnWidths; }
Figure 4 – A function that processes the metadata in the separator row.
For the data rows beyond the header and the separator, when an empty line is encountered, the processing just stops. That will handle the trailing row count problem discussed earlier. Writing out the non-empty rows in the remainder of the file is straightforward with the helper function called ProcessDataRow shown in Figure 5.
private static void ProcessDataRow(string line, int[] columnWidths, string delimiter, StreamWriter writer) { for (int columnNdx = 0, startPosition = 0; columnNdx < columnWidths.Length; columnNdx++) { int length = (startPosition + columnWidths[columnNdx] <= line.Length) ? columnWidths[columnNdx] : line.Length - startPosition; string value = line.Substring(startPosition, length).Trim(); if (value.Equals("NULL", StringComparison.InvariantCultureIgnoreCase)) value = String.Empty; startPosition += columnWidths[columnNdx] + 1; writer.Write(value); writer.Write((columnNdx < columnWidths.Length - 1) ? delimiter : Environment.NewLine); } }
Figure 5 – Process a normal row of data by writing it out using the specified delimiter.
Running the conversion function with the output.rpt sample file shown in Figure 2 produces the tab-delimited file shown in Figure 6. The tab characters appear as orange-colored arrows in the output. Notice also that wherever the word NULL appeared in the original file, it has been converted to an empty string in the output. A couple of lines of C# code in the ProcessDataRow function that handled that quite nicely.
Figure 6 – The converted file with tab delimiters and detected NULLs.
In closing, let me say that while the C# code shown in this article is somewhat stand-alone, it can easily be put into a Script Task or a Script Component in SSIS. I often use code like this within SSIS to do file preparation or validation before loading the data into staging tables. However, having a command-line utility to convert SSMS results files into delimited files with NULL detection is also quite handy. I’ll even admit that I’ve invoked such a utility as a job step in SQL Agent from time to time. Remember: the job of a good consultant is to make the difficult seem both easy and repeatable. Hopefully, you’ll find this code useful in a variety of situations, too.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)