SQL SERVER – DevOps for the DBA – Notes from the Field #091

[Note from Pinal]: This is a 91st episode of Notes from the Field series. Divas is a new buzz word with lots of substance behind it. Not everyone understands what it means and not everyone actually related it to their daily job. Lots of people think that it is alright to not explore and learn about this as after a few days or months it will be meaningless. However, the reality seems very different about DevOps as it is one of the concepts which is just rediscovered. I recently asked my friend Stuart about what it means by DevOps for DBA and he had a informative answer to this question.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about DevOps for the DBA. Let us read the story in his own words.


 SQL SERVER   DevOps for the DBA   Notes from the Field #091

Not every problem that a DBA encounters is a technical one; sometimes, it’s a procedural one.  I’ve recently become enamored with the philosophy of DevOps, and have tried to find ways to transform those concepts and ideas into practical, actionable advice for my clients.  It’s still a work in progress, but I’ll try to summarize my thoughts today (and, they could change tomorrow).

A Brief Summary of DevOps

DevOps is a collaborative approach to software development and delivery that is influenced by Agile development methods and Lean IT principles; the idea is todevops SQL SERVER   DevOps for the DBA   Notes from the Field #091 decrease time-to-market by looking at (primarily enterprise) software development as a life cycle, and finding ways to automate and increase efficiency.  The general framework is often described as consisting of the Three Ways (a phrase fully explored in The Phoenix Project):

  1. The First Way: Systems Thinking. A complete software development lifecycle encompasses all the functionality involved in both development and operations (including QA and acceptance testing); the entire system is intertwined in order for the business to bring value to customers.  Development responsibility isn’t complete when software passes QA, and Operational responsibilities don’t simply begin during the install.  Both groups should be involved to some degree throughout the process.
  2. The Second Way: Amplify Feedback Loops. No system is complete without feedback, and DevOps approaches focus on increasing the immediacy of that feedback.  Bugs need to be simple to report and acknowledge; suggestions should be taken seriously.
  3. The Third Way: Culture of Continual Experimentation and Learning. Success is found when you take a risk, and DevOps is about creating an environment where risk is encouraged and rewarded.  This is often problematic for Operations, who are usually responsible for maintaining the stability of an environment, but that’s a key to this principle: Ops MUST be involved in finding a way for Dev to take risks while continuing to ensure the stability of the environment.

From Philosophy to Practice

All that’s well and good, but what does it mean for the average DBA? How do we go from thinking about processes, procedures, and philosophy to real meaningful change?  Here’s three specific practices that operational DBA’s should implement as soon as possible:

  1. Respect the role of development. Consider the following scenario: some performance issue has occurred, in operations, and the DBA has stepped in to make a fix (say an index change, or something more drastic like a stored proc rewrites).  The implemented fix is now different than the source of truth for development (their source control), and that’s bad.  It hampers the development effort, but sometimes emergency interventions are necessary to keep the business afloat; what do you do?First and foremost, DBA’s need to limit the changes that they make to strictly emergency fixes; emergency fixes need to be reported to develop as soon as possible.   Not all performance changes need to be emergency fixes; those should be reported to dev, but unless the business is grinding to a halt, the DBA’s should not change the code in operations.  This practice illustrates both the First and the Second Way above.
  1. Request simple methods to supply feedback. If developers are responsible for writing code, they have to be receptive to feedback from operations.  I’ve worked in shops where only the developers and QA had access to create bugs; this kills the feedback loop.  Operational DBA’s must find ways to report issues to development as soon as possible; any electronic tracking measure should be simple to identify who reported an issue, when they reported it, and what was decided to do about.  If the feedback is associated with an emergency change, the DBA should include the code that they changed, so that source control can be edited to include the current production version.  The key takeaway is that EVERY change to an operational system must be reported to the developers.
  2. Strive to provide operational support to development efforts. The best development environment is one that is identical to operations; that’s not always feasible or practical to implement.   DBA’s need to be part of the development process by finding ways to make the dev environment as similar to production as possible; virtualization technology can be a great tool for that, whether it’s virtualizing servers (e.g., VMware, Hyper-V, VirtualBox) or databases (e.g., Delphix).  Short of that, you can just make sure that your developers have a good representation of the current production statistics.

Summary

Software development is hard; systems administration is hard.  Working together can make it easier, but you have to focus on practical changes to your software culture to make things happen.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What are T-SQL Median? – Notes from the Field #090

Kathi SQL SERVER   What are T SQL Median?   Notes from the Field #090[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Median. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


SQL Server has a wealth of useful T-SQL functions that let you do just about anything you need to do. One function that is sorely missing is a MEDIAN function. A median is the middle value in a set of values. For example, 3 is the median of the set 1, 3, 5. If the set is even, the median is calculated or “interpolated” by averaging the two middle values. The median of this set 1, 3, 5, 7 is 4.

You will see median values used frequently even if you do not work with statistics yourself. For example, you could see a news story listing the median household income of a city by neighborhood, the median salary for developers by language, or the median age of a population.

As I mentioned, there is no MEDIAN function in T-SQL. If, however, you are fortunate enough to be running SQL Server 2012 or 2014, there is an easy workaround. In 2012, Microsoft introduced a new function called PERCENTILE_CONT, one of the T-SQL window functions. Given a percent rank and a set of values, PERCENTILE_CONT will return the value ranked at that percent. If there is not an exact value found at the rank, PERCENTILE_CONT will interpolate the answer instead. If you supply 0.5, meaning 50%, PERCENTILE_CONT will return the median value. Here is an example that calculates the median sales by month for each year:

WITH Totals AS (
SELECT YEAR(OrderDate) AS OrderYear,
MONTH(OrderDate) AS OrderMonth,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate),
MONTH(OrderDate)
)
SELECT OrderYear, OrderMonth, TotalSales,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY TotalSales)
OVER(PARTITION BY Totals.OrderYear) AS MedianSales
FROM Totals
ORDER BY OrderYear, TotalSales;

Before we take a look at the results, let’s review the query. The query has a common table expression (CTE) that calculates the sales for each month and year. In the outer query, the PERCENTILE_CONT function is used with an argument of 0.5. PERCENTILE_CONT is a T-SQL window function, but the syntax for it and a similar function, PERCENTILE_DISC, is different than the other window functions. The function name is followed by the WITHIN GROUP clause. This is where the ORDER BY expression goes, not in the OVER clause. The ORDER BY expression is used by the function to line up the values so that the ranks can be compared to the percentile rank value provided as an argument. The OVER clause supports PARTITION BY, and in this case we would like to get the median within each year, so the PARTITION BY expression is OrderYear.

In 2005, there were orders placed between July and December, six months. Six is even, so the function calculates the average of two middle months as shown in Figure 1.

notes 90 1 SQL SERVER   What are T SQL Median?   Notes from the Field #090

Figure 1: The Median Sales for 2005

Every year in the data has an even number of months except for 2008. In 2008, there are seven months with sales, so the function can find an exact median value. Figure 2 shows the 2008 sales:

notes 90 2 SQL SERVER   What are T SQL Median?   Notes from the Field #090

Figure 2: The Median Sales for 2008

You may be wondering what the difference is between PERCENTILE_CONT and PERCENTILE_DISC. PERCENTILE_DISC always returns an exact value found in the set, never a calculation. For calculating the median, PERCENTILE_CONT is the function to use.

notes 82 3 SQL SERVER   What are T SQL Median?   Notes from the Field #090If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Enhancing Reference Data in Master Data Services – Notes from the Field #089

[Note from Pinal]: This is a 89th episode of Notes from the Field series. Master Data Services is one of the most important, but a very little explored feature of SQL Server. If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.

In this episode of the Notes from the Field series database expert Reeves Smith explains why one should add referencing data to Master Data services to our enterprise. Read the experience of Reeves in his own words.


Reeves Smith SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

In a previous post, I talked about adding reference data to Master Data Services and how it can enhance your Enterprise Data Warehouse. Through the Master Data Services add-in for Excel, I added reference data into a Master Data Services model. This model was created with data types that were inferred from the given values that were imported into the model. All of the data types were of basic types like strings or numeric.

In this post, I’m going to update the data model we created in the previous posts and add domain values to the column’s data types. Domain values are a way to restrict a column’s values to a defined list. This also provides a drop down for the column that restricts any free form text.

The steps are:

  • Create two entities that will be used for the domain values
  • Update the Customers entity to use the newly created entity

All of these step will be performed within Excel using the Master Data Services add-in for Excel.

Using the Demo model, created in the last post, we will update the MaritalStatus column with domain values.

Step 1 – Create Additional Entities

We will start with the MaritalStatus column of the Customers entity and create an additional entity called MaritalStatus. This entity will be used for the domain values in the MaritalStatus column of the Customers entity.

  1. Create an Excel worksheet with the reference data for MaritalStatus and provide a header for each of the columns in the data set. In this example we use Code and Name.

n 89 1 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 1 – MaritalStatus Excel worksheet data

  1. Select the data including the headers and click the Create Entity button from the Master Data tab in Excel.
  2. Enter the data for the following items:
    1. My data has headers: Checked
    2. Model: Demo
    3. Version: VERSION_1
    4. New entity name: MaritalStatus
    5. Code: Code
    6. Name (optional): Name

n 89 2 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 2 – Create Entity Dialog

  1. Click OK.

Step 2 – Update the Customers Entity

  1. Connect to the Customers Entity.

If you need help connecting to the Customer entity follow the below steps:

  1. From the Master Data tab click the Connect button and connect to the MDS instance with the Demo model.

n 89 3 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 3 – Connecting to MDS instance

  1. Select the Demo model and double click on the Customers Entity.

n 89 4 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 4 – The Master Data Explorer dialog

  1. Select the any cell in the MaritalStatus column of the Customer Entity.
  2. Click the Attribute Properties button.
  3. Change the Attribute type to Constrained list (Domain-based) and select the MartialStatus entity for Populate the Attribute with values from:

n 89 5 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 5 – Attribute Properties Dialog

Notice that the column has a different look with a constrained list and if a cell is selected, a drop down with the available values is now present. This keeps the users from adding values outside of the list.

n 89 6 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 6 – MaritalStatus as a Constrained List

Creating Additional Entities

Adding domain values to the other columns would require the same process.

Note: In the current dataset Education and Occupation does not have code values. One method to solve this creates an entity where the code and name contain the same values, see Figure 7.

n 89 7 SQL SERVER   Enhancing Reference Data in Master Data Services   Notes from the Field #089

Figure 7 – Education Entity

Summary

In the previous articles on Master Data Services we added reference data to a Master Data Service model. This data can be maintained through the web interface and Excel add-in by the subject matter experts (SME). By creating a robust model the SMEs are better able to create and maintain quality data.

Hope this helps.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using Project Connections in SSIS – Notes from the Field #088

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Using Project Connections in SSIS.

Tim Mitchell SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088Linchpin People are database coaches and wellness experts for a data driven world. In this 88th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to use raw files in SSIS.


In SQL Server Integration Services, connection managers are used as gateways for most any external read and write operation. Connection managers are type- and format-specific, and in the case of relational database connection managers, they are usually specific to vendor (Oracle, DB2, etc.) as well.

In most use cases, the same connection will be used across multiple packages in the same project. In pre-2012 versions of SSIS, each package would have its own connection manager for every connection used in that package. Creating and maintaining all those connection managers could be time-consuming as the number of packages grows. In SQL Server 2012, Microsoft added project connections to SSIS, allowing for the creation of connections that were accessible across all packages in a project. Instead of having to create a copy of each connection manager in every package, developers can now simply create the connection at the project level. Project connections will automatically show up in the connection manager tray for all packages in that project.

n 88 1 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

As shown, any project connection automatically has the designation (project) prepended to the name to clearly indicate that it is a project connection. Those without this designation are package connections, and are only accessible from within that package.

Project connections will also appear in the Solution Explorer window, under the Connection Managers tab.

n 88 2 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

You can create a new project connection by right-clicking on the Connection Managers node shown above, and walking through the steps to build the connection. Similarly, you can edit or delete an existing project-level connection manager from this same window.

You can also promote an existing package connection to a project connection by right-clicking on the package connection and selecting Convert to Project Connection.

n 88 3 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

Coincidentally, you can also convert back to a package connection through a similar process. Right-clicking on a project connection will expose an option to Convert to Package Connection. However, you have to be careful when choosing this option. If you convert a project connection to a package connection, that connection will then be visible only in the package in which you are currently working. If you have used the connection in any other packages in that project, those operations will fail because the connection is no longer visible at the project level. You will get a warning message when you attempt to convert a project connection to a package connection.

n 88 4 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

Finally, if you are using project connections, you can still use dynamic properties such as expressions and parameters. Do be aware that, if you use parameters to configure a project connection, you must use project parameters rather than package parameters. The latter is not accessible beyond the scope of a single package, and therefore would not always be accessible for project connections. Fortunately, the UI for the expression builder limits you to only project parameters when configuring project connections.

n 88 5 SQL SERVER   Using Project Connections in SSIS   Notes from the Field #088

In conclusion, the project connection in SSIS is an excellent tool for configuring connections at the project level to minimize the extra work required for sharing connections across multiple packages in a project.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Why Haven’t You Disabled CPU Power Saving Yet?! – Notes from the Field #087

[Note from Pinal]: This is an 87th episode of Notes from the Field series. Every week, I personally wait for the notes from the fields from Mike because it contains lessons of life which directly impacts DBA and Developer’s life positively. Mike Walsh is an amazing person to meet and know. I am fortunate to be his friend. Every single time I meet him, I learn something as a human. He is always looking for an opportunity to improve the life of DBAs and Developers.

mikewalsh SQL SERVER   Why Haven’t You Disabled CPU Power Saving Yet?!   Notes from the Field #087Mike normally talks about something that impacts a client, impacts DBAs or a professional development topic. However, in today’s blog post Mike has something very interesting and very helpful to the ever DBA in the world.

Read the entire story in his own words.


Intel CPU Core i7 2600K SQL SERVER   Why Haven’t You Disabled CPU Power Saving Yet?!   Notes from the Field #087Today’s post is going to be a very short public service announcement that I shouldn’t have to make, but part of my job as a consultant at Linchpin People is going to customers and performing WellDBA exams. A WellDBA exam is basically an extensive health check of a SQL Server environment. We are looking for small and simple changes to help performance, reliability or manageability. We are looking for deeper changes that could be made to help in these areas and we are helping clients see what sort of architecture decisions and foundational decisions should be addressed to scale into the future.

We have a free checklist you can use to check your own servers also. It describes some of our most common findings and helps you get on the road to having a healthy and a scalable SQL Server environment. You can download it and review your own environment to make sure you are following best practices.

Power Saving?! Hasn’t the SQL Family Talked About That A Lot?!?

Yes! There are posts everywhere about it and why it isn’t great for SQL Servers. Even still – it is something we find on at least 70% of the WellDBA exams we provide to customers. These are customers of all sorts and sizes with all manner of DBA teams or accidental DBAs at the helm.

So… Yes. A reminder about this setting.

What’s This All About?

Processors today have a Power Saving setting on them. They allow themselves to basically be told to operate at a lower performance profile to cut down on power costs. When the CPU gets really busy – the performance may be ramped up – then it ramps back down.

This sounds great in theory – but SQL Server isn’t normally just pegging the CPU – so the CPU performance may never go all the way up – and when it does, it often will go up just as the demand period is over and the boost isn’t needed.

The default on most Windows Server installations is “Balanced Mode” – this means out of the box you are allowing Windows to under clock your CPU experience and only increase it as/when/if needed.

Another way of saying it – you’ve taken your SQL Server instance – which you’ve licensed by CPU core nowadays – and forced it to run on lower power CPUs then you paid for when buying the server and paid for when licensing SQL Server.

What can you do?

Three simple steps really:

  1. See if you are running in balanced mode – You can go to start and then run and type in powercfg.cpl (or browse to control panel and then power settings). And see if you are running in balanced or High Performance.
  2. If running in balanced – change it to High Performance
  3. Consider making a group policy for your SQL Server servers and make sure they always run in High Performance mode.
  4. Celebrate that you are getting the performance you expected and paid for.

That’s it. I could pile on more. Over at the Linchpin blog, I put up a blog post a year ago that shows the impact of this setting and how to check it. There are more details and words there – but the important truth is found in this article that you are reading.

My plea for you is that you will just check this setting yourself and fix it for yourself. You won’t see queries take 200% less time, but you could get a boost in performance to some degree, sometimes even a 10-30% increase.

If you want to get started with performance analytics and Database Healthcheck of SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – How Do We Find Deadlocks? – Notes from the Field #086

[Note from Pinal]: This is an 86th episode of Notes from the Fields series. Deadlock is very annoying when they happen in our database. In my early career my biggest frustration was I had no idea why the deadlocks are happening and because I did not know the root cause, I was never able to solve them. As time passed by, now I have a better understanding of this situation.

JohnSterrett SQL SERVER   How Do We Find Deadlocks?   Notes from the Field #086

In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very interesting subject of how to find deadlocks in SQL Server. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words.


For one reason or another I tend to find out that DBAs in the field have a hard time detecting and understand how deadlocks occur.  A deadlock occurs when two or more sessions are blocked on each other and one statement has to be terminated to prevent the sessions from being hung on each other.

Let’s take a quick look at locking, blocking and deadlocks with a real-world example that is included in the video below. In the simple example provided in the video there are two sessions each with statements that update the same tables concurrently with explicit transactions that are held until the statements are committed or rollback. For example, session one updates all the records in table one while session two updates all the records in table two. At this point we have locking. Session one has a lock on table one and session two has a lock on table two, but neither session is blocking each other from doing work.  Next while our statements are not committed in either session we have session one update the same records being updated on session two. This is known as blocking because session two is blocking session ones update table two.  Next we have session two attempting to update table one which is currently being locked by session one. Now we have a deadlock because session one is blocked by session two and session two is blocked by session one. To prevent both sessions from being hung one of them becomes the deadlock victim while the other get to continue its work.

How Do We Find Deadlocks?

Before SQL Server 2008 we would have to rely on configuring trace flags, service broker, or a server side traces to capture deadlock information so we could be proactive and take action to prevent the deadlocks from reoccurring. In SQL Server 2008 we were given extended events and could leverage a default system health extended event which is automatically running in the background to identify deadlocks. This is a great addition because we didn’t have to manually configure anything to catch the deadlocks. In SQL Server 2012 this got easier because we could rely on the graphical user interface inside of SQL Server Management Studio to filter our system health extended event to focus straight on our deadlocks.

In the video below you will be able to create deadlocks and be able to proactively monitor them with the native tools provided with SQL Server Management Studio in SQL Server 2012 or 2014.

Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Converting Standard SSMS File Results to Delimited – Notes from the Field #085

[Note from Pinal]: This is an 85th episode of Notes from the Field series. SSMS is my favorite IDE. I work with many different databases and different IDE. However, I get extreme comfort when I am working with SSMS. Though I like SSMS does not mean it is the best tool and it can accomplish all the tasks when I want it to do. Here is one of the task – converting file results to comma delimited result. However, my friend Kevin came in the rescue. He helped me out in this situation.

Let us read how we can achieve this task of converting standard SSMS file results to Delimited resultset.


KevinHazzard SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Now and again, clients send me files to load into the database that they’ve generated using the Results To File feature in SQL Server Management Studio (SSMS). If you’re unfamiliar with this feature, let me explain it. As shown in Figure 1, after right-clicking in a query window in SSMS, you may select the Results to File option from the context menu. With this option enabled, the next time the query is run, a dialog box will appear allowing you to choose the generated file’s location.

notes85 1 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 1 – An easy way to send the results of a query in SSMS to a file.

By default, the file that’s emitted will be column-aligned otherwise known as fixed-width columns. The first row will contain the column names and a second row will have groups of hyphen characters acting as a sort of text-based underscoring. At the end of the file, a count of the emitted rows will also be included. All of these features can be seen in Figure 2, where I’ve hidden the middle rows so you can see the beginning and the end of the file.

notes85 2 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 2 – A standard, column-aligned result file with trailing row count.

When my clients ask me to load these types of files using SQL Server Integration Services (SSIS), there are a few problems to overcome. First of all, fixed-width data is not simple to load. The column specification of the flat-file source has to be updated manually to include the exact width of each column which can be time-consuming and prone to errors. Secondly, the row count at the end of the file isn’t data that can be loaded so it will cause an exception or load incorrectly, forcing us to add special handling logic in the package for that. Lastly, the literal word NULL appearing throughout the file whenever there is a missing value doesn’t really convey the absence of data. Those will also have to be handled with custom code.

To address the first problem, I sometimes ask my clients to re-run their queries setting the output format to comma or tab delimited. In SSMS, this can be done by right-clicking in the query window, selecting the Query Options… item from the context menu and choosing a different Output format in the Results / Text section of the dialog box. The second problem can be solved by adding the SET NOCOUNT ON directive to the start of the query that generated the file. I may also ask my clients to include that change. The third problem, where SSMS emits literal NULL strings whenever values are missing, can also be handled with the use of ISNULL or COALESCE to do NULL detection in the original query. This is yet one more change I must ask my clients to make.

As an experienced consultant, I understand that asking my clients to make so many modifications to their workflow is often not worth their time and trouble. That’s what they hire me for. It’s best to find ways to make whatever my clients provide work for them, especially if it can be automated. To that end, I wrote a C# function called ConvertColumnAlignedFileToDelimitedFile that helps to overcome all of these problems.


public static int ConvertColumnAlignedFileToDelimitedFile(
stringsourcePath, string targetPath, string delimiter = "\t")
{
intlineNdx = 0;
using (StreamWriter writer = File.CreateText(targetPath))
{
string header = null;
int[] columnWidths = null;
foreach (string line in File.ReadLines(sourcePath, Encoding.UTF8))
{
if (lineNdx == 0)
header = line; // save the header for subsequent processing
else if (lineNdx == 1)
columnWidths = ProcessSeparatorRow(line, header,
delimiter, writer);
else
{
// stop processing on an empty line
if (line.Length == 0) break;
ProcessDataRow(line, columnWidths, delimiter, writer);
}
lineNdx++;
}
}
returnlineNdx - 2;
}

Figure 3 – A function that converts a column-aligned file to a delimited file.

The function takes the name of a source file, a target file and the delimiter that will be inserted between values. There’s no magic here but there is an interesting trick that takes advantage of some metadata lurking in the column-aligned output file. Look at Figure 2 again. That second row containing hyphen characters actually uncovers a key constraint that we otherwise wouldn’t know: the maximum length of each column. Each block of hyphens is separated by a space so if we count the length of each hyphen group, we’ll know how to parse the entire file. I’ve provided a helper function called ProcessSeparatorRow that reads the metadata buried in the hyphen groups, writes out the header row and returns the column widths to the calling function.


</div>
private static int[] ProcessSeparatorRow(string line,
string header, string delimiter, StreamWriter writer)
{
string[] columns = line.Split(' ');
int[] columnWidths = new int[columns.Length];
for (int columnNdx = 0, startPostion = 0;
columnNdx < columnWidths.Length; columnNdx++)
{
columnWidths[columnNdx] = columns[columnNdx].Length;
int length =
(startPostion + columnWidths[columnNdx] <= header.Length)
? columnWidths[columnNdx]
: header.Length - startPostion;
string name = header.Substring(startPostion, length).Trim();
startPostion += columnWidths[columnNdx] + 1;
writer.Write(name);
writer.Write((columnNdx < columnWidths.Length - 1)
? delimiter : Environment.NewLine);
}
return columnWidths;
}

Figure 4 – A function that processes the metadata in the separator row.

For the data rows beyond the header and the separator, when an empty line is encountered, the processing just stops. That will handle the trailing row count problem discussed earlier. Writing out the non-empty rows in the remainder of the file is straightforward with the helper function called ProcessDataRow shown in Figure 5.


private static void ProcessDataRow(string line,
int[] columnWidths, string delimiter, StreamWriter writer)
{
for (int columnNdx = 0, startPosition = 0;
columnNdx < columnWidths.Length; columnNdx++)
{
int length =
(startPosition + columnWidths[columnNdx] <= line.Length)
? columnWidths[columnNdx]
: line.Length - startPosition;
string value = line.Substring(startPosition, length).Trim();
if (value.Equals("NULL", StringComparison.InvariantCultureIgnoreCase))
value = String.Empty;
startPosition += columnWidths[columnNdx] + 1;
writer.Write(value);
writer.Write((columnNdx < columnWidths.Length - 1)
? delimiter : Environment.NewLine);
}
}

Figure 5 – Process a normal row of data by writing it out using the specified delimiter.

Running the conversion function with the output.rpt sample file shown in Figure 2 produces the tab-delimited file shown in Figure 6. The tab characters appear as orange-colored arrows in the output. Notice also that wherever the word NULL appeared in the original file, it has been converted to an empty string in the output. A couple of lines of C# code in the ProcessDataRow function that handled that quite nicely.

notes85 3 SQL SERVER   Converting Standard SSMS File Results to Delimited   Notes from the Field #085

Figure 6 – The converted file with tab delimiters and detected NULLs.

In closing, let me say that while the C# code shown in this article is somewhat stand-alone, it can easily be put into a Script Task or a Script Component in SSIS. I often use code like this within SSIS to do file preparation or validation before loading the data into staging tables. However, having a command-line utility to convert SSMS results files into delimited files with NULL detection is also quite handy. I’ll even admit that I’ve invoked such a utility as a job step in SQL Agent from time to time. Remember: the job of a good consultant is to make the difficult seem both easy and repeatable. Hopefully, you’ll find this code useful in a variety of situations, too.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the Execute Process Task – Notes from the Field #084

[Note from Pinal]: This is a new episode of Notes from the Field series. Every time I give an introductory note, however, this time there is no need of intro note. This note is from Andy and as we all know he is amazing person when we have to understand the fundamentals. He has written this blog post with such an interesting way that you must read it to understand the very basic of the file system task.

andyleonard SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084


Many data integration scenarios involve executing some other process, whether starting a custom application or performing an operating system operation.

Remember: SSIS is a software development platform. With “SQL Server” included in the name, it is easy for people to confuse SSIS as a database tool or accessory, but Control Flow Tasks put that confusion to rest.

SSIS provides several Control Flow tasks. Here is a list that provides a good approximation of which tasks I use most, from most-used to least-used:

In this article I provide an example of configuring the SSIS Execute Process Task, shown in Figure 1:

notes 84 1 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 1: SSIS Execute Process Task

As with the File System Task, the Execute Process Task provides yet another way to implement an SSIS Design Pattern for source file archival via file compression. When you first open the Execute Process Task Editor, you will note several properties in the property grid, as shown in Figure 2:

notes 84 2 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 2: SSIS Execute Process Task Properties

An important property is the Executable property which holds the path to the application or process you wish to start with the Execute Process Task. In this case, I am going to start the 7-Zip command line executable to zip a data file. 7-zip is a free file compression utility, and the command line utility is pretty cool. On my system, the 7-Zip command line utility is located at “C:\Program Files\7-Zip\7z.exe” so I configure the Executable property of the Execute Process Task as shown in Figure 3:

notes 84 3 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 3: The Execute Process Task Editor with the Executable Property Configured

The Arguments property allows me to specify command line arguments to the executable. For 7-Zip, the “a” argument informs the application that I want to add files to a compressed file. The text following the “a” argument specifies the name of the compressed file. The argument that follows the name of the compressed file configures the file (or files) to add. My arguments property reads:

a E:\Projects\7Zip\data\archive1.7z E:\Projects\7Zip\data\test.csv

These arguments tell the 7z.exe executable that I want to add the E:\Projects\7Zip\data\test.csv file to a compressed file named E:\Projects\7Zip\data\archive1.7z, as shown in Figure 4:

notes 84 4 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 4: The Execute Process Task Editor with the Arguments Property Configured

I can configure other Execute Process Task properties. For example, I choose to hide the command line window for 7-Zip when it executes. To do so, I set the WindowStyle property to Hidden, as shown in Figure 5:

notes 84 5 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 5: The Execute Process Task Editor’s WindowStyle Property

The SSIS Execute Process Task is now configured to compress a file. Let’s test it! Click the OK button to close the Execute Process Task Editor. Press the F5 key or select SSIS->Start Debugging to test your work. My result is shown in Figure 6:

notes 84 6 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 6: Successful Test Execution of the SSIS Execute Process Task

Viewing the source and destination directories, we see the file was successfully moved – shown in Figure 7:

notes 84 7 SQL SERVER   The Basics of the Execute Process Task   Notes from the Field #084
Figure 7: The File, Compressed!

As I stated earlier, the SSIS Execute Process Task is powerful, flexible, and robust. This article has demonstrated another way you can use the Execute Process Task to compress files for archiving. Archiving files after loading the data they contain is a common practice in data integration.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Basic Statistics Maintenance – Notes from the Field #083

[Note from Pinal]: This is a 83rd episode of Notes from the Field series. Maintenance of the database is very critical activity and I have always seen DBA taking it very seriously. There is a only one problem – there is no single solution or guidance for how to maintain the database. Everybody has their own opinion and way to do different tasks. Statistics is one of the most important aspect of the database. The performance of entire application can depend on statistics, as it can help SQL Engine with intelligence to execute optimal plan.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about basic statistics maintenance.


 SQL SERVER   Basic Statistics Maintenance   Notes from the Field #083

Statistic maintenance is an important (but often overlooked) aspect of performance tuning for SQL Server.  The query optimizer relies on distribution statistics to determine how the query will be executed, with a particular emphasis on SEEKS vs SCANS as well as estimates of effort needed (the cost threshold for parallelism).  Out-of date stats can impact performance significantly.

Luckily, the default setting for most databases covers most database performance scenarios; SQL Server offers three basic settings for statistics maintenance:

  • Auto Create Statistics – SQL Server will create statistics during an index creation script, or when deemed necessary to satisfy a query; enabled by default.
  • Auto Update Statistics – SQL Server will update statistics when it deems them to be outdated; enabled by default.
  • Auto Update Statistics Asynchronously – When enabled, SQL Server will updated statistics after the execution of a query if it determines that an update is necessary; if disabled, the statistics will be updated first, and then the query executed. This setting is disabled by default, and there’s some controversy over whether or not it’s useful.

notes 83 1 SQL SERVER   Basic Statistics Maintenance   Notes from the Field #083

For the most part, SQL Server does a good job of maintaining statistics if these options are left with the defaults; however, statistics can still become stale over time if the data is updated at a slow rate of change.  If your statistics are more than a year old in your database, it’s probably time to do a refresh.

But how do you tell when your stats are out of date? There’s a catalog view in SQL Server called sys.stats that will give you a lot of information about statistics, but it’s very detailed; data is collected down to the column level, which may be overwhelming if you have lots of databases.  I suggest starting at a higher level, by taking a look at how out of date statistics are across all of your databases.   I use the following query to help me quickly identify troublesome databases:

/*checks last updated stats date for all databases; calculates a percentage of stats updated within the last 24 hours.
useful to determine how out-of-date statistics are.  Also identifies if auto updatestates are on and if asynchronous updates
are enabled.
*/
CREATE TABLE #dbs
(
database_name VARCHAR(100)
,
oldest_stats DATETIME
, newest_stats DATETIME
, percent_update_within_last_day DECIMAL(5, 2)
,
is_auto_update_stats_on BIT
, is_auto_update_stats_async_on BIT
)
DECLARE @dynsql NVARCHAR(4000) = 'use ?
;
if db_id() >4
BEGIN
WITH    x AS ( SELECT   STATS_DATE(s.object_id, stats_id) date_updated
FROM     sys.stats s
JOIN sys.tables t ON t.object_id = s.object_id
WHERE    t.is_ms_shipped = 0
) ,
x1
AS ( SELECT   MIN(date_updated) AS oldest_stats
, MAX(date_updated) AS newest_stats
FROM     x
)
SELECT DB_NAME() database_name
, oldest_stats
, newest_stats
, SUM(CASE WHEN DATEDIFF(d, date_updated, newest_stats) <=1 THEN 100.00 ELSE 0.00 END)/COUNT(*) "percent_update_within_last_day"
, d.is_auto_update_stats_on
, d.is_auto_update_stats_async_on
FROM    x
CROSS JOIN x1
CROSS JOIN sys.databases d
WHERE d.database_id = db_id()
GROUP BY oldest_stats
, newest_stats
, d.is_auto_update_stats_on
, d.is_auto_update_stats_async_on
END
'
INSERT  INTO #dbs
( database_name
, oldest_stats
, newest_stats
, percent_update_within_last_day
, is_auto_update_stats_on
, is_auto_update_stats_async_on
)
EXEC sp_MSforeachdb @dynsql
SELECT  *
FROM    #dbs d
DROP TABLE #dbs

Results will look similar to the following:

notes 83 2 SQL SERVER   Basic Statistics Maintenance   Notes from the Field #083

Looking at the results, if I see that the oldest database stats are more than a year old, it’s s a pretty good indicator that statistics are not being maintained by some form of ongoing maintenance operation.  If the defaults are not being used, that’s also something that needs to be investigated.

The percent_update_within_last_day is also a good trigger for me to investigate a potential issue; if less than 20% of the statistics in a database were updated in the last 24 hours, the odds of a less-than-optimal execution plan increase significantly.  If the tables in the database are large, it may take several million rows of changed data to trigger a refresh.

To refresh outdated statistics, I recommend Ola Hallengren’s maintenance scripts; they offer a lot of flexibility for developing a customized index and statistics maintenance plan, including the ability to update statistics on a scheduled basis, and focus on updating statistics that need to be refreshed.  Basic maintenance (such as this) can help prevent performance problems from occurring.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What are T-SQL Window Functions? – Notes from the Field #082

Kathi SQL SERVER   What are T SQL Window Functions?   Notes from the Field #082[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about easy and amazing functions of SQL Server. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


The first set of T-SQL window functions have been available for 10 years, but many SQL Server professionals are not quite sure what window functions are. I am frequently asked by people I meet at conferences and customer sites “What are window functions?” I always answer with another question, “Do you use ROW_NUMBER?” Almost every time, the answer is “Yes. I didn’t know that was a window function.”

T-SQL window functions are different than other T-SQL functions because they operate over a set, or window, of rows. Notice that I am not capitalizing “window.” They have nothing to do with the Windows operating system. They are part of the ANSI SQL 2003 standard.

T-SQL window functions give you the ability to perform a calculation over the rows that are the result of the FROM, WHERE, GROUP BY and HAVING clauses. You can do things like include a column from a different row, calculate running totals, and perform summary calculations without losing details or grouping.

My favorite T-SQL window function is called LAG. LAG, introduced with SQL Server 2012, lets you bring in a column from a previous row. This is useful for calculating year over year performance, for example.

Here is the syntax of the LAG function:

LAG(<expression>[,offset][,default]) OVER([PARTITION BY <expression>] ORDER BY <expression>)

You must provide the column name that you wish to bring forward, and you must provide an ORDER BY expression. The ORDER BY is used to determine just which row is the previous row. By default, the value returned is from the previous row. By specifying an offset, you can go back any number of rows. You can also specify a default value to replace any NULLs that are returned.

Run the following script to create and populate a table for testing LAG.

CREATE TABLE #Sales (
SalesYear INT NOT NULL,
SalesQtr INT NOT NULL,
SalesAmt MONEY
CONSTRAINT PK_Sales PRIMARY KEY(SalesYear, SalesQtr));
INSERT INTO #Sales( SalesYear, SalesQtr, SalesAmt )
VALUES (2009,1,1400),(2009,2,2200),(2009,3,2800),(2009,4,1000),
(
2010,1,1200),(2010,2,2300),(2010,3,2850),(2010,4,900),
(
2011,1,1550),(2011,2,2600),(2011,3,2900),(2011,4,1100),
(
2012,1,1500),(2012,2,2500),(2012,3,2800),(2012,4,1000),
(
2013,1,1300),(2013,2,2800),(2013,3,2750),(2013,4,900),
(
2014,1,1550),(2014,2,2700),(2014,3,2700),(2014,4,1200);

The following query uses the LAG function with an offset value of four to return the sales from the same quarter of the previous year.

SELECT SalesYear, SalesQtr, SalesAmt,
LAG(SalesAmt,4) OVER(ORDER BY SalesYear, SalesQtr) AS PrevYearQtrSales
FROM #Sales;

notes 82 1 SQL SERVER   What are T SQL Window Functions?   Notes from the Field #082

Figure 1: The partial results of using LAG

To calculate year-over-year sales, you subtract the previous year’s sales from the current sales and divide by the previous sales. To make the calculation a bit simpler, I’ll add the query to a CTE and perform the calculation in the outer query.

WITH Sales AS (
SELECT SalesYear, SalesQtr, SalesAmt,
LAG(SalesAmt,4) OVER(ORDER BY SalesYear, SalesQtr) AS PrevYearSales
FROM #Sales)
SELECT SalesYear, SalesQtr, SalesAmt, Sales.PrevYearSales,
FORMAT((SalesAmt - PrevYearSales)/PrevYearSales,'P') AS YOY
FROM Sales;

notes 82 2 SQL SERVER   What are T SQL Window Functions?   Notes from the Field #082

Figure 2: Using LAG for year-over-year calculation

notes 82 3 SQL SERVER   What are T SQL Window Functions?   Notes from the Field #082The LAG function is really easy to use, and it performs great. If you would like to learn more about LAG and all of the other T-SQL window functions, be sure to check out my new book: Expert T-SQL Window Functions in SQL Server.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)