SQL SERVER – Do You Know Your Data’s Classification? – Notes from the Field #050

[Notes from Pinal]: Data is a very simple word, yet it is very powerful. There is a famous saying – Know Your Data. I have quite often found that developers do not know their data, they are often confused with the same and not sure how to answer that. My friend Tim Radney is an amazing person who usually have answer to each of the questions which looks simple but are complicated in reality.

Linchpin People are database coaches and wellness experts for a data driven world. In this 50th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word. Please follow Tim on his twitter handle at @tradney.


Do you know your data’s classification?

As data professionals, we have many responsibilities. We could be responsible for backing up and restoring data, writing reports, building queries, writing stored procedures, tuning workloads, or any vast number of responsibilities.

If you are in the business of granting access to the data either through reports/queries or provisioning login access, you should be aware of the type of data you are granting access to. Your company likely has policies in place that should guide how access to certain classifications of data should be handled.

Some of the more common types of data that your company would have stricter controls over would be related to PII, PCI, SOX, GLBA, or HIPPA. You should know what the guidelines are within your company for access to this data and help make sure that those standards are being upheld.

These data assets may require additional auditing on who has access to view, print, export, etc. When the data is viewed it may require water marks or headers/footers to be on any reports.

Your organization may require certain types of data, such as PCI to be audited on who is viewing certain elements of that data.

Worst, your organization may need to be doing these things but are not. If not, you should be asking why and helping to implement a data governance plan.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – ​Tuning Queries is Sometimes Waste of Time – Notes from the Field #049

[Note from Pinal]: This is a 49th episode of Notes from the Field series. Every day I get few emails where I am asked how to tune queries so the entire server run faster. I always end up writing long answer this question. The reason is simple – query tuning is indeed the need of hours, but it is not everything. There are many more things one should do along with tuning queries. I asked the same question to Brian Moran, who is every day dealing with consultants and various organizations who are facing issues with SQL Server.

In this episode of the Notes from the Field series database expert Brian Moran explains a how Query Tuning is sometimes a waste of time when we are facing issues with performance and scalability. Read the experience of Brian in his own words.


Tuning queries is sometimes a waste of time when you’re trying to fix performance and scalability problems. That sounds crazy but it’s true. Designing an efficient workload is almost always more important than maintaining a laser focus on query tuning and other technical tricks of the DBA trade.

Here’s an example that helps to explain what I mean.

Simple Example

What’s the shortest distance between two points? A straight line, of course. Everyone knows that. Imagine that you need to calculate the most efficient path from Washington DC to New York. Imagine those cities on a traditional 3D globe. The straight line is simple to figure out, right? Just head north as the crow flies.

But what if you wander about and take a curvy path as you head north? DC to New York is about 204 miles as the crow flies, but Google says the distance is 226 miles if you take I95 and the New Jersey Turnpike. Google also presents some reasonable travel options that are as far 264 miles. It’s easy to imagine you could increase the distance even more by taking scenic roads and side trips along the way. Of course, you can also draw a straight line on a globe between DC and New York that heads in a southerly direction, which is over 24,000 miles. That’s over 100 times more expensive that the straight route northwards.

Now, let me map that mapping example back to database performance tuning. Sometimes database performance tuning is an exercise in making the workload more efficient, just as you might want to make your route to New York more efficient. To make the database workload more efficient, you can tune queries, add indexes, and do all sorts of other neat tricks. Query tuning is like an exercise in flattening the northerly path from DC to New York to make as straight a line as possible. You want to tune your queries so that they’re as direct as possible and don’t veer off into unnecessary detours.

But, what if you’re talking about trying to flatten a squiggly line from DC to New York that starts out by heading south 180 degrees, the wrong way?  You can make that line as straight and as efficient as you want.  But heading south from DC to get to New York is never going to be efficient no matter how straight of a line you draw. A route that goes the wrong way can be optimized, but it’s still going the wrong way.

Inefficient Workload

The same idea applies to queries. If you start out tuning an inefficient workload, you’ll end up with an inefficient workload that is tuned, but it is still inefficient.

This seems like a silly example. Everyone intuitively knows that the southerly route is inherently a poor choice. You can tune the route as much as you want, but in best case scenarios it’s still going to be over 100X worse than going north.

Oddly enough, 25 years of database consulting tells me many DBAs will spend most of their time tuning queries that are the database equivalent of trying to make that southerly line from DC to New York as straight as possible. They’ll spend days, weeks, or months, trying to shave that last 10%.

All too often, DBAs fail to take a moment to pause, understand the nature of the problem they are trying to solve, and try to envision a more efficient way to solve the problem.  These DBAs don’t recognize that changing the workload can be vastly better than tuning the workload.

Two Classic Examples

Here are a two classic examples that most DBAs intuitively understand. Say you have a business need defined in a way that requires drop-down list box that will be populated with 100,000 rows that users will have to scroll through. You can try to tune the process to make it more efficient at loading 100,000 rows every time the screen is touched. But that tuning does not change the workload.

Rather than simply tuning the existing workload, a better investment in time would be if you helped the designers of that screen understand and accept a solution that doesn’t require loading 100,000 rows. Or perhaps you have a system that makes heavy use of database cursors in a procedural manner of some kind. Most DBA’s know that architecting with a set-based solution will almost always be better than trying to make the cursor-based approach as fast as possible.

Here’s a 15-year-old example that’s still a common design pattern mistake made today. I was working on an office supply site when e-commerce was first becoming big business. The application was built in a very object-oriented manner. Developers love object-oriented approaches, but databases aren’t always as fond of the approach. In this case this object-oriented design pattern led to 1000 and sometimes 2000 or more round trips to SQL Server when a single user searched for office chairs, and the search brought back just one screen of results.

This was a problem because the system needed to support hundreds or thousands of concurrent users. The existing workload might therefore need to handle hundreds of thousands or millions of batch requests per second. The client wanted to do this on a single SQL Server instance since that’s all their budget could afford.

Summary

I could have spent an infinite amount of time tuning the queries used in this approach, but the workload was never going to be efficient. I was never going to be able to handle hundreds of thousands of batch requests per second on a single instance of SQL Server 2000. Instead, I made some changes to the architecture. In other words, I changed the workload, and I was able to achieve the performance goals I had.

The examples I used here are pretty basic, and most DBA’s today are aware of how to deal with such situations. But I’m willing to bet that many people reading this post are struggling with a performance problem where tuning the queries is a poor use of time as compared to completely changing the essence of the workload. It’s too easy to lose sight of the forest for the trees.

Having a hard time figuring out how to tune your queries to achieve performance goals? Maybe you should take a step back and focus on changes you can make to the workload instead?

If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – ​Building Technical Reference Library – Notes from the Field #048

[Note from Pinal]: This is a 48th episode of Notes from the Field series. How do you build a technical reference library? In other word, when you need help how do you create your own reference so you do not have to go out to look for further help. There are so many little tips and tricks one should know and Brian Kelley has amazing skills to explain this simple concept with easy words.

In this episode of the Notes from the Field series database expert Brian Kelley explains a how to find out what has changed in deleted database. Read the experience of Brian in his own words.


Do you have a technical reference library? If you’re not sure what I mean, a technical reference library is your collection of notes, code, configuration options, bugs you’ve hit that you think you’ll hit again, and anything else that you might need to retrieve again in the future related to what you do in IT. If you have a technical reference library (hereafter referred to as TRL), is it:

  • outside of email?
  • distributed across multiple locations/computers?
  • searchable?
  • fast?

With my TRL, I’m more efficient because I‘m not searching the Internet again and again for the same information. I also can ensure I handle strange cases, such as unusual configurations, which we seem to get a lot of in IT. It’s in my TRL, so I don’t have to go back through a vendor’s install document or go run someone down in the organization to get the information I need. I already have it if I put it in my TRL. Because of the efficiency that TRLs provide, most top performing IT professionals that I know have some sort of system.

Outside of Email:

I used to have a folder in email where I kept technical reference documents. Because I try to follow Inbox Zero, I do have a Reference folder, but it’s not for technical documents. My Reference folder is typically related to what that mailbox is for. For instance, my LP Reference folder is for keeping procedures related to Linchpin such as how/where to enter time, who to contact about various things, etc.

Why don’t I have my technical documents in email any longer? Let me ask a question in response to that question: What happens when email is down? When email is down, you have no access to your TRL. Email does go down. I was faced with a case where I was responsible for getting email back up and, you guessed it, my technical notes were in email. That doesn’t work.

A second question to ask: How searchable is your TRL if it’s in email?  If you keep a lot of email, especially if you don’t have a specific folder for your TRL, searching may prove to be painful. That was the other problem I started to face.

Given these two issues, I advise building your TRL outside of email.

Distributed:

If your TRL  is only on a single computer, you’re going to regret it someday. That day usually occurs when the computer in question crashes and all your notes are lost. If you have a backup, anything you put into the library after the backup is gone. Give the prevelance of cloud-based solutions nowadays, having a technical reference library which is distributed is easy. Here are some ideas:

  • Evernote
  • Microsoft OneNote
  • Microsoft SkyDrive
  • DropBox
  • Google Docs
  • Apple iCloud

I’m particular to the first two, Evernote and OneNote, because they aren’t simply “file systems.” They are designed to capture and catalog information for quick retrieval later.

All my examples will come from Evernote, because that’s the application I typically use. In fact, here’s my setup. I have a specific notebook for my TRL:

TRL Notebook

If I know exactly what I’m looking for or if I’ve added it recently, I should be able to find any note quickly in the list of notes for the notebook:

Note: SQL 2012 Slipstream

Searchable (and Fast!):

Even if what I’m looking for isn’t right there at the top of the list, I can search in Evernote (and OneNote, if I was using it) to quickly locate the document. For instance, by typing “Slipstream,” I quickly get to the article that I want:

Search of TRL

Products live Evernote and OneNote have specifically worked on Search in order to retrieve results quickly. They also provide options to search within a notebook, for instance. In my case here, since slipstream is such a specialized term compared with what else is in my Evernote notebooks, I didn’t feel the need to filter by notebook. However, I could have if I recevied a lot of hits back or if the search was taking too long.

Also note that I’ve not added any tags to this article. I’m hitting it using a text search as to the contents alone. The use of tags offers another option in order to locate the material I need quickly. If I had a lot of articles that came up for a particular search word or phrase, I could look at using tags to differentiate them better. It’s another reason to consider tools designed to hold information and make it quickly retrievable.

Build a System That Works for You:

Learning expert Cynthia Tobias was once helping a teacher who asked her students to keep a reference notebook for assignments and handouts in class, an academic version of the TRL I’ve described thus far. The teacher balked at one student’s notebook because it was messy. The teacher couldn’t imagine how the student could locate anything in the notebook and was going to give the student a poor score. Tobias asked the teacher, “What’s the point?” The point, the teacher indicated, was to be able to retrieve an assignment or handout quickly. Tobias challenged the teacher to check to see if the student could retrieve quickly (within a minute, for instance). If the student could, the teacher should leave the student alone. If the student couldn’t, then work with the student to improve the reference system.

That’s what you want to do. You want to develop a reference system that’s efficient for you. I’ve given you a snapshot of what works for me. It may not work for you. That’s okay. Start with something. If you’re starting from scratch, I would recommend starting with Evernote or OneNote. Put some notes in that you’ll need again. See how well you can retrieve those notes, especially as the number of notes increases. Make tweaks as you have to for performance sake. Most of all, build your TRL and become a better professional.

If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Database Taking Long Time to Create – Notes from the Field #047

[Notes from Pinal]: There are few things which confuse us when we encounter first time, however, there are few concepts which confuses us even though we face them multiple times. One of the such subjects is database taking a long time to create. Think about it, if you see a database which takes long time to create, your natural reaction will be how long various operations will take with the same database. I asked him if he can write further on this topic and help people understand this complex subject in simple words.

Linchpin People are database coaches and wellness experts for a data driven world. In this 26th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word. Please follow Tim on his twitter handle at @tradney.


I recently came across a support question on a forum where the user was very concerned about the health of their environment because it was taking over 10 minutes to create a 150GB database. When I was reading the post from the user two things immediately came to mind. First was that they could be creating the data file on a slow IO subsystem and second that they probably do not have Instant File Initialization turned on.

When data and log files are created in SQL Server they are initialized to overwrite any existing data. This is done by filling the files with zeros. This is a process commonly referred to as zero’ing out the file.

In SQL Server there is a way to change a setting that allows for instant initialization of the data file. This process does have a security consideration that must be understood, however for most organizations it is not an issue. Basically by not writing over the free space with zeros it could potentially allow an unauthorized user to read any previous data written to the disk.

Instant file initialization is only available if the SQL Server (MSSQLSERVER) service account has been granted SE_MANAGE_VOLUME_NAME. Members of the Windows Administrator group have this right and can grant it to other users by adding them to the Perform Volume Maintenance Tasks security policy. For more information about assigning user rights, see the Windows documentation. (Reference)

In the case of the user on the forum, they were not using Instant File Initialization and decided to enable it. The user then created another database of the same size and it created in seconds versus the same operation before taking over 10 minutes.

I created a blog and video a couple of years ago walking through the process of enabling this feature. You can view it here.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using the SSIS Term Extraction for Data Exploration – Notes from the Field #046

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Using the SSIS Term Extraction for Data Exploration.

Linchpin People are database coaches and wellness experts for a data driven world. In this 46th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand SSIS Term Extraction for Data Exploration.


Data exploration is an essential piece of any new ETL (extraction-transformation-load) process.  Knowing the structure, type, and even the semantics of data sources can help the ETL developer or architect to make better decisions on data type and length settings, transformation logic, and error handling procedures.  Additionally, there are situations in which the exploration of the data is the principal purpose of the ETL, such as text mining and other pattern analysis.  In most scenarios, SQL Server Integration Services is used as a traditional ETL tool and not necessarily for data exploration or text mining.  However, there are some tools built into SSIS that are ideally suited for exploring source feed to expose patterns in the data.  In this brief post, I’ll cover one such tool: the term extraction component.

The term extraction component is a data flow transformation that will aggregate the frequency of words found in a specified column supplied by an upstream data source.  This component expects a Unicode text or text stream field as an input, and calculates as an output the frequency of the nouns and/or noun phrases in the specified source column.  As shown below on the data flow surface, the term extraction component (circled) is always used as a transformation, accepting exactly one input and expecting either one or two outputs (the second being an optional error output).

Configuring the term extraction component can be a relatively simple exercise.  As mentioned, you’ll need to specify the upstream column that will be used as an input, and this column must be Unicode text (DT_WSTR) or stream (DT_NTEXT).

Purposefully skipping over the Exclusion tab for just a moment, we’ll review the Advanced tab.  This pane exposes several of the key configuration elements of the term extraction component.  In particular, we can use this page to set the output for noun and/or noun phrase, set the score type to frequency (raw count) or TFIDF (a calculation based both on the frequency of the term and the ratio of the analyzed terms to the entire document), and set the minimum score threshold and maximum term length.  We can also set the case-sensitive flag, in case our analytics need to be case sensitive.

Additionally, the exclusion list (on the Exclusion tab) can help to refine your calculation results.  This feature allows the ETL developer to specify a known list of terms that will be excluded from the output.  This is useful in cases where industry or common language terms might occur so frequently that including them in the output simply dilutes the rest of the data.

Although the term extraction component does not have a lot of moving parts, it does provide a quick way to perform data exploration and simple text analysis.

Conclusion

In this post, I have briefly demonstrated the attributes of the term extraction transformation, an SSIS data flow tool that can be used for data exploration.  This tool is a quick and easy way to perform triage analysis and mining of raw text data.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Hey DBA – Go Make a Proactive Investment Today! – Notes from the Field #045

[Note from Pinal]: This is a 45th episode of Notes from the Field series. If you know Mike Walsh, he is an amazing person to meet and know. I am fortunate to be his friend. Every single time I meet him, I learn something as a human.

Last time when I met, I had asked him one question – if you have to give me one advice, what will you give? He had said, I have no advice for you, but I have a challenge for you to complete. If you need help to complete that challenge, I will work with you to achieve the goal.

Now think about this entire incident, how many times this kind of conversation happens in our life – very rarely.

In this episode of the Notes from the Field series database expert Mike Walsh give a challenge to all of us. He explains to us why we should be proactive and right after that gives us one challenge which we should all take and complete.


You can spend some time and some money today.. Or you can spend more later..  It’s that simple.

As a SQL Server consultant, you could say I know a little bit about clients spending money. There are jokes out there about how consultants spend the money of our clients. I’ll tell you though, I’ve been playing with SQL Server for fifteen years now, the past 3-4 as a SQL Server consultant, and I’d much rather take less money up front than more money down the line.

Here’s what I mean – I enjoy seeing clients spend money proactively rather than reactively. For a few reasons:

  • It’s almost always less when you spend it on being proactive.
  • Stress levels are down when you aren’t in crisis mode.
  • Decisions are clearer and cleaner when there isn’t a “fire” going on.

And the old adage that I started this post out with is true. I’ve seen it true time and time again. I’ve seen it reflected in my invoices. When a client calls up front and takes care of something proactively and plans for the future – the end result is always less money spend and a more peaceful experience for all involved.

This is a simple post. It’s a quick post. I go into much more detail at an older post of mine about Planting Asparagus. And with it I have one simple challenge for you, the reader of Pinal’s wonderful blog:

Do Something Proactive Today

That’s it. Go into your SQL environment and forget the 400 e-mails in your inbox. Forget the piling up requests in your queue. Forget the avalanche surround you as a DBA. Forget that stuff and do something proactive. Think about the future. Think about some what ifs and save yourself future pain. I’m not saying you have to call us at Linchpin People and ask for a WellDBA™ Exam. I’m not saying you need to call any consultant. Just be proactive.

Here are a few places you can start. These are all places I’ve recently seen clients spend much time and money on in emergency situations because of failures at being proactive. I’m sure Pinal has some great posts about all of these areas.

  • Do a restore test… Check your backups! – I don’t just mean are you taking a backup or does your system administrator say they are backing up your environment. I mean – are you sure you are backing up your environment like your users expect and believe you are? If not? Stop reading the rest and go check.
  • Check database consistency – I’ve spent much time helping clients who had a double whammy situation lately of no good backups AND no proactive checks looking for corruption. This means that their users detected corruption. It wasn’t known how long it was there for and there was no great backup solution. That means data loss. That means expensive solutions to scrap as much data out as is possible. The time you take to get this right? It pays dividends down the line.
  • Perform routine maintenance – Index maintenance, statistics maintenance. If you aren’t doing these basic things – your users could be suffering right now with performance problems. You could be wasting storage space. Things could be less efficient than needed.
  • Setup alerts and monitoring – Whether it is free through using SQL Server Agent Alerts, SQL Agent operators and notifications of jobs and errors – or looking at the plethora of inexpensive SQL monitoring products out there. If you are blind to issues – that means that you could have a smoldering fire building in your environment right now. And you are totally blind to it until that fire gets a little more oxygen and explodes into a down system.

I could keep going. There are a dozen basic things that you can easily do right now. The investment is a little time. Maybe some digging around on the internet to make sure you are in a good spot to do these things. Maybe the investment spending some money having an external review like our exams or the health checks any number of great SQL Server consultancies do. Maybe the investment is in getting some DBA training or mentoring. Whatever the investment is. I can guarantee you from experience that it is an investment that is all but guaranteed to pay off and the dividends are huge.

Ok. Stop reading and clicking. Go make a proactive investment in your environment.

If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the SSIS Execute SQL Task – Notes from the Field #044

[Note from Pinal]: This is a new episode of Notes from the Field series. SSIS is an interesting concept. There are more hidden features with SSIS than any other product. Additionally, there are many visible features which are not known to most of the people. Earlier I mention to SQL developer that they can execute SQL queries from SSIS they looked at me like they have seen the ghost. They always thought that they can only do the task in SSIS which are available to them in the tool box. Well, I think when I will get the similar question next time, I am going to forward them a link to this blog post. Andy has explained in very simple words how one can execute SQL Task in SSIS.


With “SQL Server” included in the name of SQL Server Integration Services (SSIS), it is easy for people to assume SSIS is simply a database tool or accessory. Let there be no mistake, SSIS is a software development platform. To give you a better understanding of how to use SSIS as a development platform, I’ll write a series of blogs that step through how to use Control Flow tasks. This first article will cover the Execute SQL Task.

When developing solutions with SSIS, I use a handful of Control Flow tasks:

  • Execute SQL Task
  • Data Flow Task
  • Script Task
  • Execute Package Task
  • File System Task
  • Execute Process Task

This list is a good approximation of which tasks I use most, too – from most-used to least-used. In this article I provide a basic example of configuring the SSIS Execute SQL Task, shown in Figure 1:


Figure 1: SSIS Execute SQL Task

Three things are required to configure an Execute SQL Task:

  1. Connection Type
  2. Connection Manager
  3. SQL Statement

Connection Type

The default Connection Type is OLE DB, as shown in Figure 2:


Figure 2: Available Connection Type Property values

I configure Execute SQL Tasks to execute SQL statements like truncate a table, update or insert a single row of data, or call a stored procedure. I typically use OLE DB or ADO.NET connection types; but I occasionally use ODBC connection types. (When executing parameterized statements I find ADO.NET offers a cleaner interface. I will cover parameters in another article).

After selecting a connection type, selecting the Connection Manager is next. The Execute SQL Task Editor filters the list of available connection managers, displaying only connection managers of the (connection) type configured in the previous step.

Figure 3 shows how to select a Connection Manager:


Figure 3: Selecting a Connection Manager

The last required step to configuring an SSIS Execute SQL Task is to supply an SQL Statement. There are two properties for this: SQLSourceType and SQLStatement. The SQL Source Type property specifies the source of the SQL Statement as one of the following:

  • Direct Input
  • File Connection
  • Variable

The SQL Statement can be entered manually (direct input). It can be stored in a file (file connection) or the SQL Statement can be stored in an SSIS variable (variable). In most cases you will manually enter the query as shown in Figure 4:


Figure 4: Entering the Query

Right-click the task and click “Execute Task” to test the task configuration. Or simply press the F5 key to execute the entire SSIS package, as shown in Figure 5:


Figure 5: Success!

You now know the basics of configuring an SSIS Execute SQL Task. Go code!

:{>

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)