[Note from Pinal]: This is a 63rdth episode of Notes from the Field series. Storytelling is an art very few have mastered in their life. When I requested Stuart to share his ideas about his famous blog about MARS, I really had not expected the response which I have received in this episode of Notes from the field. It is very critical for any DBA to be proactive in their operation, however are we proactive enough? Well, you need to read this entire episode of notes from the field to grasp the concept.
In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about MARS in this very captivating story.
Database administration can be a complex job, particularly in large scale environments; DBA’s are responsible for a variety of tasks, and there’s a wealth of knowledge on how to administer and support SQL Server. For this post, however, I wanted to focus not on a specific code-related or configuration tip, but rather on a broader perspective of work in general. As a manager and a consultant, most SQL Server shops have a common problem: we focus too much on reacting to problems and not enough on preventing them.
In the past, I’ve blogged about what I call the MARS paradigm: Maintenance, Architecture, Research and Support. Most system administration tasks can be broadly categorized into one of these four areas, and each of these four areas can be conceptually defined in terms of Actions (Proactive\Reactive) and Results (Delayed\Immediate):
Think about any given day, and how much time you spend working in each of the 4 areas; which ones are given priority? If you’re like most shops, you probably spend the most amount of time working support cases (“my query is slow”, or “can you show me where to find this information in the database?”) instead of focusing on maintenance or architecture. A healthy IT shop will find ways to balance the need for immediate reaction to the need for proactive long-term results; however, the question that needs to be answered is: why do we react, and how do we become more proactive?
Why Database Administration Is Usually Reactive:
In part, the reactive nature of most IT departments is a cultural thing. Fighting fires can be fun; people who solve problems are usually lauded as heroes, while people who prevent problems often go unnoticed. Think about it this way; if a database is corrupted and a restore is required, who gets the credit for saving the day? The DBA who did the restore or the DBA who set up the backup plan years ago (assuming that they’re not the same DBA)? Administrators that are quick to act are viewed as being responsive, even though they’re usually built on a foundation that was established invisibly some time ago.
Furthermore, most database administrators are thorough in their work; support issues are usually an opportunity for them to dig into a problem and find the best solution. Maintenance issues, on the other hand, are addressing problems that don’t yet exist, and therefore don’t yet need a solution. However, constantly providing support pulls resources away from building a better foundation. If a DBA is spending time constantly fighting fires, they’re not spending time on monitoring the environment to avoid future problems. So how do you move from a reactive environment to a proactive environment?
Moving From a Reactive Approach to a Proactive Approach:
First, track your work efforts for some period of time (30 days). Find a way to categorize your work into one of the four broad areas defined above:
Maintenance: efforts needed to keep system performance and stability
Architecture: efforts to add new features, functionality, or capability
Research: efforts to learn “something new”
Support: efforts to sustain the existing functionality or customer requests
Ideally, your work is being tracked in some form of service desk software, but if not, then I’d suggest you start with something simple (like a whiteboard with sticky notes).
Second, figure out a metric for work effort; it could be time spent on an issue, or the number of issues, or some other measure of complexity. The goal is to have a reasonable method of comparison between categories. Based on your collection efforts, you should be able to figure out where you’re spending most of your time; if more than 50% of your time is being spent in Support or Research, it’s time to consider becoming more proactive.
Becoming more proactive can be a never-ending journey, but I have a few suggestions on where to start:
Prioritize accordingly. Not every support case deserves to be treated as your top priority; make sure that maintenance and architecture issues are included in your escalation plans.
Set it, but don’t forget it. SQL Server offers a lot of tools for automating maintenance; make sure that you allocate time to review successful executions as well as failures. Understanding when a system is performing outside of spec is nearly as critical as being notified when it fails.
Communicate with interested parties. Let your manager and your peers know about your maintenance an architecture efforts; the best metric would be a reduction in support calls and cases.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Recently one of my blog reader emailed me below question.
Need your urgent help. In recent past, we have been attacked by the hacker who was able to get in to our SQL Server via sysadmin account and made big damage to our data. To make sure it doesn’t happen in future, I have taken task to find out SQL Server password which are weak.
Do you have any suggestions for me?
This is one of the area which is always haunting all SQL DBAs. There are recommendations to use Windows Authentication to connect to SQL Server and that would save from all such problem. But it is not always feasible to use Windows Authentication. Now, if we decided to choose SQL Authentication, there is a setting which is “Enforce Password Policy” which would ensure that you are choosing a strong password.
If recommendations are not followed, you might end up in situation where SQL Logins have weak and basic passwords. SQL Server has provided a function PWDCOMPARE which can become very useful to find known password. Below are few example use of this out of box funtion:
SELECT NAME, NAME 'password' FROM sys.sql_logins WHERE Pwdcompare(NAME, password_hash) = 1 UNION
SELECT NAME, '<blank>' AS 'password' FROM sys.sql_logins WHERE Pwdcompare('', password_hash) = 1 UNION
SELECT NAME, 'password123' AS 'password' FROM sys.sql_logins WHERE Pwdcompare('password123', password_hash) = 1
In above query, we are trying to find:
Password same as user name – first query
Blank password – second query
Password = password123 – third query
These are one of the most common password used in the industry. I am sure you can extend this further by modifying it and adding more weak passwords.
Here is the sample output for the above:
Hope this blog would help you in finding weak passwords and make it more complex. Have you ever had a need to use such passwords in your environments? Do let me know.
[Note from Pinal]: This is a 61st episode of Notes from the Field series. One of the common question I receive every other day is – I have learned BI but I feel that there is no automation in the BI field, what should I do? How do I go beyond learning BI? How can I fill the gap between BIDS and SSDT? If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.
In this episode of the Notes from the Field series database expert Reeves Smith explains what is Biml and How Can it Help Me with SQL Server BI?. Read the experience of Reeves in his own words.
In previous articles (What is Biml and The Why of Biml), I’ve discussed Business Intelligence Markup Language (Biml) and given some examples of projects that would benefit from its use. I also demonstrated a quick way to get introduced to Biml without learning a lot of C#. As you grow in your understanding of Biml and BimlScript you will want to make your Biml files more dynamic with a programmatic metadata-driven approach. This article will build on the previous articles and add looping with metadata to build more dynamic Biml projects.
Starting where the previous article left off, I’ll demonstrate how to add a little C# to make a more automated method of metadata retrieval. If you did not walk through the other articles, I would suggest that you review them before you read this one. This article builds on the Biml in Listing 1 and lets you create multiple integration services packages from one Biml file.
Listing 1 – Semi-Automated Common Source Package Biml
Example – Automated Source Packages
This example shows you how to connect to a database and select the metadata to iterate over. Then you can create multiple SQL Server Integration Services (SSIS) packages from one Biml file. The example in the previous article manually updated the metadata and reran each Biml file. This is an effective approach, but you can build on it to create a more robust solution. Take a look at the C# code in Listing 2.
Listing 2 – C# Replacement Code
Replace the first line of code from Listing 1 with the code in Listing 2, which is an excerpt from Listing 3.
The first line will add the namespace to the Biml file and enable the calling of functions without a fully qualified name.
The second line creates a string variable that contains a connection string of the database with the metadata.
The third line creates a connection object (actually an AstOleDbConnection node) that will be used by the ImportDB function.
The forth line creates a result set with the metadata from the connection object. This object contains all of the metadata from the AdventureWorks2012 database. The ImportDB function has three parameter. The first parameter is a schema filter, and in the example the filter is on the Person schema. The second parameter is a table filter and is blank in the example, so it will be ignored. The last parameter is a list of ImportOptions, and the views have been excluded. To add more import options use the “|” followed by additional options. Example of multiple options:
The example includes an object that contains all of the metadata that you will loop over. You have to place the loop within the <Packages> node because a Biml file can only have one Packages collection. The foreach loop will create multiple <Package> nodes.
A tableName string was built below the foreach loop to reduce the code that you have to alter from the example in the previous article.
Listing 3 – Complete Code
Note: The code in listing 3 has the ConnectionString truncated to save space.
If you receive the error shown in Figure 1, it means that the tables do not exist in the stage database. To check this, add the Person filter to the table parameter in the ImportDB function and return metadata for only one table. The prior articles used this table, so it should exist.
Figure 1 – Error Message
This article added some more C# to help you refine your Biml files. I hope you can start to see the power of Biml and how it can help remove the repetitive work from your next Integration Services project.
If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.
[Note from Pinal]: This is a 59th episode of Notes from the Field series. I get over 100 emails everyday and out of 100 emails nearly 90 of them are really related to performance tuning. The emails I get is from various different IT professionals. There are quite a few application developers like (Dotnet, Java, PHP etc) who do application development everyday, but they are not always expert of SQL Server. When I read those emails I feel that the world indeed need plenty of SQL Server Performance Tuning consultants. I asked these questions to SQL Server Guru Brian Moran. He is a unique blend of person who knows the consultant’s business from technology and business perspective.
Today is Christmas Day and we have received a wonderful gift from Brian!
In this episode of the Notes from the Field series database expert Brian Moran explains in very simple words the reality of the SQL Server Consultant world. Read the experience of Brian in his own words.
December and January are times of the year when many people reflect on personal and career goals. It would be silly and naïve for me to pretend that I know what’s best for your career so take this post with a grain of salt. Maybe you want to be a fireman or an astronaut? I don’t have many words of wisdom if that is your hope.
But, I do have a lot of knowledge for data professionals who aspire to be independent consultants.
I focus more on the business side of running my consulting business these days so I’m not as technical as I used to be. But I was a SQL MVP for many years and was a pretty good tuning architect. I’ve been on the PASS Board twice and have been involved in the SQL Server space for almost 25 years. Over the years I’ve been part of the leadership team that built two successful consulting companies and I’ve coached and mentored dozens of technologists over the past two decades as they got their start in consulting.
This post is for folks who love the engine side of being a database pro and want to be an independent consultant. I’m not talking to the BI folks or the other myriad of specialties that exist in #SQLFamily. But do you love the engine? I mean really love working with the core engine? Do you want to be a consultant one day?
World class and amazing performance tuning skills are consistently the on the short list of skills that I’ve seen successful independent SQL Server consultants have over the past 20 years. Are there other skills out there that allow a consultant to have a great career? Of course. You can do well with any specialty that is hard to master and that companies absolutely must have access to in order to be successful. HA/DR, consolidation, virtualization, security, and a wide variety of other topics come to mind. But here is one element of performance tuning that that tends to create opportunities for consultants to maximize their rates and have a successful career.
Top performance tuning consultants can have a pretty quick impact on a problem and companies often need access to the skill on short notice for critical needs. Sometimes tuners can solve a problem in minutes or hours. And long term performance tuning work is often measured in days or weeks rather than weeks or months that some other types of projects require.
Here is some basic math to think about. Let’s say you want to bill $250/hr. Is that expensive? Well, like any SQL question the answer is it depends. Let’s look at it this way.
Assume you can solve a critical problem for a customer in 20 hours. $250/hr for 20 hours is $5,000.
Imagine you are competing against someone who says they can solve the problem for $125/hr.
50% less sounds like a big savings for the customer, right? But, it’s really only a net savings of $2,500.
Do you think many companies care about a difference of $2,500 when they desperately need to solve an urgent problem impacting critical business systems? Probably not. They probably care more about picking the person they believe is the best fit and most likely to solve the problem.
But now let’s imagine that you are talking to a potential customer about a project that might take 6 months to solve working full time. That’s about 1000 billable hours. Let’s make the same assumptions that you are charging $250/hr and someone else is going to offer $125.hr. Now the 50% differential is rate yields a net savings of $125,000.
Will many companies care about saving $125,000 if they can? Yep. They sure do.
I find that many folks never fully think through this side of the consulting equation early in their career. IE, within reason companies, mostly don’t care about the rate for performance tuners as long as you are awesome and can get the job done. Don’t get me wrong. It’s quite possible to charge premium rates for long term projects and you can certainly be a successful consultant while, focusing on some other aspect of SQL Server. But almost 25 years in the business have taught me that loving performance tuning and wanting to be an independent consultant can be a match made in heaven.
If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server.
This article is the high-level overview of the SQL Server security best practices and aspects. Ensuring SQL Server security is an important task that may be successfully solved by applying best practices and proven solutions described further in this article.
SQL Server physical security aspects are often overlooked. Frequently DB admins focus on the software security and network security, while completely forgetting about the physical server itself. However, the possibility to physically reach the server, will lead to abrogation of all other security configurations that they apply to the software. As far as SQL Server is installed on a physical device, whether it would be server hardware, desktop PC, laptop, or other PC in a data center. The physical security main goal is to prevent someone from gaining access to the machine.
Shared data centers or server rooms provide a wide range of physical and environmental security to ensure that secure areas are protected by appropriate entry controls to ensure that only authorized personnel are allowed access. An appropriate, physical protection should be provided against damage from natural, or man-made disasters, such as fire, flood, explosion etc. All users are required to ensure that systems are not left open to access by intruders to buildings, or by unauthorized colleagues.
If the personnel in the data center is unable to physically lock away the server, you must ensure that all USB ports on the server are disabled. Keep in mind that security is not complete if physical security is overlooked.
The network is the way that external threads may come to attack your SQL Server.
Firewalls provide efficient means to implement security. A firewall is a separator or restrictor of network traffic, which can be configured to enforce your data security policy. If you use a firewall, you will increase securityat the operating system.
SQL Server Instance Security
Every SQL Server instance that is installed on a PC can be considered to be a security domain. Thus, at the instance level you could apply separate security considerations.
The following picture demonstrates two separate instances of SQL Server on the same computer. Each has its own specific name, the first one is MSSQLSERVER\Instance1, the second one is MSSQLSERVER\Instance2.
This picture gives us the ability to see how security applies at the instance level. SQL Server Security uses the concept of securables that will be discussed later in this article.
Securables are the objects to which the SQL Server authorization system controls access. For instance, a table is a securable. Some securables can be contained within others, creating nested hierarchies called “scopes”. The securable scopes are server, database, and schema.
An object that receives permission to a securable is called a principal. The most common principals are logins and users.
A login is a security principal, or an entity that can be authenticated by a secure system. Users need a login to connect to SQL Server. You can create a login based on a Windows principal (such as a domain user or a Windows domain group) or you can create a login that is not based on a Windows principal (such as an SQL Server login).
To control access to securables, a DB admin can grant or deny permissions, or add logins and users to roles that have access.
It is important to note that every individual instance has it’s own set of security considerations. The security folders on each instance contain the same users, application roles,(etc.) folders. However, if we expand the logins folder you can see the different number of accounts.
This proves that we can have different security settings for individual users across different instances.
It is also important to note that when you do set up an SQL Server instance, you only install the necessary features that are necessary for that instance to operate. This reduces the possibility to attack of your specific SQL Server instance, by reducing the number of services and features that are available for malicious users to gain access to.
SQL Server Securables
Securables are the resources to which the SQL Server Database Engine authorization system regulates access. Essentially these are the database objects to which you can apply security on your instance of SQL Server. Some securables are standalone and others may be contained within another securable.
Each database also has its own security folder, where we can focus on users, which again, become the logins that we assign the permissions to for our database.
We also have database level roles and application roles. Currently, there are no application roles assigned on this server. There are another components that we do not see in the Database Explorer, however they still exist at the data base level. These components are assemblies which are typically DLL files that are used in the instance of sequel server for deploying functions stored procedures, or triggers. These components are managed by CLR.
SQL Server Principals
There are three high level categories of SQL Server security principals:
Windows level (domain and local logins)
SQL Server level (SQL Logins and server roles)
Database level (database users, DB and application roles)
There are two SQL Server authentication types:
by using Windows account security token
by using an account defined inside SQL Server
You can tune the type of authentication during installation. Windows mode is the most secure, as it relies on Windows accounts as opposed to SQL accounts. The recommended setting is Windows Authentication and it is selected by default. This authentication type simplifies administration and is more secure, because it uses Windows security tokens. There is no need to remember another password, and no password transits through the local network.
You cannot disable Windows authentication, however you can extend it with SQL Server authentication. Thus, in case you need to allow access to the users outside the local network, you should use SQL Server authentication. In this case, logins are defined and managed inside SQL Server. The authentication and password validation is managed by SQL Server also.
There are also default roles created during installation as well. One of which is known as the public role.
SQL Server creates the public role as a way of assigning permissions to users who have accounts on the database, but who do not have any other currently assigned permissions. The public role serves the purpose, of providing limited permissions to user accounts on the database, until you’ve had the opportunity to assign the necessary permissions for those user accounts. SQL Server also includes a guest user account. This account is not a server level account, but exists at the database level.
If we expand the users folder within the database, we will notice a guest account, that is available for each database. The guest account is used to grant permissions to users, who might be able to access the database. But, who do not have a user account assigned, or created in the database itself.
The guest account cannot be dropped, but it can be disabled, through the revocation of the connect permission. User and application access to your SQL Server instance, will be controlled using these security principles.
Manage Logins and Users
As mentioned above in this article, SQL Server implements security through the use of securables, which are the database objects, and security principles, which are the user’s inner applications that will access the database.
If you know how to create the logins and user accounts – it is the first step in creating the authentication and authorization model for your SQL Server.
Before creating logins, you must know which authentication mode SQL Server instance is configured to use. In case of mixed mode authentication, we can create Windows accounts, or we can create local SQL Server accounts for authentication on the instance of SQL Server.
Keep in mind that SQL Server allows you to create logins through Windows accounts or SQL Server accounts at the server level. Also SQL Server permits the creation of local user accounts at the database level. If we expand the database, and expand the Security folder, we notice that SQL Server includes a Users folder at the database level.
Simply right click the Users folder, and then click New User. This allows us to choose a user type at the database level, as opposed to the server level. We can create SQL user with or without a login, what means that this would be a user name we create specifically in this database.
A good example of such an account is the guest account, which is created at the database level, but does not exist at the logins level for the server itself.
You can set the authentication mode to Mix Mode at any time after the installation. And then you can create SQL Server accounts, or use Windows accounts, depending on your choice.
Understanding each type of account and how security is applied to these accounts will help you set the proper security for user access.
Understanding and using roles
SQL Server roles allow you to group users or logins into a specific container for assigning permissions to securables on your SQL Server instance. Permissions assigned to a role are applied to any user or login that is associated with that role. SQL Server provides nine fixed server roles. You can find these server roles in the Security folder of the SQL Server instance. Outside of the nine fixed server roles, SQL Server also provides database level roles.
SQL Server Permissions
Accessing any of the secureables in SQL Server requires that the login or user have permissions. The permissions in SQL Server follow certain naming conventions. The security check algorithm used by SQL is complex. And the complexity comes from the need to evaluate group or role membership, as well as explicit and implicit permissions.
There are three core elements of SQL Server permission checking:
Security context. This is related to the user, the login or the role.
Permission space. The permission space will focus around the securable, such as the database object that the user is attempting to access. SQL Server checks the permissions that the principle has been assigned according to that securable.
Required permissions. This is an important step because some tasks require more than one permission. For instance, if a user attempts to execute a stored procedure, the execute permission would be required in the stored procedure itself.
SQL Server security is too huge topic to be discussed in a single article. However, this article provides an overview of core principles and concepts of SQL Server security.
[Note from Pinal]: This is a 55th episode of Notes from the Field series. Common sense is not as much as common as we think. I am sure you agree with it from your real world experience. However, when it is about data and its security, there has to be some rules along with the policy but common sense is extremely critical. When I read this article, I find it humorous at points and some of the examples also reminded me of my past experience. If you are in data security, you will have a great time reading these notes, but if you are not, you will still love it.
In this episode of the Notes from the Field series database expert Kevin Hazzard explains common sense data security and how we can apply in daily life in real world. Read the experience of Reeves in his own words.
There are many excellent books and articles that address the correct ways to store sensitive user information. Yet, many in IT are still failing to protect customers from loss due to data breaches. Every day, it seems that there’s another retailer or commercial web site in the news for losing passwords or credit card numbers to hackers. As an industry, why are we struggling to secure this type of information when there’s so much good intelligence and so many great tools for getting the job done? It’s a complicated subject so perhaps it’s time to step back a bit and use a bit of common sense to analyze the problem.
No matter the industry, using the right tool for the job is rule number one. Line-of-business databases are all about organizing information and getting it into the hands of people who perform transactions and make decisions with it. As a result, these databases become naturally permissive by nature, especially as they evolve to meet the demands of growing businesses. There are good access controls in modern databases but when it comes to managing ultra-secure bits of data, traditional, relational databases may not be the best fit for the job.
Lightweight Directory Access Protocol (LDAP) servers like ApacheDS, OpenLDAP and Microsoft Active Directory do a much better job of handling sensitive data with less trouble than any custom coding we might do on our own. Moreover, the built-in authentication functions of LDAP are mature and standards-based, making them safe and reusable from many different applications without custom interface development. It’s our duty as technologists and as business people to highlight the high cost of custom security solutions and the huge potential risks to our managers. In particular, when it comes to storing passwords in our line-of-business databases, just say no.
If we must manage financial instruments or personally identifying information in a database like SQL Server, there are three classes of problems to solve:
Keeping the hackers from stealing our stuff,
Detecting when breach attempts occur, and
If data is unfortunately lost, making the information useless.
Let’s think about these efforts from a common sense perspective. Problem one is all about access control. The problem with permissions in any complex system is that they are difficult to maintain over time. Even if the initial configuration and policies safeguard the sensitive data, some future administrator may fail to understand or enforce the rules correctly. We could make those future administrators’ jobs much easier if we followed one simple rule: never mix highly-sensitive data in tables containing non-privileged data.
[Note from Pinal]: This is a 54th episode of Notes from the Field series. Where do we blame for a mistake or error in the system? Well, developer blames DBA and DBA blame developers. Sometimes there is no solution to the catch 22 situation. I have been there and I am very sure that you have been there too. Well, this is an amazing and beautiful article by G. Andrew Duthie. He has attempted to demystify the problem which we all face every day.
In this episode of the Notes from the Field series database expert Andrew Duthie explains about Validation Rules and where they should be implemented. Read the experience of Andrew in his own words.
If you’re a DBA dealing with developers, you may run into the question of whether it’s better to allow the developers to write validation rules in their .NET app layer, or whether you should put your foot down and insist that the validation be implemented in stored procedures. The answer, as you might expect from a consultant, is “it depends.”
Advantages of Code-based Validation
One of the things that might inform your decision on what to use for validation is the skills possessed by the development team. If you’ve got a developer or developers who are well-versed in C# and LINQ, but don’t have a lot of experience writing stored procedures, you may want to cut them a break and let them use the tools they’re more familiar with.
Writing validation rules in code at the application layer allows developers to stay within the realm of .NET objects, which can result in faster development time.
Disadvantages of Code-based Validation
While there are probably more that could be discussed, I’ll mention just two of the significant disadvantages to writing validation rules in code.
First, if the code for the validation rules is using LINQ, particularly if the rules are complex, there’s the possibility of queries that generate sub-optimal SQL under the covers. This can be mitigated by profiling the queries to make sure that any performance hogs are caught as early as possible, but it’s certainly a valid concern.
Second, from a maintainability standpoint, having rules in the app means that adding rules requires the app to be recompiled and redeployed. For some apps and environments, this may not be a big deal, but in others, it could definitely be a deal-breaker.
Advantages of Stored Procedure-based Validation
Using stored procedures for validation provides some key advantages. One is proximity to the data. Unlike code-based validation, which may require pumping significant amounts of data over the wire from the database to the app tier, stored procedure-based validation keeps the logic on the DB tier, so performance may be significantly better.
Another advantage is that with a good execution design (for example, a master stored procedure that executes a list of validation rules in a specified order based on a configuration table), it can be relatively easy to introduce new rules with less disruption than having to recompile and redeploy an entire application.
Disadvantages of Stored Procedure-based Validation
The major disadvantage of using stored procedures for validation, speaking as an app developer, is the basic impedance mismatch between .NET code (C# or Visual Basic) and T-SQL. While it’s certainly possible for developers to master both, there’s a mental cost in switching between these environments, and a potential for mistakes when transitioning from one to the other.
The other downside of stored procedures is the mixing of application logic between the app tier and the database tier. While validation close to the data can, as noted, improve performance, if some parts of the application logic live in both the app and database tiers, this could make for more costly maintenance down the road.
Consistency is Key
One additional point I’d like to make is that it’s probably wise to choose one option or the other…not both. If you have multiple applications in development (or even in maintenance mode), having a mix of app-based or sproc-based validation will likely give you headaches at some point. So get your team together and have a discussion about how you’re currently handling things, and whether there might be a better way.
The short answer to “which is better” is really “either.” It all depends on the skills of your developers, the performance you need from your app, and the other factors I’ve discussed. Although I’m coming at this from the perspective of an app developer, I’ve recently become more comfortable with the idea of stored procedure-based validation, particularly in instances where more than one app may be targeting the same database, since this can help reduce redundancy, and centralize management of rules.
I’d love to get your feedback on how you’ve handled validation rules in your environment, so feel free to share a comment below.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
[Note from Pinal]: This is a 53rdth episode of Notes from the Field series. Everyday I get 100s of emails and most of the emails have a similar request. Everyone wants to get maximum performance, but they want to make the least amount of changes in their code. Well, though both of them are contradictory requests, it is possible in most of the cases if you know the technology inside like Linchpin People do. Here in this blog post, my close friend Stuart Ainsworth explains a cool trick, which I just learned today after so many years of experience. Wow, Stuart – thanks for this amazing note from the fields – I learned something new and there will be so many who will enjoy this post.
In this episode of the Notes from the Field series database expert Stuart Ainsworth explains Using Bitwise And (&) Instead of a Junction Table.
Bitwise operations in SQL Server are not often used, but like many of the tools available to SQL Server developers, bitwise operations can provide some interesting alternatives when you’re faced with specific challenges. One of my clients has a large database that relies heavily on many-to-many relationships to identify matching interests on multiple characteristics. As a simple example, let’s assume that I’m talking about fruit baskets.
In the simplest version of a fruit basket, you’d have two database objects: the basket, and the assortment of fruit. Baskets can use different combinations of fruit, and samples of fruit may appear in more than one basket, like so:
Basket 1: Apples
Basket 2: Apples, Bananas
Basket 3: Grapes, Apples
Basket 4: Strawberries, Bananas
The traditional method of modeling this relationship would be to use a junction table, as illustrated below.
However, my client had 500,000 baskets, and roughly 50 different fruits to choose from. Assuming that every basket had at least 10 different fruits, the junction table would have at least 5,000,000 rows of data. Even though the junction table was well indexed and strongly typed, my client’s design was suffering from slow read times. The client needed an alternative. Enter the bitwise AND (&).
Setting Up a Demo
Let’s set up a demo that illustrates both the junction table method and the bitwise AND alternative. First, you’ll create the following three tables and populate them (using table valued constructors):
Baskets, which includes a column for use with the Bitwise AND
FruitID, which is set up for use with the Bitwise AND
FruitBaskets, which is a junction table
Note that primary and foreign key references are not included for the simplicity of the demo. You’ll also be adding an extra column to the Baskets table to use for the Bitwise join. Finally, note that the ID column of the Fruit table mirrors the decimal values of the binary bit positions (e.g., 1, 2, 4, 8, 16, 32, 64, 128).
Now that you’ve got your tables set up, let’s run a couple of queries. First, you’ll use a junction table (the traditional, normalized model), and then you’ll use the Bitwise AND (&). In both cases, youy’re looking for baskets that contain apples:
/*Select the fruitbaskets containing Apples using the junction table*/ SELECT BasketID, BasketName FROM dbo.Baskets b WHERE EXISTS (SELECT * FROM dbo.FruitBasketsfb JOIN dbo.Fruit f ON fb.FruitID= f.FruitID WHERE b.BasketID= fb.BasketID AND f.FruitName= 'Apples') GO /*Select the fruitbaskets containing Apples using the bithash*/ SELECT BasketID, BasketName FROM dbo.Baskets b WHERE EXISTS (SELECT * FROM dbo.Fruit f WHERE b.FruitBitHash & f.FruitID<>0 AND f.FruitName= 'Apples') GO
If you run this demo, you’ll see that you get the exact same results from the two queries. However, the first query would need to read data from 3 tables, and the second query only needs 2. If the junction table is very large, the traditional method can be significantly slower than the second method.
But how does it work? An excellent explanation can be found here, but the short answer is that when you’re using the Bitwise AND (&) to compare two different integers, any value other than 0 that is returned from that comparison means that those integers share a common base. The magic happens with this line of code:
WHERE b.FruitBitHash & f.FruitID <>0
So, why don’t we do this all the time?
There’s an old expression, “If all you have is a hammer, then everything looks like a nail.” Different tools are best suited for different problems. The limitations of using the Bitwise method to remove a junction table include:
Violation of relational integrity: The surrogate IDs in the lookup table (e.g., the Fruit table) have to have a specific order and meaning. If you make a mistake when setting up the key values, you can get wrong answers.
A limited number of bitwise values can be stored in a bigint: In SQL Server, a bigint is 8 bytes, which means that there are 64 bits. When using a single bithash column, you can only have one value per bit. (Note that you can work around this by using multiple columns, but that gets complicated.)
The benefit of the Bitwise AND method is reduced disk I\O because it eliminates a large junction table. In this case, you did notice increased CPU usage using the Bitwise method, but the increase in performance was significant. However, on faster hardware, a junction table would probably have worked as well and still maintained relational integrity. For now, Bitwise AND is a useful tool for a very specific type of problem.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
[Note from Pinal]: This is a 52th episode of Notes from the Field series. I am very happy that the journey which we started one year ago is a amazing milestone. In this 52 episode in the entire year we have learned a lot of new things from industry experts of LinchPin People. They are amazing sets of people who know what they are doing on the field and in the real world. I have received so many notes from blog readers that they have benefited from the experience shared by LinchPin Team.
In this episode of the Notes from the Field series database expert Kathi Kellenberger explains Table Valued Constructors. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively unknown topic for database experts. Read the experience of Kathi in her own words.
Table Valued Constructors, also called Row Constructors, were introduced with SQL Server 2008, but many SQL Server professionals haven’t heard about them. They make it easy to insert multiple rows of hard-coded values into a table with just one insert statement. NOTE: In this post, I’ll abbreviate Table Valued Constructors as TVCs.
I’ll often see people using older techniques to populate sample tables when asking for T-SQL help on the forums. The following example demonstrates two techniques.
CREATE TABLE #test(Col1 INT, Col2 VARCHAR(10)); --Method 1, using UNION INSERT INTO #test(Col1, Col2) SELECT 1,'a' UNION ALL SELECT 2,'b' UNION ALL SELECT 3,'c';
--Method 2, multiple select statements INSERT INTO #test(Col1, Col2) SELECT 4, 'd'; INSERT INTO #test(Col1, Col2) VALUES( 5, 'e');
The first method takes advantage of the UNION ALL operator to combine three SELECT queries into one set of results. The second example uses a separate INSERT statement for each row to be inserted. One statement uses a SELECT with hard-coded values, while the second uses the VALUES keyword.
Beginning with the 2008 version, you can specify multiple rows with the VALUES syntax which is now called TVC. Each set of values must be within parentheses and each set must be separated by a comma. Example 2 demonstrates this technique.
The big advantage of TVCs is that you save typing. Of course, looking cool on the forums is a bonus.
While using the TVC to insert multiple rows with one insert statement is pretty fantastic, there are some other interesting uses. You can create a set of results using a TVC within a derived table. Take a look at this example:
SELECT * FROM (VALUES ('January'),('February'),
) AS Months ([Month]);
In the months example above, my TVC is part of the FROM clause. Notice that I must alias the TVC just like a derived table. I also have to provide column names right after the alias.
Another interesting use of TVCs involves the APPLY operator. A feature of APPLY is that columns from the outer query are visible inside the APPLY. Not only can you have hard-coded values as in the previous example, you can use columns from the outer query as well. The following example demonstrates this technique.
SELECT SOD.SalesOrderID, Dates.* FROM Sales.SalesOrderHeader AS SOD CROSS APPLY(VALUES('Order Date',OrderDate),
('Ship Date',ShipDate)) AS Dates(TypeOfDate,TheDate);
In this example, using the AdventureWorks database, each row from the SalesOrderHeader table is returned three times, once for each kind of date. Instead of displaying each date in its own column. This example “unpivots” the dates.
Table Valued Constructors can save you some typing and make your code look more organized. And, as you have seen here, they are also capable of some neat little tricks.
If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.
[Note from Pinal]: This is a 51th episode of Notes from the Field series. In one of the earlier blog post we discussed about BIML and lots of people asked what is actually BIML and why one should use it. If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.
In this episode of the Notes from the Field series database expert Reeves Smith explains The WHY of BIML. Read the experience of Reeves in his own words.
The Why of Biml
If you question whether Biml can be useful, you are not alone. After I introduce people to Biml, someone always asks, “How can writing something in an XML-based language actually help me out?” I asked this same question when I first started working with Biml. If I’m going to spend the time learning something it had better improve my development experience or bring something else of value to the tool belt. This article will show you how Biml can save you time creating packages.
As a quick refresher, Biml is an XML language used to describe Business Intelligence (BI) projects. This discussion on Biml will work within the Visual Studio (BIDS/SSDT) environment with BIDS Helper. Using Biml with BIDS and the BIDS Helper add-in allows you to create Integration Services (SSIS) projects. (Note that Biml is compatible with SQL Server Analysis Services projects but not within the BIDS Helper add-in.)
Be aware that the Biml language does not support formatting and layout. The language is built to be simple and effective with an intended design to script and automate.
Example – Source Packages
Why would you need to automate SSIS if all of your packages are different? If you look at most of the packages within your enterprise, some sections of code or complete packages are very similar in design, the only changes being metadata.
Source packages are a typical example of packages that are similar. With most source packages, you do a complete pull of all of the data needed within the tables in the source systems and load that data directly into tables in a stage environment with a similar shape. Most of the time, these packages are very basic and perform a limited number of transformations. Figure 1 illustrates a common source package.
Figure 1 – Common Source Package
If you look into the details of the above package, you will notice that the metadata from package to package is not very different. The differences within the metadata of the above example would include:
So now that you’ve seen the script needed for a common source package, you may still be asking the original question: Why would you write the above Biml to create five packages?
The best answer might be a chart showing the difference between creating packages manually and creating packages with Biml and then automating them with BimlScript. The chart in Figure 3 assumes it takes about one business day to create one package. Thus one package takes one day, two packages take two days, and so on. With a Biml and BimlScript solution, you have extra development time upfront in creating the Biml code and then adding the BimlScript needed for automation. That investment makes projects with a small number of packages not ideal.
Figure 2 – Package Development Methods
After you create a Biml file, the next step would be to add BimlScript needed for automation. BimlScript is added directly to the Biml file and is able to replace items within the XML.
BimlScript uses C# or Visual Basic as the scripting language. If automation with C# and Visual Basic sounds like another barrier to starting a Biml project, I have another suggestion to enable you to use Biml on your next project. Having to learn both Biml and BimlScript can be an overwhelming requirement to start a Biml project. What if you could defer learning almost any C# and just focus on Biml? The following example is going to demonstrate a method to do that.
If you took the Biml code shown in Listing 1 and added the line of code in Listing 2 to the <Biml> declaration in the script in Listing 1 above, you can add a variable that you could use later in your Biml file. Let’s not worry about anything except that this code creates a variable named tableName, which you’ll use later in the file.
Listing 3 – BimlScript used to create a Variable
This line alone will not accomplish anything if you do not use it within the Biml file. A good test to make sure you declared the BimlScirpt correctly would be to add the code and run Check Biml for Errors. Regenerating a package should still work and nothing will change.
However, if you add one more line of code to four places within the Biml file, you can use and replace that value with the variable declared in Listing 3. The three places within the code are defined by the three red outlines above, and one place is within the package name to create uniquely named SSIS packages.
Listing 4 – BimlScript that will use the Variable
The final code is shown below in Listing 4.
Listing 5 – Semi-Automated Common Source Package Biml
If you need to create another package, just update the variable and regenerate the package. This will enable you to create multiple packages within minutes. It is not a perfect solution but adds a great technique on the way to learning Biml.
Note that there are cut-and-paste issues within Visual Studio that will result in extra quotation marks. These are outlined in Listing 5 below. Removing the extra quotes is all you need to do, and you are able to modify the Visual Studio development environment to alleviate this issue. That is discussed here.
Listing 6 – BimlScript Cut-and-Paste Issue
This article had touched on a lot, and I hope I’ve given you a reason to consider Biml on your next project. The article used BimlScript, and scripting is the place to focus after you learn Biml. Biml with BimlScript can provide huge savings in development effort once you’ve learned them.
In a future article I will demonstrate how to add more BimlScript to the existing code to increase package automation even further.
If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.