SQL SERVER – How to Find Weak Passwords Using T-SQL?

Recently one of my blog reader emailed me below question.

Hi Pinal,
Need your urgent help. In recent past, we have been attacked by the hacker who was able to get in to our SQL Server via sysadmin account and made big damage to our data. To make sure it doesn’t happen in future, I have taken task to find out SQL Server password which are weak.

Do you have any suggestions for me?

This is one of the area which is always haunting all SQL DBAs. There are recommendations to use Windows Authentication to connect to SQL Server and that would save from all such problem. But it is not always feasible to use Windows Authentication. Now, if we decided to choose SQL Authentication, there is a setting which is “Enforce Password Policy” which would ensure that you are choosing a strong password.

If recommendations are not followed, you might end up in situation where SQL Logins have weak and basic passwords. SQL Server has provided a function PWDCOMPARE which can become very useful to find known password. Below are few example use of this out of box funtion:

SELECT NAME,
NAME 'password'
FROM   sys.sql_logins
WHERE  Pwdcompare(NAME, password_hash) = 1
UNION
SELECT
NAME,
'<blank>' AS 'password'
FROM   sys.sql_logins
WHERE  Pwdcompare('', password_hash) = 1
UNION
SELECT
NAME,
'password123' AS 'password'
FROM   sys.sql_logins
WHERE  Pwdcompare('password123', password_hash) = 1

In above query, we are trying to find:

  • Password same as user name – first query
  • Blank password – second query
  • Password = password123 – third query

These are one of the most common password used in the industry. I am sure you can extend this further by modifying it and adding more weak passwords.

Here is the sample output for the above:

Hope this blog would help you in finding weak passwords and make it more complex. Have you ever had a need to use such passwords in your environments? Do let me know.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Iterating over Metadata with BimlScript – Notes from the Field #061

[Note from Pinal]: This is a 61st episode of Notes from the Field series. One of the common question I receive every other day is – I have learned BI but I feel that there is no automation in the BI field, what should I do? How do I go beyond learning BI? How can I fill the gap between BIDS and SSDT? If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.

In this episode of the Notes from the Field series database expert Reeves Smith explains what is Biml and How Can it Help Me with SQL Server BI?. Read the experience of Reeves in his own words.


In previous articles (What is Biml and The Why of Biml), I’ve discussed Business Intelligence Markup Language (Biml) and given some examples of projects that would benefit from its use. I also demonstrated a quick way to get introduced to Biml without learning a lot of C#. As you grow in your understanding of Biml and BimlScript you will want to make your Biml files more dynamic with a programmatic metadata-driven approach. This article will build on the previous articles and add looping with metadata to build more dynamic Biml projects.

Overview

Starting where the previous article left off, I’ll demonstrate how to add a little C# to make a more automated method of metadata retrieval. If you did not walk through the other articles, I would suggest that you review them before you read this one. This article builds on the Biml in Listing 1 and lets you create multiple integration services packages from one Biml file.

Listing 1 – Semi-Automated Common Source Package Biml

Example – Automated Source Packages

This example shows you how to connect to a database and select the metadata to iterate over. Then you can create multiple SQL Server Integration Services (SSIS) packages from one Biml file. The example in the previous article manually updated the metadata and reran each Biml file. This is an effective approach, but you can build on it to create a more robust solution. Take a look at the C# code in Listing 2.

Listing 2 – C# Replacement Code

Replace the first line of code from Listing 1 with the code in Listing 2, which is an excerpt from Listing 3.

  • The first line will add the namespace to the Biml file and enable the calling of functions without a fully qualified name.
  • The second line creates a string variable that contains a connection string of the database with the metadata.
  • The third line creates a connection object (actually an AstOleDbConnection node) that will be used by the ImportDB function.
  • The forth line creates a result set with the metadata from the connection object. This object contains all of the metadata from the AdventureWorks2012 database. The ImportDB function has three parameter. The first parameter is a schema filter, and in the example the filter is on the Person schema. The second parameter is a table filter and is blank in the example, so it will be ignored. The last parameter is a list of ImportOptions, and the views have been excluded. To add more import options use the “|” followed by additional options. Example of multiple options:

ImportOptions.ExcludeForeignKey | ImportOptionsExcludeViews

The example includes an object that contains all of the metadata that you will loop over. You have to place the loop within the <Packages> node because a Biml file can only have one Packages collection. The foreach loop will create multiple <Package> nodes.

A tableName string was built below the foreach loop to reduce the code that you have to alter from the example in the previous article.

Listing 3 – Complete Code

Note: The code in listing 3 has the ConnectionString truncated to save space.

If you receive the error shown in Figure 1, it means that the tables do not exist in the stage database. To check this, add the Person filter to the table parameter in the ImportDB function and return metadata for only one table. The prior articles used this table, so it should exist.

Figure 1 – Error Message

Summary

This article added some more C# to help you refine your Biml files. I hope you can start to see the power of Biml and how it can help remove the repetitive work from your next Integration Services project.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Performance Tuning – Is It Really A Top Skills for a SQL Server Consultant? – Notes from the Field #059

[Note from Pinal]: This is a 59th episode of Notes from the Field series. I get over 100 emails everyday and out of 100 emails nearly 90 of them are really related to performance tuning. The emails I get is from various different IT professionals. There are quite a few application developers like (Dotnet, Java, PHP etc) who do application development everyday, but they are not always expert of SQL Server. When I read those emails I feel that the world indeed need plenty of SQL Server Performance Tuning consultants. I asked these questions to SQL Server Guru Brian Moran. He is a unique blend of person who knows the consultant’s business from technology and business perspective.

Today is Christmas Day and we have received a wonderful gift from Brian!

In this episode of the Notes from the Field series database expert Brian Moran explains in very simple words the reality of the SQL Server Consultant world. Read the experience of Brian in his own words.


December and January are times of the year when many people reflect on personal and career goals. It would be silly and naïve for me to pretend that I know what’s best for your career so take this post with a grain of salt. Maybe you want to be a fireman or an astronaut? I don’t have many words of wisdom if that is your hope.

But, I do have a lot of knowledge for data professionals who aspire to be independent consultants.

I focus more on the business side of running my consulting business these days so I’m not as technical as I used to be. But I was a SQL MVP for many years and was a pretty good tuning architect. I’ve been on the PASS Board twice and have been involved in the SQL Server space for almost 25 years. Over the years I’ve been part of the leadership team that built two successful consulting companies and I’ve coached and mentored dozens of technologists over the past two decades as they got their start in consulting.

This post is for folks who love the engine side of being a database pro and want to be an independent consultant. I’m not talking to the BI folks or the other myriad of specialties that exist in #SQLFamily. But do you love the engine? I mean really love working with the core engine? Do you want to be a consultant one day?

World class and amazing performance tuning skills are consistently the on the short list of skills that I’ve seen successful independent SQL Server consultants have over the past 20 years. Are there other skills out there that allow a consultant to have a great career? Of course. You can do well with any specialty that is hard to master and that companies absolutely must have access to in order to be successful. HA/DR, consolidation, virtualization, security, and a wide variety of other topics come to mind. But here is one element of performance tuning that that tends to create opportunities for consultants to maximize their rates and have a successful career.

Top performance tuning consultants can have a pretty quick impact on a problem and companies often need access to the skill on short notice for critical needs. Sometimes tuners can solve a problem in minutes or hours. And long term performance tuning work is often measured in days or weeks rather than weeks or months that some other types of projects require.

Here is some basic math to think about. Let’s say you want to bill $250/hr. Is that expensive? Well, like any SQL question the answer is it depends. Let’s look at it this way.

  • Assume you can solve a critical problem for a customer in 20 hours. $250/hr for 20 hours is $5,000.
  • Imagine you are competing against someone who says they can solve the problem for $125/hr.
  • 50% less sounds like a big savings for the customer, right? But, it’s really only a net savings of $2,500.

Do you think many companies care about a difference of $2,500 when they desperately need to solve an urgent problem impacting critical business systems? Probably not. They probably care more about picking the person they believe is the best fit and most likely to solve the problem.

But now let’s imagine that you are talking to a potential customer about a project that might take 6 months to solve working full time. That’s about 1000 billable hours. Let’s make the same assumptions that you are charging $250/hr and someone else is going to offer $125.hr. Now the 50% differential is rate yields a net savings of $125,000.

Will many companies care about saving $125,000 if they can? Yep. They sure do.

I find that many folks never fully think through this side of the consulting equation early in their career. IE, within reason companies, mostly don’t care about the rate for performance tuners as long as you are awesome and can get the job done. Don’t get me wrong. It’s quite possible to charge premium rates for long term projects and you can certainly be a successful consultant while, focusing on some other aspect of SQL Server. But almost 25 years in the business have taught me that loving performance tuning and wanting to be an independent consultant can be a match made in heaven.

If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Beginning with SQL Server Security Aspects

SQL Server Security Aspects

This article is the high-level overview of the SQL Server security best practices and aspects. Ensuring SQL Server security is an important task that may be successfully solved by applying best practices and proven solutions described further in this article.

Physical Security

SQL Server physical security aspects are often overlooked. Frequently DB admins focus on the software security and network security, while completely forgetting about the physical server itself. However, the possibility to physically reach the server, will lead to abrogation of all other security configurations that they apply to the software. As far as SQL Server is installed on a physical device, whether it would be server hardware, desktop PC, laptop, or other PC in a data center. The physical security main goal is to prevent someone from gaining access to the machine.

Shared data centers or server rooms provide a wide range of physical and environmental security to ensure that secure areas are protected by appropriate entry controls to ensure that only authorized personnel are allowed access. An appropriate, physical protection should be provided against damage from natural, or man-made disasters, such as fire, flood, explosion etc. All users are required to ensure that systems are not left open to access by intruders to buildings, or by unauthorized colleagues.

If the personnel in the data center  is unable to physically lock away the server, you must ensure that all USB ports on the server are disabled. Keep in mind that security is not complete if physical security is overlooked.

Network  Security

The network is the way that external threads may come to attack your SQL Server.

Firewalls provide efficient means to implement security. A firewall is a separator or restrictor of network traffic, which can be configured to enforce your data security policy. If you use a firewall, you will increase security at the operating system.

SQL Server Instance Security

Every SQL Server instance that is installed on a PC can be considered to be a security domain. Thus, at the instance level you could apply separate security considerations.

I will use the dbForge Studio for SQL Server v 4.0 for the demonstration purposes as it has an excellent tool called Security Manager.

The following picture demonstrates two separate instances of SQL Server on the same computer.  Each has its own specific name, the first one is MSSQLSERVER\Instance1, the second one is MSSQLSERVER\Instance2.

This picture gives us the ability to see how security applies at the instance level. SQL Server Security uses the concept of securables that will be discussed later in this article.

Securables are the objects to which the SQL Server authorization system controls access. For instance, a table is a securable. Some securables can be contained within others, creating nested hierarchies called “scopes”. The securable scopes are server, database, and schema.

An object that receives permission to a securable is called a principal. The most common principals are logins and users.

A login is a security principal, or an entity that can be authenticated by a secure system. Users need a login to connect to SQL Server. You can create a login based on a Windows principal (such as a domain user or a Windows domain group) or you can create a login that is not based on a Windows principal (such as an SQL Server login).

To control access to securables, a DB admin can grant or deny permissions, or add logins and users to roles that have access.

It is important to note that every individual instance has it’s own set of security considerations. The security folders on each instance contain the same users, application roles,(etc.) folders. However, if we expand the logins folder you can see the different number of accounts.

This proves that we can have different security settings for individual users across different instances.

It is also important to note that when you do set up an SQL Server instance, you only install the necessary features that are necessary for that instance to operate. This reduces the possibility to attack of your specific SQL Server instance, by reducing the number of services and features that are available for malicious users to gain access to.

SQL Server Securables

Securables are the resources to which the SQL Server Database Engine authorization system regulates access. Essentially these are the database objects to which you can apply security on your instance of SQL Server. Some securables are standalone and others may be contained within another securable.

Each database also has its own security folder, where we can focus on users, which again, become the logins that we assign the permissions to for our database.

We also have database level roles and application roles. Currently, there are no application roles assigned on this server. There are another components that we do not see in the Database Explorer, however they still exist at the data base level. These components are assemblies which are typically DLL files that are used in the instance of sequel server for deploying functions stored procedures, or triggers. These components are managed by CLR.

SQL Server Principals

There are three high level categories of SQL Server security principals:

  1. Windows level (domain and local logins)
  2. SQL Server level (SQL Logins and server roles)
  3. Database level (database users, DB and application roles)

There are two SQL Server authentication types:

  • by using Windows account security token
  • by using an account defined inside SQL Server

You can tune the type of authentication during installation. Windows mode is the most secure, as it relies on Windows accounts as opposed to SQL accounts. The recommended setting is Windows Authentication and it is selected by default. This authentication type  simplifies administration and is more secure, because it uses Windows security tokens. There is no need to remember another password, and no password transits through the local network.

You cannot disable Windows authentication, however you can extend it with SQL Server authentication. Thus, in case you need to allow access to the users outside the local network, you should use SQL Server authentication. In this case, logins are defined and managed inside SQL Server. The authentication and password validation is managed by SQL Server also.

There are also default roles created during installation as well. One of which is known as the public role.

SQL Server creates the public role as a way of assigning permissions to users who have accounts on the database, but who do not have any other currently assigned permissions. The public role serves the purpose, of providing limited permissions to user accounts on the database, until you’ve had the opportunity to assign the necessary permissions for those user accounts. SQL Server also includes a guest user account. This account is not a server level account, but exists at the database level.

If we expand the users folder within the database, we will notice a guest account, that is available for each database. The guest account is used to grant permissions to users, who might be able to access the database. But, who do not have a user account assigned, or created in the database itself.

The guest account cannot be dropped, but it can be disabled, through the revocation of the connect permission. User and application access to your SQL Server instance, will be controlled using these security principles.

Manage Logins and Users

As mentioned above in this article, SQL Server implements security through the use of securables, which are the database objects, and security principles, which are the user’s inner applications that will access the database.

If you know how to create the logins and user accounts – it is the first step in creating the authentication and authorization model for your SQL Server.

Before creating logins, you must know which authentication mode SQL Server instance is configured to use. In case of mixed mode authentication, we can create Windows accounts, or we can create local SQL Server accounts for authentication on the instance of SQL Server.

Keep in mind that SQL Server allows you to create logins through Windows accounts or SQL Server accounts at the server level. Also SQL Server permits the creation of local user accounts at the database level. If we expand the database, and expand the Security folder, we notice that SQL Server includes a Users folder at the database level.

Simply right click the Users folder, and then click New User. This allows us to choose a user type at the database level, as opposed to the server level. We can create SQL user with or without a login, what means that this would be a user name we create specifically in this database.

A good example of such an account is the guest account, which is created at the database level, but does not exist at the logins level for the server itself.

You can set the authentication mode to Mix Mode at any time after the installation. And then you can create SQL Server accounts, or use Windows accounts, depending on your choice.

Understanding each type of account and how security is applied to these accounts will help you set the proper security for user access.

Understanding and using roles

SQL Server roles allow you to group users or logins into a specific container for assigning permissions to securables on your SQL Server instance. Permissions assigned to a role are applied to any user or login that is associated with that role. SQL Server provides nine fixed server roles. You can find these server roles in the Security folder of the SQL Server instance. Outside of the nine fixed server roles, SQL Server also provides database level roles.

SQL Server Permissions

Accessing any of the secureables in SQL Server requires that the login or user have permissions. The permissions in SQL Server follow certain naming conventions. The security check algorithm used by SQL is complex. And the complexity comes from the need to evaluate group or role membership, as well as explicit and implicit permissions.

There are three core elements of SQL Server permission checking:

  1. Security context. This is related to the user, the login or the role.
  2. Permission space. The permission space will focus around the securable, such as the database object that the user is attempting to access. SQL Server checks the permissions that the principle has been assigned according to that securable.
  3. Required permissions. This is an important step because some tasks require more than one permission. For instance, if a user attempts to execute a stored procedure, the execute permission would be required in the stored procedure itself.

Summary

SQL Server security is too huge topic to be discussed in a single article. However, this article provides an overview of core principles and concepts of SQL Server security.

SQL SERVER – Common Sense Data Security – Notes from the Field #055

[Note from Pinal]: This is a 55th episode of Notes from the Field series. Common sense is not as much as common as we think. I am sure you agree with it from your real world experience. However, when it is about data and its security, there has to be some rules along with the policy but common sense is extremely critical. When I read this article, I find it humorous at points and some of the examples also reminded me of my past experience. If you are in data security, you will have a great time reading these notes, but if you are not, you will still love it.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains common sense data security and how we can apply in daily life in real world. Read the experience of Reeves in his own words.


There are many excellent books and articles that address the correct ways to store sensitive user information. Yet, many in IT are still failing to protect customers from loss due to data breaches. Every day, it seems that there’s another retailer or commercial web site in the news for losing passwords or credit card numbers to hackers. As an industry, why are we struggling to secure this type of information when there’s so much good intelligence and so many great tools for getting the job done? It’s a complicated subject so perhaps it’s time to step back a bit and use a bit of common sense to analyze the problem.

No matter the industry, using the right tool for the job is rule number one. Line-of-business databases are all about organizing information and getting it into the hands of people who perform transactions and make decisions with it. As a result, these databases become naturally permissive by nature, especially as they evolve to meet the demands of growing businesses. There are good access controls in modern databases but when it comes to managing ultra-secure bits of data, traditional, relational databases may not be the best fit for the job.

Lightweight Directory Access Protocol (LDAP) servers like ApacheDS, OpenLDAP and Microsoft Active Directory do a much better job of handling sensitive data with less trouble than any custom coding we might do on our own. Moreover, the built-in authentication functions of LDAP are mature and standards-based, making them safe and reusable from many different applications without custom interface development. It’s our duty as technologists and as business people to highlight the high cost of custom security solutions and the huge potential risks to our managers. In particular, when it comes to storing passwords in our line-of-business databases, just say no.

If we must manage financial instruments or personally identifying information in a database like SQL Server, there are three classes of problems to solve:

  1. Keeping the hackers from stealing our stuff,
  2. Detecting when breach attempts occur, and
  3. If data is unfortunately lost, making the information useless.

Let’s think about these efforts from a common sense perspective. Problem one is all about access control. The problem with permissions in any complex system is that they are difficult to maintain over time. Even if the initial configuration and policies safeguard the sensitive data, some future administrator may fail to understand or enforce the rules correctly. We could make those future administrators’ jobs much easier if we followed one simple rule: never mix highly-sensitive data in tables containing non-privileged data.

It’s deceptively simple-sounding but in practice, if sensitive data is always segregated into encrypted tables (http://blog.sqlauthority.com/2009/04/28/sql-server-introduction-to-sql-server-encryption-and-symmetric-key-encryption-tutorial-with-script/) and placed into a separate, secure schema requiring elevated access privileges, mistakes concerning permissions will become less likely over time. Moreover, by denying SELECT, INSERT, UPDATE and DELETE privileges on the secured tables, every query can be routed through stored procedures where problems two and three can be addressed with auditing and data obfuscation controls. Lastly, to ensure that lost data is useless, use the new Backup Encryption feature of SQL Server 2014 or invest in a third-party tool that does the same.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Validation Rules: Code or Database? – Notes from the Field #054

[Note from Pinal]: This is a 54th episode of Notes from the Field series. Where do we blame for a mistake or error in the system? Well, developer blames DBA and DBA blame developers. Sometimes there is no solution to the catch 22 situation. I have been there and I am very sure that you have been there too. Well, this is an amazing and beautiful article by G. Andrew Duthie. He has attempted to demystify the problem which we all face every day.

In this episode of the Notes from the Field series database expert Andrew Duthie explains about Validation Rules and where they should be implemented. Read the experience of Andrew in his own words.


 

If you’re a DBA dealing with developers, you may run into the question of whether it’s better to allow the developers to write validation rules in their .NET app layer, or whether you should put your foot down and insist that the validation be implemented in stored procedures. The answer, as you might expect from a consultant, is “it depends.”

Advantages of Code-based Validation

One of the things that might inform your decision on what to use for validation is the skills possessed by the development team. If you’ve got a developer or developers who are well-versed in C# and LINQ, but don’t have a lot of experience writing stored procedures, you may want to cut them a break and let them use the tools they’re more familiar with.

Writing validation rules in code at the application layer allows developers to stay within the realm of .NET objects, which can result in faster development time.

Disadvantages of Code-based Validation

While there are probably more that could be discussed, I’ll mention just two of the significant disadvantages to writing validation rules in code.

First, if the code for the validation rules is using LINQ, particularly if the rules are complex, there’s the possibility of queries that generate sub-optimal SQL under the covers. This can be mitigated by profiling the queries to make sure that any performance hogs are caught as early as possible, but it’s certainly a valid concern.

Second, from a maintainability standpoint, having rules in the app means that adding rules requires the app to be recompiled and redeployed. For some apps and environments, this may not be a big deal, but in others, it could definitely be a deal-breaker.

Advantages of Stored Procedure-based Validation

Using stored procedures for validation provides some key advantages. One is proximity to the data. Unlike code-based validation, which may require pumping significant amounts of data over the wire from the database to the app tier, stored procedure-based validation keeps the logic on the DB tier, so performance may be significantly better.

Another advantage is that with a good execution design (for example, a master stored procedure that executes a list of validation rules in a specified order based on a configuration table), it can be relatively easy to introduce new rules with less disruption than having to recompile and redeploy an entire application.

Disadvantages of Stored Procedure-based Validation

The major disadvantage of using stored procedures for validation, speaking as an app developer, is the basic impedance mismatch between .NET code (C# or Visual Basic) and T-SQL. While it’s certainly possible for developers to master both, there’s a mental cost in switching between these environments, and a potential for mistakes when transitioning from one to the other.

The other downside of stored procedures is the mixing of application logic between the app tier and the database tier. While validation close to the data can, as noted, improve performance, if some parts of the application logic live in both the app and database tiers, this could make for more costly maintenance down the road.

Consistency is Key

One additional point I’d like to make is that it’s probably wise to choose one option or the othernot both. If you have multiple applications in development (or even in maintenance mode), having a mix of app-based or sproc-based validation will likely give you headaches at some point. So get your team together and have a discussion about how you’re currently handling things, and whether there might be a better way.

Summary

The short answer to “which is better” is really “either.” It all depends on the skills of your developers, the performance you need from your app, and the other factors I’ve discussed. Although I’m coming at this from the perspective of an app developer, I’ve recently become more comfortable with the idea of stored procedure-based validation, particularly in instances where more than one app may be targeting the same database, since this can help reduce redundancy, and centralize management of rules.

I’d love to get your feedback on how you’ve handled validation rules in your environment, so feel free to share a comment below.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using Bitwise And (&) Instead of a Junction Table – Notes from the Field #053

[Note from Pinal]: This is a 53rdth episode of Notes from the Field series. Everyday I get 100s of emails and most of the emails have a similar request. Everyone wants to get maximum performance, but they want to make the least amount of changes in their code. Well, though both of them are contradictory requests, it is possible in most of the cases if you know the technology inside like Linchpin People do. Here in this blog post, my close friend Stuart Ainsworth explains a cool trick, which I just learned today after so many years of experience. Wow, Stuart – thanks for this amazing note from the fields – I learned something new and there will be so many who will enjoy this post.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains Using Bitwise And (&) Instead of a Junction Table.


 

Bitwise operations in SQL Server are not often used, but like many of the tools available to SQL Server developers, bitwise operations can provide some interesting alternatives when you’re faced with specific challenges. One of my clients has a large database that relies heavily on many-to-many relationships to identify matching interests on multiple characteristics. As a simple example, let’s assume that I’m talking about fruit baskets.

In the simplest version of a fruit basket, you’d have two database objects: the basket, and the assortment of fruit.  Baskets can use different combinations of fruit, and samples of fruit may appear in more than one basket, like so:

Basket 1: Apples
Basket 2: Apples, Bananas
Basket 3: Grapes, Apples
Basket 4: Strawberries, Bananas

The traditional method of modeling this relationship would be to use a junction table, as illustrated below.

 

However, my client  had 500,000 baskets, and roughly 50 different fruits to choose from. Assuming that every basket had at least 10 different fruits, the junction table would have at least 5,000,000 rows of data. Even though the junction table was well indexed and strongly typed, my client’s design was suffering from slow read times.  The client needed an alternative. Enter the bitwise AND (&).

Setting Up a Demo

Let’s set up a demo that illustrates both the junction table method and the bitwise AND alternative.  First, you’ll create the following three tables and populate them (using table valued constructors):

  1. Baskets, which includes a column for use with the Bitwise AND
  2. FruitID, which is set up for use with the Bitwise AND
  3. FruitBaskets, which is a junction table

Note that primary and foreign key references are not included for the simplicity of the demo. You’ll also be adding an extra column to the Baskets table to use for the Bitwise join. Finally, note that the ID column of the Fruit table mirrors the decimal values of the binary bit positions (e.g., 1, 2, 4, 8, 16, 32, 64, 128).

CREATE TABLE Baskets
(
BasketID INT
, BasketName VARCHAR(100)
,
FruitBitHash BIGINT
)
CREATE TABLE Fruit
(
FruitID BIGINT
, FruitName VARCHAR(20)
)
CREATE TABLE FruitBaskets
(
BasketID INT
, FruitID BIGINT
)
GO
INSERT  INTO Fruit
( FruitID, FruitName)
VALUES  ( 1, 'Apples'),
(
2, 'Bananas'),
(
4, 'Grapes'),
(
8, 'Strawberries')
GO
INSERT  INTO dbo.Baskets
( BasketID, BasketName, FruitBitHash)
VALUES  ( 1, 'Apples', 1),
(
2, 'Apples, Bananas', 1 + 2),
(
3, 'Grapes, Apples', 1 + 4),
(
4, 'Strawberries, Bananas', 8 + 2)
GO
INSERT  INTO dbo.FruitBaskets
( BasketID, FruitID)
VALUES  ( 1, 1),
(
2, 1 ),
(
2, 2 ),
(
3, 1 ),
(
3, 4 ),
(
4, 8 ),
(
4, 2 )
GO

Now that you’ve got your tables set up, let’s run a couple of queries. First, you’ll use a junction table (the traditional, normalized model), and then you’ll use the Bitwise AND (&).  In both cases, youy’re looking for baskets that contain apples:

/*Select the fruitbaskets containing Apples using the junction table*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.FruitBaskets fb
JOIN dbo.Fruit f ON fb.FruitID = f.FruitID
WHERE b.BasketID = fb.BasketID
AND f.FruitName = 'Apples')
GO
/*Select the fruitbaskets containing Apples using the bithash*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.Fruit f
WHERE b.FruitBitHash &amp; f.FruitID <>0
AND f.FruitName = 'Apples')
GO

If you run this demo, you’ll see that you get the exact same results from the two queries. However, the first query would need to read data from 3 tables, and the second query only needs 2. If the junction table is very large, the traditional method can be significantly slower than the second method.

But how does it work? An excellent explanation can be found here, but the short answer is that when you’re using the Bitwise AND (&) to compare two different integers, any value other than 0 that is returned from that comparison means that those integers share a common base. The magic happens with this line of code:

WHERE b.FruitBitHash & f.FruitID <>0

So, why don’t we do this all the time?

There’s an old expression, “If all you have is a hammer, then everything looks like a nail.” Different tools are best suited for different problems. The limitations of using the Bitwise method to remove a junction table include:

  1. Violation of relational integrity: The surrogate IDs in the lookup table (e.g., the Fruit table) have to have a specific order and meaning. If you make a mistake when setting up the key values, you can get wrong answers.
  2. A limited number of bitwise values can be stored in a bigint: In SQL Server, a bigint is 8 bytes, which means that there are 64 bits. When using a single bithash column, you can only have one value per bit. (Note that you can work around this by using multiple columns, but that gets complicated.)

The benefit of the Bitwise AND method is reduced disk I\O because it eliminates a large junction table. In this case, you did notice increased CPU usage using the Bitwise method, but the increase in performance was significant. However, on faster hardware, a junction table would probably have worked as well and still maintained relational integrity. For now, Bitwise AND is a useful tool for a very specific type of problem.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)