SQL SERVER – Creating the SSIS Catalog – Notes from the Field #058

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Creating the SSIS catalog.

Linchpin People are database coaches and wellness experts for a data driven world. In this 58th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to Creating the SSIS Catalog.


With the release of SQL Server 2012, Integration Services (SSIS) received a significant overhaul.  The most notable change was the addition of the SSIS catalog, a new deployment and storage mechanism that helps to centralize and standardize the storage, execution, and logging of SSIS packages.  In this post, I will briefly illustrate how to set up the SSIS catalog on a new installation of SQL Server 2012 or 2014.

The SSIS Catalog

When connecting to an instance of SQL Server using SSMS 2012 or 2014, you’ll notice that there is a new node in the tree, entitled Integration Services Catalogs.  By default, this node will be empty as shown below.

Before we can begin deploying SSIS packages to the catalog on this instance, we need to create the catalog.  It is worth noting here that, although the plural node name of Integration Services Catalogs implies that we may have more than one catalog, we can only create one SSIS catalog per SQL Server instance.

Creating the SSIS Catalog

To create the SSIS catalog, simply right click the Integration Services Catalogs node and choose Create Catalog.

Next, you’ll be presented with the Create Catalog dialog, which allows you to set the options for creating the catalog.  There are really only two things that you need to configure here:

  • Enable automatic execution of Integration Services stored procedure at SQL Server startup. This option will enable the automatic execution of a cleanup stored procedure when SQL Server starts.  This stored procedure will check for and update any packages whose status in the catalog was left in limbo due to an unexpected shutdown of SQL Server.  I don’t know of any downsides to enabling this, so I recommend that you leave this box checked.
  • Encryption password. Because you can store sensitive information (such as database or FTP passwords) in the SSIS catalog, this database requires an encryption password to properly secure those assets.

You’ll also notice that the name of the catalog database (SSISDB) is presented in this window, but it is not editable.  In the current versions (2012/2014) of SSIS, the name of the catalog database must be set to SSISDB.  Also, the SSIS catalog requires enabling CLR integration, so this option is shown but is also not editable.

Once you’ve configured the encryption password and selected the option of whether to allow the cleanup stored procedure to run, you’re ready to create the catalog.  Simply click the OK button in this window to create the catalog.  After so doing, you can browse to the Object Explorer in SSMS and see that we now have the SSISDB catalog listed under the Integration Services Catalogs node.  The catalog is now ready for use!

One final note on the SSIS catalog: If you browse the list of databases after creating the catalog, you’ll notice that there is a new user database named SSISDB in the list of databases.  The SSIS catalog is contained in its own database on the SQL Server instance on which it is installed.  Unlike prior versions of SSIS which stored the SSIS assets in MSDB, the SSIS catalog uses a dedicated user database to better isolate the SSIS storage.

Conclusion

With a new installation of SQL Server 2012 or 2014, the SSIS catalog is not installed by default.  This brief post demonstrated how to easily and quickly set up the SSIS catalog on a new installation of SQL Server.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

Hey DBA – Baselines and Performance Monitoring – Why? – Notes from the Field #058

[Note from Pinal]: This is a 58th episode of Notes from the Field series. Mike Walsh is an amazing person to meet and know. I am fortunate to be his friend. Every single time I meet him, I learn something as a human. He is always looking for an opportunity to improve the life of DBAs and Developers.

In one of the recent communication with Mike, I had asked him a question that what is actually Baselines and Performance Monitoring? How do we define what is normal and what is not normal? Where do we draw lines when we talk about performance? Database expert Mike Walsh decided to help us with the answer of this question.

Read the entire story in his own words.


Last time it was my time to post here on SQL Authority, I suggested that DBAs make a proactive investment in their environments.

I can’t stress this enough – procrastination is our enemy as technologists. Especially as DBAs. We have things we should do on a regular basis to keep our environments moving along. I gave a few suggestions of places to start in that post three months ago.  If you’ve not started down the road on those things, stop reading this and do those things first.

If you’ve started being proactive in those areas already? Keep reading and let’s talk about being proactive in another area:

Baselines and Performance Monitoring – Why?

It’s really important to know what “normal” looks like for a lot of reasons when it comes to performance in your environment.

The answers to these questions all require you to know what normal looks like:

  • When will this environment exhaust the resources available in our current setup?
  • Things are slow – what’s the cause?
  • Can we consolidate onto fewer servers?

Without having a sense of what normal looks like for server performance – you can’t accurately answer those questions. For some of them you need to watch what normal looks like over time. And normal can look different on different servers.

Regularly looking at your performance metrics and comparing them on a regular basis is an important step in being proactive. The process will teach you about your environment and let you see trends develop over time. It will also teach you about SQL Server performance more as you dig into the various counters and understand what they mean.

Baselines and Performance Monitoring – An Approach                 

There are a lot of approaches to take here. You can run your favorite perfmon counters on a regular basis and look at the results in Excel. You can use a performance and event monitoring tool like SQL Sentry – and look at the data it collects over time. Or any number of approaches in between.

While I often advise my consulting clients to use several of these approaches – a good place for you to start learning and doing is with a free tool called PAL – Performance Analysis of Logs. This tool is available on Codeplex and I describe how to use this tool in a couple blog posts and videos on the Linchpin People blog (http://www.linchpinpeople.com/sql-server-performance-baseline-presentation-seacoast-sql/ is a presentation I give on baselines and baseline metric collection; http://www.linchpinpeople.com/how-create-sql-server-baseline-using-pal/ shows a couple videos on how to use the PAL tool and create a perfmon template).

Rather than regurgitate what those resources, and the posts I link to in them, describe – I’ll suggest a few action items that you can follow right now. Those links will help you on your journey here. Even if your environment has no issues, the baseline can prove valuable to you in the future – and the process is guaranteed to teach you something about SQL Server performance you didn’t know before you started. I’ve been working with SQL server for 16 years and I still learn something every time I engage in serious baseline study.

Baselines and Performance Monitoring – Action Plan

Some suggested next steps you can take to get on the road to being proactive about performance:

  • Learn about the PAL tool – use the links here to start.
  • Collect “Fake” Performance Data – Start with a test environment if you aren’t sure, watch the impact and get comfortable with the process.
  • Collect Real Performance Data – Once you are comfortable with the approach, do the steps in production. Track some data for a couple days, a day or a week depending on the sample interval you go with.
  • Analyze The Data – Run the performance metrics through the PAL tool and look for alerts. Keep in mind that not all alerts require immediate action, but look at the alerts and learn about them. Read the tips in the PAL tool, look at blogs like this one to understand what is going on and see if you have issues.
  • Rinse and Repeat – Try this again in a month or a few months. What’s changed? Are you doing more activity (maybe batches/second or transactions/sec or user count) and seeing your resources still performing well? That’s great. Do you see performance slowing down but activity not really that much higher? Dig in and see why you are falling off your baseline.

The point is – get started looking at your baselines. You may not have an urgent and immediate need for a good set of baselines today – but when that day comes, you’ll wish you took the time to collect that data. I can’t tell you how often we get called in to help a customer experiencing performance concerns and a baseline would help at least get a sense for what changed from when things were good. We can still help get them out of their mess all the same – but if we had a few baselines over time we can sometimes more quickly figure out where the most beneficial changes can come.

If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Basics of the SSIS Data Flow Task – Notes from the Field #057

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. As wikipedia says – It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – How to learn SSIS data flow task? Andy was very kind to answer the questions and provides plenty of information about how a novice developer can learn SSIS from the beginning and become expert in the technology.


If you know SQL Server, you’re likely aware of SQL Server Integration Services (SSIS). What you might not realize is that SSIS is a development platform that allows you to create and perform some interesting Control Flow Tasks. In the first blog post in this series, I showed how to use the Execute SQL Task. Now, let’s look at the Data Flow Task. When developing solutions with SSIS, I use a handful of Control Flow tasks:

  • Execute SQL Task
  • Data Flow Task
  • Script Task
  • Execute Package Task
  • File System Task
  • Execute Process Task

This list is a good approximation of which tasks I use most, too – from most-used to least-used. In this article I provide a basic example of configuring the SSIS Data Flow Task, shown in Figure 1:


Figure 1: SSIS Data Flow Task

The SSIS Data Flow Task is a very special task. It is the only task to have its own tab in the Integrated Development Environment (IDE) as shown in Figure 2:


Figure 2: The Data Flow Tab

If you click on the tab, you will note a new SSIS Toolbox containing Data Flow-specific components, as shown in Figure 3:


Figure 3: Data Flow SSIS Toolbox

SSIS Data Flows are typically used to move data from one location to another. The data flow accomplishes data movement by first reading data into Data Flow Buffers. Think of a buffer as a region of memory SSIS uses to hold data rows as the rows are processed by the data flow. In Figure 4, I have configured an OLE DB Source Adapter to pump data rows into the data flow:


Figure 4: Configuring an OLE DB Source Adapter

The data is often transformed while being moved from one location to another. The SSIS data flow components that perform transforming operations are called Transformations, and they are joined to other data flow components by Data Flow Paths. An example of a transformation is the Derived Column Transformation, as shown in Figure 5:


Figure 5: Adding a Derived Column Transformation and a Data Flow Path

You can use transformations to perform many operations (e.g., you can manipulate values of columns in rows, you can remove or redirect rows based on column values, etc.) on the data as it flows through the data flow task. For example, the Derived Column Transformation permits you to manipulate (transform) existing data or to combine existing data to create new columns, as shown in Figure 6:


Figure 6: Creating a New Column with the Derived Column Transformation

I created a new column named “UpperCaseName” in the Derived Column Transformation. I used SSIS Expression Language to define the transform – “UPPER([Name])” in this case.

Note: SSIS Expression Language is very powerful and very difficult to learn. For more information, please see Linchpin People’s SSIS Expression Language series.

Now I need to land these rows into a table. I’ll use an OLE DB Destination Adapter – connected from the Derived Column Transformation via data flow path – to accomplish loading our transformed rows into a table, as shown in Figure 7:


Figure 7: Configuring an OLE DB Destination Adapter

Once the OLE DB Destination Adapter is configured, you can execute either the package or the Data Flow Task as shown in Figure 8:


Figure 8: Test Execution Successful!

In this article, I shared an introduction to the SSIS Data Flow Task and some of its functionality. Although I barely scratched the surface of Data Flow capabilities, you should now be able to compose and test your first SSIS Data Flow!

:{>

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Proof of Concepts with Windows Azure – Notes from the Field #056

[Note from Pinal]: This is a 56th episode of Notes from the Fields series. If you are DBA and Developer there is always a situation when you want to prove yourself by building a small proof of concepts of your idea. However, most of the time, it is way more complicated than we think. Building proof of the concepts required many different resources and skills. Above all there are chances that what we have built is not upto the mark and we have to rebuild the example one more time. Trust me it is never simple those tasks which looks simple from the beginning.

In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very common issue DBAs and Developer faces in their career – how to build proof of concepts and how to maximize the power of Azure. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words.


Whether you know it or not cloud services are here and they are changing the way we will provide information technology services. For example, in many information technology shops it can take weeks if not months to get an instance of SQL Server up and running.  Here are some minimal action items that must be completed before DBA’s get access server to install SQL Server.  You have to order a physical server, your procurement team must approve the order, and the server has to be shipped. Once the server is received the server must be racked in the data center, cables must be connected, and the data center team needs to document their changes. Then the operations team needs to install and configure windows. I could keep going but there are a lot of things that should be done to a server before the DBA team gets its hands on it. What are you going to do if you’re a DBA and you need instance up in 30 minutes for a proof of concept? It’s becoming more common that the cloud is the answer.

Every time I need a server for a proof of concept I jump to Windows Azure.  I can quickly build a Windows Azure Machine with SQL Server provided within 30 minutes.  In this tip, I am going to walk through the steps to create your first Windows Azure Machine.

1. Get Azure Windows Account. If you don’t have one you can get a free trial. At the time of writing this tip your free trial would include $220 of credit.
http://azure.microsoft.com/en-us/pricing/free-trial/

Are you an MSDN Subscriber who isn’t leveraging Microsoft’s cloud service? If so, you can sign up for free monthly credit.

2. Log In to Azure Portal
https://manage.windowsazure.com

3. Create New Virtual Machine

On the left hand side, click on Virtual Machines and then the add button on the bottom of the left side of the screen. This will load our wizard for creating our first virtual machine.

Now that wizard is loaded as you can see below we can select virtual machine and create it from the gallery. In the Gallery we will be able to select one of many images used that already includes SQL Server baked in.

Looking at the SQL Server images you will see you can access Enterprise Edition, Standard Edition and Web Edition for SQL 2014 down to SQL 2008 R2.

 

Next you can customize your image by release date. This will allow you to have different service packs or CUs. You can also select between two different tiers and sizes.  You will have to create a user name and password and you will want to keep this credential as it will be your first account.

 

Next you will be able to select more machine configuration options. You will get to determine where the Azure Virtual Machine is located.  Below you will see I am using my MSDN Subscription.

 

Finally, you will get to configure more configuration extensions to help automate or secure your virtual machine.

Finally, you will see your server being provisioned. What once use to take weeks or months can now be done in the cloud in minutes.

Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Common Sense Data Security – Notes from the Field #055

[Note from Pinal]: This is a 55th episode of Notes from the Field series. Common sense is not as much as common as we think. I am sure you agree with it from your real world experience. However, when it is about data and its security, there has to be some rules along with the policy but common sense is extremely critical. When I read this article, I find it humorous at points and some of the examples also reminded me of my past experience. If you are in data security, you will have a great time reading these notes, but if you are not, you will still love it.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains common sense data security and how we can apply in daily life in real world. Read the experience of Reeves in his own words.


There are many excellent books and articles that address the correct ways to store sensitive user information. Yet, many in IT are still failing to protect customers from loss due to data breaches. Every day, it seems that there’s another retailer or commercial web site in the news for losing passwords or credit card numbers to hackers. As an industry, why are we struggling to secure this type of information when there’s so much good intelligence and so many great tools for getting the job done? It’s a complicated subject so perhaps it’s time to step back a bit and use a bit of common sense to analyze the problem.

No matter the industry, using the right tool for the job is rule number one. Line-of-business databases are all about organizing information and getting it into the hands of people who perform transactions and make decisions with it. As a result, these databases become naturally permissive by nature, especially as they evolve to meet the demands of growing businesses. There are good access controls in modern databases but when it comes to managing ultra-secure bits of data, traditional, relational databases may not be the best fit for the job.

Lightweight Directory Access Protocol (LDAP) servers like ApacheDS, OpenLDAP and Microsoft Active Directory do a much better job of handling sensitive data with less trouble than any custom coding we might do on our own. Moreover, the built-in authentication functions of LDAP are mature and standards-based, making them safe and reusable from many different applications without custom interface development. It’s our duty as technologists and as business people to highlight the high cost of custom security solutions and the huge potential risks to our managers. In particular, when it comes to storing passwords in our line-of-business databases, just say no.

If we must manage financial instruments or personally identifying information in a database like SQL Server, there are three classes of problems to solve:

  1. Keeping the hackers from stealing our stuff,
  2. Detecting when breach attempts occur, and
  3. If data is unfortunately lost, making the information useless.

Let’s think about these efforts from a common sense perspective. Problem one is all about access control. The problem with permissions in any complex system is that they are difficult to maintain over time. Even if the initial configuration and policies safeguard the sensitive data, some future administrator may fail to understand or enforce the rules correctly. We could make those future administrators’ jobs much easier if we followed one simple rule: never mix highly-sensitive data in tables containing non-privileged data.

It’s deceptively simple-sounding but in practice, if sensitive data is always segregated into encrypted tables (http://blog.sqlauthority.com/2009/04/28/sql-server-introduction-to-sql-server-encryption-and-symmetric-key-encryption-tutorial-with-script/) and placed into a separate, secure schema requiring elevated access privileges, mistakes concerning permissions will become less likely over time. Moreover, by denying SELECT, INSERT, UPDATE and DELETE privileges on the secured tables, every query can be routed through stored procedures where problems two and three can be addressed with auditing and data obfuscation controls. Lastly, to ensure that lost data is useless, use the new Backup Encryption feature of SQL Server 2014 or invest in a third-party tool that does the same.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Validation Rules: Code or Database? – Notes from the Field #054

[Note from Pinal]: This is a 54th episode of Notes from the Field series. Where do we blame for a mistake or error in the system? Well, developer blames DBA and DBA blame developers. Sometimes there is no solution to the catch 22 situation. I have been there and I am very sure that you have been there too. Well, this is an amazing and beautiful article by G. Andrew Duthie. He has attempted to demystify the problem which we all face every day.

In this episode of the Notes from the Field series database expert Andrew Duthie explains about Validation Rules and where they should be implemented. Read the experience of Andrew in his own words.


 

If you’re a DBA dealing with developers, you may run into the question of whether it’s better to allow the developers to write validation rules in their .NET app layer, or whether you should put your foot down and insist that the validation be implemented in stored procedures. The answer, as you might expect from a consultant, is “it depends.”

Advantages of Code-based Validation

One of the things that might inform your decision on what to use for validation is the skills possessed by the development team. If you’ve got a developer or developers who are well-versed in C# and LINQ, but don’t have a lot of experience writing stored procedures, you may want to cut them a break and let them use the tools they’re more familiar with.

Writing validation rules in code at the application layer allows developers to stay within the realm of .NET objects, which can result in faster development time.

Disadvantages of Code-based Validation

While there are probably more that could be discussed, I’ll mention just two of the significant disadvantages to writing validation rules in code.

First, if the code for the validation rules is using LINQ, particularly if the rules are complex, there’s the possibility of queries that generate sub-optimal SQL under the covers. This can be mitigated by profiling the queries to make sure that any performance hogs are caught as early as possible, but it’s certainly a valid concern.

Second, from a maintainability standpoint, having rules in the app means that adding rules requires the app to be recompiled and redeployed. For some apps and environments, this may not be a big deal, but in others, it could definitely be a deal-breaker.

Advantages of Stored Procedure-based Validation

Using stored procedures for validation provides some key advantages. One is proximity to the data. Unlike code-based validation, which may require pumping significant amounts of data over the wire from the database to the app tier, stored procedure-based validation keeps the logic on the DB tier, so performance may be significantly better.

Another advantage is that with a good execution design (for example, a master stored procedure that executes a list of validation rules in a specified order based on a configuration table), it can be relatively easy to introduce new rules with less disruption than having to recompile and redeploy an entire application.

Disadvantages of Stored Procedure-based Validation

The major disadvantage of using stored procedures for validation, speaking as an app developer, is the basic impedance mismatch between .NET code (C# or Visual Basic) and T-SQL. While it’s certainly possible for developers to master both, there’s a mental cost in switching between these environments, and a potential for mistakes when transitioning from one to the other.

The other downside of stored procedures is the mixing of application logic between the app tier and the database tier. While validation close to the data can, as noted, improve performance, if some parts of the application logic live in both the app and database tiers, this could make for more costly maintenance down the road.

Consistency is Key

One additional point I’d like to make is that it’s probably wise to choose one option or the othernot both. If you have multiple applications in development (or even in maintenance mode), having a mix of app-based or sproc-based validation will likely give you headaches at some point. So get your team together and have a discussion about how you’re currently handling things, and whether there might be a better way.

Summary

The short answer to “which is better” is really “either.” It all depends on the skills of your developers, the performance you need from your app, and the other factors I’ve discussed. Although I’m coming at this from the perspective of an app developer, I’ve recently become more comfortable with the idea of stored procedure-based validation, particularly in instances where more than one app may be targeting the same database, since this can help reduce redundancy, and centralize management of rules.

I’d love to get your feedback on how you’ve handled validation rules in your environment, so feel free to share a comment below.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using Bitwise And (&) Instead of a Junction Table – Notes from the Field #053

[Note from Pinal]: This is a 53rdth episode of Notes from the Field series. Everyday I get 100s of emails and most of the emails have a similar request. Everyone wants to get maximum performance, but they want to make the least amount of changes in their code. Well, though both of them are contradictory requests, it is possible in most of the cases if you know the technology inside like Linchpin People do. Here in this blog post, my close friend Stuart Ainsworth explains a cool trick, which I just learned today after so many years of experience. Wow, Stuart – thanks for this amazing note from the fields – I learned something new and there will be so many who will enjoy this post.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains Using Bitwise And (&) Instead of a Junction Table.


 

Bitwise operations in SQL Server are not often used, but like many of the tools available to SQL Server developers, bitwise operations can provide some interesting alternatives when you’re faced with specific challenges. One of my clients has a large database that relies heavily on many-to-many relationships to identify matching interests on multiple characteristics. As a simple example, let’s assume that I’m talking about fruit baskets.

In the simplest version of a fruit basket, you’d have two database objects: the basket, and the assortment of fruit.  Baskets can use different combinations of fruit, and samples of fruit may appear in more than one basket, like so:

Basket 1: Apples
Basket 2: Apples, Bananas
Basket 3: Grapes, Apples
Basket 4: Strawberries, Bananas

The traditional method of modeling this relationship would be to use a junction table, as illustrated below.

 

However, my client  had 500,000 baskets, and roughly 50 different fruits to choose from. Assuming that every basket had at least 10 different fruits, the junction table would have at least 5,000,000 rows of data. Even though the junction table was well indexed and strongly typed, my client’s design was suffering from slow read times.  The client needed an alternative. Enter the bitwise AND (&).

Setting Up a Demo

Let’s set up a demo that illustrates both the junction table method and the bitwise AND alternative.  First, you’ll create the following three tables and populate them (using table valued constructors):

  1. Baskets, which includes a column for use with the Bitwise AND
  2. FruitID, which is set up for use with the Bitwise AND
  3. FruitBaskets, which is a junction table

Note that primary and foreign key references are not included for the simplicity of the demo. You’ll also be adding an extra column to the Baskets table to use for the Bitwise join. Finally, note that the ID column of the Fruit table mirrors the decimal values of the binary bit positions (e.g., 1, 2, 4, 8, 16, 32, 64, 128).

CREATE TABLE Baskets
(
BasketID INT
, BasketName VARCHAR(100)
,
FruitBitHash BIGINT
)
CREATE TABLE Fruit
(
FruitID BIGINT
, FruitName VARCHAR(20)
)
CREATE TABLE FruitBaskets
(
BasketID INT
, FruitID BIGINT
)
GO
INSERT  INTO Fruit
( FruitID, FruitName)
VALUES  ( 1, 'Apples'),
(
2, 'Bananas'),
(
4, 'Grapes'),
(
8, 'Strawberries')
GO
INSERT  INTO dbo.Baskets
( BasketID, BasketName, FruitBitHash)
VALUES  ( 1, 'Apples', 1),
(
2, 'Apples, Bananas', 1 + 2),
(
3, 'Grapes, Apples', 1 + 4),
(
4, 'Strawberries, Bananas', 8 + 2)
GO
INSERT  INTO dbo.FruitBaskets
( BasketID, FruitID)
VALUES  ( 1, 1),
(
2, 1 ),
(
2, 2 ),
(
3, 1 ),
(
3, 4 ),
(
4, 8 ),
(
4, 2 )
GO

Now that you’ve got your tables set up, let’s run a couple of queries. First, you’ll use a junction table (the traditional, normalized model), and then you’ll use the Bitwise AND (&).  In both cases, youy’re looking for baskets that contain apples:

/*Select the fruitbaskets containing Apples using the junction table*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.FruitBaskets fb
JOIN dbo.Fruit f ON fb.FruitID = f.FruitID
WHERE b.BasketID = fb.BasketID
AND f.FruitName = 'Apples')
GO
/*Select the fruitbaskets containing Apples using the bithash*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.Fruit f
WHERE b.FruitBitHash &amp; f.FruitID <>0
AND f.FruitName = 'Apples')
GO

If you run this demo, you’ll see that you get the exact same results from the two queries. However, the first query would need to read data from 3 tables, and the second query only needs 2. If the junction table is very large, the traditional method can be significantly slower than the second method.

But how does it work? An excellent explanation can be found here, but the short answer is that when you’re using the Bitwise AND (&) to compare two different integers, any value other than 0 that is returned from that comparison means that those integers share a common base. The magic happens with this line of code:

WHERE b.FruitBitHash & f.FruitID <>0

So, why don’t we do this all the time?

There’s an old expression, “If all you have is a hammer, then everything looks like a nail.” Different tools are best suited for different problems. The limitations of using the Bitwise method to remove a junction table include:

  1. Violation of relational integrity: The surrogate IDs in the lookup table (e.g., the Fruit table) have to have a specific order and meaning. If you make a mistake when setting up the key values, you can get wrong answers.
  2. A limited number of bitwise values can be stored in a bigint: In SQL Server, a bigint is 8 bytes, which means that there are 64 bits. When using a single bithash column, you can only have one value per bit. (Note that you can work around this by using multiple columns, but that gets complicated.)

The benefit of the Bitwise AND method is reduced disk I\O because it eliminates a large junction table. In this case, you did notice increased CPU usage using the Bitwise method, but the increase in performance was significant. However, on faster hardware, a junction table would probably have worked as well and still maintained relational integrity. For now, Bitwise AND is a useful tool for a very specific type of problem.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)