SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

September 4, 2012

Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post about Data Quality Services.

This article is introduction to Data Quality Services for beginners. We will be using an Excel file

Click on the image to enlarge the it.

In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data.

Here are the two very important steps we will be learning in this tutorial.

Building a New Knowledge Base
Creating a New Data Quality Project

Let us start the building the Knowledge Base. Click on New Knowledge Base.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs1

In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs3
Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs4

Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs5

Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs6

As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs7

Here is the screen which will demonstrate how the screen will look after creating domains.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs8

Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs9

Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs10

In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs11

Clicking next will publish the knowledge base which is just created.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs12

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs13

Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same.
SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs131

Click on New Data Quality Project to see start DQS Project.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs14

In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs15

In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs16

Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs17

In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs19

Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs20

I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs21

In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs22

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs23

Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs24

On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs25

The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs26

Clicking explore will generate the files.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs27

Let us open the generated file. It will look as following and it looks pretty complete and corrected.

SQL SERVER - Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 - Introduction to DQS dqs28

Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts.

Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Business Intelligence, Data Quality Services, Data Warehousing, DQS, SQL Server

SQL SERVER – DQS Error – Cannot connect to server – A .NET Framework error occurred during execution

SQLAuthority News – A Quick Note on @Pluralsight Video – Call Me Maybe Developer Way

26 Comments. Leave new

Manish
September 4, 2012 8:44 am
Thanks a lot for this valuable information. Here is also some more information about DQS in SQL 2012.
Thanks
Manish
Reply
Hardy21
September 4, 2012 9:38 am
Thanks for sharing. Really useful information.
Reply
Saqib
September 4, 2012 11:17 am
Nice information,
But what are the practical use of it in production servers.
Reply
Adi
September 5, 2012 4:29 pm
Joining the question above – I’d love to hear some ‘real-world’ examples of DQS usage.
Also, is it possible to use as an automatic process, or does it always have to be ad-hoc?
Reply
mrasadmahmood
October 16, 2012 4:49 pm
A very useful introduction to DQS. Do you find that the lines between MDS and DQS have blurred and have you written on this subject at all?
Reply
Philips
January 24, 2013 8:58 pm
Very nicely introduced DQS, we are planning to implement this in production, we are a health care company and with recent health law changes we need to have an integrated view of our patients across multiple patient applications service different lines of business. In this scenario the biggest challenge is how do you match your patients across different applications and come up with one version of the patient when no two systems have the same unique identifier. To solve this problem we are planning on using DQS and train it with the rules at the end of which we are hoping to uniquely identify the patient. Do not know if there are any other better options out there if anyone knows please do comment. Option of trickling the cleansed data back to the source is not acceptable and cannot be enforced, not all source systems have SSN or Medicaid or Medicare numbers….
Reply
- Dre
  June 17, 2013 11:39 pm
  Philips,
  I understand your need for uniquely identifying individuals, but don’t quite see how this solution helps you toward your goal. While I understand how it may standardizing things like city and state names, I’m not sure how it would help to define uniqueness between patients.
  Reply
Eric
January 24, 2013 11:22 pm
Thanks for the sharing this nice introduction to DQS.
Even when this kind of projects can reach an automatic execution, they will always need the human factor to decide which values are real and which are not.
That is why it is nice to see a tool that can really be used by power users, who will be doing this kind of stuff.
Reply
xiaohui
August 14, 2013 1:00 pm
Thanks for sharing DQS knowledge.
Reply
yousef
October 25, 2013 12:32 pm
Good to know about the new features ,but it seems a bit impractical since in database we have a list of valid entries for a column and users have to opt a valid value for a column from a combo box which is actually connected to a look up table ,so this kind of error will never happen.I would like to hear about useful use of this feature.
Reply
David
November 6, 2013 6:58 pm
Nice introduction post indeed. However my question will be on the difference between DQS and MDS (Master Data Services). My other question is on the issue whereby you do not actually need to correct the source data but all you will need to do will be just map the incorrectly spelt data to the correct ones. For example if you have Jo, Joe, Johhny in the source and require all these three to just be mapped to the correct Master Data value John. Will it be possible to represent such a scenario using DQS or MDS?
Reply
neerajarao
November 17, 2013 6:55 am
Thank You Pinal. As always you bring new features to light with such simplicity and ease. Makes it more interesting to learn.
Reply
Drew
January 28, 2014 9:03 pm
Thanks, Pinal,
Clear. Simple. Appreciated.
Reply
Gaurav Pareek
May 26, 2014 6:45 am
Have you used it with hindi language(nvarchar) with really large table of say 20+ lakh record for matching?? I am finding it difficult to prepare a correct knowledge-base. Any help would be appreciated
Reply
Adit | Tyo
June 13, 2014 9:23 am
Thanks for sharing. Straight forward & clear. So appreciated.
Reply
Sudhir Pawale
February 4, 2015 12:24 pm
Thanks Pinal for sharing info with us
Its realy nice feature for data cleansing.
Can we automate this process like auto creating knowledge base ?
Reply
Discovery2012
September 9, 2015 8:17 pm
Hi Pinal,
Your blog is very description and on the topic. I am a big fan of your blog. Thanks very much.
Reply
- Pinal Dave
  September 10, 2015 8:27 pm
  I am glad you liked it.
  Reply
Syed Asad Ahmed
April 8, 2016 8:06 pm
Hi Pinal,
I have been using your blog from very long time, and I must say your articles are very helpful.
I like you this artilce as well :) I have a question for you.
Can I use this Data Quality Service for between two tables.
I mean I want to map similar columns between two tables in to one table.
Thanks.
Reply
Viktor
September 27, 2016 2:40 am
Pinal,
Thanks you for the post. It would also be good to have answers to interesting questions that have already been addressed above. Below I pointed out to the name, date and time the question was posted in this blog:
– yousef: October 25, 2013 12:32 pm
– David: November 6, 2013 6:58 pm.
– Gaurav Pareek: May 26, 2014 6:45 am
– Sudhir Pawale: February 4, 2015 12:24 pm
– Syed Asad Ahmed: April 8, 2016 8:06 pm
Reply
hejnav
January 16, 2017 1:40 am
Good thing, worked like a charm (just like I did it two years ago … :)) … But what I missed at that time as well: after doing the actual corrections, I would have expected, that this knowledge is saved and will not be asked again, if I re-run the process. Am I wrong?
Reply
- Pinal Dave
  January 16, 2017 12:28 pm
  I think it should.
  Reply
Muhammad Noman
March 15, 2017 4:55 pm
can you plz tell about some thing Master data services in sql server 2016 ?
Reply