SQL SERVER – What is Data Mining – A Simple Introductory Note

According to MacLennan et al. (2009), data mining is defined as “the process of analyzing data to find hidden patterns using automatic methodologies.” Consider the following simple example that explains this concept. By analyzing the data on the items purchased from a supermarket or a chain of such stores, information on the products that are sold most can be obtained and accordingly supply of that particular products are increased and vice versa. Data mining, in short, is an analytical activity that studies the hidden patterns in a huge pile of data after appropriately classifying and sorting it.

Who all are involved in Data Mining?

Data mining is an activity, which can be programmed, that involves the analysis of data and finally revealing the hidden patterns. Architects, Developers and Analysts are involved in the data mining process.

Data mining is usually carried out by an Analyst. However, it is not necessary that every time, he/she will be able to identify all the hidden patterns of a particular data set, irrespective of its size. Finally these identified patterns are converted into useful information for business purpose. A Developer combines data mining with application solutions, and an Architect understands the needs of the developer and the analyst and meets them accordingly.

Microsoft and Data Mining

Microsoft provides a wide range of data mining options, which includes collaborative solutions and ad hoc analysis (in MS Office Excel). A free plug-in is available in MS Office Excel 2007, which helps the analyst to analyze the data patterns. In addition to this plug-in, the Business Intelligence Development Studio (BIDS) that is free with the SQL Server can also be used for data mining purpose.

It should be noted that data mining is not done on the basis of any known data patterns or any other additional information. The results obtained of data mining are generated from the data presented and not from any other resources. Microsoft data mining applies mathematical techniques on the available data set to obtain models. In addition to BDIS, the .NET framework and Data Mining extensions (DMX) language is also provided by Microsoft for custom solutions. At times, data mining is also known as machine learning.

Results of Data Mining – Data Mining Models

Microsoft data mining results in data mining models, which are statistical information – either predictive or descriptive. A Microsoft Mining Model consists of the following three components: metadata, which is information about the data; patterns, which are mathematical formulas or rules; and bindings, where the data is defined. The statistical results may not be understandable in relation to a business perspective. Hence, these results or models must be translated to useful business information. One who engages in data mining is responsible for creating a link or relation between the resultant data model and the respective business problem.

Role of Data Miner (or Analyst)

A data miner should undergo adequate training with regard to all the tools and technologies used in mining and should not limit himself/herself to only those tools that are required for that particular organization/business. In fact, it is the responsibility of the organization to provide training to the data mining professional on a broader perspective. Data mining is never complete without the analyst. The application of the results of data mining to a specific business significantly depends on how far the analyst has understood the industry-specific objectives.

Applications of Data Mining

Data mining is used in various applications such as forecasting business and customer trends, detecting fraud (especially in the banking sector), generating customized advertisements, grouping customers on the basis of their purchasing trends, and risk analysis.

Benefits of Microsoft Data Mining

Microsoft data mining is extensible. It can be licensed through SQL Server 2008 (or SQL Server 2005) and it is compatible with other technologies, thereby allowing access to data in different formats. Microsoft data mining can also be used for business intelligence solutions and it is scalable unlike other data mining products.

Reference : Pinal Dave (https://blog.sqlauthority.com)

Business Intelligence, Data Warehousing
Previous Post
SQL SERVER – Mirrored Backup and Restore and Split File Backup – Introduction
Next Post
SQL SERVER – Designing SQL Server 2005 Analysis Services Cubes for Excel 2007 PivotTables

Related Posts

6 Comments. Leave new

  • soumyaranjan mohanty
    September 6, 2009 2:56 pm

    the explaination of datamining is too excellent and amazing:

    Reply
  • obviously the best mssql author available for today today !

    Reply
  • Hi,
    I am glad that you are interested about Data mining!!
    Doing my research I found great books about it.
    This books intends to bring together the most recent advances and applications of data mining research in the promising areas of medicine and biology, in real life applications, web applications etc.
    The readers will benefit from this books and consider it as an excellent way to keep pace with the vast and diverse advances of new research efforts.This is link where you can find it:

    They are free to download!!

    Reply
  • Hi Pinal,

    I want a suggestion from you that
    i am right now working as a software developer and got a offer for the profile of Sql Data mining. Can you please help me which profile is a very good for career growth.

    Thanks & Regards
    Vinay Kumar

    Reply
  • Hi Pinal. Great article indeed.
    I would like to do a build a linear regression model based on Printing machine toner Consumption. Do you recommend other Algorithms. Do you have an example about pattern recognition and analyzing continuous Consumption of a single Item. to forecast future demands

    Thans a lot
    omar Tharwat

    Reply
  • I am a novice of dataming. Now I have a problem is to predicte: Who will buy something in the next month and which brands they will buy ?. The data is from a online shopping website, just like eBay. some data is given as blow(just a sample here) :

    user_id brand_id type visit_datetime
    10944750 13451 0 2013/4/15
    10944750 21110 0 2013/4/17
    23221235 21134 2 2013/4/12
    ………
    ………
    In fact, There are a lot users and brands. The records are about 20 thousands, datetime from 2013/4/15 – 2013/8/15.

    the “type” has means four actions : 0 is click, 1 is purchase, 2 is Favorites , 3 push to shopping cart.

    So, we are going to predict type=1, and the time is September 2013. The output likes:

    user1 -> brand1,brand2
    user2 -> brand2.
    I can only use sqlserver and excel do some basic analyze. I just want to know The right way to analyze these data and Which Algorithm or model to use. Thank you very much!
    You can see this question on stackflow also:http://stackoverflow.com/questions/22635282/the-data-ming-on-users-online-shopping-records

    Reply

Leave a Reply