Feeds:
Posts
Comments

Archive for the ‘Data Warehousing’ Category

I was recently working on learning various new stuff. I just would like to share very interesting resources here today. Microsoft SQL Server Migration Assistant (SSMA) is a toolkit that dramatically cuts the effort, cost, and risk of migrating from any other data platform to SQL Server 2005, SQL Server 2008, SQL Server 2008 R2 and SQL Azure.

Here are few important resources links:

Microsoft SQL Server Migration Assistant (SSMA) Team’s Blog

One very front page of the blog, I noticed very interesting diagram – where it displays four database products. One can click on any of them to go to detail description of the same. Here is interesting Video regarding this tool.

If you have any particular question or need help regarding migration – you can always reach out to me and I will do my best to help you.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Many of us are tend to overlook simple things even if we are capable of doing complex work. In SQL Server 2008, inline variable assignment is available. This feature exists from last 3 years, but I hardly see its utilization. One of the common arguments was that as the project migrated from the earlier version, the feature disappears. I totally accept this argument and acknowledge it. However, my point is that this new feature should be used in all the new coding – what is your opinion?

The code which we used in SQL Server 2005 and the earlier version is as follows:

DECLARE @iVariable INT, @vVariable VARCHAR(100), @dDateTime DATETIME
SET
@iVariable = 1
SET @vVariable = 'myvar'
SET @dDateTime = GETDATE()
SELECT @iVariable iVar, @vVariable vVar, @dDateTime dDT
GO

The same should be re-written as following:

DECLARE @iVariable INT = 1, @vVariable VARCHAR(100) = 'myvar', @dDateTime DATETIME = GETDATE()
SELECT @iVariable iVar, @vVariable vVar, @dDateTime dDT
GO

I have started to use this new method to assign variables as I personally find it very easy to read as well write.

Do you still use the earlier method to declare and assign variables? If yes, is there any particular reason or just an old routine? I am interested to hear about this.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

Read Full Post »

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group.

Here is the guest post by Niraj Bhatt.

As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises.

A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it:

a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements.

b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them.

c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key.

As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema.

This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others.

Let’s see few other terms that we would run into while talking about data warehouses.

ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers.

Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Read Full Post »

http://msdn.microsoft.com/en-us/library/gg605238.aspx

I am very excited that Fast Track Data Warehouse 3.0 reference guide has been announced. As a consultant I have always enjoyed working with Fast Track Data Warehouse project as it truly expresses the potential of the SQL Server Engine. Here is few details of the enhancement of the Fast Track Data Warehouse 3.0 reference architecture.

The SQL Server Fast Track Data Warehouse initiative provides a basic methodology and concrete examples for the deployment of balanced hardware and database configuration for a data warehousing workload. Balance is measured across the key components of a SQL Server installation; storage, server, application settings, and configuration settings for each component are evaluated.

Description Note
FTDW 3.0 Architecture Basic component architecture for FT 3.0 based systems.
New Memory Guidelines Minimum and maximum tested memory configurations by server socket count.
Additional Startup Options Notes for T-834 and setting for Lock Pages in Memory.
Storage Configuration RAID1+0 now standard (RAID1 was used in FT 2.0).
Evaluating Fragmentation Query provided for evaluating logical fragmentation.
Loading Data Additional options for CI table loads.
MCR Additional detail and explanation of FTDW MCR Rating.

Read white paper on fast track data warehousing.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

 

Read Full Post »

An integral part of any BI system is the data warehouse—a central repository of data that is regularly refreshed from the source systems. The new data is transferred at regular intervals  by extract, transform, and load (ETL) processes.

This whitepaper talks about what are best practices for Data Warehousing. This whitepaper discusses ETL, Analysis, Reporting as well relational database. The main focus of this whitepaper is on mainly ‘architecture’ and ‘performance’.

Download Best Practices for Data Warehousing with SQL Server 2008 R2

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Read Full Post »

The traditional form of storing digital data has been disk storage.  However, the huge advances in technology means that there has been a huge need for data storage to evolve to keep up with the fast-changing times.  Microsoft SQL Server has gone through a huge overhaul in order to keep up with the amount of data storage that is necessary, and that is where data warehousing comes into play.

For many online applications, there is a need to not only access small amount of information from disk storage, but large amounts in the forms of sets.  SQL Server allows access to these sets of data in a sequential manner that optimizes computer.  This proprietary technology makes SQL Server the go-to system to fit these computing needs.

SQL Server has also solved the problem of changing from “data marts” to data warehouses.  As companies grow, their digital storage needs change and expand as well.  Previously, transferring a system over to an SQL Server meant having to destroy old data during the transfer.  This was the source of a lot of data loss and server problems.  However, the new versions of SQL Server have built-in systems that help organizations transfer to the larger data storage system.

The costs of data storage and data warehousing are becoming a huge concerns for small businesses and corporations alike.  It is a known rule of thumb that the larger the main server, the more expensive it is.  SQL Server also addresses this problem by allowing data warehousing to be run on multiple processors – in the end rendering it much cheaper than having one main data warehousing server.  SQL Servers uses the “hub-and-spoke” system, which allows data warehousing to be broken up by multiple systems without losing any data.

A huge question is how SQL Server can store so much data.  As with any storage system, some data is compressed before it is stored.  Other system will undergo lengthy processes in order to retrieve this compressed data due to updates and possible loss of information, but SQL Server forgoes this updating process, which allows for quick and easy access as well as compression.

The issue of compression of too much data is also solved with parallel processing.  In this system, multiple processors are “enlisted” in order to help store the data.  SQL Server has the technology to divide the data appropriately, and then easily access this data on command.  Rather than slowing one server down, the data can be divided up to be easily handled.

One of the beauties of computing is that the computer can calculate probabilities much faster than a human – indeed, this was one of the main functions of computer development.  SQL Server uses this capacity to store data, using “Probability of Access” technology.  SQL Server can determine which data is “hot” and likely to be accessed the most, which is moderate, and which is cold – that is, there is a low probability of access.  With these designations, SQL Server can store the Hot data in “high performance” storage, so it can be quickly recovered, and cold data is stored elsewhere, freeing up the processors to work more efficiently.

SQL Server has not only optimized storage of “static” data, or individual points of information, but has created a technology that will also store “streaming” data.  Streaming data is a way to record multiple “events” over a period of time – like measurements of temperature thousands of times per minute for instance.  This kind of data storage would completely bog down another server, but SQL Server has developed the technology to efficiently store this data without slowing down processors.  And, of course, the data will be easily accessible because of the data warehousing systems already discussed.

Streaming data is difficult for most servers because of storage purposes.  This streaming data overwhelms the servers because they cannot store and retrieve this much data efficiently, and it bogs down the system.  Other companies have attempted to create programs that allow for access of streaming data, but it often comes at the cost of processor speed.  The usual process is to store streaming data as historical data then retrieving it as static data, which slows down all the processors.  SQL Server has created a system that allows for analysis of this data by showing it in the interactive sector.

In summary, SQL Server is at the very forefront of future data warehousing – DW 2.0.  SQL Server is able to store large amounts of data at a low cost and with maximum efficiency.  Microsoft has taken into account the need for multiple processors and an easy way to transfer old data into this new system.  The technology behind SQL Server is sure to make data access much more efficient.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Read Full Post »

« Newer Posts - Older Posts »