This white paper explores how the Kimball approach to architecting and building a data warehouse/business intelligence (DW/BI) system works with Microsoft’s Parallel Data Warehouse, and how you would incorporate this new product as the cornerstone of your DW/BI system. For readers who are not familiar with the Kimball approach, we begin with a brief overview of the approach and its key principles. We then explore the Parallel Data Warehouse (PDW) system architecture and discuss its alignment with the Kimball approach. In the last section, we identify key best practices and pitfalls to avoid when building or migrating a large data warehouse to a Microsoft SQL Server PDW system.
This article is a walkthrough that illustrates how to build multiple related data models by using the tools that are provided with Microsoft SQL Server Integration Services. In this walkthrough, you will learn how to automatically build and process multiple data mining models based on a single mining structure, how to create predictions from all related models, and how to save the results to a relational database for further analysis. Finally, you view and compare the predictions, historical trends, and model statistics in SQL Server Reporting Services reports.
This solution also introduces the concept of ensemble models for data mining, which are sets of multiple related models. For most data mining projects, you need to create several models, analyze the differences, and compare outputs before you can select a best model to use operationally. Integration Services provides a framework within which you can easily generate and manage ensemble models.
I was recently looking for best practices for Hyper-V and SQL Server and I ended up whitepaper which was published in July earlier this year. I really wish I had come across this whitepaper earlier but any way still it is better to be late then never.
Memory is a critical resource to Microsoft SQL Server workloads, especially in a virtualized environment where resources are shared and contention for shared resources can lead to negative impact on the workload. Windows Server 2008 R2 SP1 introduced Hyper-V Dynamic Memory, which enables virtual machines to make more efficient use of physical memory resources. Hyper-V Dynamic Memory treats memory as a shared resource that can be reallocated automatically among running virtual machines. There are unique considerations that apply to virtual machines that run SQL Server workloads in such environments. This document provides insight into considerations and best practices for running SQL Server 2008 R2 in Hyper-V Dynamic Memory configurations on Windows Server 2008 R2 SP1.
he purpose of this guide is to provide a description of the technologies and best practices utilized to design a database consolidation solution; guidance will be appropriately defined throughout to prescribe configurations and considerations to implement for best results. Documentation of specific tasks will be very limited.
This white paper consider the following three potential strategies:
Using a single physical machine to host multiple virtual machines running the Microsoft SQL Server database software
Using a single machine to host multiple SQL Server instances
Using a single instance of SQL Server to host multiple databases
Note: Above abstract is from Microsoft Official documentation.
SQL Server Analysis Service (SSAS) has been always interesting subject for research. Analysis Services cubes are a very powerful tool in the hands of the business intelligence (BI) developer. They provide an easy way to expose even large data models directly to business users. Microsoft has published very informative white paper on Analysis Services Operations Guide. This white paper is authored by Thomas Kejser, John Sirmon, and Denny Lee.
In this guide you will find information on how to test and run Microsoft SQL Server Analysis Services in SQL Server 2005, SQL Server 2008, and SQL Server 2008 R2 in a production environment. The focus of this guide is how you can test, monitor, diagnose, and remove production issues on even the largest scaled cubes. This paper also provides guidance on how to configure the server for best possible performance. It is the goal of this guide to make your operations processes as painless as possible, and to have you run with the best possible performance without any additional development effort to your deployed cubes. In this guide, you will learn how to get the best out of your existing data model by making changes transparent to the data model and by making configuration changes that improve the user experience of the cube.
One of the many features of Microsoft SQL Server PowerPivot is the range of data sources that can be used to import data. Anything, from Microsoft SQL Server relational databases, Oracle databases, and Microsoft Access databases, to text documents, can be used as data sources in PowerPivot. In this paper, I explain one of the new and upcoming data sources that people are excited about – SharePoint list data in the form of Atom feeds. This white paper goes on to explain the different ways you can import SharePoint list data into PowerPivot, what types of lists are supported, various components that need to be installed to use this feature, and where to get those components.
I am very excited that Fast Track Data Warehouse 3.0 reference guide has been announced. As a consultant I have always enjoyed working with Fast Track Data Warehouse project as it truly expresses the potential of the SQL Server Engine. Here is few details of the enhancement of the Fast Track Data Warehouse 3.0 reference architecture.
The SQL Server Fast Track Data Warehouse initiative provides a basic methodology and concrete examples for the deployment of balanced hardware and database configuration for a data warehousing workload. Balance is measured across the key components of a SQL Server installation; storage, server, application settings, and configuration settings for each component are evaluated.
FTDW 3.0 Architecture
Basic component architecture for FT 3.0 based systems.
New Memory Guidelines
Minimum and maximum tested memory configurations by server socket count.
Additional Startup Options
Notes for T-834 and setting for Lock Pages in Memory.
RAID1+0 now standard (RAID1 was used in FT 2.0).
Query provided for evaluating logical fragmentation.
Additional options for CI table loads.
Additional detail and explanation of FTDW MCR Rating.
For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at an IO-related wait types.
From Book On-Line:
Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits.
Any tasks are waiting for I/O to finish. This is a good indication that IO needs to be looked over here.
Reducing IO_COMPLETION wait:
When it is an issue concerning the IO, one should look at the following things related to IO subsystem:
Proper placing of the files is very important. We should check the file system for proper placement of files – LDF and MDF on a separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk),etc.
Check event log and error log for any errors or warnings related to IO.
If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as the configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on development (test environment) set up and as soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance.
It is very possible that there are no proper indexes in the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the two articles that I wrote; they are about how to optimize indexes:
Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server.
SQL Server Analysis Service have many features which are commonly requested and many already exists in the system. Security Data Entry is very important feature and SSAS supports writeback feature. Analysis Services is a tool for aggregating information and providing business users with the ability to analyze and support decision making in their business. By using the built-in writeback feature in Analysis Services, business users can also modify their data points to perform what-if analysis or supplement any existing data. The techniques described in this article derive from the author’s professional experience in the design and development of complex financial analysis applications used by various business groups in a large multinational company.
I recently presented session on Statistics and Best Practices in Virtual Tech Days on Nov 22, 2010. The sessions was very popular and I got many questions right after the sessions. The number question I had received was where everybody can get the further information. I am very much happy that my sessions created some curiosity for one of the most important feature of the SQL Server. Statistics are the heart of the SQL Server.
Microsoft has published a white paper on the subject how statistics are useful to Query Optimizer. Here is the abstract of the same white paper from Microsoft.
Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
Writer: Eric N. Hanson and Yavor Angelov
Microsoft SQL Server 2008 collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data. This paper describes what data is collected, where it is stored, and which commands create, update, and delete statistics. By default, SQL Server 2008 also creates and updates statistics automatically, when such an operation is considered to be useful. This paper also outlines how these defaults can be changed on different levels (column, table, and database). In addition, it presents how certain query language features, such as Transact-SQL variables, interact with use of statistics by the optimizer, and it provides guidance for using these features when writing queries so you can obtain good query performance.
Understanding and Controlling Parallel Query Processing in SQL Server
Data warehousing and general reporting applications tend to be CPU intensive because they need to read and process a large number of rows. To facilitate quick data processing for queries that touch a large amount of data, Microsoft SQL Server exploits the power of multiple logical processors to provide parallel query processing operations such as parallel scans. Through extensive testing, we have learned that, for most large queries that are executed in a parallel fashion, SQL Server can deliver linear or nearly linear response time speedup as the number of logical processors increases. However, some queries in high parallelism scenarios perform suboptimally. There are also some parallelism issues that can occur in a multi-user parallel query workload. This white paper describes parallel performance problems you might encounter when you run such queries and workloads, and it explains why these issues occur. In addition, it presents how data warehouse developers can detect these issues, and how they can work around them or mitigate them.