This white paper explores how the Kimball approach to architecting and building a data warehouse/business intelligence (DW/BI) system works with Microsoft’s Parallel Data Warehouse, and how you would incorporate this new product as the cornerstone of your DW/BI system. For readers who are not familiar with the Kimball approach, we begin with a brief overview of the approach and its key principles. We then explore the Parallel Data Warehouse (PDW) system architecture and discuss its alignment with the Kimball approach. In the last section, we identify key best practices and pitfalls to avoid when building or migrating a large data warehouse to a Microsoft SQL Server PDW system.
This article is a walkthrough that illustrates how to build multiple related data models by using the tools that are provided with Microsoft SQL Server Integration Services. In this walkthrough, you will learn how to automatically build and process multiple data mining models based on a single mining structure, how to create predictions from all related models, and how to save the results to a relational database for further analysis. Finally, you view and compare the predictions, historical trends, and model statistics in SQL Server Reporting Services reports.
This solution also introduces the concept of ensemble models for data mining, which are sets of multiple related models. For most data mining projects, you need to create several models, analyze the differences, and compare outputs before you can select a best model to use operationally. Integration Services provides a framework within which you can easily generate and manage ensemble models.
I was recently looking for best practices for Hyper-V and SQL Server and I ended up whitepaper which was published in July earlier this year. I really wish I had come across this whitepaper earlier but any way still it is better to be late then never.
Memory is a critical resource to Microsoft SQL Server workloads, especially in a virtualized environment where resources are shared and contention for shared resources can lead to negative impact on the workload. Windows Server 2008 R2 SP1 introduced Hyper-V Dynamic Memory, which enables virtual machines to make more efficient use of physical memory resources. Hyper-V Dynamic Memory treats memory as a shared resource that can be reallocated automatically among running virtual machines. There are unique considerations that apply to virtual machines that run SQL Server workloads in such environments. This document provides insight into considerations and best practices for running SQL Server 2008 R2 in Hyper-V Dynamic Memory configurations on Windows Server 2008 R2 SP1.
he purpose of this guide is to provide a description of the technologies and best practices utilized to design a database consolidation solution; guidance will be appropriately defined throughout to prescribe configurations and considerations to implement for best results. Documentation of specific tasks will be very limited.
This white paper consider the following three potential strategies:
Using a single physical machine to host multiple virtual machines running the Microsoft SQL Server database software
Using a single machine to host multiple SQL Server instances
Using a single instance of SQL Server to host multiple databases
Note: Above abstract is from Microsoft Official documentation.
SQL Server Analysis Service (SSAS) has been always interesting subject for research. Analysis Services cubes are a very powerful tool in the hands of the business intelligence (BI) developer. They provide an easy way to expose even large data models directly to business users. Microsoft has published very informative white paper on Analysis Services Operations Guide. This white paper is authored by Thomas Kejser, John Sirmon, and Denny Lee.
In this guide you will find information on how to test and run Microsoft SQL Server Analysis Services in SQL Server 2005, SQL Server 2008, and SQL Server 2008 R2 in a production environment. The focus of this guide is how you can test, monitor, diagnose, and remove production issues on even the largest scaled cubes. This paper also provides guidance on how to configure the server for best possible performance. It is the goal of this guide to make your operations processes as painless as possible, and to have you run with the best possible performance without any additional development effort to your deployed cubes. In this guide, you will learn how to get the best out of your existing data model by making changes transparent to the data model and by making configuration changes that improve the user experience of the cube.
One of the many features of Microsoft SQL Server PowerPivot is the range of data sources that can be used to import data. Anything, from Microsoft SQL Server relational databases, Oracle databases, and Microsoft Access databases, to text documents, can be used as data sources in PowerPivot. In this paper, I explain one of the new and upcoming data sources that people are excited about – SharePoint list data in the form of Atom feeds. This white paper goes on to explain the different ways you can import SharePoint list data into PowerPivot, what types of lists are supported, various components that need to be installed to use this feature, and where to get those components.
I am very excited that Fast Track Data Warehouse 3.0 reference guide has been announced. As a consultant I have always enjoyed working with Fast Track Data Warehouse project as it truly expresses the potential of the SQL Server Engine. Here is few details of the enhancement of the Fast Track Data Warehouse 3.0 reference architecture.
The SQL Server Fast Track Data Warehouse initiative provides a basic methodology and concrete examples for the deployment of balanced hardware and database configuration for a data warehousing workload. Balance is measured across the key components of a SQL Server installation; storage, server, application settings, and configuration settings for each component are evaluated.
FTDW 3.0 Architecture
Basic component architecture for FT 3.0 based systems.
New Memory Guidelines
Minimum and maximum tested memory configurations by server socket count.
Additional Startup Options
Notes for T-834 and setting for Lock Pages in Memory.
RAID1+0 now standard (RAID1 was used in FT 2.0).
Query provided for evaluating logical fragmentation.
Additional options for CI table loads.
Additional detail and explanation of FTDW MCR Rating.
For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at an IO-related wait types.
From Book On-Line:
Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits.
Any tasks are waiting for I/O to finish. This is a good indication that IO needs to be looked over here.
Reducing IO_COMPLETION wait:
When it is an issue concerning the IO, one should look at the following things related to IO subsystem:
Proper placing of the files is very important. We should check the file system for proper placement of files – LDF and MDF on a separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk),etc.
Check event log and error log for any errors or warnings related to IO.
If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as the configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on development (test environment) set up and as soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance.
It is very possible that there are no proper indexes in the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the two articles that I wrote; they are about how to optimize indexes:
Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server.