[Note from Pinal]: This is a third episode of Notes from the Field series. Lots of organizations do Virtualization. The purpose of the Virtualization is often consolidation, performance or just easy to use configuration. Looking at the various advantages of the Virtualization, nowadays I find lots of organization have implemented Virtualization. When they face various problems with Virtualization they often blame the technology instead of going deeper to correct the problem. Virtualization can be blessings for your server, you just have to properly configure them and tune them. Do not run away from Virtualization, rather implement them properly.
In this episode of the Notes from the Field series database expert David Klee explains a very crucial issue DBAs and Developer faces in their career – how you perform a performance related root cause analysis. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words.
As a technologist, I primarily straddle two worlds, centered between the Microsoft SQL Server and private cloud Virtualization ecosystems. As a consultant, I spend a number of hours each week working with architecture and performance problems with virtualized SQL Server instances. After doing this for a number of years, quite a few common trends keep appearing at different projects. These trends almost always hurt performance and limit the effectiveness of the DBA to protect the core systems that their job depends on the most. Not all Virtualization infrastructures are created equal, and unless a specific focus on maintaining your SQL Server performance is present, the greatest silent performance killer is one that will silently attack and hinder your mission-critical systems. What is it, and what can you do about it?
Most virtualized environments are designed as they always have been – focused on consolidation, but not necessarily performance. Unfortunately for you, the DBA, placing your SQL Servers on an environment historically designed for consolidation is guaranteed to hurt. The answer is too many vCPUs. Why?
Everything that occurs in a virtualized environment must be scheduled within the Virtualization layer, also known as the hypervisor. This is the layer the sits directly on top of the hardware, and its job is to coordinate and prioritize the virtual machine’s requests to the physical resources. Between the time a CPU request is submitted and the time of actual execution, the request is placed in a ‘ready’ queue.
In an environment that is lightly taxed, this time spent in the queue is negligible, and the performance impact is so insignificant that usually only a benchmarking team can determine the Virtualization overhead. However, as the consolidation rations grow (meaning a greater number of virtual machines are placed on the physical machines), these times spent in the queues can grow to a much higher value. The greater the value, the more substantial the impact to performance will be. High wait times in the vCPU performance of an application that is highly parallelized, such as SQL Server, can impact the application’s performance even more than just a single threaded vCPU task getting slightly delayed.
For an example of how severe the impact of these scheduling waits can be on a SQL Server’s performance, please visit this blog post. (link)
Unfortunately, these delays are silently transparent to the applications and the operating systems inside the virtual machines. No entries are logged in any error log, and no warnings are sent to the application owner. Some applications are less sensitive to these vCPU performance impacts, such as file servers or print servers. However, resource intensive applications, such as SQL Server, are greatly impacted by these waits. All you, the DBA, knows is that last night’s ETL process ran two hours longer than usual last night, but all of the usual performance indicators are not present.
Now that I have thoroughly scared you away from virtualizing your SQL Servers, let’s add some transparency into this scenario so you can better identify and measure these wait times. Both VMware vSphere and Microsoft Hyper-V have metrics that you can sample to better gauge the time spent waiting on CPU resources. vSphere has metrics presented within the vCenter management interface, and among the performance metrics is the VMware CPU wait metric called CPU Ready Time. More details on VMware’s vCPU Ready Time can be found at this blog post. Hyper-V has metrics that are collected with SCOM and/or Perfmon, and the Perfmon metric that you are looking for is called Hyper-V Hypervisor Virtual Processor\CPU Wait Time Per Dispatch. Work with your Virtualization administrator to get access to these tools. As a DBA responsible for your business-critical SQL Servers, it is imperative that you gain access to these metrics so you can break through that black box called Virtualization and gain access to the performance statistics underneath your systems. These metrics will help you quickly troubleshoot any performance issue that you may encounter.
Sample and analyze these values very carefully. Work with your Virtualization team to determine if the current levels are acceptable and not impacting performance. Educate them on the resource consumption of SQL Server and your specific database workloads. Work with the Virtualization administrators to reduce and then maintain acceptable resource timing levels so you can maximize the performance of your SQL Servers.
If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (http://blog.sqlauthority.com)