Data Starvation – Balance Your SQL Server – Part 2

October 26, 2017

In Part 1 I covered how much data your SQL Server should be processing and the effects if it isn’t. Here we check to make sure that you’re achieving that speed. Let us continue learning about data starvation in this blog post. We will be continuing to see how Vexata helps our SQL Server run faster.

Are the CPUs consistently over 90% utilized? If not, then something is holding them back. This could be application or hardware issues, below we’ll attempt to eliminate hardware.

First, look at the % User Time in Performance Monitor. SQL Server runs in the user space and low %’s of time spent in the user space could signal the CPU is being held back. Like in the below screen grab, simulated on a server connected to a slow SAN, only about 14% of the CPU’s 84% (% Processor Time) was actually spent on SQL Server work.

Verify server hardware:

Transport is the highway between your CPUs and your storage and should be more than big enough for your number of cores. Per port speeds, one way:

10Gb NIC (iSCSI): ~550MB/s
8Gb HBA (Fibre Channel): ~800MB/s
16Gb HBA (Fibre Channel): ~1,600MB/s
32Gb HBA (Fibre Channel): ~3,200MB/s

Know your workloads:

Are your mission critical applications OLAP or OLTP?

OLAP:

Data Starvation - Balance Your SQL Server - Part 2 ds2

Read block size	Avg. Disk Bytes/Read
Write block size	Avg. Disk Bytes/Write
Read latency	Avg. Disk sec/Read
Write latency	Avg. Disk sec/Write
Read bandwidth	Disk Reads Bytes/sec
Write bandwidth	Disk Writes Bytes/sec

Read and write bandwidth are the most important. The higher the bandwidth the faster reports will go, the more can run at the same time and the faster the data will load. Latency should still be low (<2ms), but not as low as the smaller batches in OLTP.

Test: Run a table scan on a large table and see how high the bandwidth goes. Limit the results returned with a filter like: “SELECT * FROM CUSTOMERS WHERE AGE < 0”. Monitor the bandwidth with PerfMon.

Analysis: The read bandwidth should sustain around your theoretical peak based on the number of logical cores as discussed in blog Part 1. Note: Using a table scan removes application based issues so full data throughput can be tested.

OLTP:

Read block size	Avg. Disk Bytes/Read
Write block size	Avg. Disk Bytes/Write
Read latency	Avg. Disk sec/Read
Write latency	Avg. Disk sec/Write
Read IOPs	Disk Read/sec
Write IOPs	Disk Writes/sec

Read and write latencies are the most important. The faster the CPUs can get back to processing the fewer time slices will be missed. IOPs is a measure of scale. How many operations can be executed.

Test: Use PerfMon to record the CPU utilization and LogicalDisks’ IOPs and Latency for the data, log and tempdb volumes.

Analysis: During peak user workloads, is the CPU utilization consistently over 90%? If not, what are the latencies? Lowering latencies get the CPU back to work faster, reduces wasted time slices and increases CPU utilization.

Common latency ranges for 8k OLTP IOPs running at full production load:

Storage type	Read Latency	Write Latency
Disk array	5-10ms	2-5ms
SSD Flash Array (AFA)	1-2ms	1ms
NVMe Flash Array (NVMe)	.2ms	.2ms

Conclusion

SQL licenses can be the most expensive part of a database platform. If the CPUs are not consistently over 90% then investigate the situation. A common issue these days is data starvation in which the storage tier is vastly underpowered and leaving the CPUs underutilized.

Call to Action: Meanwhile, check out Vexata solutions for SQL Server.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Server, Vexata

Learn PowerBI and Advanced Analytics – Instructor-Led and Self Paced Training

SQL SERVER – Adding New Database to AlwaysOn Replica is Slow