CXPACKET has to be most popular one of all wait stats. I have commonly seen this wait stat as one of the top 5 wait stats in most of the systems with more than one CPU.
Books On-Line:
Occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if contention on this wait type becomes a problem.
CXPACKET Explanation:
When a parallel operation is created for SQL Query, there are multiple threads for a single query. Each query deals with a different set of the data (or rows). Due to some reasons, one or more of the threads lag behind, creating the CXPACKET Wait Stat. There is an organizer/coordinator thread (thread 0), which takes waits for all the threads to complete and gathers result together to present on the client’s side. The organizer thread has to wait for the all the threads to finish before it can move ahead. The Wait by this organizer thread for slow threads to complete is called CXPACKET wait.
Note that not all the CXPACKET wait types are bad. You might experience a case when it totally makes sense. There might also be cases when this is unavoidable. If you remove this particular wait type for any query, then that query may run slower because the parallel operations are disabled for the query.
Reducing CXPACKET wait:
We cannot discuss about reducing the CXPACKET wait without talking about the server workload type.
OLTP: On Pure OLTP system, where the transactions are smaller and queries are not long but very quick usually, set the “Maximum Degree of Parallelism” to 1 (one). This way it makes sure that the query never goes for parallelism and does not incur more engine overhead.
EXEC sys.sp_configure N'max degree of parallelism', N'1'
GO
RECONFIGURE WITH OVERRIDE
GO
Data-warehousing / Reporting server: As queries will be running for long time, it is advised to set the “Maximum Degree of Parallelism” to 0 (zero). This way most of the queries will utilize the parallel processor, and long running queries get a boost in their performance due to multiple processors.
EXEC sys.sp_configure N'max degree of parallelism', N'0'
GO
RECONFIGURE WITH OVERRIDE
GO
Mixed System (OLTP & OLAP): Here is the challenge. The right balance has to be found. I have taken a very simple approach. I set the “Maximum Degree of Parallelism” to 2, which means the query still uses parallelism but only on 2 CPUs. However, I keep the “Cost Threshold for Parallelism” very high. This way, not all the queries will qualify for parallelism but only the query with higher cost will go for parallelism. I have found this to work best for a system that has OLTP queries and also where the reporting server is set up.
Here, I am setting ‘Cost Threshold for Parallelism’ to 25 values (which is just for illustration); you can choose any value, and you can find it out by experimenting with the system only. In the following script, I am setting the ‘Max Degree of Parallelism’ to 2, which indicates that the query that will have a higher cost (here, more than 25) will qualify for parallel query to run on 2 CPUs. This implies that regardless of the number of CPUs, the query will select any two CPUs to execute itself.
EXEC sys.sp_configure N'cost threshold for parallelism', N'25'
GO
EXEC sys.sp_configure N'max degree of parallelism', N'2'
GO
RECONFIGURE WITH OVERRIDE
GO
Read all the post in the Wait Types and Queue series. Additionally a must read comment of Jonathan Kehayias.
Note: The information presented here is from my experience and I no way claim it to be accurate. I suggest you all to read the online book for further clarification. All the discussion of Wait Stats over here is generic and it varies from system to system. It is recommended that you test this on the development server before implementing on the production server.
Reference: Pinal Dave (https://blog.sqlauthority.com)
15 Comments. Leave new
Pinal,
CXPACKET waits aren’t necessarily bad, or even a sign of a problem, they are normal and expected if queries are executing using parallelism. I wouldn’t recommend that CXPACKET alone ever be considered a reason to reduce or change the configuration of the ‘max degree of parallelism’ configuration option. Most newer servers are going to be NUMA based systems, and the best practice recommendation for this configuration option is to set it to the number of processor cores in a single NUMA node. Generally speaking this best practice is not followed, and most recommendations show changing the ‘max degree of parallelism’ configuration option to 1 or 2 whenever the topic of CXPACKET comes up.
My recommendation, especially on newer NUMA based systems is always to first set it following best practices to the number of physical cores in a single NUMA node and then monitor. You can find the number of schedulers in each NUMA node by querying the online_scheduler_count from sys.dm_os_nodes. Even for data warehouse environments, this should be set initially following this practice unless testing has shown that leaving it at 0 is actually best, which is possible.
After making the change monitor not just wait types, but also for how the system is performing. Look for queries that run under parallelism and test them manually using different levels of DOP using the OPTION(MAXDOP n) query hint to see if reducing parallelism actually improves or harms performance. You might find that reducing it for one query improves performance while the rest of the workload shows a performance decrease from that same tested reduction. In that case putting the query hint in, either as a plan guide for the individual query, or by changing the code if you have access, would yield better returns.
Hello Jonathan,
Could you please tell me what is recommended CTP and MAXDOP value on my sql 2008 X64 standard edition. Mentioned below is the o/p sys.dm_os_nodes
0 ONLINE 0x000000000BD50080 0x0000000000FC4248 0x000000000E89C1A0 1 16773120 12 11 42 5 3665920 12873728 1
1 ONLINE 0x0000000000FC6080 0x0000000000FC4578 0x0000000022BD21A0 0 4095 12 11 42 5 0 0 1
64 ONLINE DAC 0x0000000023A30080 0x0000000000FC48A8 0x0000000023A3E1A0 0 0 1 1 1 1 0 0 0
Pinal, Jonathan, anyone.
Is there any rule of thumb like in what kind of situations this is bad for the performance? I’d guess that servers where there is many INSERT/UPDATEs with simultanous SELECTs with lot of CXPACKET waits is not a good thing.
I’m asking because one of our system shows many, many rows with lastwaittype=CXPACKET when I query the sys.sysprocesses view almost any given time. The application that uses the database is sluggish and I’m 99% sure that it’s because of the database or to be precise because of the queries ran against the database.
Thanks in advance,
Marko
Marko,
In general, OLTP systems don’t benefit from parallelism, but peoples definition of what an OLTP system is differs. A lot of OLTP systems these days are not strictly used for OLTP workloads, but also support reporting as well. It is possible that the problem could be parallelism in that scenario, and it is possible and more likely that a detailed analysis of the reporting side would show that code changes, index changes, or overall implementation changes would solve the problems.
As I mentioned in my first comment, I look at the queries that are executing using parallelism and determine why. Is there a missing index, are the estimates made by the optimizer significantly different from the actual results in the execution plan? How frequently are the queries being executed? It could be that changing the ‘cost threshold for parallelism’ option to a higher value would reduce the number of queries running under parallelism, but still allow more expensive queries to utilize parallelism. The hardware being used is also a factor in what you do.
This is really a huge “It Depends” type of subject, and a lot of things come into play that have to be looked at to determine what is best. If you’d like some assistance with looking at the problem feel free to contact me.
Hi Pinal,
Very good article to understand the role of Wait Types in perfomance ……Just wanted to correct it here that in your blog you have mentioned to change ‘Max Degree Of Paralleism’ to 1 or 0 as per the system (OLAP/OLTP)’ but in your query you have used ‘cost threshold for parallelism’ parameter. please correct it to avoid any confusion.
Thanks,
Nilesh
Pinal,
I think there is a mistake in your post. For pure OLTP and OLAP systems you are giving (correctly) advice for MaxDOP, but the code states (incorrectly) :
EXEC sys.sp_configure N’cost threshold for parallelism’, N’1′
Also, I want to express some concern about the hard advice for the MAXDOP=2. Maybe Jonathans advice might give better initial value.
Is this N’max degree of parallelism’ means number of physical CPUs or logical processors ? We have a db box with 2 CPUs x 4 Cores, so, shows 8 logical processors on Task Manager, so, should I set the limit to 1 (half of physical CPUs ) or 4 ( half of 8 logical ) ? Thanks.
When you wrote this command:
EXEC sys.sp_configure N’cost threshold for parallelism’, N’1′
GO
RECONFIGURE WITH OVERRIDE
GO
Did you not mean to reference ‘max degree of parallelism’?
I have a cut over in 3 days and performance testing is under way now. I found that adjusting
EXEC sys.sp_configure N’cost threshold for parallelism’, N’25’
but leaving MaxDOP set to 0 to have the best performance. At least now I have stopped seeing CXPACKET wait types and the performance difference is marginally better.
Duane Lawrence
How do you find the queries that are executing using parallelism ? I have SQL server 2005 installed but the prod database is set at compatibility 80.
How do i find the queries that are executing using parallelism. I have SQL server 2005 installed but the prod database is set at compatibility level 80. I am see the follow wait stats using Glen Barry’s wait SQL
wait_time_s pct running_pct
CXPACKET 1280289.05 51.45 51.45
Like Jonathan has stated. Our OLTP system supports reporting, could the queries by found by using profiler and looking for queries where duration is greater than CPU? Our Maxdop is currently set at 0.
Real quickly, I develop and manage the performance on a (complex) OLTP environment. This means it’s not just crud statements, but some complex selects used in archiving, and analyzing transactions on the fly while doing the inserts, updates, and deletes. Believe it or not, I’ve found the default maxdop of zero works extremely well and just a minor adjustment to degree of par. to 10 or 15 is fine. But what was the main consideration is “CPU head-room”. Does the system max out all procs? or is there adaquate head-room for preventing bottle-necking on CPU? Just another data point to consider. And increased DOP has allowed us to have a much lower ellapsed time in queries while the total CPU times are a little higher. So it would appear we have higher cxpacket waits, but that’s perfectly fine. It’s the net result that matters.
-dp
Thank you, Pinal!
Saved our day today :)
Thank you, Pinal. Great info as always.