It has been a while since we have seen a puzzle so let us do a small puzzle today. Actually, this is a very simple puzzle if you have worked with SQL Server for a while. I got an email from one of the readers from the blog post and he wanted me to correct the following query which he had written. Well, I have been working with SQL Server for over 19 years and I often do not remember the syntax and have to look up online. Here is a very simple puzzle for you which involves Aggregate and Subquery.
Important Update: The Workshop is now over and it was one of the landmark workshops, where I shared business secrets of successful Performance Tuning Expert. You can watch the recording of the workshop for a limited period of time. |
Puzzle – Aggregate and Subquery
Here is the query when we run against the AdventureWorks database, it gives us an error. Remember AdventureWorks database is a sample database and you can download and install AdventureWorks over here.
USE AdventureWorks2014 GO SELECT pch.StandardCost, p.ProductID FROM Production.ProductCostHistory pch INNER JOIN Production.Product p ON pch.ProductID = p.ProductID WHERE pch.StandardCost > AVG(p.StandardCost) GO
Now when you run the above query, it gives us the following error:
Msg 147, Level 15, State 1, Line 17
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
Now here is your task, you need to help me to fix the above query. The hint to resolve the query is already in the blog post’s title and in the query error. You need to understand what my query is attempting to do and use the hint which is already pointed out to you. After you successfully fix the query it will produce the following results.
Rules
Well, there are no real rules, as long as you modify my query to get me above results. Please post your query in the comments section below before April 22, 2019. I will keep all the answers hidden till April 22, 2019 11:59:00 PM.
Winner
One Winner will get 1 free seat for my upcoming class, which is worth USD 1000.
SQL Server Performance Tuning Practical Workshop for EVERYONE
Date:Â April 23, 2019, Tuesday
8:00 AM Pacific Time | 11:00 AM Eastern Time
If you do not want to take part in the contest, you can directly register for the class here.
Update: April 23, 2019
The contest is closed and the winning Answer is here:
USE AdventureWorks2014 GO SELECT pch.ProductID, pch.StartDate,pch.EndDate, pch.StandardCost, pch.ModifiedDate FROM Production.ProductCostHistory pch INNER JOIN Production.Product p ON pch.ProductID = p.ProductID WHERE pch.StandardCost > p.StandardCost GO
There is no need for the AVG on the StandardCost as there will be always a single entry for the product. Even if you change the query with group by + having you will still see the query uses the simple join.
Congratulations to the winner Nicholas Månsson. He is very excited to join the SQL Server Performance Tuning Practical Workshop for EVERYONE.
Everyone who participated you got to solve this difficult puzzle so indirectly you are a winner (of knowledge) as well. This was an intense competition with over 500 comments.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
527 Comments. Leave new
SELECT p.ProductID
, pch.StartDate
, pch.EndDate
, pch.StandardCost
, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN ( SELECT Product.ProductID
, AVG( Product.StandardCost ) AvgCost
FROM Production.Product
GROUP BY Product.ProductID ) p
ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > p.AvgCost;
GO
USE AdventureWorks2014
GO
SELECT pch.ProductID, StartDate, EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
GROUP BY pch.ProductID, StartDate, EndDate, pch.StandardCost, pch.ModifiedDate
HAVING pch.StandardCost > AVG(p.StandardCost)
GO
SELECT pch.ProductID, pch.StartDate, pch.EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN (select ProductID, AVG(StandardCost) StandardCost from Production.Product group by ProductID)p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > p.StandardCost
SELECT pch.ProductID,pch.StartDate,pch.EndDate, pch.StandardCost,pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
group by pch.ProductID,pch.StartDate,pch.EndDate, pch.StandardCost,pch.ModifiedDate
having pch.StandardCost > AVG(p.StandardCost)
GO
I’m sure I could be doing something better, but off the cusp, this will give you the requested results:
SELECT [pch].[ProductID]
, [pch].[StartDate]
, [pch].[EndDate]
, [pch].[StandardCost]
, [pch].[ModifiedDate]
FROM [Production].[ProductCostHistory] AS [pch]
INNER JOIN
(
SELECT [ProductID]
, AVG([StandardCost]) AS [StandardCost]
FROM [Production].[Product]
GROUP BY [ProductID]
) AS [p]
ON [pch].[ProductID] = [p].[ProductID]
WHERE [pch].[StandardCost] > [p].[StandardCost];
SELECT pch.StandardCost, p.ProductID
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
group by pch.StandardCost, p.ProductID
having pch.StandardCost > AVG(p.StandardCost).
— Fixed original query:
;WITH Product_CTE (ProductID, AVG_StandardCost) AS
(
SELECT ProductID, AVG(StandardCost)
FROM Production.Product
GROUP BY ProductID
)
SELECT pch.StandardCost, p.ProductID
FROM Production.ProductCostHistory pch INNER JOIN Product_CTE p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > AVG_StandardCost
— To provide the requared output:
;WITH Product_CTE (ProductID, AVG_StandardCost) AS
(
SELECT ProductID, AVG(StandardCost)
FROM Production.Product
GROUP BY ProductID
)
SELECT pch.ProductID, pch.StartDate, pch.EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch INNER JOIN Product_CTE p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > AVG_StandardCost
— other way of doing it, just for the fun of it:
SELECT pch.ProductID, pch.StartDate, pch.EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory AS pch
INNER JOIN (SELECT ProductID, AVG(StandardCost) AS ‘AVG_StandardCost’ FROM Production.Product GROUP BY ProductID) AS p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > p.AVG_StandardCost
Pinal,
I refactored my method 3 by eliminating an INNER JOIN. I believe this is the most elegant, though the execution plan is exactly the same as my method 1.
Joe
–method 4 correlated subquery optimized with no HAVING clause and eliminated 1 join
SELECT p.ProductID StartDate, EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN (
SELECT ProductID, AVG(StandardCost) AS StandardCost
FROM Production.Product
GROUP BY ProductID
) p
ON p.ProductID = pch.ProductID
WHERE pch.StandardCost > p.StandardCost
— Solution to Puzzle – Aggregate and Subquery
GO
SELECT ProductID, pch.StandardCost
FROM Production.ProductCostHistory pch
WHERE pch.StandardCost > (select AVG(StandardCost) from Production.Product where ProductID = pch.ProductID)
order by ProductID
GO
1. Since Production.Product has ProductID as primary key, LOGICALLY, AVG(StandardCost) is same as StandardCost; so just remove AVG(
2. If at all there is need to average,
SELECT pch.StandardCost, p.ProductID
FROM Production.ProductCostHistory pch
INNER JOIN (select AVG(p1.StandardCost) avgStdCost, ProductID FROM Production.Product p1 group by ProductID) p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > avgStdCost
Use [Adventureworks2014]
Go
Select
pch.StandardCost, p.productID
From
[Production].[ProductionCostHistory] pch INNER JOIN
(Select ProductID, AVG.(StandardCost) [AVGStandardCost] from [Production].[Product] group by productID) p
on
pch.[ProductID] = p.[ProductID]
Where
pch.StandardCost > p.[AVGStandardCost]
USE AdventureWorks2014
GO
SELECT p.ProductID, pch.StandardCost, p.AverageCost
FROM Production.ProductCostHistory pch
INNER JOIN (
SELECT ProductID, AVG(p.StandardCost) AS AverageCost
FROM Production.Product
GROUP BY ProductID) p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > p.AverageCost
GO
USE AdventureWorks2014
GO
WITH cte
as
(
SELECT pch.StandardCost, p.ProductID,
CASE WHEN pch.StandardCost > AVG(p.StandardCost) THEN ‘Greater’
ELSE ‘not’
END AS AvgCostGreater
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
GROUP BY pch.StandardCost, p.ProductID
)
SELECT StandardCost, ProductID
FROM cte WHERE cte.AvgCostGreater=’Greater’
GO
Pinal,
Tricky! I think the author didn’t understand the grain of the tables. While ProductionCostHistory has each ProductID many times, the Product table only has each ProductID once as it is the primary key in this table. So, there is no need to AVG(p.standard_cost) if he’s then going to group the results by the ProductId. To get the same 6 rows of results that you showed, we just need to do this:
SELECT p.ProductID, pch.StartDate, pch.EndDate,
pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
WHERE pch.StandardCost > p.StandardCost
If, however, the request was to see entries from the ProductCostHistory table where the StandardCost was greater than the overall average StandardCost across the entire Product table, we could use the query below. This does NOT produce the same 6 results that you showed in your screenshot.
with cte_avg as
(select AVG(p.StandardCost) as overall_avg_standardcost
from Production.Product p)
SELECT pch.ProductID, pch.StartDate, pch.EndDate,
pch.StandardCost, cte_avg.overall_avg_standardcost
FROM Production.ProductCostHistory pch, cte_avg
WHERE pch.StandardCost > cte_avg.overall_avg_standardcost
SELECT *
INTO #test1
FROM ( SELECT 1 AS ProductId,
2500 AS ProductCost
UNION
SELECT 2 AS ProductId,
5000 AS ProductCost
UNION
SELECT 3 AS ProductId,
8000 AS ProductCost) a;
SELECT *
INTO #test2
FROM ( SELECT 1 AS ProductId,
2500 AS ProductCost
UNION
SELECT 4 AS ProductId,
5000 AS ProductCost
UNION
SELECT 3 AS ProductId,
3000 AS ProductCost) a;
SELECT * FROM #test1 AS t1
WHERE EXISTS (SELECT * FROM #test2 AS t2 WHERE t2.ProductId = t1.ProductId having t1.ProductCost > AVG(t2.ProductCost))
;WITH CTE AS
( SELECT p.ProductID, AVG(p.StandardCost) AS AvgStandardCost
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
GROUP BY p.ProductID
)
SELECT pch.ProductID, pch.StartDate, pch.EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
JOIN CTE ON CTE.ProductID = pch.ProductID
WHERE pch.StandardCost > CTE.AvgStandardCost
or
SELECT pch.ProductID, pch.StartDate, pch.EndDate, pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
JOIN
(
SELECT p.ProductID, AVG(p.StandardCost) AS AvgStandardCost
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
GROUP BY p.ProductID
) sub ON sub.ProductID = pch.ProductID
WHERE pch.StandardCost > sub.AvgStandardCost
GO
USE AdventureWorks2017
GO
SELECT p.ProductID,pch.StartDate,pch.EndDate ,pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
–WHERE pch.StandardCost > AVG(p.StandardCost)
group by p.ProductID,pch.StartDate,pch.EndDate ,pch.StandardCost, pch.ModifiedDate
HAVING pch.StandardCost > AVG(p.StandardCost)
GO
I feel there is a clever solution that I don;t see. But here are two ways I could get your results.
SELECT pch.*
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
INNER JOIN (
SELECT ip.ProductID, AVG(ip.StandardCost) AS sc
FROM Production.ProductCostHistory ipch
INNER JOIN Production.Product ip ON ipch.ProductID = ip.ProductID
GROUP BY ip.ProductID
) sub ON sub.ProductID = p.ProductID
AND pch.StandardCost > sub.sc;
WITH AggrCosts AS (
SELECT pch.*, AVG(p.StandardCost) OVER (Partition by p.ProductId) AS AvgCost
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
)
SELECT ProductID, StartDate, EndDate, StandardCost, ModifiedDate
FROM AggrCosts
WHERE StandardCost > AvgCost
USE AdventureWorks2014
GO
SELECT p.ProductID,pch.StartDate, pch.EndDate ,pch.StandardCost, pch.ModifiedDate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID group by P.ProductID, PCh.StandardCost, pch.StartDate, pch.EndDate, pch.ModifiedDate
having pch.StandardCost > AVG(p.StandardCost)
order by p.ProductID
GO
USE AdventureWorks2014
GO
SELECT
p.ProductID,
startdate,
enddate,
pch.StandardCost,
p.modifieddate
FROM Production.ProductCostHistory pch
INNER JOIN Production.Product p ON pch.ProductID = p.ProductID
GROUP BY pch.StandardCost,p.ProductID,startdate,enddate,p.modifieddate
HAVING pch.StandardCost > AVG(p.StandardCost)
GO