Business Intelligence (BI) plays a significant role in businesses nowadays. Moreover, the databases that deal with the queries related to BI are presently facing an increase in workload. At present, when queries are sent to very large databases, millions of rows are returned. Also the users have to go through extended query response times when joining multiple tables are involved with such queries. ‘Star Join Query Optimization’ is a new feature of SQL Server 2008 Enterprise Edition. This mechanism uses bitmap filtering for improving the performance of some types of queries by the effective retrieval of rows from fact tables.
Improved Query Response Times
In general, data warehouses employ dimensionally modeled star or snowflake schemas. These schemas have one or more than one fact tables that contain transactional data and many dimension tables, which holds information such as product data, customer information, and times and dates – all these define the fact table data. Usually, foreign keys are employed for maintaining relationships between the rows in fact tables and also between the rows in the dimension tables. Databases that contain star schemas are recognized by SQL Server 2008 Enterprise. It uses the new Star Join Query logic for processing queries against such star schemas more efficiently. Typically, on an average, data warehouse queries run faster to approximately 20 percent.
Automatically Implemented
Star Join Query Optimization is automatically implemented by the SQL Server. It does not require a special database or application configuration. The query processor will usually optimize queries with medium selectivity (this refers to the queries that retrieve approximately 10% to 75% of rows from a fact table). Such queries are usually handled using hash joins to join the dimension and fact tables by employing the foreign keys to identify the matching rows. A hash table is built for each dimension table referenced in the query in the case of hash joins; the optimization process uses these hash tables for deriving bitmap filters. The key values from each dimension table are identified by bitmap filters; these key values qualify for inclusion in the query. When the fact table is scanned, the bitmap filters are applied to it. These bitmap filters eliminate those rows of the fact table which are not qualified for inclusion in the result set. The most selective bitmap filter is applied first as it is found to eliminate the highest number of rows. Since the eliminated rows do not need further processing, the subsequent filters need not be applied to them – this way the process becomes more efficient.
Query Selectivity
The performance is enhanced in the case of medium selectivity queries while using bitmap filtering because the rows are filtered before any joins are implemented. Hence, there is a decrease in the number of rows that are processed by each join. Bitmap filtering is not applied when queries are highly selective (i.e., those queries that return less than 10% of the rows in a fact table). In such case, a nested loop join is found to be generally more efficient. Similarly, when the queries are not very selective at all (queries which return more than 75% of the rows in a fact table), bitmap filtering is not applied as there are very few rows to be filtered, and hence, there is no requirement of enhancement in performance in this case.
Integer Data Types
Star join optimization is found to give the highest efficiency when the data type of the columns used in the joins is integer. This feature enables the bitmap filter to be applied as part of the initial table or index scan rather than being used at a later stage in the query plan. Most of the queries are benefited from star join optimization since foreign key relationships are commonly implemented using integer-based alternate key columns.
Reference : Pinal Dave (https://blog.sqlauthority.com)
5 Comments. Leave new
Thank you for this informative article, I had not read about this before but this will make a strong case for using a star schema when it fits the DB utilization.
Please optimize it
****************************
SET statistics io ON;
SET statistics TIME ON;
SELECT itm.userreviewid AS itemid,
itm.reviewitem AS title,
itm.imgpath AS imgpath,
‘Barkspider’ AS [Source],
reviews.numratings,
reviews.avgrating,
5 AS ratingmax,
” AS itempath,
” AS identifier,
‘barspider-user-review.png’ AS logo,
Isnull(0, 0) AS fromprice,
Isnull(0, 0) AS toprice,
reviews.totrating,
Newid() AS ‘NewID’,
‘a’ AS bssort
FROM barkspiderblog.dbo.userreviews itm WITH (nolock)
JOIN (SELECT COUNT(1) AS numratings,
AVG(rating) AS avgrating,
SUM(rating) AS totrating,
reviewitem
FROM barkspiderblog.dbo.userreviews WITH (nolock)
GROUP BY reviewitem) reviews
ON Lower(itm.reviewitem) = Lower(reviews.reviewitem)
WHERE ( ( ( itm.reviewitemid IS NULL
OR itm.reviewitemid = 0 )
AND itm.parentid IS NOT NULL )
OR ( itm.reviewitemid IS NOT NULL
AND itm.reviewitemid = itm.userreviewid ) )
AND itm.reviewitem LIKE ‘%tv%’
AND itm.userreviewid IN (SELECT userreviewid
FROM (SELECT Rank() OVER (PARTITION BY
reviewitem
ORDER
BY createdon ASC) AS
r1,
userreviewid
FROM barkspiderblog.dbo.userreviews
WHERE reviewitem LIKE ‘%tv%’) tab
WHERE tab.r1 = 1)
UNION ALL
SELECT itm.itemid,
itm.title,
itm.imgpath,
src.DESCRIPTION AS [Source],
reviews.numratings,
reviews.avgrating,
src.ratingmax,
itm.itempath,
itm.identifier,
src.logo,
fpr.fromprice,
fpr.toprice,
reviews.totrating,
Newid() AS ‘NewID’,
‘b’ AS bssort
FROM items itm WITH (nolock)
JOIN sources src WITH (nolock)
ON itm.sourceid = src.sourceid
JOIN (SELECT COUNT(1) AS numratings,
AVG(rating) AS avgrating,
SUM(rating) AS totrating,
itemid
FROM reviews WITH (nolock)
GROUP BY itemid) reviews
ON itm.itemid = reviews.itemid
LEFT OUTER JOIN fetchitemprice fpr WITH (nolock)
ON itm.itemid = fpr.itemid
WHERE title LIKE ‘%tv%’
UNION ALL
SELECT itm.itemid,
itm.title,
itm.imgpath,
‘Barkspider’ AS [Source],
reviews.numratings,
reviews.avgrating,
5 AS ratingmax,
itm.itempath,
itm. identifier,
‘barspider-user-review.png’ AS logo,
0 AS fromprice,
0 AS toprice,
reviews.totrating,
Newid() AS newid,
‘a’ AS bssort
FROM items itm WITH (nolock)
JOIN sources src WITH (nolock)
ON itm.sourceid = src.sourceid
JOIN (SELECT COUNT(1) AS numratings,
AVG(rating) AS avgrating,
SUM(rating) AS totrating,
reviewitemid,
reviewitem
FROM barkspiderblog.dbo.userreviews WITH (nolock)
GROUP BY reviewitem,
reviewitemid) reviews
ON itm.itemid = reviews.reviewitemid
LEFT OUTER JOIN fetchitemprice fpr WITH (nolock)
ON itm.itemid = fpr.itemid
WHERE itm.itemid IN (SELECT ur1.reviewitemid
FROM barkspiderblog.dbo.userreviews ur1 WITH (nolock)
WHERE ur1.parentid IS NULL)
AND title LIKE ‘%tv%’
ORDER BY bssort,
Newid()
Hi Pinal,
Thanks for such a great article,
I have a small doubt in this, as per the star join startagy it will create hash tables. If our fact table is very huge and our temp DB can’t take up this amount of data into hash table how does it manage the query.
some quries take too much time to be executed at my distributed appliction database so I got technique is
1.Make script of entire tables and sps
2.Delete then re-create database
3-Execute script throug dos command
this works but after few weeks problem come again, I am stuck.What should I do?
Hi Pinal. When you say “Typically, on an average, data warehouse queries run faster to approximately 20 percent” do you mean ideal optimized target run time is 20% of an equivalent non optimized query or 80% of a non optimized query?