Interview Question of the Week #034 – What is the Difference Between Distinct and Group By

Question: What is the difference between DISTINCT and GROUP BY?

Answer:

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates, then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:

SELECT DISTINCT Employee, Rank
FROM Employees

Example of GROUP BY:

SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:

SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #033 – How to Invalidate Procedure Cache of SQL Server?

icon clean Interview Question of the Week #033   How to Invalidate Procedure Cache of SQL Server?Question: How to invalidate all the stored procedure plans in SQL Server?

Answer: DBCC FREEPROCCACHE will invalidate all stored procedure plans that the optimizer has cached in memory and force SQL Server to compile new plans the next time those procedures are run.

Remember not to run this on a production server without proper planning as once you run this DBCC command it will invalidate all the stored procedure plans. This means if the user runs this during pick time, many Stored Procedures will execute at the same time and create a resource bottleneck.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #032 – Best Practices Around FileGroups

Here is another question I just heard during a recent interview in the multinational company. I have observed over 1000s of interview, but this is the first time I have heard this is being discussed and I was very much glad to hear it. I have listed the answer of the candidate, right below it as well.

Question: What are the best practices for filegroups in SQL Server?

Answer:

It is suggested to place transaction logs on separate physical hard drives. In this manner, data can be recovered up to the second in the event of a media failure.

Tables and their non-clustered indexes separated into separate file groups can improve performance, because modifications to the table can be written to both the table and the index at the same time.

If tables and their corresponding indexes in a different file group, they must be backed up the two file groups as a single unit as they cannot be backed up separately.

Set a reasonable size of your database and transaction log (25% of database size).

Leave the Autogrow feature ON for the data files and for the log files with reasonable size of autogrow increment.

Place the log files on another physical disk arrays than those with the data files to improve I/O Performance.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Interview Question of the Week #031 – How to do Case Sensitive SQL Query Search

Question – How to do case sensitive search on any column of the table?

Answer

If Column1 of Table1 has following values ‘CaseSearch, casesearch, CASESEARCH, CaSeSeArCh’, following statement will return you all the four records.

SELECT Column1
FROM Table1
WHERE Column1 = 'casesearch'

To make the query case sensitive and retrieve only one record (“casesearch”) from the above query, the collation of the query needs to be changed as follows.

SELECT
Column1
FROM Table1
WHERE Column1 COLLATE Latin1_General_CS_AS = 'casesearch'

Adding COLLATE Latin1_General_CS_AS makes the search case sensitive.

Default Collation of the SQL Server installation SQL_Latin1_General_CP1_CI_AS is not case sensitive.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #029 – Difference Between CHARINDEX vs PATINDEX

Question: What is the difference between CHARINDEX vs PATINDEX?

Answer: We can use either CHARINDEX or PATINDEX to search in a TEXT field in SQL SERVER. The CHARINDEX and PATINDEX functions return the starting position of a pattern you specify.

regular expressions regex Interview Question of the Week #029   Difference Between CHARINDEX vs PATINDEX

Both functions take two arguments. With PATINDEX, you must include percent signs before and after the pattern, unless you are looking for the pattern as the first (omit the first %) or last (omit the last %) characters in a column. For CHARINDEX, the pattern cannot include wildcard characters. The second argument is a character expression, usually a column name, in which Adaptive Server searches for the specified pattern.

Example of CHARINDEX:
USE AdventureWorks2014;
GO
SELECT CHARINDEX('important', DocumentSummary),
DocumentSummary
FROM Production.Document
WHERE Title = 'Repair and Service Guidelines';
GO

Examples of PATINDEX:
USE AdventureWorks2014;
GO
SELECT PATINDEX('%imp_rtant%', DocumentSummary),
DocumentSummary
FROM Production.Document
WHERE Title = 'Repair and Service Guidelines';
GO

Summary:
PATINDEX is CHARINDEX + WildCard Search. Use either of them is depending your business need.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

Interview Question of the Week #028 – Few SQL Best Practices

Just last week, I moderated a SQL Server interview and I heard very interesting conversation.

Question – What are the best practices are you following for SQL Server?

When I heard this question – I thought candidate will answer many different things, but he was stunned and had nothing much to add to the conversation. That reminded me one of my older articles which I believe still adds quite a lot of value to this conversation. Here is the article as an answer to the question.

Answer – 

1. Know your data and business application well.
Familiarize yourself with these sources; you must be aware of the data volume and distribution in your database.

2. Test your queries with realistic data.
A SQL statement tested with unrealistic data may behave differently when used in production. To ensure rigorous testing, the data distribution in the test environment must also closely resemble that in the production environment.

3. Write identical SQL statements in your applications.
Take full advantage of stored procedures, and functions wherever possible. The benefits are performance gain as they are precompiled.

4. Use indexes on the tables carefully.
Be sure to create all the necessary indexes on the tables. However, too many of them can degrade performance.

5. Make an indexed path available.
To take advantage of indexes, write your SQL in such a manner that an indexed path is available to it. Using SQL hints is one of the ways to ensure the index is used.

6. Understand the Optimizer.
Understand the optimizer how it uses indexes, where clause, order by clause, having clause, etc.

7. Think globally when acting locally.
Any changes you make in the database to tune one SQL statement may affect the performance of other statements used by applications and users.

8. The WHERE clause is crucial.
The following WHERE clauses would not use the index access path even if an index is available.
E.g. Table1Col1 (Comparison Operator like >, >=, <=,) Table1Col2, Table1Col1 IS (NOT) NULL, Table1Col1 NOT IN (value1, value2), Table1Col1 != expression, Table1Col1 LIKE ‘%pattern%’, NOT Exists sub query.

9. Use WHERE instead of HAVING for record filtering.
Avoid using the HAVING clause along with GROUP BY on an indexed column.

10. Specify the leading index columns in WHERE clauses.
For a composite index, the query would use the index as long as the leading column of the index is specified in the WHERE clause.

11. Evaluate index scan vs. full table scan. (Index Only Searches Vs Large Table Scan, Minimize Table Passes)
If selecting more than 15 percent of the rows from a table, full table scan is usually faster than an index access path. An index is also not used if SQL Server has to perform implicit data conversion. When the percentage of table rows accessed is 15 percent or less, an index scan will work better because it results in multiple logical reads per row accessed, whereas a full table scan can read all the rows in a block in one logical read.

12. Use ORDER BY for index scan.
The SQL Server optimizer will use an index scan if the ORDER BY clause is on an indexed column. The following query illustrates this point.

13. Minimize table passes.
Usually, reducing the number of table passes in a SQL query results in better performance. Queries with fewer table passes mean faster queries.

14. Join tables in the proper order.
Always perform the most restrictive search first to filter out the maximum number of rows in the early phases of a multiple table join. This way, the optimizer will have to work with fewer rows in the subsequent phases of join, improving performance.

15. Redundancy is good in where condition.
Provide as much information as possible in the WHERE clause. It will help the optimizer to clearly infer conditions.

16. Keep it simple, stupid.
Very complex SQL statements can overwhelm the optimizer; sometimes writing multiple, simpler SQL will yield better performance than a single complex SQL statement.

17. You can reach the same destination in different ways.
Each SQL may use a different access path and may perform differently.

18. Reduce network traffic and increase throughput.
Using T-SQL blocks over Multiple SQL statements can achieve better performance as well as reduce network traffic. Stored Procedures are better over T-SQL blocks as they are stored in SQL Server and they are pre-compiled.

19. Better Hardware.
Better hard ware always helps performance. SCACI drives, Raid 10 Array, Multi processors CPU, 64-bit operating system improves the performance by a great amount.

20. Avoid Cursors.
Using SQL Server cursors can result in some performance degradation in comparison with select statements. Try to use a correlated sub query or derived tables if you need to perform row-by-row operations.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #027 – Move TempDB from One Drive to Another Drive When Low Disk Space

I am often invited to attend various interviews and once in a while, I see practical questions discussed in the interview. Here is one of the important interview question which is related to TempDB.

Question – “What will you do when your log file of TempDB is full?”

Answer – The best solution when Log file of TempDB is full is to move it to a new drive. Here is more details about the same.

Make sure that TempDB is set to autogrow and do not set a maximum size for TempDB. If the current drive is too full to allow autogrow events, then arrange a bigger drive, or add files to TempDB on another device (using ALTER DATABASE as described below and allow those files to autogrow.

Move TempDB from one drive to another drive. There are major two reasons why TempDB needs to move from one drive to another drive.
1) TempDB grows bigger and the existing drive does not have enough space.
2) Moving TempDB to another file group which is on a different physical drive helps to improve database disk read, as they can be read simultaneously.

Follow direction below exactly to move the database and log from one drive (c:) to another drive (d:) and (e:).

Open Query Analyzer and connect to your server. Run this script to get the names of the files used for TempDB.

USE TempDB
GO
EXEC sp_helpfile
GO

Results will be something like:
name fileid filename filegroup size
——- —— ————————————————————– ———- ——-
tempdev 1 C:Program FilesMicrosoft SQL ServerMSSQLdatatempdb.mdf PRIMARY 16000 KB
templog 2 C:Program FilesMicrosoft SQL ServerMSSQLdatatemplog.ldf NULL 1024 KB

Along with other information related to the database. The names of the files are usually tempdev and templog by default. These names will be used in the next statement. Run following code, to move mdf and ldf files.

USE MASTER
GO
ALTER DATABASE TempDB MODIFY FILE
(NAME = tempdev, FILENAME = 'd:\datatempdb.mdf')
GO
ALTER DATABASE TempDB MODIFY FILE
(NAME = templog, FILENAME = 'e:\datatemplog.ldf')
GO

The definition of the TempDB is changed. However, no changes are made to TempDB till SQL Server restarts. Please stop and restart SQL Server and it will create TempDB files in new locations.

You can read more about this over here.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #026 – 64 Bit Vs 32 Bit Confusion

I often see people getting lost between 64 bit and 32 bit operating systems and application conversation. There are plenty of individuals still today, not sure what is the difference between them as well what runs on what platform. Here are a few very simple questions related to Windows OS and SQL Server.

32bit 64bit Interview Question of the Week #026   64 Bit Vs 32 Bit Confusion

Question: Can you run SQL SERVER 32 bit on Windows 32 bit OS?

Answer: Yes

Question: Can you run SQL SERVER 64 bit on Windows 32 bit OS?

Answer: No

Question: Can you run SQL SERVER 32 bit on Windows 64 bit OS?

Answer: Yes

Question: Can you run SQL SERVER 64 bit on Windows 64 bit OS?

Answer: Yes

I guess there you go. In a simple words, you can run pretty much everything on Windows 64 bit OS. However, Windows 32 bit OS can’t run SQL Server 64 bit.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #025 – How to Implement Paging in SQL Server?

Here is a very popular question when SQL Server interview consists of live coding example.

Question: How to implement paging in SQL Server? Explain with the help of script.

Answer: Here is the script to implemented paging in SQL Server 2005.

USE AdventureWorks
GO
DECLARE  @StartRow INT
DECLARE  
@EndRow INT
SET    
@StartRow = 120
SET @EndRow = 140
SELECT    FirstName, LastName, EmailAddress
FROM (
SELECT    PC.FirstName, PC.LastName, PC.EmailAddress,
ROW_NUMBER() OVER(
ORDER BY PC.FirstName, PC.LastName,PC.ContactID) AS RowNumber
FROM    Person.Contact PC) PersonContact
WHERE    RowNumber > @StartRow
AND RowNumber < @EndRow
ORDER BY FirstName, LastName, EmailAddress
GO

Above query is using ROW_NUMBER () function, but you can also use CTE to accomplish the same. Here is the script for the same:

USE AdventureWorks
GO
DECLARE @StartRow INT
DECLARE
@EndRow INT
SET
@StartRow = 120;
SET @EndRow = 140;
WITH PersonContact AS
(
SELECT PC.FirstName, PC.LastName, PC.EmailAddress,
ROW_NUMBER() OVER(
ORDER BY PC.FirstName, PC.LastName,PC.ContactID) AS RowNumber
FROM Person.Contact PC)
SELECT FirstName, LastName, EmailAddress
FROM PersonContact
WHERE RowNumber > @StartRow
AND RowNumber < @EndRow
ORDER BY FirstName, LastName, EmailAddress
GO

You can read more about this in following blog posts:

Reference : Pinal Dave (http://blog.SQLAuthority.com)

Interview Question of the Week #024 – What is the Best Recovery Model?

Here is one of the most popular questions I often see people asking on the internet.

Question: What is the best recovery model for my database?

Answer: Every situation is different and each situation has different needs for the recovery model.

SQL Server offers three recovery models: full recovery, simple recovery and bulk-logged recovery. The recovery models determine how much data loss is acceptable and determines whether and how transaction logs can be backed up.

Select Simple Recovery Model if:
* Your data is not critical.
* Losing all transactions since the last full or differential backup is not an issue.
* Data is derived from other data sources and is easily recreated.
* Data is static and does not change often.

-- Set the recovery model to Simple
ALTER DATABASE Advetureworks SET RECOVERY SIMPLE
GO

Select Bulk-Logged Recovery Model if:
* Data is critical, but logging large data loads bogs down the system.
* Most bulk operations are done off hours and do not interfere with normal transaction processing.
* You need to be able to recover to a point in time.

-- Set the recovery model to Bulk Logged
ALTER DATABASE Advetureworks SET RECOVERY BULK_LOGGED
GO

Select Full Recovery Model if:
* Data is critical and no data can be lost.
* You always need the ability to do a point-in-time recovery.
* Bulk-logged activities are intermixed with normal transaction processing.
* You are using replication and need the ability to resynchronize all databases involved in replication to a specific point in time.

-- Set the recovery model to Full
ALTER DATABASE Advetureworks SET RECOVERY FULL
GO

You can switch from any recovery model to another recovery model, but prior to or after the switch, you may need to issue additional transaction log or full backups to ensure you have a complete backup set.

Reference: Pinal Dave (http://blog.sqlauthority.com)