SQL SERVER – Connecting to Azure Storage with SSMS

SQL Server Management Studio can now connect to Azure Storage.

It is very simple to connect to Azure Storage. First open the Object Explorer in SQL Server Management Studio (SSMS). Now click on Connect. Upon clicking on the connect it will bring up five options as displayed in the image below.

Click on “Azure Storage” and it will bring up another screen asking for storage account and account key.

Now go to your Azure account and get the name of the storage account, and account Key.

It will bring up following screen.

Enter the name of the account and account key at the prompt in SSMS.

Once you enter the details, it will immediately connect you with your Azure Storage. Next, expand the node and you will notice your containers.

The same containers will be visible in your Azure account online.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – SSMS: Index Usage Statistics, Physical Statistics & User Statistics

When I used to be in the consulting business, the most request session or topics always revolved around Performance Tuning. And whenever I talk to folks around what they want to get covered, they often get to a point of when and how to create efficient indexes in my database. The concepts of indexes and how they function have been in the industry for more than 2 decades and it still baffles me how they are still a topic of debate and interest. The core fundamentals of Indexes have surely stood the challenges of time and are still a great starting point for building highly performant SQL Server or for that matter data centric applications. I consider creating indexes as science rather than rules of thumb. SQL Server is a cost based optimizer and does a pretty good job in identifying, evaluating and using the right index based on your query needs. Having said that, it is also upto an administrator to monitor how the indexes are being used in a given database. Which are the ones mostly used and which least. It is a fine line to walk to understand and eliminate unwanted and duplicate indexes from time to time created by developers. I do have a starter script for duplicate indexes which is worth a look.

In this blog, we would take a quick look at three reports which are available under Database node.

  1. Index Usage Statistics
  2. Index Physical Statistics
  3. User Statistics

Index Usage Statistics

This report show the information about Index operational statistics and usage statistics. When I say usage I mean, the statistics about scans, seeks, when it was last accessed, how many user scans etc. The queries used in the background is pretty straight forward. Index Operational Statistics section shows data using below DMV sys.dm_db_index_operational_stats(db_id(),null,null,null) long with other catalog views to convert object ids to name. Next section Index Usage Statistics shows the data using DMV sys.dm_db_index_usage_stats which is well documented in book online.

Using above report, we can see which Index is used by various queries and if they are doing seek or scan. If it is a small table then it might be OK to see scan but in general, I don’t get a good feeling when I see scan of an index on huge table. Sometimes I get questions on which indexes are not being used by SQL Server. This report can give us the indexes which are used. If you don’t see index not shown here, it would mean that no query has fired which can utilize that index. So, please don’t drop the indexes which the assumption that they are not getting used. Also if the server was restarted recently, there wouldn’t be statistics for us to play with or show in this report.

Index Physical Statistics

As the report name says, it show how the index is physically laid out on data files. Here is how the report looks like:

Under the hood, this report uses below objects.

SELECT t4.name AS [schema_name]
,       t3.name AS table_name
,       t2.name AS index_name
,      t1.OBJECT_ID
,      t1.index_id
,      t1.partition_number
,      t1.index_type_desc
,      t1.index_depth
,      t1.avg_fragmentation_in_percent
,      t1.fragment_count
,      t1.avg_fragment_size_in_pages
,      t1.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,'LIMITED') t1
INNER JOIN sys.objects t3 ON (t1.OBJECT_ID = t3.OBJECT_ID)
INNER JOIN sys.schemas t4 ON (t3.schema_id = t4.schema_id)
INNER JOIN sys.indexes t2 ON (t1.OBJECT_ID = t2.OBJECT_ID AND  t1.index_id = t2.index_id)
WHERE index_type_desc <> 'HEAP'
ORDER BY t4.name,t3.name,t2.name,partition_number

If we look at the report it gives recommendation about “Operation Recommended” – Rebuild or Reorganize. The recommendation is depending up-on the fragmentation in the index. I must point out that the report does NOT consider the size of table. For example, if I have an index which is having only 10 pages, even rebuild or reorganize might not remove the fragmentation.  So, you should click on (+) symbol and see how big the table it. To demonstrate this, I have selected first table.

Even if I rebuild the index, it would still recommend Rebuild. Notice that this index has just 2 pages so we don’t have to worry about fragmentation.

User Statistics

This report has no relation with Indexes but since this is a small report, I would club them in this blog. This particular report shows information about the users who are connected to database for which report is launched. This report would be useful in situation multiple applications use different logins to connect SQL Server Instance.

Here is the query used under the hood.

SELECT (
dense_rank() OVER (
ORDER BY login_name
,nt_user_name
)
) %
2 AS row_num
,(
row_number() OVER (
ORDER BY login_name
,nt_user_name
,sessions.session_id
)
) %
2 AS session_num
,login_name
,nt_user_name
,sessions.session_id AS session_id
,COUNT(DISTINCT connections.connection_id) AS connection_count
,COUNT(DISTINCT CONVERT(CHAR, sessions.session_id) + '_' + CONVERT(CHAR, requests.request_id)) AS request_count
,COUNT(DISTINCT cursors.cursor_id) AS cursor_count
,CASE
WHEN SUM(requests.open_tran) IS NULL
THEN 0
ELSE SUM(requests.open_tran)
END AS transaction_count
,sessions.cpu_time + 0.0 AS cpu_time
,sessions.memory_usage * 8 AS memory_usage
,sessions.reads AS reads
,sessions.writes AS writes
,sessions.last_request_start_time AS last_request_start_time
,sessions.last_request_end_time AS last_request_end_time
FROM sys.dm_exec_sessions sessions
LEFT JOIN sys.dm_exec_connections connections ON sessions.session_id = connections.session_id
LEFT JOIN MASTER..sysprocesses requests ON sessions.session_id = requests.spid
LEFT JOIN sys.dm_exec_cursors(NULL) cursors ON sessions.session_id = cursors.session_id
WHERE (
sessions.is_user_process = 1
AND requests.dbid = DB_ID()
)
GROUP BY sessions.login_name
,sessions.nt_user_name
,sessions.session_id
,sessions.cpu_time
,sessions.memory_usage
,sessions.reads
,sessions.writes
,sessions.last_request_start_time
,sessions.last_request_end_time

All you need to notice is that information is fetched from sys.dm_exec_sessions, sys.dm_exec_connections and master..sysprocesses. You can also observe that filter has been added for DB_ID() which gives the database id for the connection which is running the query.  Here is the report.

In the column “Active Sessions” there is a (+) symbol for each row which can be used to get details about the sessions used by that particular login.

This is a long and exhaustive list of SSMS reports that we covered as part of the series. Let us see how this learning can be put to use in our environments and let me know if you learnt something new in this series till now.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using the SSIS Term Extraction for Data Exploration – Notes from the Field #046

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Using the SSIS Term Extraction for Data Exploration.

Linchpin People are database coaches and wellness experts for a data driven world. In this 46th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand SSIS Term Extraction for Data Exploration.


Data exploration is an essential piece of any new ETL (extraction-transformation-load) process.  Knowing the structure, type, and even the semantics of data sources can help the ETL developer or architect to make better decisions on data type and length settings, transformation logic, and error handling procedures.  Additionally, there are situations in which the exploration of the data is the principal purpose of the ETL, such as text mining and other pattern analysis.  In most scenarios, SQL Server Integration Services is used as a traditional ETL tool and not necessarily for data exploration or text mining.  However, there are some tools built into SSIS that are ideally suited for exploring source feed to expose patterns in the data.  In this brief post, I’ll cover one such tool: the term extraction component.

The term extraction component is a data flow transformation that will aggregate the frequency of words found in a specified column supplied by an upstream data source.  This component expects a Unicode text or text stream field as an input, and calculates as an output the frequency of the nouns and/or noun phrases in the specified source column.  As shown below on the data flow surface, the term extraction component (circled) is always used as a transformation, accepting exactly one input and expecting either one or two outputs (the second being an optional error output).

Configuring the term extraction component can be a relatively simple exercise.  As mentioned, you’ll need to specify the upstream column that will be used as an input, and this column must be Unicode text (DT_WSTR) or stream (DT_NTEXT).

Purposefully skipping over the Exclusion tab for just a moment, we’ll review the Advanced tab.  This pane exposes several of the key configuration elements of the term extraction component.  In particular, we can use this page to set the output for noun and/or noun phrase, set the score type to frequency (raw count) or TFIDF (a calculation based both on the frequency of the term and the ratio of the analyzed terms to the entire document), and set the minimum score threshold and maximum term length.  We can also set the case-sensitive flag, in case our analytics need to be case sensitive.

Additionally, the exclusion list (on the Exclusion tab) can help to refine your calculation results.  This feature allows the ETL developer to specify a known list of terms that will be excluded from the output.  This is useful in cases where industry or common language terms might occur so frequently that including them in the output simply dilutes the rest of the data.

Although the term extraction component does not have a lot of moving parts, it does provide a quick way to perform data exploration and simple text analysis.

Conclusion

In this post, I have briefly demonstrated the attributes of the term extraction transformation, an SSIS data flow tool that can be used for data exploration.  This tool is a quick and easy way to perform triage analysis and mining of raw text data.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SSMS: Resource Locking and Object Execution Statistics Report

As a DBA or SQL Developer, the amount of time we work with SQL Server Management can never be measured. As a database person, these hidden gems of SQL Server can never be underestimated. With each release the tool improves to give the next set of features. I have seen in multiple blogs where the Product Managers constantly ask for feedbacks and are looking at making enhancements to the tool. If you goto the connect site, the votes for a feature requests decides if they will ever make it into the next release of SSMS. If you get the drift, knowing the statistics on a given request decides how the feature will get added. These statistics and questions gives product team the confidence to people’s request. Similarly when working with SQL Server objects, it will be useful if we can know how many times a procedure were called and how many CPU, IO cycles were spent.

In this part of blog we would talk about two reports from the Database level standard reports as shown in the diagram.

Resource Locking Statistics by Objects

This report is similar to earlier blocking report which is available at server level. Since this report is database level, it shows information about blocking within the database for which we launched the report. In case of no blocking in the database, the report would launch and show message as below:

Currently, this graph does not have any data to show”. This is shown in the diagram below.

For demonstrate purpose I have created two blocking chains, one for table t1 and another for table t2. And have launched the repot again. If we compare this with earlier report, it’s clearly visible that the warning is not available and as highlighted in the image, we can see non-zero value under “# Waiting Transactions”. Without clicking on (+) it would not be possible to see who is blocked.

Once we expand Object No. 1, below is what we get. We can clearly see that there are 5 locks which are granted and 1 lock on RID is in WAIT status. The SPID is shown at the same level.

The majority of blocking information in this report is picked from DMV sys.dm_tran_locks and sys.dm_exec_sessions. Other information about object metadata is from catalog views – sys.partitions, sys.objects, sys.schemas

Here is the list of possible locks shown in the report: METADATA, DATABASE, FILE, TABLE, HOBT, EXTENT, PAGE, KEY, RID, ALLOCATION_UNIT and APPLICATION.

Object Execution Statistics

This is one of the interesting report which shows information about the execution of objects in the database. If we launch this report for a database which is not used and plans for any object, stored procedure, function etc. is not available in plan cache then we would see “empty” report like below.

There are five graphs available in the report. All of them are taken from the information available in DMV sys.dm_exec_query_stats which contains historical information about query executions since it was started. If plan is evicted from the cache then the report would not show the query which might be a problem.

To show some data in report, I have executed some stored procedures and functions in AdventureWorks2014 sample database. Here is the report when we have plans available in cache. This is what you would see on production server.

You would notice that out of 5 graphs only 3 are shown. That’s due to the fact that they are not significant higher and would be shown as zero.

Under the cover, it runs below query. I have modified query a little bit to show meaningful column name.

SELECT  CASE WHEN sch.name IS NULL THEN '' ELSE sch.name END AS schema_name
,       dense_rank() OVER (ORDER BY s2.objectid) AS SPRank
,       s3.name AS [Obj Name]
,       s3.TYPE AS [Obj Type]
,       (SELECT TOP 1 SUBSTRING(TEXT,(s1.statement_start_offset+2)/2, (CASE WHEN s1.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),TEXT))*2 ELSE s1.statement_end_offset END - s1.statement_start_offset) /2  ) FROM sys.dm_exec_sql_text(s1.sql_handle)) AS [SQL Statement]
,       execution_count
,       plan_generation_num
,       last_execution_time
,       ((total_worker_time+0.0)/execution_count)/1000 AS [avg_worker_time]
,       total_worker_time/1000.0 'total_worker_time'
,       last_worker_time/1000.0 'last_worker_time'
,       min_worker_time/1000.0 'min_worker_time'
,       max_worker_time/1000.0 'max_worker_time'
,       ((total_logical_reads+0.0)/execution_count) AS [avg_logical_reads]
,       total_logical_reads
,       last_logical_reads
,       min_logical_reads
,       max_logical_reads
,       ((total_logical_writes+0.0)/execution_count) AS [avg_logical_writes]
,       total_logical_writes
,       last_logical_writes
,       min_logical_writes
,       max_logical_writes
,       ((total_logical_writes+0.0)/execution_count + (total_logical_reads+0.0)/execution_count) AS [avg_logical_IO]
,       total_logical_writes + total_logical_reads 'Total IO'
,       last_logical_writes +last_logical_reads 'Last IO'
,       min_logical_writes +min_logical_reads 'Min IO'
,       max_logical_writes + max_logical_reads  'MAX IO'
FROM    sys.dm_exec_query_stats s1
CROSS apply sys.dm_exec_sql_text(sql_handle) AS  s2
INNER JOIN sys.objects s3  ON ( s2.objectid = s3.OBJECT_ID )
LEFT OUTER JOIN sys.schemas sch ON(s3.schema_id = sch.schema_id)
WHERE s2.dbid = DB_ID()
ORDER BY  s3.name;

This query would give the results for the database in which it would execute. Notice that there is a filter for db_id()  in where clause. In the graph, we can observed object # assigned which is detailed in the table shown below the graph under heading “All Executable Objects”

If we click on (+) sign near Object No column, we can see more details about individual statement in the object. In below image, I have clicked on the (+) symbols near 2 and we can see statement within the function. You would see more rows if there are more statements in the procedure.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Location of Natively Compiled Stored Procedure and Naming Convention

Yesterday I wrote about SQL SERVER – Beginning In-Memory OLTP with Sample Example. One of the questions I received right after I published a blog post was why do I call stored procedure natively coded stored procedure when the entire code is in T-SQL. Indeed a very good question. The answer is very simple, the reason we call it natively compiled stored procedure as soon as we execute create statement compiler will convert interpreted T-SQL, query plans and expressions into native code.

You can execute the following query in your SSMS and find out the location of the natively compiled stored procedure.

SELECT name,
description
FROM   sys.dm_os_loaded_modules
WHERE description = 'XTP Native DLL'
GO

To see this DMV in action execute the code from this blog post on your SQL Server.

– Create database
CREATE DATABASE InMemory
ON PRIMARY(NAME = InMemoryData,
FILENAME = 'd:\data\InMemoryData.mdf', size=200MB),
-- Memory Optimized Data
FILEGROUP [InMem_FG] CONTAINS MEMORY_OPTIMIZED_DATA(
NAME = [InMemory_InMem_dir],
FILENAME = 'd:\data\InMemory_InMem_dir')
LOG ON (name = [InMem_demo_log], Filename='d:\data\InMemory.ldf', size=100MB)
GO

– Create table
USE InMemory
GO
-- Create a Memeory Optimized Table
CREATE TABLE DummyTable_Mem (ID INT NOT NULL,
Name VARCHAR(100) NOT NULL
CONSTRAINT ID_Clust_DummyTable_Mem PRIMARY KEY NONCLUSTERED HASH (ID) WITH (BUCKET_COUNT=1000000))
WITH (MEMORY_OPTIMIZED=ON)
GO

– Create stored procedure
-- Inserting same 100,000 rows using InMemory Table
CREATE PROCEDURE ImMemory_Insert_test
WITH NATIVE_COMPILATION, SCHEMABINDING,EXECUTE AS OWNER
AS
BEGIN
ATOMIC WITH (TRANSACTION ISOLATION LEVEL=SNAPSHOT, LANGUAGE='english')
DECLARE @counter AS INT = 1
DECLARE @start DATETIME
SELECT
@start = GETDATE()
WHILE (@counter <= 100000)
BEGIN
INSERT INTO
dbo.DummyTable_Mem VALUES(@counter, 'SQLAuthority')
SET @counter = @counter + 1
END
SELECT
DATEDIFF(SECOND, @start, GETDATE() ) [InMemory_Insert in sec]
END
GO

Now let us execute our script as described.

Now we can see in our result, there are two different dll files.  From the image above I have explained various parts of the dll file.

As per the image, our database id is 11 and if we check it is same as what we have created few seconds ago. Similarly the name of the object id can be found as well.

If we open up the folder where we have created this object we will see two sets of file information. One for stored procedure and one for table.

My friend Balmukund explains this concept very well on his blog over here.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Beginning In-Memory OLTP with Sample Example

In-Memory OLTP is a wonderful new feature introduced in SQL Server 2014. My friend Balmukund Lakhani has written amazing series on A-Z of In-Memory on his blog. All serious learner should study it for deep understanding of the same subject. I will try to cover a few of the concepts in simpler word and often you may find me referring Balmukund’s site on this subject.

Why do we need In-Memory?

Here is the paragraph from Balmukund’s blog (published with approval):

Looking at the market trends of tumbling cost of RAM (USD/MB) and performance implication of reading data from memory vs disk, its evident that people would love to keep the data in memory. With this evolution in hardware industry, softwares have to be evolved and modified so that they can take advantage and scale as much as possible. On the other hand, businesses also don’t want to compromise the durability of data – restart would clear RAM, but data should be back in the same state as it was before the failure. To meet hardware trends and durability requirements, SQL Server 2014 has introduced In-Memory OLTP which would solve them in a unique manner.

Before we start on the subject, let us see a few of the reasons, why you want to go for high-performance memory optimized OLTP operation.

  • It naturally integrates with SQL Server relational database
  • It supports Full ACID properties
  • It helps with non-blocking multi-version optimistic concurrency control, in other words, no locks or latches

Well, let us start with a working example. In this example, we will learn a few things – please pay attention to the details.

  1. We will create a database with a file group which will contain memory optimized data
  2. We will create a table with setting memory_optimized set to enabled
  3. We will create a stored procedure which is natively compiled

The procedure of our test is very simple. We will create two stored procedures 1) Regular Stored Procedure 2) Natively Compiled. We will compare the performance of both the SP and see which one performs better.

Let’s Start!

Step 1: Create a database which creates a file group containing memory_optimized_data

CREATE DATABASE InMemory
ON PRIMARY(NAME = InMemoryData,
FILENAME = 'd:\data\InMemoryData.mdf', size=200MB),
-- Memory Optimized Data
FILEGROUP [InMem_FG] CONTAINS MEMORY_OPTIMIZED_DATA(
NAME = [InMemory_InMem_dir],
FILENAME = 'd:\data\InMemory_InMem_dir')
LOG ON (name = [InMem_demo_log], Filename='d:\data\InMemory.ldf', size=100MB)
GO

Step 2: Create two different tables 1) Regular table and 2) Memory Optimized table

USE InMemory
GO
-- Create a Simple Table
CREATE TABLE DummyTable (ID INT NOT NULL PRIMARY KEY,
Name VARCHAR(100) NOT NULL)
GO
-- Create a Memeory Optimized Table
CREATE TABLE DummyTable_Mem (ID INT NOT NULL,
Name VARCHAR(100) NOT NULL
CONSTRAINT ID_Clust_DummyTable_Mem PRIMARY KEY NONCLUSTERED HASH (ID) WITH (BUCKET_COUNT=1000000))
WITH (MEMORY_OPTIMIZED=ON)
GO

Step 3: Create two stored procedures 1) Regular SP and 2) Natively Compiled SP

Stored Procedure – Simple Insert
-- Simple table to insert 100,000 Rows
CREATE PROCEDURE Simple_Insert_test
AS
BEGIN
SET NOCOUNT ON
DECLARE
@counter AS INT = 1
DECLARE @start DATETIME
SELECT
@start = GETDATE()
WHILE (@counter <= 100000)
BEGIN
INSERT INTO
DummyTable VALUES(@counter, 'SQLAuthority')
SET @counter = @counter + 1
END
SELECT
DATEDIFF(SECOND, @start, GETDATE() ) [Simple_Insert in sec]
END
GO

Stored Procedure – InMemory Insert
-- Inserting same 100,000 rows using InMemory Table
CREATE PROCEDURE ImMemory_Insert_test
WITH NATIVE_COMPILATION, SCHEMABINDING,EXECUTE AS OWNER
AS
BEGIN
ATOMIC WITH (TRANSACTION ISOLATION LEVEL=SNAPSHOT, LANGUAGE='english')
DECLARE @counter AS INT = 1
DECLARE @start DATETIME
SELECT
@start = GETDATE()
WHILE (@counter <= 100000)
BEGIN
INSERT INTO
dbo.DummyTable_Mem VALUES(@counter, 'SQLAuthority')
SET @counter = @counter + 1
END
SELECT
DATEDIFF(SECOND, @start, GETDATE() ) [InMemory_Insert in sec]
END
GO

Step 4: Compare the performance of two SPs

Both of the stored procedure measures and print time taken to execute them. Let us execute them and measure the time.

-- Running the test for Insert
EXEC Simple_Insert_test
GO
EXEC ImMemory_Insert_test
GO

Here is the time taken by Simple Insert: 12 seconds

Here is the time taken by InMemory Insert: Nearly 0 second (less than 1 seconds)

Step 5: Clean up!

-- Clean up
USE MASTER
GO
DROP DATABASE InMemory
GO

Analysis of Result

It is very clear that memory In-Memory OLTP improves performance of the query and stored procedure. To implement In-Memory OLTP there are few steps user to have follow with regards to filegroup and table creation. However, the end result is much better in the case of In-Memory OTLP setup.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Google Drive Trick – Google Spreadsheet Formatting Dates to String

I have been using google drive and google spreadsheet for a while. I work with experts around the world and we all use different date format. The biggest confusion which often happens when we are sharing data is a timestamp. For some people it is natural to have dates time like dd/mm/yyyy and for some it is natural to have mm/dd/yyyy. Sometimes we are just confused looking at the date as there is no instruction about the format. Finally, we all decided that we will spell out the dates. For example, if we have date like 11/9/2014 to avoid confusion if it is November 9, 2014 or September 11, 2014 we decided to just write it in the string.

Now here is the problem with Google Spreadsheet – when I type any datetime, like as following, it will immediately convert them to mm/dd/yyyy format and defeat our purpose of spelling that out.

Finally, we figured out the solution how we can preserve the datetime as a string in Google Spreadsheet. Just start your datetime with a single quotes. For example now we enter datetime as follows:

‘September 9, 2014 instead of September 9, 2014. The best part is that, the single quotes are not visible on the screen.

Well, if there was any other way to do this and I was not aware of it, please let me know.

Reference: Pinal Dave (http://blog.sqlauthority.com)