SQL SERVER – Using Raw Files in SSIS – Notes from the Field #070

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Creating the SSIS catalog.

Linchpin People are database coaches and wellness experts for a data driven world. In this 70th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to use raw files in SSIS.


SQL Server Integration Services is well designed for retrieving and processing data on the fly, directly in the data flow pipeline. However, there are circumstances that occasionally require the persistence of result sets in SSIS for use during package execution. For these such cases, one option is to use SSIS raw files.

The raw file in SSIS is a special kind of binary file, optimized for and intended for use only through SSIS. This special file is intended to store temporary result sets on disk to allow reuse of data across multiple packages and multiple executions.

Although these are not tools one would likely use on a daily basis, the raw file tools in SSIS are helpful in a number of scenarios:

  • The same result set is reused several times as part of the different data flow (or even different packages)
  • Due to size or query complexity, retrieving a particular set of reference data takes a significant amount of time
  • Due to business rules, one or more interim result set is required during the ETL process

In cases such as this, using raw files to store result sets can be a handy solution.

To create and use raw files, SSIS comes with two components – a raw file source and a raw file destination – to allow reading from and writing to these files. I’ll start by showing the destination (ordinarily I would start with the source, but since we need to create a raw file with the destination component before we can use the source component, I’m changing the typical demonstration order). As shown below, this example uses a data flow with an OleDB source – connected to AdventureWorks – for sending sales data into a raw file destination.

On the raw file destination, we’ve got a handful of configuration options, including:

  • The type of file name (either direct or from an SSIS variable)
  • A selector to choose the file name or the variable supplying that file name
  • An option to select the write method (create, overwrite, append, etc.)
  • A column pane to allow source-to-target mappings

Note that on the Write Option selection, there are some limitations with respect to whether the specified output file already exists. For example, if you choose the Create Once option, you’ll see a design-time error if the file already exists. Similarly, if you choose either the Append or the Truncate and Append option, you’ll see an error if the file does not yet exist. To make the design-time experience easier, there is a button in the UI of the destination to create the initial raw file in case you want to use either of the Append options without having previously created the file.

The raw file source, the simpler of the two raw file components, has just a couple of configuration options: the type of file name, a selector to choose the file name or the variable supplying that file name, and a data grid to allow the selection of the columns to be included from the raw file.

As shown in use below, the raw file source feeds the data flow pipeline, ultimately sending its data into the OleDB destination.

Final thoughts

One thing you’ll want to keep in mind is that there is no automatic purge process to empty or remove the raw files after use. Therefore, if there is a concern about persisting the files, either because of disk space or data sensitivity, there should be a follow-on process that truncates or deletes the raw files after they are no longer needed. Also, if you want to track how recently the raw file was loaded, you could either check the update stamp of the file (using a script task), or use a date stamp column in the process that loads the raw file to allow storage of the load date in the data within the file.

Conclusion

The raw file source and destination in SSIS can be used to address situations where temporary file system storage of result sets is required in an ETL process. Although not an everyday set of tools, this source and destination are useful tools to know how to use in cases where persisting interim result sets is needed.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Database Stuck in “In Recovery” Mode After Restart

Read this blog and I am sure it will bring some instances in your environments. This is one of the common issues I have observed while working with SQL Server from long time now. In this blog we will discuss little details about the issue and possible action you might take.

Whenever there is a restart of SQL Server, all databases would undergo “Recovery” process. This is the stage where the database has to come back online in a consistent state. There are three sub-phases with-in the process. Discovery, Roll forward and Rollback. The names are pretty self-explanatory. Let me explain for those who are interested to learn in detail:

  • Analysis: This is the phase where SQL Server would go through the LDF file and build the in-memory structures to know how much work is needed in the next two phases.
  • Roll forward (redo): During the shutdown of the database, there might be transactions which are committed but not yet written to the MDF file via checkpoint.
  • Rollback (undo): If there were any transactions which were uncommitted then they have to be rolled back to bring the database to a consistent state.

When we would see database in “InRecovery” state?

  • Restart of SQL Server.
  • Database offline and online.
  • Restore of database from backup.

All of the above would is called “recovery” process of the database and all databases must go through three phases as explained earlier.

What should we do?

The very first thing I always check is ERRORLOG. In Errorlog, we should see the very first message in the database (TestMe is the name of the database):

Starting up database ‘TestMe’.

This means the files are opened and recovery is started. After sometime, you should see phase 1.

Recovery of database ‘TestMe’ (28) is 0% complete (approximately 37 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 3% complete (approximately 36 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.

Once phase 1 is complete, it would go with Phase 2 and 3 as shown below.

Recovery of database ‘TestMe’ (28) is 3% complete (approximately 36 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 0% complete (approximately 142 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 7% complete (approximately 19 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 15% complete (approximately 26 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 21% complete (approximately 25 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 27% complete (approximately 20 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 34% complete (approximately 19 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 41% complete (approximately 16 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 48% complete (approximately 14 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 55% complete (approximately 12 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 62% complete (approximately 10 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 69% complete (approximately 9 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 75% complete (approximately 7 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 82% complete (approximately 5 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 88% complete (approximately 3 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
Recovery of database ‘TestMe’ (28) is 95% complete (approximately 1 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

And once it completes, you should use something similar.

3807 transactions rolled forward in database ‘TestMe’ (28). This is an informational message only. No user action is required.
0 transactions rolled back in database ‘TestMe’ (28). This is an informational message only. No user action is required.
Recovery is writing a checkpoint in database ‘TestMe’ (28). This is an informational message only. No user action is required.
Recovery completed for database TestMe (database ID 28) in 30 second(s) (analysis 1289 ms, redo 29343 ms, undo 72 ms.) This is an informational message only. No user action is required

The text in green color explains the three phases which I explained earlier.

What are the possible causes?

  • Huge size of transaction log file.
  • SQL restarted during a long running transaction.
  • Huge number of VLFs.
  • You might be hitting a bug which is fixed in SQL Server. I have referenced KB below.

List of known issues

If you are running SQL Server 2005, 2008, 2008 R2 or SQL 2012, please make sure you have applied fixes given in below.

http://support.microsoft.com/kb/2455009 (FIX: Slow performance when you recover a database if there are many VLFs inside the transaction log in SQL Server 2005, in SQL Server 2008 or in SQL Server 2008 R2)

http://support.microsoft.com/kb/2524743 (FIX: Recovery takes longer than expected for a database in a SQL Server 2008 or in a SQL Server 2008 R2 environment)

The fixes would help in speeding up the phases of recovery. Hope this blog helps you in a direction to look at some of the SQL Server behavior.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Security Conversations and Notes with a DBA

One of the great things about working in Pluralsight is that I get to meet a lot of people (trust me on this). I am lost for words when it comes to how my day job helps me in achieving some of the coolest things presentations at a number of conferences. The Pluralsight booth is always a place where we see some of the great minds from the industry swing by. So in case you get to a conference and get to see a Pluralsight booth, feel free to swing by, take a moment to talk someone out there, I guarantee your time will be well spent.

In one conferences, I vividly remember a conversation from an enthusiast who was coming from the Banking vertical. He attended all my sessions and made sure he caught me at the booth as I spend most of my time there when not delivering sessions. So it was in-between sessions that this gentleman caught me and started asking me a number of questions around security. The whole discussion took close to an hour but I was glad someone was serious enough in trying to understand these basics.

I was super excited and was talking some of the latest and greatest security capabilities of SQL Server 2014 which I explained in my Plurasight Course. But the individual said they were running on a SQL Server 2008 R2. That got me into a fix because I had to rewind my mind to give him recommendations. I kept thinking hard as I starting giving each of the recommendations.

In this blog, let me recollect some of the conversations I had and pen them down for everyone’s reference and most importantly for my reference.

About User Accounts

SQL Server executes as a set of Windows services. SQL Server is required to run under a domain account to interact with network services, to access domain resources such as file shares, or if it uses linked server connections to other SQL Server instances running on other computers in the domain.
When choosing the service account, consider an account with the least amount of privileges needed to do the job and no more.

Using a domain user that is not a member of the Local Administrator group or of the Domain Administrators group will be the best choice for the SQL Server service account.

Some especially sensitive accounts are detailed below:

  • SQL Server Browser is a name resolution service that provides SQL Server instance connection information to clients. If the SQL Server Browser service account is compromised, an attacker can use the permissions associated with the account to expand their control over the Windows environment. It is recommended that the SQL Server Browser service be run under the NETWORK SERVICE account that has the least privileges.
  • Some other security considerations: do not use LOCAL SYSTEM, LOCAL SERVICE, or NETWORK SERVICE as the service account to run MSFTESQL. This could provide an elevation of privileges for the Full-Text Search service.

Prefer Kerberos over NTLM authentication. Kerberos as the default authentication mode for windows connections to SQL Server is possible if client and server are joined in the same domain, or client and server are in different domains, but these two domains are configured as two-way trusts. Furthermore, Kerberos is available by setting the appropriate SPNs. Verify if SQL Server Name (WSFC solution) or Hostname matches the SQL Server service accounts. If so, this could prevent you from defining an SPN on the correct object, because you will get a User and Computer objects with the same name. Refer to Understanding Kerberos and NTLM authentication in SQL Server Connections for further information on setting Kerberos authentication.

As for securing SQL Server roles and permissions, consider the following:

  • The SA account is a well-known and frequent target of malicious users. Disable the account by using the ALTER LOGIN statement. If this is not an option, consider renaming the account by using the ALTER LOGIN statement, as this action can help protect the account. Refer to disabling SA blog.
  • Restrict the membership of the sysadmin fixed server role to logins that use Windows Authentication, assuring a greater protection of SQL Server.
  • If users without sysadmin rights need to perform certain tasks, consider creating proxy accounts. There are two types of proxy accounts, and each relates to a specific set of tasks: the xp_cmdshell proxy and the SQL Agent job proxy.
  • The configuration required by each type of proxy is different, and the way the authorities they give are used is different. If there is a requirement for users without sysadmin rights to run xp_cmdshell, then the xp_cmdshell proxy must be created with the sp_xp_cmdshell_proxy_account Similarly, if there is a requirement for users without sysadmin rights to own SQL Agent jobs, then one or more SQL Agent job proxy accounts must be created.

Though the one hour of conversations spanned a lot of topics and demonstrations on my laptop using MSA account and more. I think this brain dump of the conversation will become a great reference for me if someone asks me these again. Do let me know if you have read about these before in your interactions? Have you implemented these in your environments?

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Script to Convert Date to Julian Dates

There are many methods to store date values. One of them is Julian date. It is a seven digit number where first four is the year part and the next three digits are the number of days from Jan 01.

If you want to convert this number into a valid datetime value, you can use the following simple method

DECLARE @JULIANDATE INT
SET
@JULIANDATE= 2012146
SELECT DATEADD(DAY,@JULIANDATE%1000-1,DATEADD(YEAR,0,LEFT(@JULIANDATE,4))) AS JULIANDATE

The logic is to create Jan 01 of year part and add last three digits as day part using the DATEADD function

The result is

JULIANDATE
 -----------------------
 2012-05-25 00:00:00.000

Do let me know if you know any other method to convert dates in Julian Dates. I will be happy to publish that on this blog.

Additional note: Not to be confused with the Julian calendar, a Julian date or day number is the number of elapsed days since the beginning of a cycle of 7,980 years invented by Joseph Scaliger in 1583. The purpose of the system is to make it easy to compute an integer difference between one calendar date and another calendar date.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #008 – Is GO T-SQL Statement?

I often get this question in the email – more frequently than I would like to get.

Question: “Is GO a T-SQL statement?”

Answer: “No, GO is not a T-SQL statement, it is a command which is recognized by SSMS or sqlcmd as a signal to send the current batch to SQL Server Engine.”

Well, here are two articles I recommend to read for more information – MSDN and Explanation SQL Command GO.

Here are few things I would like to add related to the GO statement.

  • If recommend a statement terminator semicolon after each statement, however, do not use semicolon after the GO statement or it will give an error.
  • ODBC or OLE DB API does not recognize the GO statement and it will give a syntax error.
  • You can use GO in a next line of T-SQL Statement. If you provide an integer after GO statement, it will execute that statement multiple times.
  • You can only have comments in the same line as a GO statement, any other statement will error.
  • You can change the command GO to any other value in SSMS by going to Tools > Option

Let me know your thoughts about the GO statement.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – FIX: SQL Server Not Starting – Msg 864, Level 16, State 1- Buffer Pool Extension Feature

Recently I got an email from one of the student who attended my SQL Server 2014 Pluralsight Course from last year, he was trying to use Buffer Pool Extension feature and got into some unique trouble. Here is the email from him:

Hi Pinal,
I watched your course about SQL Server 2014 administration new features on PluralSight. It is really a great course for someone like me who is trying to do fast ramp-up on SQL Server 2014.

I was trying to experiment with buffer pool extension feature. I have enabled the buffer pool extension feature on SQL Server Standard Edition and now I am not able to start my SQL Server service. Could you please help me out? This is not a business down situation, but want to understand what went wrong.

Regards,

(Name Hidden)

The very first thing which I asked him was to provide the error log information. Whenever there is a problem with the SQL startup, I always start looking at the Errorlog. Here is what I saw in the SQL Server Error Log.

2015-02-28 00:39:11.63 spid8s      Starting up database 'master'.

2015-02-28 00:39:11.84 spid8s      Attempting to allocate 6560086 BUF for buffer pool extension for a maximum of 6560085 page descriptors.
2015-02-28 00:39:11.84 spid8s      Error: 864, Severity: 16, State: 1.
2015-02-28 00:39:11.84 spid8s      Attempting to allocate 6560086 BUF for buffer pool extension for a maximum of 6560085 page descriptors.

Above ERRORLOG explains that, SQL server is not able to allocate the required buffers into the buffer pool extension file. If you look closely at both numbers, the difference of one buffer. This would mean that the size of buffer pool extension was more than the maximum allowed size. I looked into the documentation about the size limitation and found below link on MSDN.

https://msdn.microsoft.com/en-us/library/dn133176.aspx

(The buffer pool extension size can be up to 32 times the value of max_server_memory for Enterprise editions, and up to 4 times for Standard edition)

I went back and looked into the error log and on the top found below.

Microsoft SQL Server 2014 - 12.0.2430.0 (X64)
Oct 15 2014 16:05:37
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

So it means that the size given for BPE file was going beyond the expected limit of 4 times of max server memory because of standard edition.

So now, we know why the error is occurring and we are unable to start SQL Server. To fix the error we need to start SQL Server with minimum configuration so the BPE is not initialized. We can use startup parameter f. Here are the detailed steps.

  1. Open command prompt. Use “Run As Administrator”, if applicable.
  2. Type net start MSSQLServer /f /mSQLCMD

Since my machine has default instance I am using MSSQLServer. For named instance it would be MSSQL$<InstanceName>

We have given additional parameter called “m” and passed SQLCMD to let SQL Server know that only SQLCMD can connect.

  1. Once SQL is started, connect to SQL via SQLCMD using below on command prompt.
SQLCMD -S(local) -E

Note that you need to provide your own server name. In above example I am connecting to default instance of SQL.

  1. Once connection is made, we need to disable BPE by using below command
ALTER SERVER CONFIGURATION
SET BUFFER POOL EXTENSION OFF

  1. Once it’s turned off, we can stop SQL by net stop command as below
Net Stop MSSQLServer

  1. Now we can start SQL normally and it should work (unless we have some other problem)

To dig further, I have taken a Virtual Machine of SQL Server Standard Edition in Microsoft Azure Cloud. My machine had below configuration:

RAM = 14 GB
Max Server Memory = 12 GB
D drive is SSD for BPE file. Over there I created folder SQLAuthority_BPE

Here are the steps to reproduce the error in SQL Server Standard Edition.

SP_CONFIGURE 'MAX SERVER MEMORY', 12000
GO
RECONFIGURE WITH OVERRIDE
GO
ALTER SERVER CONFIGURATION
SET BUFFER POOL EXTENSION ON (FILENAME = 'D:\SQLAuthority_BPE\BUFFERPOOLEXT.BPE' ,SIZE = 50 GB)
GO
SELECT * FROM SYS.DM_OS_BUFFER_POOL_EXTENSION_CONFIGURATION
GO
SHUTDOWN WITH NOWAIT

Important: Above script would also shutdown SQL Server because I have added shutdown T-SQL command.

Once it was done, I was not able to start SQL without following the steps which I mentioned earlier. After fixing the problem, I configured BPE with 48 GB is size and it worked fine. Here is the message which we can see in ERRORLOG

2015-02-28 09:57:42.110 spid8s       Buffer pool extension "D:\BPE\BUFFERPOOLEXT.BPE" has been initialized successfully with size is 49152 MB.

Hope this would help in understanding the limitation and fixing the error which I have not seen so far.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – ReadOnly Databases and Notes Using Snapshots

This seems to be the third week that I am writing about ReadOnly databases, this blog is in continuation of last weeks notes. When I wrote the notes from last week, one of my blog reader said if it is possible to get the ReadOnly error without making the database into a ReadOnly database. This got me thinking and I couldn’t sleep that day. And a quick call to my friend to get some ideas – the answer was simple – using Snapshots.

I couldn’t control my excitement and this triggered me to write this special case scenario. As we execute the script, we will learn some of the behavior.

Let us start by creating our database which will be used for testing.

CREATE DATABASE [ReadOnlyDB]
CONTAINMENT
= NONE
ON  PRIMARY
( NAME = N'ReadOnlyDB', FILENAME = N'C:\Temp\ReadOnlyDB.mdf' , SIZE = 4024KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N'ReadOnlyDB_log', FILENAME = N'C:\Temp\ReadOnlyDB_log.ldf' , SIZE = 20480KB , FILEGROWTH = 10%)
GO
USE ReadOnlyDB
GO
-- Creating our table
CREATE TABLE tbl_SQLAuth (id INT, Longname CHAR(8000))
GO

Next let us go ahead and create a snapshot database out of our ReadOnlyDB.

-- Create a snapshot on our DB
CREATE DATABASE ReadOnlyDB_SS ON
( NAME = ReadOnlyDB, FILENAME =
'C:\Temp\ReadOnlyDB_data_1800.ss' )
AS SNAPSHOT OF ReadOnlyDB;
GO

Now that our snapshot database (ReadOnlyDB_SS) is created, it is important to know that snapshot databases are created as ReadOnly databases. The behavior is similar to creating ReadOnly Databases. To test this, let us try to insert some data into our Snapshot DB.

USE ReadOnlyDB_SS
GO
INSERT INTO tbl_SQLAuth VALUES (1, 'SQLAuth');
GO

Now we will be presented with the same familiar error (3906):

Msg 3906, Level 16, State 1, Line 25

Failed to update database “ReadOnlyDB_SS” because the database is read-only.

Even though we have not explicitly marked the snapshot database as Readonly we are getting this error. I was curious to know, what will happen if we try to mark the snapshot database as ReadOnly again. The error is self-explanatory.

-- Let us set the Snapshot DB as READ_ONLY
USE MASTER
GO
ALTER DATABASE [ReadOnlyDB_SS] SET READ_ONLY
GO

Msg 5093, Level 16, State 1, Line 31

The operation cannot be performed on a database snapshot.

Msg 5069, Level 16, State 1, Line 31

ALTER DATABASE statement failed.

If that was easy, what will happen if we try to make the snapshot database to read_write? The error is self-explanatory again here:

USE [master]
GO
ALTER DATABASE [ReadOnlyDB_SS] SET  READ_WRITE WITH NO_WAIT
GO

Msg 5093, Level 16, State 1, Line 37

The operation cannot be performed on a database snapshot.

Msg 5069, Level 16, State 1, Line 37

ALTER DATABASE statement failed.

Now that gives us two learnings, we cannot mark a Snapshot database as ReadOnly and nor can we mark a snapshot database as Read_Write enabled.

With these learnings in place, let us do the cleanup. Make sure to drop the Snapshot database before deleting the main database. So our cleanup script for this blog looks:

-- Clean up time
USE MASTER
GO
DROP DATABASE ReadOnlyDB_SS
GO
DROP DATABASE ReadOnlyDB
GO

Reference: Pinal Dave (http://blog.sqlauthority.com)