SQL SERVER – Retrieve Last Inserted Rows from Table – Question with No Answer

Some conversations are very interesting and I would like to bring them to everyone’s attention. Here is one of the conversation happened earlier this week. Let us say the question was asked by Dustin.

Dustin: Can we retrieve last 10 rows from a table?

Pinal: Sure, just get the top rows from the end of the table. For this you will have to order table with Identity Column Descending.

Dustin: I do not have an identity column in table.

Pinal: No problem. Do you have a column which tracks the date and time for the newly inserted row or timestamp?

Dustin: No such thing in my table.

Pinal: Hmmm… That means there is no property which can identify if the rows were inserted last or first.

Dustin: I believe so. Any other thoughts?

Pinal: Well, is there any auditing on the table or any trigger or any other mechanism where you store your data to another table for example with an OUTPUT clause?

Dustin: No, nothing like that. Is there any administrative ways to get the last inserted row?

Pinal: What do you mean?

Dustin: Can I read the logs and know what were the rows inserted?

Pinal: I have personally never tried to read log files and I have no clue how to do that. If you are going do that, I suggest you take backup of your log file and do the experiment on it. There are good chances if you use tool which is not safe you will get your database at very high risk.

Dustin: Do you recommend any tool?

Pinal: No, I do not recommend any tool as of now. If I will find one, I will write a blog post.

Note: Please do not post a comment with third party tools link, I will delete them without notice. I will be doing my research soon and will post a blog about it.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – Performance: Do-it-Yourself Caching with Memcached vs. Automated Caching with SafePeak

We’ve all been reading a lot of articles that suggest adding a caching layer in front of a database is more efficient than hitting the database directly. Among different caching options there is Do-It-Yourself coding, using open source tools like Memcached, or using Velocity (part of the Microsoft AppFabric) or using Ehcache (for Java developers). Alternatively, you can now find automated caching solutions like SafePeak’s dynamic caching for SQL Server.

Why use a caching solution?

Let’s quickly recap the many reasons to use data caching with SQL Server:

  • To improve data access read/write performance
  • To reduce the number queries hitting the of database
  • To reduce database load (CPU, IO, Memory)
  • To improve application page load response time (the user’s actual experience)
  • To distribute repetitive query workloads to separate cache servers
  • To share the benefit of cached data on one server across all other servers.
  • To increase operational efficiency, by scaling more data and more users on smaller and fewer SQL Servers machines

Another benefit of caching is its impact on database scalability. SQL Server is expensive to scale with bigger hardware, which also can trigger increased license fees. Fast Flash storage systems, which can be helpful, are also very expensive. Whereas adding a new web/caching server can be much cheaper (and potentially more effective).

So, now that we know why we want to use caching with SQL SERVER, what’s the best way to go about it? Well, you’ll need to decide what’s best for you. Let’s look at three main SQL Server caching options:

  1. SQL Server Memory and IO caching
  2. Application data caching, a Do It Yourself approach using Memcached/similar APIs
  3. SafePeak automated SQL caching

By the way, I’d also like to learn from your experiences, so I look forward to your comments and thoughts on what I discuss here.

SQL Server: Doesn’t it already have memory cache?

To start with, we need to remember that SQL Server has its own memory cache for objects in the database. When data is retrieved, SQL Server maintains its cache and will (if necessary) pull the row from its memory and not hit the disk.

So if SQL Server has its own cache, what’s the benefit of data caching layers such as Memcached, SafePeak or similar?

The answer is this: SQL Server only caches:

  1. Query plans
  2. Pages from the database files.

SQL Server does NOT cache results from a query.

This is the important distinction.

For example, imagine if you have a complex query which uses some aggregation on a lot of data (e.g.: how many different countries we have in our customer database: SELECT DISTINCT Country from Customers GROUP BY country). SQL Server will scan the WHOLE customer table, but your result-set will only be a few entries long. When you reissue your query, SQL Server will reuse the query plan and will rescan the customer table, (and if you are lucky the pages are still in memory)

When you use application cache, you store your result-sets in Memcached RAM. Then reuse them over and over again without connecting to the database server, thus offloading workloads from your database server.

Do-It-Yourself application data caching using Memcached / similar APIs

Application data caching is quite easy to start. However, changing (or building) code for large applications with effective data caching is challenging for a few reasons:

  1. Correctly scoping required code changes;
  2. Delivering and testing actual code changes required to handle caching of all (or most) SQL queries;
  3. Preserving data integrity: how to invalidate all relevant cache items and to handle all kinds of data change events. And in real-life both READs / WRITEs can be complex involving stored-procedures, triggers, views, table joins etc.; and
  4. Maintaining the additional code layer.

One thing to keep in mind is that because DIY and Memcached approaches need to touch app code, they can’t be used to accelerate any off-the-shelf software that relies on SQL Server.

So, while it can be complicated, application performance benefits are always good, so let’s dive into what you would do using Memcached.

About Memcached data caching
Memcached is a big hash table: a key/value store that lives entirely in RAM on multiple servers. It allows you to use RAM from multiple servers as single memory caching space.

Basic data caching using Memcached

Below is a basic tutorial showing (via pseudocode) how you can get started with integrating Memcached into your application. If you’re an application developer, it isn’t something you just “turn on” and then your site goes faster. You have to pay attention. For every query you want to be cached, you have to put (set()) it into cache, get it from cache (get()) and (probably most complicated) make sure you keep the data in the cache correct by removing from cache (delete()) when the data is updated.

The following pseudocode of Memcached example is written in #Perl but can be done in same way in most other languages, including .net, java and phpetc (memcached clients)

Initializing a Memcached Client with a list of your pre-configured Memcached servers:

# perl example
my $memclient = Cache::Memcached->new({ servers => [ '10.0.0.10:11211', '10.0.0.11:11211' ]});

Wrapping an SQL Query
Memcached is famous for reducing load on SQL databases. Here is some pseudocode showing how to wrap a database query access with a memcached caching layer, by implementing check-in-cache (function get()) and place-into-cache (function set()):

# Define a query and use it as a key for the Memcached:
sql = "SELECT * FROM user WHERE user_id = ?"
key = 'SQL:' . user_id . ':' . md5sum(sql)
# We check if the value is 'defined' (or in cache), since '0' or 'FALSE' can be legitimate values!
if (defined result = memcli:get(key)) {
        return result
} else {
        # Query is not in Cache, Get query resultset from your database server and convert it to resultset array
        handler = run_sql(sql, user_id)
        rows_array = handler:turn_into_an_array

        # Cache it for five minutes
        memcli:set(key, rows_array, 5 * 60)
        return rows_array
}

Notice that the SQL query result-set was entered to cache with five-minute expiration time, as an example.

Stale data in cache and invalidation of cached items
Unless you actively invalidate the cached item, when a user (or your database) makes a change, it can take up to five minutes (or more, depending on cache item expiration time) for users to see the correct new data. This is a key issue when implementing a Do-It-Yourself caching strategy.

When a user comes along and edits data, you can keep the cache in sync in two main ways:

  1. You can update the item in cache; or
  2. Delete the old item.

Expiration
The most basic invalidation happens by defining the Expiration period. Even if you‘re actively deleting or overwriting cached data, you’ll still want to have the cache expire occasionally.

 # Cache it for five minutes
  memcli:set(key, rows_array, 5 * 60)

Expiration times can be set from 0, meaning “never expire”, to 30 days. Any time higher than 30 days is interpreted as a timestamp date.

delete()
The simplest method of invalidation is to simply delete an entry and have your website re-cache the data the next time it’s fetched. For example, when a user updates her bio, you want her to see her latest info when she reloads the page:

 sql = "SELECT * FROM user WHERE user_id = ?"
 key = 'SQL:' . user_id . ':' . md5sum(sql)

 memcli:delete(key)

The next time this query will be requested it will be fetched from the database and repopulate the cache.

set()
A more efficient idea is to actively update your cache as your data changes. When our user updates her bio, take the bio object and move it into the cache via ‘set’. You can pass the new data into the same routine that normally checks for data, or however you want to structure it.

Summary about DYI application data caching

Data caching is considered a key component of a high performance and scalable application architecture. Fast page load response time is a one of the key metrics for user satisfaction. Memcached and similar distributed caching technologies are used today in most (if not all) well-known big websites (Linkedin, Facebook others). The challenge is doing all the extra work custom DIY caching represents for a development team to implement, test, and maintain.

If you think I’m painting a dark picture for Do-It-Yourself, home-grown caching, you’re right. Why? Because nowadays there are alternatives to use a commercial smart caching layer. The cost to purchase third-party caching software is much more accurately defined, than the unknown cost of developing caching code yourself.

But, you need to decide for yourself if that’s the best option for your situation.

SafePeak’s “Automated Dynamic Caching”

SafePeak is pre-packaged software that is specifically designed to cache SQL Server queries. It automates many of the steps that developers would try to do manually with Memcached. And it doesn’t require any changes to application code. This makes it quick to deploy and useful to accelerate the performance of both 3rd-party applications (like SharePoint, CRM Dynamics) and your own custom applications.

In my opinion, this makes Safepeak’s approach much easier for software and database architects and developers to use: you can literally “plug-in” SafePeak to production SQL Server systems. All queries are analyzed and managed by SafePeak automatically: their logic and dependent objects (like tables, views, triggers, procedures, etc.) are identified and analyzed, write commands are monitored and cache invalidation applied in real-time.

And its app acceleration encompasses all of an app’s queries and procedures traffic immediately on deployment.

SafePeak’s self-learning algorithms manage most of configuration; the rest is done via a dashboard tool that combines cache management with real-time monitoring and statistics. By analyzing both the database schemas and the SQL traffic, SafePeak creates optimized caching patterns specific to your applications and improves them over time. SafePeak cache supports read-write database transactions and is dynamically kept in sync with any data changes as they happen.

Here’s what the SafePeak dashboard looks like (screenshot from SafePeak’s whitepaper):

What Next?

You have to decide what approach or technology is best for your situation.

An effective DYI caching strategy, using libraries like Memcached or Velocity, has some advantages (better granular cache invalidation control, ability to store additional types of data besides database row-sets and others). But it can be tricky to accurately scope the work needed, hard to code correctly and comes at a real cost of time and focus away from developing application capabilities.

SafePeak offers an easier way to accelerate SQL Server based applications, with all caching benefits without any app re-programming. In a way, SafePeak’s design reminds me of a combination of Memcached caching and an In-Memory database, but enhanced with machine learning intelligence.

Caching technology is designed to maximize operational efficiency of your existing infrastructure, so that you can scale more data and more workloads.

I like SafePeak’s “plug-and-play” approach as it gets results fastest and with minimum risk. If you’d like to check it out, they offer a free trial (www.safepeak.com/download) that includes one-on-one technical training & assistance support.

Now I’d like to hear from you. Have you deployed a SQL caching strategy? What have you done? How has it worked out?

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – Pattern Matching Comparison Using Regular Expressions with REGEXP

MySQL supports pattern matching comparison using regular expressions with REGEXP keyword. In this post we will see how to use REGEXP in MySQL. We can use the patterns ^, $, | and sqaure braces to effectively match the values.

Let us create the following table:

CREATE TABLE items(item_id INT, item_description VARCHAR(100));
INSERT INTO items VALUES (1,'Television');
INSERT INTO items VALUES (2,'Mobile');
INSERT INTO items VALUES (3,'laptop');
INSERT INTO items VALUES (4,'Cables');
INSERT INTO items VALUES (5,'Camera');
INSERT INTO items VALUES (6,'jewels');
INSERT INTO items VALUES (7,'shirt');
INSERT INTO items VALUES (8,'Cup');
INSERT INTO items VALUES (9,'Pen');
INSERT INTO items VALUES (10,'Pencil');

1 Find out item_description that starts with c
SELECT item_description FROM items
WHERE item_description regexp '^c';

Result :

Item_description
 Cables
 Camera
 Cup

2 Find out item_description that ends with s
SELECT item_description FROM items
WHERE item_description regexp 's$';

Result :

Item_description
 Cables
 jewels

3 Find out item_description that starts with c or ends with s
SELECT item_description FROM items
WHERE item_description regexp '^c|s$';

Result :

Item_description
 Cables
 Camera
 jewels
 Cup

4 Find out item_description that contains the alphabet a
SELECT item_description FROM items
WHERE item_description regexp '[a]';

Result :

Item_description
 laptop
 Cables
 Camera

5 Find out item_description that contains the alphabet c or p

SELECT item_description FROM items
WHERE item_description regexp '^[cp]';

Result :

Item_description
 Cables
 Camera
 Cup
 Pen
 Pencil

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is the Difference Between Latches and Locks

A very common question I often get in the email is

“What is the difference between latches and locks?”

The answer to this question is not an easy one. Here is a quick answer from the MS white paper.

A page in SQL Server is 8KB and can store multiple rows. To increase concurrency and performance, buffer latches are held only for the duration of the physical operation on the page, unlike locks which are held for the duration of the logical transaction. Latches are internal to the SQL engine and are used to provide memory consistency, whereas locks are used by SQL Server to provide logical transactional consistency.

Above answer is taken from the old, but extremely relevant white paper written in the year 2011. It was written for SQL Server 2008 R2, however, almost all the concepts from this Whitepaper are still very accurate.

If you want to learn more about Latches and how to diagnose and resolve problems around latch contention, I strongly suggest following whitepaper.

Diagnosing and Resolving Latch Contention on SQL Server

Let me know your thoughts about this subject.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Find Port SQL Server is Listening – Port SQL Server is Running

Here is one of the very common interview questions I often ask people -

Q: What is the default Port SQL Server running on?

A: 1433 for TCP/IP and 1434 for USD connection.

I always get the correct answer for this question. However, when I ask a follow up question, so far I have yet not got a single correct answer.

Q: Great, now assumes that there is more than one instance of SQL Server running on the same server. What will be the port of the second SQL Server running on the server as the default port is already used by default instant?

A: (I have yet to get the correct answer for this one in interview).

Because of the same reason, I have decided to blog about this.

Here are different ways one can figure out on which port the second instance of SQL Server is running.

Method 1: using xp_readerrorlog

Execute following stored procedure on the instance where you want to find out port on which SQL Server is running.

USE MASTER
GO
xp_readerrorlog 0, 1, N'Server is listening on'
GO

The above query will return results something very similar to the following:

Method 2: Configuration Properties

Go to Windows >> Start >> SQL Server Program Folder >> SQL Server Configuration Manager

Now go to SQL Server Network Configurations >> Protocols for Your Server and go to TCP/IP and right click over it.

Now over here when you scroll down you will notice Port Details.

It is that easy to find the port of the second instance of SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Error – Resolution – Could not allocate space for object in database because the filegroup is full

Just a day ago on my local box I received following error.

Could not allocate space for object ‘<object_name>’ in database ‘<database_name>’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

I was a bit surprised as this was not a production server or even real server. It was my laptop with SQL Server installed in it for testing and presentation purpose. I was a bit shocked and decided to figure out what is wrong with my SQL Server. Just like everyone I searched online and I end up having few solutions, however, after a while I realized then what exactly I was missing. However, I encourage all of you to read various steps I took to reach to a solution to this problem.

Solution / Fix :

Step 1: I checked if my Hard Drive has more space available.

If your Hard Drive is full, please empty out any unnecessary stuff. This may not directly fix your problem, but it is a necessary exercise to do before you do Step 2.

Step 2: Delete unnecessary object from your file group.

If your primary filegroup (in most cases) or filegroup which has generated this error is restricted to fix size, you may get this error. You have two workarounds here.

Step 2a: Delete objects from your filegroup which you do not need. You can move objects to another filegroup if you have multiple filegroups.

Step 2b: Increase the fixed size of your primary filegroup.

Step 3: Check your SQL Server version.

SQL Server Express version 2008 and earlier had the limitation of 4GB maximum Database size. SQL Server Express 2008 R2 and onwards have limitation 10GB maximum database size.

In my case, I was running SQL Server Express version, hence I had faced above error. I moved my database to standard edition and everything went fine.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SSIS Data Flow Troubleshooting – Part1 – Notes from the Field #019

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

In this episode of the Notes from the Field series I asked SSIS Expert Andy Leonard a very crucial question – How to troubleshoot SSIS data flow? It is such a complicated problem that everyone thinks they know, but not everyone can resolve this issue. Any Leonard is a world renowned expert, explains in this blog post how to troubleshoot SSIS data flow.


SQL Server Integration Services (SSIS) is designed to move data. Several SSIS tasks can move data but the Data Flow Task is arguably used most to accomplish moving data. When bad things happen to good developers, it helps to know how to troubleshoot. Writing about all the ways to troubleshoot an SSIS Data Flow Task will take more than one post. So this is Part 1.

Building a Quick Demo Package

If you are interested in working through some demos with me, create a new SSIS project named DataFlowTests. Add a Data Flow Task to the Control Flow as shown in Figure 1:

You can use any source data you desire, but I am going to use a file containing weather data captured from my weather station here in Farmville, Virginia beginning in December 2008. You can obtain this data here. Click the Data Flow tab to edit the Data Flow Task. Add a Flat File Connection Manager configured to consume the sensor1-all.csv file containing the December 2008 weather data. Add a Flat File Source adapter and link it to the Flat File Connection Manager. You Data Flow should appear as shown in Figure 2:

To test the Flat File Source adapter, we need to connect an output to another component. Add a Union All transformation and connect the output of the Flat File Source to it, as shown in Figure 3:

You can now test-execute the SSIS package. You should see results similar to those shown in Figure 4:

This data flow isn’t doing very much. The Flat File Source adapter is coupled to the Flat File Connection Manager. The Flat File Connection Manager reads data from the sensor1.all.csv file you downloaded from andyweather.com. Weather data is read into the data flow from the file via the Flat File Connection Manager and the Flat File Source adapter. The 106 rows of weather data are then passed into the Data Flow Path connecting the Flat File Source adapter output to the first Union All Input (Union All Input 1).

Insert a Warning

Let’s create an issue that will raise a warning. Open the Flat File Connection Manager Editor, click on the Advanced page, select the Min T column, and change its DataType property to “four-byte signed integer [DT_I4] as shown in Figure 5:

When you close the Flat File Connection Manager Editor, a warning icon displays on the Flat File Source adapter. If you hover over the Flat File Source adapter with your mouse, a tooltip will display some of the warning text as shown in Figure 6:

You can see more of the Warning if you execute the package and the view the Progress (if the package is running the SSIS Debugger) or Execution Results tab (if the package has been executed in the Debugger and then the Debugger has been stopped), as shown in Figure 7:

The warning we injected by changing the DataType property of the Min T column in the Flat File Connection Manager does not cause an error. The warning stems from the fact that there is a difference between the data type of the Min T column flowing into the Flat File Source adapter from the Flat File Connection Manager and the data type of the Min T column flowing out of the Flat File Source adapter. You can correct this warning by opening the Flat File Source adapter and clicking the Columns page. When opening the Editor (or Advanced Editor) of the Flat File Source adapter, you will be prompted as shown in Figure 8:

Click the “Yes” button to synchronize the Min T column’s data type in the Flat File Source adapter’s output (the metadata of the “output column”) with the Min T column’s data type supplied from the Flat File Connection Manager (the metadata of the “external column”).

Insert an Error

One way to generate an error in our simple data flow task is to configure the Flat File Connection Manager to perform an impossible data type coercion. Every value in the Min HI column contains “—“. Let’s configure the Min HI column in the Flat File Connection Manager and the Flat File Source adapter as data type Integer.

First, open the Flat File Connection Manager Editor and change the Min HI DataType property to DT_I4 as shown in Figure 9:

When you click the “OK” button, we see the same warning displayed in Figure 6 – the external columns are out of synchronization with the data source columns. Right-click the Flat File Source adapter and click Show Advanced Editor as shown in Figure 10:

When you attempt to open the Advanced Editor, you will be prompted to fix the data type mismatch between the output columns and the external columns. Click the “No” button to proceed, as shown in Figure 11:

When the Advanced Editor for the Flat File Source adapter opens, click the Input and Output Properties tab. Expand the Flat File Source Output node, and then expand the Output Columns node. Click on the Min HI column and change the DataType property to “four-byte signed integer [DT_I4] as shown in Figure 12:

Click the “OK” button to close the Advanced Editor. Note the Union All now displays a validation error as shown in Figure 13:

What has happened? The tooltip reveals an issue with a metadata mismatch, but the error is truncated. We can view the full error message in the Error List by clicking ViewàError List as shown in Figure 14:

Note: The Error List option is missing from the View menu in some versions of SSDT-BI. In those environments, use the keystrokes. Hold down Ctrl and press \ followed by E.

The Error List window displays the full text of errors, warnings, and other informational messages, and is shown in Figure 15:

There are a couple ways to fix this error. The easy way is to delete the Union All and add a new Union All transformation in its place (the hard way is to edit the package’s XML).

Conclusion

We are going to stop here. We have introduced an error in the Flat File Source Adapter, though. And we will begin there in Part 2.

If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)