SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Data Row Space Usage and NULL Storage – Day 15 of 35

August 15, 2011

Answer simple quiz at the end of the blog post and –
Every day one winner from India will get Joes 2 Pros Volume 3.
Every day one winner from United States will get Joes 2 Pros Volume 3.

Data Row Space Usage

Most of a table’s space is occupied by its records. Indexes and other properties use a relatively small amount of known space for the table. Suppose your company – or a hiring manager – shows you the design of the SalesInvoiceDetail table and says, “We expect this table to receive an average of 100,000 records per day during the next two years. How much hard drive space should we purchase to handle this expected growth?” You know how many rows will be received in a day and how many days there are in a year. The unknown in this scenario is the amount of space each row will use. If you calculate the amount of space each row needs, you can then answer this resource planning question for the new table.

100,000 rows/day * 365 days = 36,500,000 rows

36,500,000 rows * __ KB/row = ____ KB of storage space needed

Our calculations will similarly focus on data rows. In order to estimate a row’s space consumption, we must know the amount of space each field’s data will use.

There are three key components which contribute to a field’s space consumption:

· The data type
· Whether the data type is fixed or variable
· Whether the field is nullable

Common Data Types

The names and storage amounts per field for many commonly used data types are shown here.

Exact numeric data types
int (integer)                    4 bytes
bigint                           8 bytes
smallint                        2 bytes
tinyint                          1 byte
money                                     8 bytes
smallmoney                 4 bytes
decimal                        5-17 bytes, depending on the number of digits
numeric                       5-17 bytes, depending on the number of digits
bit                                1-8 bit fields use 1 byte; 9-16 bit fields use 2 bytes; etc.

Row Header

Every row has a 4 byte header. This is a positioning header which keeps track of where the row is placed in the table and which fields the row contains.

Update from Paul Randal (Blog | Twitter): The 4 byte row header has nothing to do with the position of the record or which columns are in the record. It contains two bytes (one for index records) that say what kind of record it is, plus the offest of the null bitmap. The nulll bitmap is always present in data records, regardless of whether the columns are nullable or not – unless the table is comprised solely of sparse columns.

Fixed Data

Fixed length data types always occupy the amount of space allotted to them. For example, an int will always use 4 bytes. A char(3) always takes up 3 bytes, even if the field contains just 1 or 2 characters. Fixed length data is predictable and the easiest type of data for SQL Server to manage. Calculations involving fixed data are straightforward. However, variable length data incurs additional overhead.

Variable Block

Every record containing variable length data includes something called a variable block. The first time you create a field with a variable length data type (e.g., varchar, nvarchar), the variable block is created. This block keeps track of the number of variable length data fields within the record and takes up 2 bytes.

The more variable length fields you have, the bigger the variable block grows. Each variable length field adds another 2 bytes to the block. (These 2 bytes keep track of where the data is positioned within the row.) For example, if you have one varchar field in your table, your variable block would contain 4 bytes for each record in the table. If you have two varchar fields, your variable block would contain 6 bytes (2 bytes per field plus 2 bytes to set this up).

Variable Data

Variable length data types do pretty much what the name implies – they expect the data length to vary from row to row. Fields using variable length data types, such as varchar and nvarchar, are typically name or address fields where you aren’t certain how long the data will be.

The advantage these data types offer is that shorter names or addresses can take up less storage space than a fixed data type. For example, if you have a char(100) field to allow for long addresses, then that field always uses 100 bytes no matter how long the data actually is. However, if you know that most of your addresses will consist of 20 characters, you probably would choose a varchar(100) to use less storage space but retain the flexibility to accept addresses up to 100 characters in length.

Varchar data consumes 1 byte per character. Nvarchar data consumes 2 bytes per character, because it is Unicode. For example a varchar(10) field containing the name “Rick” would consume 4 bytes. If it were an nvarchar, it would use 8 bytes.

This is one difference between fixed and varying length data. With fixed data, you can calculate the storage consumption without needing to look at the actual data. However, to precisely calculate how much storage a row or table is actually utilizing, you would need to examine the length of data in each of the variable data fields (the LEN( ) function is shown in the figure below.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_1

Now let’s bring in the design interface for the RoomChart table, which we will use to calculate the actual space consumption for rows in the RoomChart table. We will look at the table design along with our LEN( ) query result so that we have the length measurements for the RoomName field.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_2

Let’s calculate the actual space consumption for Row 1 of the RoomChart table. We will begin with the fixed length data. Each of the four rows contains two fixed length fields. The ID field uses 4 bytes and the field named “Code” uses 3 bytes. Each row also has a 4 byte header. Thus, without looking at the data, we already know each row uses at least 11 bytes.

Header + Fixed Length Fields (ID and Code fields)

[4 bytes + 4 bytes + 3 bytes = 11 bytes]

The final field (RoomName) contains variable length data, so in order to evaluate the space consumption we must: 1) calculate the variable block; and 2) look at the actual variable length data.

Since there is one variable field per row, we must allow 2 bytes for the creation of the variable block. Then we must multiply the number of variable field(s) in the row by 2 bytes.

Variable Block

[2 bytes + (1 field * 2 bytes/field) = 4 bytes]

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

Null Data

One important piece of the storage calculation we haven’t yet considered is the null block. Somewhat like variable length data fields, each record in a table containing nullable field(s) uses a little extra storage space.

Null Block

Each record begins with a standard 4 byte row header. Right after the row header, the first item in the data portion of the record is the fixed data. SQL Server stores together all of the columns containing fixed width data.

If your table contains nullable data, then a null block follows the fixed data and occupies the third space in the physical structure of the record. (Without the null block, the usual order prevails – #1 Row Header, #2 Fixed Data, #3 Variable Block, and #4 Variable Data payload.)

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_3

The null block (also called the null bitmap) is created as soon as a nullable field is created in a table. The null block in each row begins as 2 bytes but may grow as you add more fields to your table.

Next you must count the total number of columns in the table. Add an additional byte to the row’s null block for the first field and another byte for every 8^th field. In other words, if a table has between 1-8 fields, then the null block in each row will be 3 bytes. If the table contains 9-16 fields, then the null block will be 4 bytes per row. If the table contains 17-24 fields, then the null block will be 5 bytes per row, and so forth.

These additional bytes contain an indication for each column’s nullability. In other words, whether the column will allow nulls (e.g., Code, RoomName) or won’t allow nulls (e.g., the ID column in the RoomChart table).

Null Block Storage Allocation

It often surprises people to know that it only takes one nullable field to cause every field in the table to take up 1 bit of extra space in the null block. These additional bits keep track of each column whether it does or doesn’t contain a null. The following diagram illustrates two tables: one in which every column is nullable and one containing only a single nullable column.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_4

Each table contains 10 columns, c1 through c10. Since there are 10 columns and each table contains at least one nullable column, 10 additional bits are needed in the null block. The null block will contain a 2-byte fixed length field and a variable length bitmap of 1 bit per column. Memory is allocated in 8-bit bytes and 10 bits (c1 – c10) crosses into the next byte. The variable length bitmap takes up 2 more bytes, bringing the size to 4 bytes for each tables’ null block.

Note that in table t1, an INSERT statement placed the integer values 1 through 10 in columns c1 through c10, respectively. Since none of these values is null, the null block bitmap contains 0’s for columns c1 through c10. On the right half of this figure, you see these bits are located in byte 1 for columns c1 through c8, and byte 2 for columns c9 and c10. The additional 6 bits of byte 2 in the null block bitmap are not used by any columns. In table t2 we have one null value in the last column. An INSERT statement placed the integer values 1 through 9 in columns c1 through c9, and null in column c10. Columns c1 through c9 are not null, so the null block bitmap contains 0’s (zeros) for those columns. Column c10 does contain null, so its bit in the null block bitmap reflects a 1.

In a table that contains at least one nullable column, each row will contain a null block whose length depends on the number of columns in the table. If a row of such a table contains a null value for a nullable column, its bit in that row’s null block bitmap will be set to 1. Columns which are not null have their bits in the row’s null block bitmap set to 0.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_5

Recall that we calculated the space consumption for Row 1 as 65 bytes.

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

We know a null block is needed, because there are nullable fields in the RoomChart table. There are two nullable fields, Code and RoomName.

Creating the null block uses 2 bytes. Then you must count the total number of columns in the table. Since this table contains 3 columns, 1 byte is added to the null block.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 1:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 50 = 68 bytes

Thus, the full amount of space used by Row 1 of the RoomChart table is 68 bytes. Now let’s recalculate the second row’s space usage including the null block. Recall we calculated the space consumption for Row 2 as 53 bytes.

Actual Data

[Quinault-Experience, 19 unicode chars = 38 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 38 = 53 bytes

Since the null block for each record in the table will be the same size, we know the null block for Row 2 will be the same as Row 1: 3 bytes.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 2:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 38 = 56 bytes

Note: If you want to setup the sample JProCo database on your system you can watch this video. For this post you will want to run the SQLArchChapter4.0Setup.sql script from Volume 3.

Question 15

You have three variable length data fields. What are the rules that go into the calculation of how large the variable block will be (Choose two)?

You will allocate 2 bytes to the creation of the variable block
You will allocate 3 bytes to the creation of the variable block
You will allocate 2 more bytes for each of the three variable fields
You will allocate 1 byte for every eight columns in the table.

Rules:

Please leave your answer in comment section below with correct option, explanation and your country of resident.
Every day one winner will be announced from United States.
Every day one winner will be announced from India.
A valid answer must contain country of residence of answerer.
Please check my facebook page for winners name and correct answer.
Winner from United States will get Joes 2 Pros Volume 3.
Winner from India will get Joes 2 Pros Volume 3.
The contest is open till next blog post shows up at which is next day GTM+2.5.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Joes 2 Pros, SQL Scripts

SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Output Clause in Simple Examples – Day 14 of 35

SQLAuthority News – Pluralsight Giving Away Free Subscription to Quiz Participants

57 Comments. Leave new

JOHN
August 16, 2011 8:08 am
Correct answers are 1 and 3
You will allocate 2 bytes to the creation of the variable block
You will allocate 2 more bytes for each of the three variable fields
USA
Reply
Gopalakrishnan Arthanarisamy
August 16, 2011 8:11 am
Correct answers are # 1 and # 3.
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Because, “Variable Block” requires 2 bytes for the creation and another 2 more bytes for each of the three variable fields to set this up.
Formula
[2 bytes (for creation) + (1 fields * 2 bytes/field) = 4 bytes]
As the calculation is for three variable length data fields, the total number of bytes would be
[2 bytes (for creation) + (3 fields * 2 bytes/field) = 8 bytes] is the answer.
Gopalakrishnan Arthanarisamy
Unisys, Bangalore, India
Reply
gvprabu
August 16, 2011 10:40 am
1) and 3) are the Correct Answers.
— GVPRABU || BANGALORE || INDIA
Reply
vishal shah
August 16, 2011 11:18 am
Correct answers are:
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Country: India
Reply
Shilpa Sharma
August 16, 2011 12:47 pm
Correct option is 1 and 3.
Shilpa
India
Reply
Rajesh Garg
August 16, 2011 12:51 pm
Option 1 and 3 are correct
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Rajesh Garg
India
Reply
Madhivanan
August 16, 2011 1:29 pm
Correct answers are 1 and 3
1.You will allocate 2 bytes to the creation of the variable block.
3.You will allocate 2 more bytes for each of the three variable fields
Country : India
Reply
Krunal Doshi
August 16, 2011 1:46 pm
Correct options are 1 and 3.
INDIA
Reply
Rajesh Mohanrangan
August 16, 2011 5:13 pm
Correct answers are 1 and 3
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
From India
Reply
Eric
August 16, 2011 6:29 pm
The best answers are options 1 and 3.
Eric
USA
Reply
Uday Bhoopalam
August 17, 2011 12:18 am
Correct answers are option 1 & 3
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Reply
Nikhil Mahajan
August 17, 2011 8:43 am
the correct option is –1) and 3)
1) You will allocate 2 bytes to the creation of the variable block
3) You will allocate 2 more bytes for each of the three variable fields
The variable block is takes 2 bytes first time. Each of the variable fields will add 2 bytes each,
india
Reply
Somnath
August 17, 2011 10:51 am
The correct answers are Options-1 and 3
1).You will allocate 2 bytes to the creation of the variable block
3).You will allocate 2 more bytes for each of the three variable fields
Somnath Desai
India
Reply
John
August 31, 2011 6:38 am
I would like to know how to insert a Name with Apostrophe in SQL 2005. I did try this
INSERT INTO Cust
(R_CustName)
VALUES ( ‘My”s”Name’)
when you insert the result is My’s’Name
i don’t need the apostrophe after s. how can i remove it while inserting..
Any idea guys..
Reply
Kevmeister
March 20, 2014 6:04 am
It would be nice to see this article extended to cover BLOB data – varchar(max) and varbinary(max) and how that storage is computed.
Reply