SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Data Row Space Usage and NULL Storage – Day 15 of 35

August 15, 2011

Answer simple quiz at the end of the blog post and –
Every day one winner from India will get Joes 2 Pros Volume 3.
Every day one winner from United States will get Joes 2 Pros Volume 3.

Data Row Space Usage

Most of a table’s space is occupied by its records. Indexes and other properties use a relatively small amount of known space for the table. Suppose your company – or a hiring manager – shows you the design of the SalesInvoiceDetail table and says, “We expect this table to receive an average of 100,000 records per day during the next two years. How much hard drive space should we purchase to handle this expected growth?” You know how many rows will be received in a day and how many days there are in a year. The unknown in this scenario is the amount of space each row will use. If you calculate the amount of space each row needs, you can then answer this resource planning question for the new table.

100,000 rows/day * 365 days = 36,500,000 rows

36,500,000 rows * __ KB/row = ____ KB of storage space needed

Our calculations will similarly focus on data rows. In order to estimate a row’s space consumption, we must know the amount of space each field’s data will use.

There are three key components which contribute to a field’s space consumption:

· The data type
· Whether the data type is fixed or variable
· Whether the field is nullable

Common Data Types

The names and storage amounts per field for many commonly used data types are shown here.

Exact numeric data types
int (integer)                    4 bytes
bigint                           8 bytes
smallint                        2 bytes
tinyint                          1 byte
money                                     8 bytes
smallmoney                 4 bytes
decimal                        5-17 bytes, depending on the number of digits
numeric                       5-17 bytes, depending on the number of digits
bit                                1-8 bit fields use 1 byte; 9-16 bit fields use 2 bytes; etc.

Row Header

Every row has a 4 byte header. This is a positioning header which keeps track of where the row is placed in the table and which fields the row contains.

Update from Paul Randal (Blog | Twitter): The 4 byte row header has nothing to do with the position of the record or which columns are in the record. It contains two bytes (one for index records) that say what kind of record it is, plus the offest of the null bitmap. The nulll bitmap is always present in data records, regardless of whether the columns are nullable or not – unless the table is comprised solely of sparse columns.

Fixed Data

Fixed length data types always occupy the amount of space allotted to them. For example, an int will always use 4 bytes. A char(3) always takes up 3 bytes, even if the field contains just 1 or 2 characters. Fixed length data is predictable and the easiest type of data for SQL Server to manage. Calculations involving fixed data are straightforward. However, variable length data incurs additional overhead.

Variable Block

Every record containing variable length data includes something called a variable block. The first time you create a field with a variable length data type (e.g., varchar, nvarchar), the variable block is created. This block keeps track of the number of variable length data fields within the record and takes up 2 bytes.

The more variable length fields you have, the bigger the variable block grows. Each variable length field adds another 2 bytes to the block. (These 2 bytes keep track of where the data is positioned within the row.) For example, if you have one varchar field in your table, your variable block would contain 4 bytes for each record in the table. If you have two varchar fields, your variable block would contain 6 bytes (2 bytes per field plus 2 bytes to set this up).

Variable Data

Variable length data types do pretty much what the name implies – they expect the data length to vary from row to row. Fields using variable length data types, such as varchar and nvarchar, are typically name or address fields where you aren’t certain how long the data will be.

The advantage these data types offer is that shorter names or addresses can take up less storage space than a fixed data type. For example, if you have a char(100) field to allow for long addresses, then that field always uses 100 bytes no matter how long the data actually is. However, if you know that most of your addresses will consist of 20 characters, you probably would choose a varchar(100) to use less storage space but retain the flexibility to accept addresses up to 100 characters in length.

Varchar data consumes 1 byte per character. Nvarchar data consumes 2 bytes per character, because it is Unicode. For example a varchar(10) field containing the name “Rick” would consume 4 bytes. If it were an nvarchar, it would use 8 bytes.

This is one difference between fixed and varying length data. With fixed data, you can calculate the storage consumption without needing to look at the actual data. However, to precisely calculate how much storage a row or table is actually utilizing, you would need to examine the length of data in each of the variable data fields (the LEN( ) function is shown in the figure below.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_1

Now let’s bring in the design interface for the RoomChart table, which we will use to calculate the actual space consumption for rows in the RoomChart table. We will look at the table design along with our LEN( ) query result so that we have the length measurements for the RoomName field.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_2

Let’s calculate the actual space consumption for Row 1 of the RoomChart table. We will begin with the fixed length data. Each of the four rows contains two fixed length fields. The ID field uses 4 bytes and the field named “Code” uses 3 bytes. Each row also has a 4 byte header. Thus, without looking at the data, we already know each row uses at least 11 bytes.

Header + Fixed Length Fields (ID and Code fields)

[4 bytes + 4 bytes + 3 bytes = 11 bytes]

The final field (RoomName) contains variable length data, so in order to evaluate the space consumption we must: 1) calculate the variable block; and 2) look at the actual variable length data.

Since there is one variable field per row, we must allow 2 bytes for the creation of the variable block. Then we must multiply the number of variable field(s) in the row by 2 bytes.

Variable Block

[2 bytes + (1 field * 2 bytes/field) = 4 bytes]

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

Null Data

One important piece of the storage calculation we haven’t yet considered is the null block. Somewhat like variable length data fields, each record in a table containing nullable field(s) uses a little extra storage space.

Null Block

Each record begins with a standard 4 byte row header. Right after the row header, the first item in the data portion of the record is the fixed data. SQL Server stores together all of the columns containing fixed width data.

If your table contains nullable data, then a null block follows the fixed data and occupies the third space in the physical structure of the record. (Without the null block, the usual order prevails – #1 Row Header, #2 Fixed Data, #3 Variable Block, and #4 Variable Data payload.)

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_3

The null block (also called the null bitmap) is created as soon as a nullable field is created in a table. The null block in each row begins as 2 bytes but may grow as you add more fields to your table.

Next you must count the total number of columns in the table. Add an additional byte to the row’s null block for the first field and another byte for every 8^th field. In other words, if a table has between 1-8 fields, then the null block in each row will be 3 bytes. If the table contains 9-16 fields, then the null block will be 4 bytes per row. If the table contains 17-24 fields, then the null block will be 5 bytes per row, and so forth.

These additional bytes contain an indication for each column’s nullability. In other words, whether the column will allow nulls (e.g., Code, RoomName) or won’t allow nulls (e.g., the ID column in the RoomChart table).

Null Block Storage Allocation

It often surprises people to know that it only takes one nullable field to cause every field in the table to take up 1 bit of extra space in the null block. These additional bits keep track of each column whether it does or doesn’t contain a null. The following diagram illustrates two tables: one in which every column is nullable and one containing only a single nullable column.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_4

Each table contains 10 columns, c1 through c10. Since there are 10 columns and each table contains at least one nullable column, 10 additional bits are needed in the null block. The null block will contain a 2-byte fixed length field and a variable length bitmap of 1 bit per column. Memory is allocated in 8-bit bytes and 10 bits (c1 – c10) crosses into the next byte. The variable length bitmap takes up 2 more bytes, bringing the size to 4 bytes for each tables’ null block.

Note that in table t1, an INSERT statement placed the integer values 1 through 10 in columns c1 through c10, respectively. Since none of these values is null, the null block bitmap contains 0’s for columns c1 through c10. On the right half of this figure, you see these bits are located in byte 1 for columns c1 through c8, and byte 2 for columns c9 and c10. The additional 6 bits of byte 2 in the null block bitmap are not used by any columns. In table t2 we have one null value in the last column. An INSERT statement placed the integer values 1 through 9 in columns c1 through c9, and null in column c10. Columns c1 through c9 are not null, so the null block bitmap contains 0’s (zeros) for those columns. Column c10 does contain null, so its bit in the null block bitmap reflects a 1.

In a table that contains at least one nullable column, each row will contain a null block whose length depends on the number of columns in the table. If a row of such a table contains a null value for a nullable column, its bit in that row’s null block bitmap will be set to 1. Columns which are not null have their bits in the row’s null block bitmap set to 0.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_5

Recall that we calculated the space consumption for Row 1 as 65 bytes.

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

We know a null block is needed, because there are nullable fields in the RoomChart table. There are two nullable fields, Code and RoomName.

Creating the null block uses 2 bytes. Then you must count the total number of columns in the table. Since this table contains 3 columns, 1 byte is added to the null block.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 1:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 50 = 68 bytes

Thus, the full amount of space used by Row 1 of the RoomChart table is 68 bytes. Now let’s recalculate the second row’s space usage including the null block. Recall we calculated the space consumption for Row 2 as 53 bytes.

Actual Data

[Quinault-Experience, 19 unicode chars = 38 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 38 = 53 bytes

Since the null block for each record in the table will be the same size, we know the null block for Row 2 will be the same as Row 1: 3 bytes.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 2:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 38 = 56 bytes

Note: If you want to setup the sample JProCo database on your system you can watch this video. For this post you will want to run the SQLArchChapter4.0Setup.sql script from Volume 3.

Question 15

You have three variable length data fields. What are the rules that go into the calculation of how large the variable block will be (Choose two)?

You will allocate 2 bytes to the creation of the variable block
You will allocate 3 bytes to the creation of the variable block
You will allocate 2 more bytes for each of the three variable fields
You will allocate 1 byte for every eight columns in the table.

Rules:

Please leave your answer in comment section below with correct option, explanation and your country of resident.
Every day one winner will be announced from United States.
Every day one winner will be announced from India.
A valid answer must contain country of residence of answerer.
Please check my facebook page for winners name and correct answer.
Winner from United States will get Joes 2 Pros Volume 3.
Winner from India will get Joes 2 Pros Volume 3.
The contest is open till next blog post shows up at which is next day GTM+2.5.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Joes 2 Pros, SQL Scripts

SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Output Clause in Simple Examples – Day 14 of 35

SQLAuthority News – Pluralsight Giving Away Free Subscription to Quiz Participants

57 Comments. Leave new

Bill Pepping
August 15, 2011 5:51 pm
Hi Pinal,
Challenge:
Question 15
You have three variable length data fields. What are the rules that go into the calculation of how large the
variable block will be (Choose two)?
1.You will allocate 2 bytes to the creation of the variable block
2.You will allocate 3 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
4.You will allocate 1 byte for every eight columns in the table.
Correct Answer:
The correct answer is #1 and #3.
Explanation:
If you have any variable length fields, a Variable Block is created, consisting of 2 bytes initially. For each variable length field, 2 additional bytes are used. In this example, 3 variable length fields will result in a Variable Block of 8 byes.
#2 is incorrect. You need to allocate 2 bytes for the creation of the Variable Block, not 3.
#4 is referring to the algorithm for calculating how many bytes are needed in the Null Block, besides the initial 2 bytes.
Country:
United States
Thanks for the knowledge!
Regards,
Bill Pepping
Reply
Don
August 15, 2011 5:53 pm
To obtain the correct answer, the calculation must take into account both 1 and 3:
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Country: United States
Reply
Deb
August 15, 2011 6:33 pm
Answer is #1 & #3
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
per your article that states “Since there is one variable field per row, we must allow 2 bytes for the creation of the variable block. Then we must multiply the number of variable field(s) in the row by 2 bytes.”
Deb- USA
Reply
Partha Pratim Dinda
August 15, 2011 6:39 pm
Ans is 1 and 3
1) You will allocate 2 bytes to the creation of the variable block
3) You will allocate 2 more bytes for each of the three variable fields
Creation of the variable block will take up 2 bytes, and every variable block will subsequently add 2 bytes to this amount.
2 bytes + (1 field * 2 bytes/field) = 4 bytes
2 bytes + (3 variable-length-fields * 2 bytes per variable-length-field) = 8 bytes
Partha
India
Reply
Diljeet Kumari
August 15, 2011 6:45 pm
The Correct Options for the question “You have three variable length data fields. What are the rules that go into the calculation of how large the variable block will be” is
Option 1 ) and Option 3)
1) You will allocate 2 bytes to the creation of the variable block
AND
3) You will allocate 2 more bytes for each of the three variable fields
Why 1 and thee are correct
We have any variable length fields, a Variable Block is created, consisting of 2 bytes starting. Now for every variable length field, 2 additional bytes are used. In this query, 3 variable length fields will result in a Variable Block of 8 byes.
why Others are wrong
Option 2) is invalid because we require to allocate 2 bytes for the creation of the Variable Block but not 3.
Option 4) This Option is referring to the algorithm for calculating how many bytes are needed in the Null Block along with the initial 2 bytes.
Diljeet Kumari
Country : India
Reply
manasdash
August 15, 2011 6:45 pm
1 & 3 are answers
Reply
ramdas
August 15, 2011 6:50 pm
Option 1 and Option 3 are correct.
2 bytes for variable block.
2 more bytes for each of the 3 variable fields.
Reply
Sean Wilsen
August 15, 2011 7:29 pm
Correct answers are #1 and #3.
USA
Reply
Jagdish Prajapati
August 15, 2011 7:35 pm
correct ans is 1 and 3.
INDIA
Reply
Sale A. A.
August 15, 2011 9:32 pm
Option 1 and 3 are correct
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
These two options will provide the desire result.
(Sale, Nigeria)
Reply
Peter Spencer
August 15, 2011 10:28 pm
1 and 3.
The variable block will be a total of 8 bytes, for three variable length columns. 2 bytes to create the block, plus 2 bytes for each variable length column.
Reply
DiveSh Singhvi
August 15, 2011 10:40 pm
The correct answers are Options-1 and 3
1).You will allocate 2 bytes to the creation of the variable block.
3).You will allocate 2 more bytes for each of the three variable fields.
Divesh
INDIA
Reply
kkmjssate
August 15, 2011 10:46 pm
The correct answers are #1 and #3
You will allocate 2 bytes to the creation of the variable block
You will allocate 2 more bytes for each of the three variable fields
Reply
Gordon Kane
August 15, 2011 10:46 pm
1.You will allocate 2 bytes to the creation of the variable block
3.You will allocate 2 more bytes for each of the three variable fields
Gordon Kane
Allen TX
USA
Reply
Chetan
August 15, 2011 10:51 pm
1. You will allocate 2 bytes to the creation of the variable block.
3 .You will allocate 2 more bytes for each of the three variable fields
Chetan – USA
Reply
kkmjssate
August 15, 2011 10:51 pm
The correct answers are #1 and #3
You will allocate 2 bytes to the creation of the variable block
You will allocate 2 more bytes for each of the three variable fields
but pinal how we see practically clearly
kkmishra
India
Reply
Basavaraj
August 15, 2011 11:17 pm
Correct Answer is: 1 and 3
Thanks
Basavaraj
India
Reply
dilipkumarjena
August 15, 2011 11:29 pm
Hi Pinal Sir,
The Correct Answer for the above question is Options 1 and Option 3.
option1)You will allocate 2 bytes to the creation of the variable block + ” and ” +
Option 3) You will allocate 2 more bytes for each of the three variable fields
Explanation Why Option #1 and #3 combined are correct
We require variable length fields and a Variable Block created which is consisting of 2 bytes at the start.
Now here every variable length field two additional bytes are added. In this query, 3 variable length fields will result in adding Variable Block of 8 byes.
why Others are wrong :
Option 2) Here we need to allocate 2 bytes for the creation of the Variable Block but not 3 hence not correct.
Option 4) This query is referring for the algorithm for calculating how many
bytes are needed in the Null Block along with the initial 2 bytes hence Not valid.
A very happy Independence Day to you !!!
DILIP KUMAR JENA
Country : INDIA
Reply
David Brust
August 16, 2011 12:39 am
The answer is 1 and 3.
The first time you create a field with a variable length data type (e.g., varchar, nvarchar), the variable block is created.
Each variable length field adds another 2 bytes to the block.
I love you Pinal Dave.
David
USA
Reply
Damodaran Venkatesan
August 16, 2011 2:00 am
Answer: 1 and 3.
If there are any variable length data type in the record, it will have 2 byte variable block for the creation of the variable block.
For each variable data column in the record, it will take additional 2 bytes in the record.
Damodaran Venkatesan
Country: USA.
Reply