SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Data Row Space Usage and NULL Storage – Day 15 of 35

Answer simple quiz at the end of the blog post and –
Every day one winner from India will get Joes 2 Pros Volume 3.
Every day one winner from United States will get Joes 2 Pros Volume 3.

Data Row Space Usage

Most of a table’s space is occupied by its records. Indexes and other properties use a relatively small amount of known space for the table.  Suppose your company – or a hiring manager – shows you the design of the SalesInvoiceDetail table and says, “We expect this table to receive an average of 100,000 records per day during the next two years. How much hard drive space should we purchase to handle this expected growth?” You know how many rows will be received in a day and how many days there are in a year.  The unknown in this scenario is the amount of space each row will use. If you calculate the amount of space each row needs, you can then answer this resource planning question for the new table.

100,000 rows/day * 365 days    =  36,500,000 rows

36,500,000 rows   * __ KB/row = ____ KB of storage space needed

Our calculations will similarly focus on data rows. In order to estimate a row’s space consumption, we must know the amount of space each field’s data will use.

There are three key components which contribute to a field’s space consumption:

· The data type
· Whether the data type is fixed or variable
· Whether the field is nullable

Common Data Types

The names and storage amounts per field for many commonly used data types are shown here.

Exact numeric data types
int (integer)                    4 bytes
bigint                           8 bytes
smallint                        2 bytes
tinyint                          1 byte
money                                     8 bytes
smallmoney                 4 bytes
decimal                        5-17 bytes, depending on the number of digits
numeric                       5-17 bytes, depending on the number of digits
bit                                1-8 bit fields use 1 byte; 9-16 bit fields use 2 bytes; etc.

Row Header

Every row has a 4 byte header. This is a positioning header which keeps track of where the row is placed in the table and which fields the row contains.

Update from Paul Randal (Blog | Twitter): The 4 byte row header has nothing to do with the position of the record or which columns are in the record. It contains two bytes (one for index records) that say what kind of record it is, plus the offest of the null bitmap. The nulll bitmap is always present in data records, regardless of whether the columns are nullable or not – unless the table is comprised solely of sparse columns.

Fixed Data

Fixed length data types always occupy the amount of space allotted to them. For example, an int will always use 4 bytes. A char(3) always takes up 3 bytes, even if the field contains just 1 or 2 characters. Fixed length data is predictable and the easiest type of data for SQL Server to manage. Calculations involving fixed data are straightforward.  However, variable length data incurs additional overhead.

Variable Block

Every record containing variable length data includes something called a variable block. The first time you create a field with a variable length data type (e.g., varchar, nvarchar), the variable block is created. This block keeps track of the number of variable length data fields within the record and takes up 2 bytes.

The more variable length fields you have, the bigger the variable block grows. Each variable length field adds another 2 bytes to the block. (These 2 bytes keep track of where the data is positioned within the row.) For example, if you have one varchar field in your table, your variable block would contain 4 bytes for each record in the table. If you have two varchar fields, your variable block would contain 6 bytes (2 bytes per field plus 2 bytes to set this up).

Variable Data

Variable length data types do pretty much what the name implies – they expect the data length to vary from row to row. Fields using variable length data types, such as varchar and nvarchar, are typically name or address fields where you aren’t certain how long the data will be.

The advantage these data types offer is that shorter names or addresses can take up less storage space than a fixed data type. For example, if you have a char(100) field to allow for long addresses, then that field always uses 100 bytes no matter how long the data actually is. However, if you know that most of your addresses will consist of 20 characters, you probably would choose a varchar(100) to use less storage space but retain the flexibility to accept addresses up to 100 characters in length.

Varchar data consumes 1 byte per character. Nvarchar data consumes 2 bytes per character, because it is Unicode. For example a varchar(10) field containing the name “Rick” would consume 4 bytes. If it were an nvarchar, it would use 8 bytes.

This is one difference between fixed and varying length data. With fixed data, you can calculate the storage consumption without needing to look at the actual data. However, to precisely calculate how much storage a row or table is actually utilizing, you would need to examine the length of data in each of the variable data fields (the LEN( ) function is shown in the figure below.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_1

Now let’s bring in the design interface for the RoomChart table, which we will use to calculate the actual space consumption for rows in the RoomChart table. We will look at the table design along with our LEN( ) query result so that we have the length measurements for the RoomName field.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_2

Let’s calculate the actual space consumption for Row 1 of the RoomChart table. We will begin with the fixed length data. Each of the four rows contains two fixed length fields. The ID field uses 4 bytes and the field named “Code” uses 3 bytes. Each row also has a 4 byte header. Thus, without looking at the data, we already know each row uses at least 11 bytes.

Header + Fixed Length Fields (ID and Code fields)

[4 bytes + 4 bytes  + 3 bytes      = 11 bytes]

The final field (RoomName) contains variable length data, so in order to evaluate the space consumption we must: 1) calculate the variable block; and 2) look at the actual variable length data.

Since there is one variable field per row, we must allow 2 bytes for the creation of the variable block. Then we must multiply the number of variable field(s) in the row by 2 bytes.

Variable Block

[2 bytes + (1 field * 2 bytes/field) = 4 bytes]

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

Null Data

One important piece of the storage calculation we haven’t yet considered is the null block. Somewhat like variable length data fields, each record in a table containing nullable field(s) uses a little extra storage space.

Null Block

Each record begins with a standard 4 byte row header. Right after the row header, the first item in the data portion of the record is the fixed data. SQL Server stores together all of the columns containing fixed width data.

If your table contains nullable data, then a null block follows the fixed data and occupies the third space in the physical structure of the record. (Without the null block, the usual order prevails – #1 Row Header, #2 Fixed Data, #3 Variable Block, and #4 Variable Data payload.)

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_3

The null block (also called the null bitmap) is created as soon as a nullable field is created in a table. The null block in each row begins as 2 bytes but may grow as you add more fields to your table.

Next you must count the total number of columns in the table. Add an additional byte to the row’s null block for the first field and another byte for every 8th field. In other words, if a table has between 1-8 fields, then the null block in each row will be 3 bytes. If the table contains 9-16 fields, then the null block will be 4 bytes per row. If the table contains 17-24 fields, then the null block will be 5 bytes per row, and so forth.

These additional bytes contain an indication for each column’s nullability. In other words, whether the column will allow nulls (e.g., Code, RoomName) or won’t allow nulls (e.g., the ID column in the RoomChart table).

Null Block Storage Allocation

It often surprises people to know that it only takes one nullable field to cause every field in the table to take up 1 bit of extra space in the null block. These additional bits keep track of each column whether it does or doesn’t contain a null. The following diagram illustrates two tables:  one in which every column is nullable and one containing only a single nullable column.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_4

Each table contains 10 columns, c1 through c10. Since there are 10 columns and each table contains at least one nullable column, 10 additional bits are needed in the null block. The null block will contain a 2-byte fixed length field and a variable length bitmap of 1 bit per column. Memory is allocated in 8-bit bytes and 10 bits (c1 – c10) crosses into the next byte.  The variable length bitmap takes up 2 more bytes, bringing the size to 4 bytes for each tables’ null block.

Note that in table t1, an INSERT statement placed the integer values 1 through 10 in columns c1 through c10, respectively. Since none of these values is null, the null block bitmap contains 0’s for columns c1 through c10. On the right half of this figure, you see these bits are located in byte 1 for columns c1 through c8, and byte 2 for columns c9 and c10. The additional 6 bits of byte 2 in the null block bitmap are not used by any columns. In table t2 we have one null value in the last column. An INSERT statement placed the integer values 1 through 9 in columns c1 through c9, and null in column c10. Columns c1 through c9 are not null, so the null block bitmap contains 0’s (zeros) for those columns. Column c10 does contain null, so its bit in the null block bitmap reflects a 1.

In a table that contains at least one nullable column, each row will contain a null block whose length depends on the number of columns in the table. If a row of such a table contains a null value for a nullable column, its bit in that row’s null block bitmap will be set to 1. Columns which are not null have their bits in the row’s null block bitmap set to 0.

SQL SERVER - Tips from the SQL Joes 2 Pros Development Series - Data Row Space Usage and NULL Storage - Day 15 of 35 j2p_15_5

Recall that we calculated the space consumption for Row 1 as 65 bytes.

Actual Data

[Renault-Langsford-Tribute, 25 unicode chars = 50 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 50 = 65 bytes

We know a null block is needed, because there are nullable fields in the RoomChart table. There are two nullable fields, Code and RoomName.

Creating the null block uses 2 bytes. Then you must count the total number of columns in the table. Since this table contains 3 columns, 1 byte is added to the null block.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 1:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 50 = 68 bytes

Thus, the full amount of space used by Row 1 of the RoomChart table is 68 bytes. Now let’s recalculate the second row’s space usage including the null block. Recall we calculated the space consumption for Row 2 as 53 bytes.

Actual Data

[Quinault-Experience, 19 unicode chars = 38 bytes]

Header 4 | Fixed Data 7 | Variable Block 4 | Variable Data 38 = 53 bytes

Since the null block for each record in the table will be the same size, we know the null block for Row 2 will be the same as Row 1:  3 bytes.

Null Block

[2 bytes + 1 byte (only 3 fields)] = 3 bytes]

So we must add 3 bytes to our original storage calculation for Row 2:

Header 4 | Fixed Data 7 | Null Block 3 | Variable Block 4 | Variable Data 38 = 56 bytes

Note: If you want to setup the sample JProCo database on your system you can watch this video. For this post you will want to run the SQLArchChapter4.0Setup.sql script from Volume 3.

Question 15

You have three variable length data fields. What are the rules that go into the calculation of how large the variable block will be (Choose two)?

  1. You will allocate 2 bytes to the creation of the variable block
  2. You will allocate 3 bytes to the creation of the variable block
  3. You will allocate 2 more bytes for each of the three variable fields
  4. You will allocate 1 byte for every eight columns in the table.

Rules:

Please leave your answer in comment section below with correct option, explanation and your country of resident.
Every day one winner will be announced from United States.
Every day one winner will be announced from India.
A valid answer must contain country of residence of answerer.
Please check my facebook page for winners name and correct answer.
Winner from United States will get Joes 2 Pros Volume 3.
Winner from India will get Joes 2 Pros Volume 3.
The contest is open till next blog post shows up at which is next day GTM+2.5.

Reference:  Pinal Dave (https://blog.sqlauthority.com)

Joes 2 Pros, SQL Scripts
Previous Post
SQL SERVER – Tips from the SQL Joes 2 Pros Development Series – Output Clause in Simple Examples – Day 14 of 35
Next Post
SQLAuthority News – Pluralsight Giving Away Free Subscription to Quiz Participants

Related Posts

57 Comments. Leave new

  • Rene Alberto Castro Velasquez
    August 15, 2011 7:32 am

    Correct asnwers are No. 1 and No. 3, because 2 bytes are needed to create a variable block (answer No. 1), and 2 bytes are needed for each variable field (answer No. 3)
    Rene Castro
    El Salvador

    Reply
  • paulrandalPaul Randal
    August 15, 2011 7:52 am

    Some corrections:
    The 4 byte row header has nothing to do with the position of the record or which columns are in the record. It contains two bytes (one for index records) that say what kind of record it is, plus the offest of the null bitmap.
    The nulll bitmap is always present in data records, regardless of whether the columns are nullable or not – unless the table is comprised solely of sparse columns.

    Reply
  • 1) and 3)
    1) You will allocate 2 bytes to the creation of the variable block
    3) You will allocate 2 more bytes for each of the three variable fields

    The variable block is takes 2 bytes first time. Each of the variable fields will add 2 bytes each,

    Reply
  • Ramakrishnan Srinivasan
    August 15, 2011 9:06 am

    Creation of the variable block will take up 2 bytes, and every variable block will subsequently add 2 bytes to this amount. Hence,

    2 bytes + (3 variable-length-fields * 2 bytes per variable-length-field) = 8 bytes

    Options 1 and 3 support the above calculation, and therefore, they are correct

    Ramakrishnan RS,
    Mysore, India

    Reply
  • Correct answers are option 1 & 3

    1.You will allocate 2 bytes to the creation of the variable block
    3.You will allocate 2 more bytes for each of the three variable fields

    because Variable block requires 2 bytes for the creation and another 2 more bytes for each of the three variable fields to set this up.

    Formula
    [2 bytes (for creation) + (1 fields * 2 bytes/field) = 4 bytes]

    As the calculation is for three variable length data fields, the total number of bytes would be

    [2 bytes (for creation) + (3 fields * 2 bytes/field) = 8 bytes] is the answer

    Thanks.

    Country – India

    Reply
  • The correct answers are #1 and #3
    You will allocate 2 bytes to the creation of the variable block
    You will allocate 2 more bytes for each of the three variable fields

    I am from USA

    Reply
  • The correct answers are Options-1 and 3
    1).You will allocate 2 bytes to the creation of the variable block
    3).You will allocate 2 more bytes for each of the three variable fields

    Reply
  • Option 1 and 3 are correct.

    Thanks

    Sudhir Chawla
    New Delhi, India

    Reply
  • Rajneesh Verma
    August 15, 2011 10:38 am

    Hi,
    Option 1 and 3 are correct Answers.
    1.) You will allocate 2 bytes to the creation of the variable block
    3.) You will allocate 2 more bytes for each of the three variable fields

    Thanks…
    Rajneesh Verma
    (INDIA)

    Reply
  • shatrughna kumar
    August 15, 2011 10:46 am

    HAPPY INDEPENDENCE DAY.

    Correct options are 1 & 3.

    Special thanks for posting this beautiful article on memory allocation.
    I will sure follow these while design of DB in future.

    New Delhi
    India

    Reply
  • Nakul Vachhrajani
    August 15, 2011 11:54 am

    The correct answers are #1 & #3.

    Reason: As has been put up very well in the article: “2 bytes per field plus 2 bytes to set this up”:
    1. You will allocate 2 bytes to the creation of the variable block
    3. You will allocate 2 more bytes for each of the three variable fields

    Country of residence: India

    On a side-note:
    Pinal: This series is absolutely wonderful. Especially today’s article. I was aware about the high-level storage concepts, but not about how fixed-length, NULL blocks and variable-length fields are arranged on the row.

    I got to learn a great deal today, and I thank-you and the other writes of the “Joes 2 Pros” series.

    Reply
  • Aditya Bisoi (@AdityaBisoi07)
    August 15, 2011 12:18 pm

    Question 15
    Ans :
    1) You will allocate 2 bytes to the creation of the variable block
    3) You will allocate 2 more bytes for each of the three variable fields

    Chennai, INDIA

    Reply
  • Hi,

    1) You will allocate 2 bytes to the creation of the variable block

    3) You will allocate 2 more bytes for each of the three variable fields

    Base on this bellow equation above 2 Choose is correct.

    [2 bytes + (1 field * 2 bytes/field) = 4 bytes]

    [2 bytes + (3 field * 2 bytes/field) = 8 bytes]

    Correct ans. is : – 1 & 3

    I am from Ahmedabad,Gujarat,India

    Reply
  • Options 1 and 3 are correct.
    1.) You will allocate 2 bytes to the creation of the variable block
    3.) You will allocate 2 more bytes for each of the three variable fields

    Reply
  • Want to understand the space allocated to the indexes created for the table.

    I’m Ramalingam from chennai, India

    Reply
  • Correct answer is Option 1 and 3.

    1.You will allocate 2 bytes to the creation of the variable block.
    3.You will allocate 2 more bytes for each of the three variable fields

    Country:India

    Thanks,
    Fazal Vahora

    Reply
  • P.Anish Shenoy
    August 15, 2011 2:28 pm

    Hi Sir,

    The correct options are 1 and 3
    1. we should allocate 2 bytes to the creation of the variable block
    and
    3. we will allocate 2 more bytes for each of the three variable fields.

    so the calculation goes like this :

    We have three variable fields so

    [2 bytes + (3 field * 2 bytes/field) = 8 bytes]

    Option 2 is incorrect as we allocate only 2 bytes for the creation of the variable block.
    Option 4 doesn’t satisfy the rules.

    P.Anish Shenoy,
    INDIA,Bangalore,Karnataka

    Reply
  • Correct answer is 1) and 3)
    1) You will allocate 2 bytes to the creation of the variable block
    3) You will allocate 2 more bytes for each of the three variable fields

    The variable block is takes 2 bytes first time. Each of the variable fields will add 2 bytes each,

    My country of residence is UK

    Reply
  • 1) and 3)
    1) You will allocate 2 bytes to the creation of the variable block
    3) You will allocate 2 more bytes for each of the three variable fields

    The variable block is takes 2 bytes first time. Each of the variable fields will add 2 bytes each,

    I’m from INDIA

    Reply
  • Mike Michalicek
    August 15, 2011 5:32 pm

    Tricky question, trying to pick one answer yet 2 fit the bill.

    Correct answers are 1 & 3.

    USA

    Mike Michalicek

    Reply

Leave a Reply