This is the 8th post out of my 10 post series of my 10th book – SQL Basics. Today will show the importance of data and information.
You can get that in Paperback (USA) and Kindle (Worldwide).
Logging Data Changes
In my experience, many students do not find the concept of datafile and logfile activity an intuitive one. So we will ease into it with an example that we have found helps students grasp this topic more quickly. But first we need a little explanation as to why SQL Server uses logfiles.
We know that SQL Server stores its data much like other applications, in files which are saved to a persistent drive. But a distinguishing feature of SQL Server is its robust ability to keep track of things. The security and safety of the data and reliability of the system are SQL Server’s top priorities. Therefore, you can imagine that logging activity, which tracks every transaction made in the database, is a pretty big deal. Examples where logging saves the day generally involve some type of database restore or recovery need. Once a database backs itself up, we are generally assured a reliable mechanism we can use to restore the system in case something unfavorable happens. Suppose we notice bad data has come into the system through one of the periodic feeds. In fact, this data is so problematic that the team decided we must restore the database back to the point a week ago. This gets us back to a time before the bad data began entering the system. The periodic database backup is built using information provided by the logfile. Logfiles keep track of the database transactions and help ensure data and system integrity, in case a system recovery is ever necessary.
Ok, now we’re ready to tackle datafiles and logfiles. A datafile is fairly straightforward – it contains all the current data. Suppose we’ve been making changes to JProCo’s Employee table. If we could peek into the datafile, we would find data identical with the result of SELECT * FROM Employee. However; it wouldn’t tell us that an hour ago, we deleted an employee record, or that today at 9:45 AM the manager added a new employee record to the table.
We sometimes compare the datafile to getting information from the ATM. Usually we are happy with the data the ATM shows us (i.e., the current balance), and it always provides us the data rapidly. But if we need to look back and see deposit or withdrawal information, the ATM can’t help us. To see the transaction detail which has led to the current account balance, we need to look at our bank statement. Logfiles are just like the transaction history shown in a bank statement, where we can find a separate transaction entry for every purchase, deposit, or withdrawal made. The two identically structured WordPad files are going to help us visualize the transaction activity retained by the logfile.
These are pretty small files, just 1 KB each. We made make some significant edits to Document A, which we won’t make to Document B. Not only will the documents differ visually, but when we see that the changes cause Document A’s size to expand to 6 KB, it’s clear that Document A and Document B are no longer identical files.
Where my classes tend to find the “ah-ha” moment is when we actually see the changes in Document A being removed one by one as we use Edit and Undo to backtrack and see the edits disappear. Similarly, if we delete a few words one by one, the Undo operation will backtrack and make each word reappear one by one. What the application is doing is traversing through the log of changes and accessing memory to find all of those changes. If the changes weren’t logged, they wouldn’t be available.
At the end of the demonstration, Document A has been returned to its beginning state, it contains the information identical to Document B and we’ve saved the file in order to clear the logged changes from memory. Thus, Document A and B are each 1 KB in size at the end but just prior to saving Document A, we make another interesting “ah-ha” observation. On the surface, both documents appear identical. However; when we compare the size of the two files, Document A is many times larger than Document B. In my classroom demos, the file grows to 250 KB with relatively few clicks. The log tracks changes made to the document from the last time it was saved up until the current moment.
At this point, an expert DBA would understandably be bored silly with this demo. However; over the years, we’ve found this the fastest way to ramp up students new to the abstract concept of the work done by logfiles.
Document A’s condition at the beginning and end of the demo (i.e., 1 KB and reflecting the data “This File is Small.”) serves as a comparison to the datafile. Because the file was saved at the beginning of the demo, and then again at the end, the document showed just the current state of the data. This has nothing to do with tracking data which was added or deleted along the way. The datafile’s purpose is to reflect the current state of the database.
Database Datafiles and Logfiles
Step 1. Pretend we have a brand new database with one table (Employee) which contains zero records. There are no records in the JProCo database, so there are no records in the datafile, and since we haven’t made any changes to the database, there are zero records in the logfile.
Step 2. Now data starts coming into the JProCo database. You add one new record for Alex Adams to the Employee table. So now we have one record in the datafile and one record in the logfile.
Step 3. You then add another record (Barry Brown). Two records are now in JProCo, so two records are in the datafile and two records in the logfile. So, we have two pieces of data and two entries in the logfile reflecting those changes.
Step 4: The next step updates an existing record. Employee 2 is coming back from leave, so we’re going to change his status from “On Leave” to “Active.” There will still be two records in the database, so the datafile will contain two records. But there will be three records in the logfile since we made three changes since the last time we backed up the database.
Step 5: The database is backed up nightly at midnight. After the three earlier changes happen, suppose it’s after midnight and the database backup has just finished running. At 12:05AM there would still be two records in the JProCo database so we would have two records in the datafile. During most backup processes, the changes contained in the logfile are sent to the backup file. The logfile is emptied as part of the backup process, so zero records remain in the logfile immediately after each backup.
Step 6: On day 2, we insert Lee Osako’s record (the third record added to Employee). At this point we have three records in the datafile. The logfile has been empty since the backup, and this change now adds one record to the logfile.
Action Item
Get the book for yourself and your friend. This is just a reference everyone must have it.
Available in Paperback (USA), Kindle (Worldwide)
Reference: Pinal Dave (https://blog.sqlauthority.com)
7 Comments. Leave new
Excellent explanation Pinal Much appreciated…..
Nice explanation Pinal….
But i have one more question ….what happened when we are in a process of database backup and if any customer has performed any transaction.
During this time will log file is still empty? if yes then how do we need to track the changes in log file
I thought this looked really familiar and then I realized I have this book :)
Hi Pinel
can u please explain me Microsoft certification 70-461 exam details .
thanks
very nice explanation…
very helpful…. :-)
Superb sir.Explained very well.I have Clearly Understood the difference between data file and Log file. Thank you.
After the backup , my logfile is not emptied. Check well if this is correct.