Please read the Introductory Post before continue reading interview question and answers.
What is ETL?
ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn’t really matter that that data is in different forms or formats. The data can come from any source. ETL is powerful enough to handle such data disparities. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data – using rules or lookup tables, or creating combinations with other data – to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.
What is VLDB?
VLDB is abbreviation of Very Large Database. For instance, a one-terabyte database can be considered as a VLDB. Typically, these are decision support systems or transaction processing applications serving a large number of users.
Is OLTP Database is Design Optimal for Data Warehouse?
No. OLTP database tables are normalized, and it will add additional time to queries to return results. Additionally, the OLTP database is small; it does not contain data from a long period (many years), which needs to be analyzed. A OLTP system is basically an ER model and not a Dimensional Model. If a complex query is executed on an OLTP system, it may lead to heavy overhead on the OLTP server that will affect the normal business processes.
If denormalizing improves Data Warehouse Processes, then why is the Fact Table is in the Normal Form?
The foreign keys of facts tables are primary keys of Dimension tables. It is clear that the fact table contains columns which are a primary key to another table that itself make a normal form table.
What are Lookup Tables?
A lookup table is the table placed on the target table based upon the primary key of the target; it just updates the table by allowing only modified (new or updated) records based on the lookup condition.
What are Aggregate Tables?
An aggregate table contains the summary of existing warehouse data, which is grouped to certain levels of dimensions. It is always easy to retrieve data from aggregated tables than visiting original table which has millions of records. Aggregate tables reduce the load in the database server and improve the performance of the query, and they also can retrieve the result quickly.
What is Real-Time Data-Warehousing?
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.
What are Conformed Dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.
What is a Conformed Fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.
How do you Load the Time Dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.
What is a Level of Granularity of a Fact Table?
Level of granularity means the level of detail that you put into the fact table in a data warehouse. Level of granularity implies the detail you are willing to put for each transactional fact.
What are Non-Additive Facts?
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However, they are not considered as useless. If there are changes in dimensions, the same facts can be useful.
What is a Factless Facts Table?
A fact table that does not contain numeric fact columns is called a factless facts table.
What are Slowly Changing Dimensions (SCD)?
SCD is the abbreviation of slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD.
- SCD1: The new record replaces the original record. Only one record exists in database – current data.
- SCD2: A new record is added into the customer dimension table. Two records exist in the database – current data and previous history data.
- SCD3: The original data is modified to include new data. One record exists in database – new information is attached with old information in same row.
Reference: Pinal Dave (http://blog.SQLAuthority.com)