Please read the Introductory Post before continue reading interview question and answers.
ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn’t really matter that that data is in different forms or formats. The data can come from any source. ETL is powerful enough to handle such data disparities. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data – using rules or lookup tables, or creating combinations with other data – to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.
VLDB is abbreviation of Very Large Database. For instance, a one-terabyte database can be considered as a VLDB. Typically, these are decision support systems or transaction processing applications serving a large number of users.
No. OLTP database tables are normalized, and it will add additional time to queries to return results. Additionally, the OLTP database is small; it does not contain data from a long period (many years), which needs to be analyzed. A OLTP system is basically an ER model and not a Dimensional Model. If a complex query is executed on an OLTP system, it may lead to heavy overhead on the OLTP server that will affect the normal business processes.
The foreign keys of facts tables are primary keys of Dimension tables. It is clear that the fact table contains columns which are a primary key to another table that itself make a normal form table.
A lookup table is the table placed on the target table based upon the primary key of the target; it just updates the table by allowing only modified (new or updated) records based on the lookup condition.
An aggregate table contains the summary of existing warehouse data, which is grouped to certain levels of dimensions. It is always easy to retrieve data from aggregated tables than visiting original table which has millions of records. Aggregate tables reduce the load in the database server and improve the performance of the query, and they also can retrieve the result quickly.
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.
Level of granularity means the level of detail that you put into the fact table in a data warehouse. Level of granularity implies the detail you are willing to put for each transactional fact.
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However, they are not considered as useless. If there are changes in dimensions, the same facts can be useful.
A fact table that does not contain numeric fact columns is called a factless facts table.
SCD is the abbreviation of slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD.
Reference: Pinal Dave (http://blog.SQLAuthority.com)