• Fact Table
According to Kimball, the fact table is the main table of a dimensional model, where the numerical measures of interest of the company are stored.
The word “fact” represents a measure of the modeled processes, such as quantities, values, and indicators. The fact table records the facts that will be analyzed.
It is composed of a primary key (formed by a unique combination of dimension key values) and by the metrics of interest to the business. The fact table must always be filled with the measures relating to the fact.
You should not fill a line with the fact table with zeros to represent that nothing happened (for example, that there was no sales of a product on a certain date), as this would cause the fact table to grow too much. In addition, the fact table should represent a unit of the business process, not mixing different subjects in the same fact table.
• Dimension Table
The dimension table is composed of attributes and contains the description of the business. Its attributes are sources of query constraints, results grouping, and report headers.
It has aspects by which one intends to observe the metrics related to the modeled process. The dimension table is usually much smaller than the fact table.
One of the best practices for constructing Dimensions is the use of Surrogate Keys as the primary key (associated with transactional keys). In this appendix, we will have a topic explaining the concept of Surrogate Key (SK)
An example of the difference between the fact table and the dimension is in the figure below:
Graph 1 – Fact table in the center and other dimension tables around
If the fact / metric to be measured is the revenue of a supermarket chain, the dimensions for the evaluation of the metric would be, for example, the number of stores, location, products sold and time.
• Aggregate Table
The aggregate table is created with data from the fact table, changing its granularity, that is, it summarizes the data, generating a smaller table. The aggregate table is used to optimize the access time of a query to the database. It is important to thoroughly evaluate the environment to define which aggregations should be created; The use of them requires an additional effort of maintenance, in addition to increasing the expense with storage, so one must always try to create aggregate tables that attend to multiple queries.
In addition, the aggregate tables may be temporary; In this way, it is necessary to take into account the possible extinction of this table and the future effects caused due to its exclusion.
It is the information stored in the fact tables that allow measuring the performance of business processes. Metrics are generally volumetric, numerical, may or may not be aggregated, and most often are additive type, ie allow operations such as addition, subtraction, and averages.
There are also two other types of metrics, the non-additive metrics and the semi-additive metrics. Non-additive metrics cannot be manipulated freely, such as percentage or relative values. The semi-additive metrics are values that cannot be summed in all dimensions.
• Star Model (Star Schema)
The name “star” is due to the arrangement in which the tables are found, the fact table being centralized relating to several other dimension tables. Here is an example of the Star Schema structure in the following figure:
Graph 2 – Star Model
In this model the data are unnormalized to avoid joins between tables, reducing query time, however due to the repetition of data, it uses more disk space.
The advantage of this model is the efficiency in data extraction, which is a great differential in the case of a Data Warehouse.
• Snowflake Model
Another very common type of structure is the Snowflake data model, which consists of an extension of the Star model where each of the “star tips” becomes the center of other stars. This is because each dimension table would be normalized by “breaking” the original table along existing hierarchies in its attributes.
It is recommended to use the snowflake scheme only when the dimension line gets too long and start to be relevant from a storage point of view. Here’s an example of the Snow Flakes structure in the following figure:
Graph 3 – Snowflake Pattern
Due to this structure, access to data is slower, but it facilitates the construction of cubes of some BI (Business Intelligence) and BA (Business Analytics) tools. The decision to opt for the star scheme or the snowflake should be made taking into account the data volume, the DBMS, the tools used, etc. Below is a comparative table between these two models.