Data Warehousing: Dimensional Role-Playing

When a single dimension needs to be part of a fact table multiple times! By Michelle A. Poolet.

Role-playing isn’t limited to the Broadway stage; data warehouse dimensions can also assume roles. Dimensional roles can be used not only for precision when you’re defining the purpose or function of a set of rows from a dimension, but also to easily maintain the data because there’s only one table to manage instead of many tables to synchronize.

cropped-cropped-filo01_cartoonized_42.jpgDimensional role-playing occurs when a single dimension needs to be part of the same fact table many times. For example, dates and times appear in most types of analysis because business activities happen in a timeframe and objects exist in time. Time is almost always used when calculating and evaluating measures in a fact table. Each event constitutes a link to the time dimension. For example, a customer order was taken on a specific date; the order was picked and packed at the warehouse on one or more dates; the order was shipped to the customer on a date; the order has a delivery date; the order’s payment was received on a date; and sometimes, the order was returned on a date and a refund was issued on a date. All of these dates need to be recorded, stored, and analyzed because they all play a role in the retail sale life cycle.

Defining Roles

When many same-type events, such as dates, are present in a single fact table, you have an accumulating snapshot fact table. (Of course, there are other types of fact tables, which I’ll discuss in upcoming articles.) Figure 1 shows the Shipping star schema. The fact table (i.e., factShipping) is an accumulating snapshot; it contains dates for multiple events, such as scheduled and actual departure and arrival dates and times, where each shipment starts and ends, and the buyer and the seller. We could record a lot more discreet date and time events in this fact table, but no others are necessary for this example.

This star schema isn’t a cube; it’s a set of relational tables that can be managed and manipulated using SQL Server Enterprise Manager or SQL Server Management Studio (SSMS). Clearly, it isn’t the OLTP tables that are used to initially capture the retail events because OLTP tables are more highly normalized than the tables in Figure 1. Also, the join or associative table, which is the OLTP counterpart to the OLAP fact table, would typically have fewer attributes than this fact table. These tables are containers that hold the source data for the OLAP Services/Analysis Services multidimensional databases (cubes). Another name for a multidimensional database is the staging database or data store. The staging database receives verified and validated data via the extraction, transformation, and loading (ETL) processes from various external data sources, including OLTP databases. (This staging database is also an excellent reporting database because the veracity of its data content and the logical orderliness of its architecture make sense to business users. Business users armed with Microsoft Excel can build ad hoc reports without having to rely on overworked business intelligence (BI) programmers and analysts.)

Take a look at the views in Figure 1, which you can identify because view names start with “v” (e.g., vSeller). (Note that in Figure 1, dimension table names start with “dim” and fact table names start with “fact.”) I’ve built views on each of the three dimensions, Time, Person, and Location. In this star schema, I’m linking the views rather than using the dimensions to link to the fact table. Now it’s easy to visualize that each shipment has a buyer and a seller, a starting and an ending location, and multiple date and time events associated with each record in the fact table. The views (aka roles) serve as virtual dimensions.

When creating a dimensional role, you can filter data. For example, if the company won’t ship orders on weekends or holidays, it’s easy to constrain the shipping date views with the following selection criteria:

select …
from dimTIME
where DayNumberOfWeek between 2 and 6
and Holiday = 0

Or you can separate buyers from all other people when creating the vBuyer view by running the following command:

create view vBuyer
AS
select …
from dimPerson
where PersonType = ‘buyer’

Supertypes and Subtypes

If something seems familiar about this discussion of dimensional roles, the reason might be that I’ve discussed the concepts of supertypes and subtypes in previous columns. Dimensional role-playing combines data subtyping and filtering into one operation. For example, in Figure 1, dimPerson is a supertype and vBuyer and vSeller are virtual subtype entities. Using the WHERE criterion, the WHERE clause of the query that creates the subsets of data for each of the views acts as a data filter and establishes the subtype, which then ensures that only buyers are listed in the vBuyer dimensional view and only sellers are listed in the vSeller dimensional view.

Each of the roles that these dimensions play in the accumulating snapshot fact table is a little different, and it’s important to relabel the view dimension and its attributes to better express the role meaning. I’ve done that in Figure 1: For example, the time dimension has morphed into vScheduledShipDate, and TimeKEY has become SchedShipDateKEY. There’s little chance that a business user would use vScheduledShipDate under the assumption that it represented arrival times. Remember, one characteristic of a data warehouse is ease of understanding, and simple naming conventions make the roles easy to understand.

The second characteristic of a data warehouse is performance. As anyone who works in data warehousing will tell you, proper indexing is crucial to good performance, so you should consider indexing views. Because of the simplicity of the dimensional table structure—a surrogate key, a flattened set of attributes—the dimensions lend themselves nicely to serving as base tables for indexed views. Also, the indexed view is dynamically updated as the base table changes, so you only have to manage the dimension table when you’re executing ETL operations.

Dimensional Role-Playing

If you’re still having trouble understanding how dimensional roleplaying works, visualize the last time you flew in an airplane. There was a scheduled departure date/ time, an actual departure date/time, a scheduled arrival date/time, and an actual arrival date/time. Then there was the time on the ground at the point of departure (calculated as the time interval between when the jet pulled away from the gate until the wheels were off the ground), the time spent in the air traveling from the departure airport to the arrival airport, and the time on the ground at the point of arrival (calculated as the time interval between touch down and when the jet docked at the gate). These times are critical for conventional airline flight analysis, and each would have an entry in a fact table that would be used to calculate airline flight on-time percentages.

role-playing-figura1

Até apróxima!!smurf

4 Respostas to “Data Warehousing: Dimensional Role-Playing”

  1. Rafael Azevedo (@rafabts) Says:

    Great article.
    I didn’t face this situation yet, but this article is going to prevent some headaches to me. Thanks!

    By the way, I got this post trying to find some guidance for the following scenario: if my business allows I have many buyers related to a product and I need this information when querying my data warehouse. Could you give some clue about how to model it in an elegant way?

    • Carlos Alberto Lorenzi Lima Says:

      Thanks a lot for your visit!!!

      Dimensional Model

      Let’s see if I understand your question: We have a product, and this particular product can be purchased for various clients. We can cite as an example, a retail store that sells a particular brand of television 22 inches, several customers buy this tv in one of many of our retail chain stores.

      We can solve this issue very simply in our dimensional modeling, creating a product dimension, a dimension of customers, a dimension of stores and a dimension of dates of purchase, which will be associated with our fact sales.

      Was that even your doubts? Helped?
      Big hug.
      Lito

      • Rafael Azevedo (@rafabts) Says:

        Sorry, my question was very badly formulated.. I really got confused doing it… shame! rs

        Let me try again…

        I have a source model similar to this one: https://docs.google.com/drawings/d/1DvSubJkaZQ_ejXkkT5dsm4GvkTWaRXhpp_BiBXy2yu4/edit

        and I am planning a target model like this:
        https://docs.google.com/drawings/d/1lYwaC_Z7U3vFhnZr7LjoL8Nj7LuCZ1lYxvh9ZyT7ULo/edit

        I don’t know if I design a DimStakeholders linked direct to FactSales, or if it is the case to apply some approach to make a many to many dimension relationship between OrganizationStructure dimension, a stakeholder dimension and fact table sales…

      • Carlos Alberto Lorenzi Lima Says:

        Well, it seems to me that your project is addressing a cooperative sales, and from what I see, you are trying to delete a snow flake structure and transforms it into star schema. With respect to specific stakholders, we need to answer some questions:

        1 – They have different codes?
        2 – can be a maximum of 2 per group?
        3 – How many groups we have in our project?
        4 – How many organizations will be?

        Well, if we have different codes for stakeholders, we can create without problems, a dimension of Stakeholders and instead we use a surrogate key, we will use the code of the stakeholder itself. (1)

        If we have groups with a maximum of two stakeholders, then add their names in our DimOrganizationStructure dimension. (2)

        If we have a few groups, we can create a junk dimension with stakeholders, and use in this dimension, surrogates key to identify it in fact, and columns code group name and stakeholder name as well. (3)

        If we have a few organizations, then the junk dimension can be used for the organization, and, in that dimension, use surrogate key to identify it in fact, NameFarm, NameFarmOwner, NameGroup, NameGroupConsultant, NameAgroindustry, NameStakeholder. (4)

        Otherwise, we can think of bridge tables or keep a standard structure (3NF) between DimOrganizationStructure and stakeholders dimension (snow flake) …

        So, wath you think about it? I was able to help you in any way?

        I hope so!

        Hugs!!!

        Lito

        By the way, are you here, in Brazil? Speak portuguese?


Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s

%d blogueiros gostam disto: