Some organizations have adopted a data warehouse architecture that includes an atomic third normal form (3NF) relational data warehouse. This architecture, often called the hub-and-spoke or Corporate Information Factory (CIF), includes a data acquisition ETL process to gather, clean and integrate data from various sources. Atomic data is loaded into third normal form data structures, typically called the enterprise data warehouse (EDW) in this architecture. Another ETL data delivery process then populates downstream reporting and analytic environments supporting business users.
Organizations who’ve adopted this architecture often find some business users developing reporting and analytic applications directly against the atomic 3NF data structures. Ideally, these users would leverage an architected downstream analytic platform. Unfortunately, many organizations either populate the downstream environments with summary rather than atomic data, or worse, never get around to building the user assessable environments. Inevitably this results in a set of frustrated business users. The 3NF data structures are difficult to understand, the required queries tend to be very complex and difficult to develop, and the query response times can be abysmal. As a result, business units resort to pulling complete sets of data from the atomic 3NF warehouse to populate their own shadow reporting and analytic platform. Clearly, this is undesirable and leads the organization down a path to non-integrated and inconsistent results.
To overcome these challenges, a popular modification to the Kimball Architecture has evolved. This hybrid architecture leverages the existing 3NF data warehouse as a primary source of clean, integrated data to feed a dimensionally-structured enterprise presentation area. The resulting dimensional presentation area would consist of a number of atomic, business process-centric fact tables integrated via a set of conformed dimensions.
The key advantages of enhancing the 3NF environment with a dimensional presentation area are to present an atomic, integrated, consistent environment to the business community that is significantly less complex. As a result, the data is easier to understand, users can more readily create the queries they require, and the queries themselves are less complex. In addition, the query response from the underlying dimensional structures will be significantly quicker.
The majority of organizations will implement the dimensional presentation layer physically. Some organizations may look to leverage other technical solutions to implement the dimensional presentation layer logically rather physically. In this scenario, the business users would interact directly against some logical dimensional layer and the technologies involved would resolve the required queries directly against the 3NF table structures. Options for implementing this logical layer would include dimensional designs implemented via:
- Logical database views, sometimes called dimensional views
- Semantic layer of a BI tool
- Data virtualization tool
The effectiveness of a logical layer versus physical instantiation is relatively unproven for large scale production use in most environments. There are several considerations to weigh carefully before committing to a logical dimensional implementation:
- While the logical presentation area relieves the ease of use/understandability issues, it does nothing to improve the performance of the underlying queries. The technology supporting the logical layer will ultimately need to resolve the query against the same underlying 3NF table structures. If query performance against the 3NF environment is already a challenge, then the logical presentation layer is not likely a viable option.
- The use of conformed dimensions is critical in any dimensional presentation area. Often the ETL processing to deliver the integration, de-duplication, survivorship, and slowly changing dimension logic necessary for an effective conformed dimension will defeat the use of logical presentation and strongly favor a physical presentation area.
In either case, organizations utilizing a 3NF data warehouse environment can often improve their ease of use and performance characteristics by enhancing their environment to include a well-designed, properly architected dimensional presentation area.