Most of the guidance in the Kimball method for designing, developing, and deploying a DW/BI system is just that: guidance. There are hundreds or thousands of rules in the Kimball Group’s many books, and I confess to having bent many of those rules over the decades, when faced with conflicting goals or unpleasant political realities. By Joy Mundy – © Kimball Group. All rights reserved.
There are several tenets to the Kimball method that I am passionate about. This Design Tip lists the things I say over and over, to audiences experienced and new. Any reader who has attended a class that I’ve taught, or hired me as a consultant, has heard me discourse on most or all of the items on this list.
- The dimensional model is the key asset.
The Kimball method, as outlined in The Data Warehouse Lifecycle Toolkit, Second Edition, is centered on the dimensional model. Dimensional modeling principles are Ralph Kimball’s and the Kimball Group’s most widely known contribution to the world of business intelligence. It’s our focus because a good dimensional model is absolutely vital to the success of your DW/BI endeavor. If you get the model right, and populate it with integrity, the rest is straightforward.
- Dimensional modeling is a group activity.
Even the best dimensional modeler will create a poor dimensional model if she works solo. Not only is dimensional modeling a group activity, it is a group activity that must involve the business user community. Over the years, we have turned down countless consulting requests to design a model without business input. Or, worse, struggled through painful projects where the promised business user participation did not materialize.
It is undoubtedly a huge request of the user community. Our design process usually requires 50-60 hours of sessions over a 4 to 6 week period (or more, depending on project complexity). The people whom we want to participate in the design sessions are always valuable and in demand. But if they can’t be convinced to put in the time and energy, the resulting system will fail.
We’ve talked about this many times over the years, from Margy’s article, Practical Steps for Designing a Dimensional Model, to Bob’s Design Tip #157 Involve Business Representatives in Dimensional Modeling. It’s an element of the Kimball method that I am unwaveringly passionate about.
- The dimensional model is the best specification for the DW / BI system.
The majority of clients that I work with don’t have a written specification for the DW/BI system, certainly no document that reflects reality in a meaningful way. The most common specification format includes mind-numbing lists of what users want to filter and drill on, as well as the demand that all 2000 of the existing reports be supported by the new system. If all we’ve accomplished with our new DW/BI system is re-platforming the existing canned reports, we have failed.
We ask the business users at the end of the design process to think about the analyses they’ve recently done, tried to do, or would have liked to do, from the information in the current design scope. We want them to say “Yes, this model meets our needs.” At the same time, the IT people on the team have been watching the discussion of data sources, transformations, and other technical details. We ask them to affirm “Yes, we can populate this data model.” The dimensional model design write-up is a meaningful and actionable specification of the requirements.
Bob wrote eloquently on this topic in Design Tip #123 Using the Dimensional Model to Validate Business Requirements.
- The dimensional model should add value beyond restructuring.
Some of the most valuable improvements that you can deliver in your DW/BI system are to add improved descriptors and groups to frequently used data. Yet these opportunities are often missed by the design team. I’ve even encountered teams with an explicit policy to add nothing beyond what is in the source system.
Examples of valuable data model additions include:
- Making it easy for users to filter out the seldom used transaction codes.
- Providing attractive sort columns for pick lists and reports.
- Precompute banding, such as age ranges or transaction quality measures.
- Supporting different or deeper hierarchies than are managed in the source systems, such as marketing versus manufacturing views of the product dimension.
- Master data management systems are a great source.
De-duplication is one of the most difficult tasks facing a data warehouse team. Since the earliest days of data warehousing, the back room team has struggled to design ETL processes that de-duplicate entities such as customer. The increasing popularity and functionality of master data management (MDM) technology and programs provides a much better solution than within the ETL flow. And not just because it’s hard! The rhythm of the ETL process, which on an ongoing basis we want to be bullet proof and hands free, is fundamentally at odds with the de-duplication process. No matter how great our tools, how clever our code, how complete our business rules, the automated de-duplication process cannot achieve 100% accuracy. A person is required to make a judgment on the questionable cases. This works much better if it is a job responsibility during the business day, rather than waiting for the ETL load.
Warren wrote about how to set up an MDM program way back in a 2007 article, Pick the Right Approach to MDM. His advice is still good, and I have been seeing increasing success in MDM implementations. One recent client requires that alldimension attributes come from the MDM system, where they are actively managed by assigned business owners. ETL is basically just a consumer.
Most organizations will not be quite so zealous, but a simple MDM system can be used to build the value-added data elements discussed previously in this Design Tip. Dimension hierarchies are notorious for being imperfectly structured if sourced from users’ desktops; the MDM tools deliver a simple platform to manage this information, as discussed in my article,Maintaining Dimension Hierarchies.
- Don’t skip the relational data warehouse.
Designing and populating an enterprise conformed data warehouse is hard. Everyone would like to skip this step. Throughout the 23+ years of my career in data warehousing, I have observed any number of technological attempts to simplify the process, from building the BI layer directly on the transaction system, to so-called virtual data warehouses, back full circle to the current flavor of visualization tool build scripts.
As I discussed in Design Tip #162 Leverage Data Visualization Tools, But Avoid Anarchy, just say no! Unless you completely control all of your source data, you should leave ETL to the ETL tool, leave data storage and management to a relational database engine, and let the BI tools shine at what they do best: great visualizations and user experience.
- It’s all about the business.
I say this many times during classes and consulting. It’s the most important characteristic of the Kimball Lifecycle method: practical, uncompromising focus on the business. It infuses everything we do, and it’s the single most important message to carry forward.