By Bob Wall, Senior Consultant
For the last several years I have been an avid fan of the television show “House”, an American medical drama. Each week the episode begins with an unsuspecting patient falling prey to some unknown and mysterious disease. They typically fail to receive an initial correct diagnosis, rendering their case as rare and very complex. Dr. House and his team of specialists then spend the remainder of the episode acting like the medical equivalents of Sherlock Holmes, trying to solve the root cause of the disease and discover a life saving cure. Throughout the episode, Dr. House is rude, obnoxious, condescending, and addicted to pain killers, and the team performs numerous ethically dubious procedures. For instance, one or more of the doctors often makes a visit to the patient’s home in search of clues that might suggest certain pathology. They enter the home with or without permission, sometimes breaking the law, to try to discover any underlying evidence.
Miraculously by the end of the hour show, the cure is almost always found; the patient lives, and House and his team are ready to solve the next big mystery the following week.
Wondering what this has to do with your data warehouse project? Well, imagine if you will, some slight changes in the show’s format and cast and maybe you will see the similarity.
In this episode a business manager mysteriously falls prey to a sick data warehouse environment and all indications are that it has something to do with bad data. No one can figure out the underlying cause and in fact, the manager has received several misdiagnoses. Just as the manager is ready to give up, Dr. House and his team of diagnosticians step in to assist. In this case House’s specialty is Data Management and the other doctors are specialists in Data Governance, Data Architecture, Data Modeling, Data Analysis/Profiling, and Data Quality.
The Data team first identifies the overall problem and then the symptoms:
Overall Problem
- Inability to understand source data and uncover the bad data disease, imperiling the data warehouse
Symptoms
- Data quality issues
- Multiple sources generating the same data
- Non-existent documentation of source systems
- Data models and file structures not understood
- Lack of knowledge about core data elements
- Mismatch between source system data and developers’ definition
- Analyzing multiple data sources for a high profile IT project
Next the team interviews all of the patient stakeholders (SMEs, Data Stewards, Business Analysts, IT, etc.) and makes a visit to the corporate home to gather evidence that might help in the diagnosis (business plans, data models, system documentation, ETL specs, etc.).
Now the team concurs on several possible reasons for the problem and runs several diagnostic tests, the most prominent being:
- Data Source Analysis – the purpose is to attain an understanding of what data is available and where it resides within the patient’s myriad of systems and applications.
- Data Profiling – the purpose is to attain an understanding of an organizations data validating data types, domain values, etc.
Finally, the Data team is able to understand the patient’s disease and dispense a cure for the data warehouse problem by addressing the data quality problems. This cure includes offering short term data corrective actions, ongoing incorporation of data assessment—a data profiling tool and analysis process—as a permanent part of their source system infrastructures, and a long term common recommendation for the creation of a new role, Source Data Steward, for systems that are either highly complex or in demand by other business applications.
So, just like the television show, a “House” call for a sick Data Warehouse Project from the Data team can be seen as analogous to the House television episode. However, there are several noteworthy differences: The lead “Data Doctor” in this case is never rude, obnoxious or addicted to drugs! The team never breaks into a building to gather documents and evidence, but always asks permission first! And, most importantly, there is no certainty that the cure will be found in only an hour episode!
Bob Wall is a senior consultant with Baseline Consulting. He is an information technology specialist with 30 years experience in all areas of data warehouse administration, data architecture, data resource management, training, and applications systems development, as well as in corporate management.

Comments