By Fernando Martinez Campos, Senior Consultant
In my last blog post I described a way to define BI metrics through business user interview techniques. Now it’s time to start investigating what data is needed to answer the business questions. This post discusses data requirements, a subset of business requirements gathering.
A fundamental step in defining data requirements is to build a logical data model (LDM) which reflects the natural way data relates to the business. Aside from interviewing users and building an LDM, another input involves analyzing the sources of this data: operational applications and departmental databases.
Data requirements should be a precursor to data source analysis and profiling. A mistake many analysts make is to assume that the source is good enough to run BI Analytics. Once they load the source, much to their chagrin they find “data surprises” that will make the analysis unreliable. So it is extremely wise to being data source analysis by interviewing the SMEs and ask these fundamental questions:
- Does the data contain duplicates?
- Is there occasional missing data in some of the fields?
- Is there a better source for this data?
- Is the data content recorded accurately and reflect the business event?
- Are the business rules defined when adding or changing the data?
- Is there a data owner responsible for fixing it?
I am always surprised how much users know about errors in their data. They are invaluable in pinpointing where the errors are in the sources. Your initial efforts in the data quality investigation will uncover these “data hot spots” from them. When you start profiling the data to obtain more accurate results on exactly where all the problems exist, you’ll have a great headstart With the data profiling results you will be prepared to proactively handle any unexpected problems with the data before you load the data.
Besides creating the LDM, you will also need to create a Business Requirements Document (BRD) that will have a data requirements section. These are the activities needed to build this section:
- Define the data with a precise meaning and content, in other words, the metadata
- Determine how often the data needs to be loaded
- Determine the cutoff time for the data
- Analyze audit and compliance considerations
- Define the performance requirements and response times in retrieving the data
- Determine if the data has special security and privacy needs
- Define the length of history needed for the data
Creating a BRD streamlines the design phase with these benefits. You can uncover early security, audit and performance considerations that will need to be done in the physical implementation. And data profiling helps you proactively minimize any data quality problems during the load process. Finally, creating an LDMprovides the foundation for the physical data model (PDM) which becomes the de-facto data structure for the tables on your data warehouse.
Fernando Martinez-Campos is a senior consultant with Baseline Consulting and expert in data architecture, data interchange standards, legacy coexistence strategies, and reference architectures templates for infrastructure, applications and databases.

Comments