By Carol Newcomb, Senior Consultant
Diamond in the Rough: Data Quality
The third part of my summertime primer addresses Data Quality Analysis. Don’t even start a data quality analysis until you have completed the first two steps of your Root Cause Analysis--investigate & prioritize any potential causative factors, and start your metadata assessment. Otherwise, you may be misled by your findings.
Data Quality Analysis
A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive. Initially, investigation may be focused on single data elements or events. As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes. A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program. The goal is to design a data quality management lifecycle, as shown in this diagram:
Initial Data Quality Analysis Process
I. Define data scope- Determine data elements that are associated with or are direct results of the reported issue
- Check that all metadata definitions are present and current
- Enlist the involvement of the Data SME or Data Stewards
- Identify all source systems where the data originates, is entered or derived
- Extract the relevant data from all key source systems.
- Design the profile. A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.
- Profile the data to determine key characteristics that are contributing to the issue, such as:
- Wrong values
- Missing values
- Corrupt transformation processes
- Incorrect business rules
- Incorrect usage rules
- Summarize the key findings from the profile detail
- Determine what key drivers are contributing to the impact
- Determine accountability for the data quality issue
- Involve other Data Stewards in troubleshooting and designing the data quality solution
Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders. This monitoring process should be scalable based on the number of data elements being tracked.
- Corrective Action Plan
- Does scope of problem warrant change in metadata definitions, business practices or data entry rules?
- Does scope of problem warrant a data governance standard?
- Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?
- Preventive Action Plan
- This plan will be designed to minimize the probability of data quality issues from recurring
- Determine ‘early warning triggers’ based on designated thresholds. These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)
- If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan
- Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees
photo by Swamibu via Flickr (Creative Commons License)

Recent Comments