By Frank Dravis, Senior Consultant
It’s counter-intuitive how something as simple as a lookup table can drive a data governance initiative. At one client site I worked at, they had a reference table called Customer Level Code. In that table, there were nine records containing values such as Premier, Platinum, Invalid, etc. The table included a description column for the tier and a corresponding lookup code, but that was it. For years, this little table sat on the legacy data warehouse with no true source system from which to track its origins or authorized editors. From time to time, a code was added, such as Not Required. The person who told the mainframe DBA to add the value was an IT manager who worked with the business and saw the need.
Fast forward to the time of building a much better data warehouse. The architects, in their efforts to satisfy the business requirements, accumulated a list of over 50 such reference tables. These tables spanned the enterprise. They were referenced by many applications, and encompassed everything from employee pay grades to merchandise categories. In order to do things right, the new data warehouse needed a defined source for each data domain and table it would load, and for a large number of the reference tables, there was either no single source or defined owner or even an electronic repository to hold the original values. They were written in a notebook.
To wrap our arms around all of the reference tables, a solution was implemented where the contents of all 50 tables was loaded into one table in the landing area for the new data warehouse. This new table included columns such as change date, who authorized the change, previous value, etc.; in other words, all the rich metadata to describe how the code values were managed. Better still, this single table was published as the source of record for those tables that had no previous source system. The owners of specific records were identified, and standard extracts were written to pull data from those source systems that did exist. Taken in isolation, maintaining the Customer Level Code table was a small, informal activity below the data management radar. But when the cumulative effect of all 50 tables was considered with their errors, latency, and lack of linkage, a substantial benefit was gained by managing them as a corporate resource. That single table housing all code values aided the implementation of a data governance initiative by: identifying gaps in ownership; establishing a central repository for management, common access routines, standardized interfaces; and better yet, uniform change management for policies and permissions.
So the next time you are working with a code table, ask yourself how many more are there like it, and wouldn’t it be nice to have one central place to go for every reference field you ever needed?
photo by p_a_h via Flickr (Creative Commons License)

Excellent post Frank,
Code tables are often the Holy Grail of a company’s collective business intelligence.
Perhaps, we could hire Dan Brown to write “The Da Vinci Code Table” – symbologist Robert Langdon racing through the corporate catacombs in search for the missing cryptex of Leonardo Da Vinci, founder of the Priory of Metadata, the secret society responsible for maintaining enterprise data lineage…
Or we could just follow your great advice and build a central repository of code tables, which would be more realistic, much more useful, and is far too often overlooked.
Best Regards,
Jim
Posted by: Jim Harris | December 03, 2009 at 10:15 AM