« Consulting Terms of Engagement: a Readiness Checklist | Main | To Infinity and Beyond! — Healthcare in the Cloud »

April 22, 2010

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5518fa10688340134800a1dee970c

Listed below are links to weblogs that reference Data Profiling with SQL is Hazardous to Your Company’s Health:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Stephen

Good post. I find it difficult to convince clients mid-project that data profiling is worth their time (and my expense). I can't imagine too many of them getting data profiling religion and buying a software license.

I think that you're completely right, but lamentably many organizations just don't get it.

Perhaps that will change soon.

Stephen,

I definitely agree that data profiling is an essential activity. It is one of the commonly overlooked aspects of enterprise information initiatives -- sometimes even by data quality projects.

Data profiling tools can definitely help with the required analysis, and there is significant diversity in the vendor offerings, including some excellent open source alternatives that can alleviate the associated costs.

However, I would argue that sometimes non-tool options such as SQL queries can occasionally be a legitimate approach because although a data profiling tool can definitely help you automate repeatable processes for some of the grunt work needed to begin your analysis, it is important to remember that the analysis itself can not be automated.

Subject matter expertise and business knowledge of the data's meaning to organization's tactical and strategic goals are integral to understanding the reports produced by data profiling, whether they were created by ad hoc SQL queries or a data profiling tool.

Therefore, I agree that implementing a robust data profiling system is an essential part of an effective data management environment. However, this can be one of the many areas where organizations believe that simply purchasing a tool satisfies the requirement -- which I know is definitely NOT the point you are making.

Best Regards,

Jim

@philsimon - At my last client, I was somewhat successful in pointing out that they were actually doing profiling in other contexts. When I pointed out the similarity, the reply was, "Oh, that's all you're talking about> That's easy!"

@jim - Point well taken. It's easy to sound exclusionary in 300 or so words, and SQL is great for quick-and-dirty analysis. I'm more interested in the ability to build a reusable library in a tool, which is more than a bunch of text files on a file share ;-)

Stephen,

I agree with you on both counts:

1. It is better to use a tool designed to make data profiling easier.

2. It is difficult to convince clients of this need.

I have found this to be the case, especially when there is no history or culture of data content profiling in the organisation.

The worst case are the classic lines from IT "The business is responsible for the quality of the data", and "love the data".

But then IT refuse to provide standard data quality metrics on what is actually in the data - Standard metrics that a data profiling tool could readily provide.

Sorry for the rant.... it's a pet hate of mine.

Yes - Love the data.... but first and foremost, one must "know the data".

Rgds Ken

I have the best luck positioning data profiling within the context of a specific initiative and/or use case, preferably with a couple of case studies showing how other projects have benefitted. Ideally, they've already attempted to do manual data analysis -- then they're ready to hear the message of using tooling for the stat-gathering and humans to focus on the results.

Subject matter experts (SMEs) are critical. Tooling is most effective when leveraged to drive focused conversations with the SME, preferably on only one data domain at a time.

Great blog posting and thanks for mentioning DataCleaner :-) You are absolutely right - while it is possible to go a long way with custom SQL scripts I find that using a tool with a wide set of predefined measures you will almost always find something that you would otherwise never think of asserting through an handwritten query.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

About This Blog

The consulting experts at Baseline Consulting are posting their fresh perspectives on buzzworthy news, vendor features, industry trends, and other hot topics.

Contributing Authors

Baseline on Twitter

    follow me on Twitter