First, let me apologize for the delay in getting this post out. The team was busy working on a big data clean up project that took most of the summer and a lot of resources. The good news is that the effort provided enough statistics to make a business case for process and system changes to improve the data going forward. One small step at a time!
In order to share what types of resources I believe are important for a successful data quality program, I'll need to provide a brief overview of what makes up our Data Quality Program.
Our program is based on recommendations that came out of an initial assessment of the maturity level of our data quality and include the following:
For more information check out Wikipedia : Data Profiling
This is the most important part of a data quality program. It allows you to see where your data problems are AND track the progress of any improvements. Once you start measuring the quality of your data you should continue to measure it on an on-going basis.
You'll need an analyst with a strong analytical mind. I use the term analyst only because data quality resource profiles are so new, so analyst can also refer to business analyst or data analyst or information analyst. They need to be someone who likes to get the bottom of things and who sees every problem as a challenge to be overcome. They also need to be good at developing and writing business cases, as the results of your data profiling generally lead to a case for making changes to a process or system. They should be an excellent communicator (ok, everyone should be an excellent communicator!) and have good influencing skills...they need this to get the data extracts that some groups don't like to give up :)
Data definition is a simple idea: define your data so that creators, users and stakeholders understand the meaning and purpose of the data. Remember the definition of data quality: "the data is in a state fit for it's intended purpose"? And although the logic behind defining you data is simple, the effort to do this is sometimes the most challenging and time consuming. Have you ever tried to get everyone to agree to a definiton of a 'customer'? Not so easy. And yet without a definition, you have nothing to work towords in a data quality program. You simply cannot understand, improve, consolidate or convert the data without understanding the meaning.
How did we do this? We started with the data that most concerns everyone and that is the basic customer information: Name, address, segmentation, etc. We obtained the definitions from the data warehouse and we put them all together in a word document, posted it on a shared directory and sent links to everyone who we thought would be interested in this information. The results were as follows:
-People contacted us to advise us that a definition was out of date - good! We ensured the definition was updated.
-Business Analysts began to refer to the definitions as part of their requirements documentation - good!
-Business users began to refer to the definitons when discussing potential changes - good!
-New employees used the information as part of their training - good!
We then purchased a simply wiki tool, added the definitons and published this information on the corporate web site. That was in late 2007. Today we have a corporate wiki which contains over 1000 terms (definitions, business rules and purpose) and over 60 articles and help guides, it gets on average over 800 hits per month and has 16 contributers. The contributers are business users of the information who care about the accuracy and have agreed to particpate in the upkeep of the information - they are our future Data Stewards.
Our goal - to have every corporate term published in the wiki and have the information managed by stewards.
We have 2 types of resources to manage this function:
One is an excellent writer and editor - they should be excellent communicators and should write from a business perspective. Why? Because you need to use simple, everyday language that everyone can understand. For example, if your term is called 'blue sky', your definition or business rule information should say "why the sky is blue" rather than "diffuse sky radiation and its impact on colour perception". Their main responsibility is to review the information provided or contributed by others and ensure it is clear, easy to understand and formatted correctly. Kind of like a wiki editor.
The second is a business analyst with really good (and I mean REALLY good) sales skills. It's their job to extract the information out of people's heads, emails, documents, user guides, folders and systems. They also need to get users of the information to become contributers - a big change for non-social media types.
Those are the main functions of our program - and the most important. We also implement data quality improvement projects (see my apology at the top of the page), provide data quality related support, develop help guides and tips and tricks and perform manual and automated cleansing and enrichment using a data management software tool. For the next post, and I promise not to leave it so long, I'll identify some of our biggest challenges with not having executive sponsorship and how we overcame them using some purple cow techniques :)