Wednesday, December 30, 2009

My New Year Data Quality resolutions for 2010

Because we all know that if we write it down and publish it for all to see we will have a much better chance of success! So here they are:

  1. ‘Communivate’ quality issues to a much broader audience – no more suppressed results – ever!
  2. Establish employee Data Quality performance targets – for ALL employees. (This could be a multiyear task however since it’s a big challenge it will be lots of fun to do)!
  3. No more Mr. (Ms.) nice guy – those of you who know better yet keep causing Data Quality issues beware! I have stats on you! Fear me!! ...or maybe I’ll just nicely advise them that I can help them get ready for the soon to be established performance targets – see item #2.
  4. Establish measures and track success of how well our IT project processes are following the supposedly integrated information management protocols – I know it’s not my job to do but sheesh, nobody else is doing it dammit!
  5. Communivate results to the broad audience (see item #1)
  6. Obtain agreement from ‘informal’ data stewards (those that I have established relationships with after stalking for months on end...) to partner with us DQ folks and develop some agreed upon roles and responsibilities. (These will become part of the employee performance targets.)
  7. Take ‘quasi’ business sponsors to task for not doing what they said they would do (also part of the employee performance targets).
  8. And last but not least, I also hereby promise to swear less often, or perhaps more quietly, when hitting the roadblocks that stand in the way.

I can’t wait to reward myself...


‘We will open the book. Its pages are blank. We are going to put words on them ourselves. The book is called Opportunity and its first chapter is New Year's Day’. ~Edith Lovejoy Pierce~

Wednesday, December 23, 2009

All I want for Christmas is..

Dear Santa:

I hope you are well. I have been very good this year (don’t listen to the rumours) and would appreciate the following for Christmas:
1. A business sponsor – so when I find a problem that requires a business decision or communication I don’t waste 3 weeks (and I don’t come cheap!) trying to find someone who cares.
2. Data Stewards – who are measured on how they manage the data they are accountable for.
3. Integration of our Data Management Software – so we can match and enhance at the source of data input rather than performing it after the fact.
4. IT best practices – where every IT project includes data profiling.
5. Complete Business Requirements – these would include:
• The identification of the Stewards for the related data and processes
• Data Definitions
• Business Rules
• Governance processes
• Stakeholders
• Business Purpose of data
• Quality requirements for data
• Training requirements related to quality of data
6. Comprehensive Benefits Realization – where costs associated to managing data re-actively are identified
7. Corporate Training for new employees that includes Data Quality best practices
8. Accountability for those who know better yet still create bad data
9. Employee Performance Measures that include at least one objective related to quality of data
10. Effectively communicated and measured corporate policies and procedures related to quality of data

I know I’m asking for a lot, but if you could do this for me I promise it will improve the satisfaction of our customers, reduce our operating costs and make me one seriously happy camper.

Thanks very much,
Sincerely,
Jill


Wednesday, December 9, 2009

Ways to 'Communivate' your Data Issues

Part 1 of:
The
Purple Cow approach to Data Quality

Or: How to have fun while trying trying to jump data quality sponsorship hurdles
Or: how to use innovative communication tactics to reach your Data Quality objectives.


‘Communivate’ is a combination of the words communicate and innovate, and it means to communicate in an innovative way. Our team uses it a lot to describe how we get our message across. We are one of those insane teams (aka Sneezers ) who constantly push the boundaries of ‘appropriate’ tactics to get the job done, and are always coming up with new terms to describe our approaches. (Makes me wonder if coining new terms is a DQ thing?)


Background
I am responsible for implementing a Data Quality (DQ) program and I have no business sponsor. As a result, my team and I put an enormous amount of effort into achieving the following:
· Raising awareness
· Communicating poor DQ issues
· Stalking (did I say stalking?) I meant to say identifying and engaging business stakeholders
· Developing business cases to educate business and IT on best practices
· Getting buy-in


Essentially, we collect a lot of data and share it with whoever will listen. And because we don’t have that essential business sponsor, we need to communicate over and over (and over) the same messages to various stakeholders. It can get tiresome [insert shot of Dracula sucking enthusiasm out of lifeless body here] after a while…


So, here is one example of how we communivate,
Goal 1: Raise Awareness
The Strategy? Find a Captive Audience.
Since we don’t have a business sponsor we don’t have the same corporate tools to spread the word. Internal intranets, team portals and corporate newsletters are all off limits, so instead we took the message to the people. Because we’re sneezers, we wanted to push the boundaries and have fun. The team printed off screenshots of seriously bad data and posted them (under covert secrecy – more sneezer fun) on the doors of washroom stalls. You could not get a more captive audience than that.
The results? We ran this campaign 4 times over a 1 year period and by the end of the year our communivative strategy AND the message we were trying to achieve was mentioned by a Senior VP in a corporate communication, we received 57 positive (and I tree hugging negative) comments and another Senior VP asked us when our new campaign was going to start. (Ok, they still are not sponsoring us but they do like us, so one thing at a time..).


Awareness raised, goal achieved.

Sunday, November 15, 2009

Data Quality, My Story

I wish I could write more frequently. I have a lot to say and I'm always writing stories in my head. Every week our Data Quality program moves a little bit farther up that maturity ladder and I'd like to share what worked with the world! The problem is I struggle with how to take what happened, what is real, and write about it in a way that does not cross any of my organisations confidential boundaries. Which is why I struggle. Do I say this or that..will that get me in trouble..what ARE the rules in these days of social media?

Anyway, I've been inspired lately by some great
bloggers on the subject of Data Quality: OCDQ Blog, Jill Dyche, Dylan Jones, Steve Sarsfield...
Their stories are relevant, informative and personal, which is why I like to read them. And I do push the boundaries at work every single day. It's the only way to make change happen. And it has worked so far....So I will tell my story, and hope that maybe someone else will benefit just like I have. As Spock would say: "The needs of the many, outweigh the needs of the few, or the one".

Yep, bit of a Trekkie fan!

Tuesday, September 1, 2009

Data Quality Team Resources

First, let me apologize for the delay in getting this post out. The team was busy working on a big data clean up project that took most of the summer and a lot of resources. The good news is that the effort provided enough statistics to make a business case for process and system changes to improve the data going forward. One small step at a time!

In order to share what types of resources I believe are important for a successful data quality program, I'll need to provide a brief overview of what makes up our Data Quality Program.

Our program is based on recommendations that came out of an initial assessment of the maturity level of our data quality and include the following:

Data Profiling
For more information check out Wikipedia : Data Profiling

This is the most important part of a data quality program. It allows you to see where your data problems are AND track the progress of any improvements. Once you start measuring the quality of your data you should continue to measure it on an on-going basis.

Resource Requirements
You'll need an analyst with a strong analytical mind. I use the term analyst only because data quality resource profiles are so new, so analyst can also refer to business analyst or data analyst or information analyst. They need to be someone who likes to get the bottom of things and who sees every problem as a challenge to be overcome. They also need to be good at developing and writing business cases, as the results of your data profiling generally lead to a case for making changes to a process or system. They should be an excellent communicator (ok, everyone should be an excellent communicator!) and have good influencing skills...they need this to get the data extracts that some groups don't like to give up :)

Data Definition
Data definition is a simple idea: define your data so that creators, users and stakeholders understand the meaning and purpose of the data. Remember the definition of data quality: "the data is in a state fit for it's intended purpose"? And although the logic behind defining you data is simple, the effort to do this is sometimes the most challenging and time consuming. Have you ever tried to get everyone to agree to a definiton of a 'customer'? Not so easy. And yet without a definition, you have nothing to work towords in a data quality program. You simply cannot understand, improve, consolidate or convert the data without understanding the meaning.

How did we do this? We started with the data that most concerns everyone and that is the basic customer information: Name, address, segmentation, etc. We obtained the definitions from the data warehouse and we put them all together in a word document, posted it on a shared directory and sent links to everyone who we thought would be interested in this information. The results were as follows:
-People contacted us to advise us that a definition was out of date - good! We ensured the definition was updated.
-Business Analysts began to refer to the definitions as part of their requirements documentation - good!
-Business users began to refer to the definitons when discussing potential changes - good!
-New employees used the information as part of their training - good!

We then purchased a simply wiki tool, added the definitons and published this information on the corporate web site. That was in late 2007. Today we have a corporate wiki which contains over 1000 terms (definitions, business rules and purpose) and over 60 articles and help guides, it gets on average over 800 hits per month and has 16 contributers. The contributers are business users of the information who care about the accuracy and have agreed to particpate in the upkeep of the information - they are our future Data Stewards.

Our goal - to have every corporate term published in the wiki and have the information managed by stewards.

Resource Requirements
We have 2 types of resources to manage this function:
One is an excellent writer and editor - they should be excellent communicators and should write from a business perspective. Why? Because you need to use simple, everyday language that everyone can understand. For example, if your term is called 'blue sky', your definition or business rule information should say "why the sky is blue" rather than "diffuse sky radiation and its impact on colour perception". Their main responsibility is to review the information provided or contributed by others and ensure it is clear, easy to understand and formatted correctly. Kind of like a wiki editor.
The second is a business analyst with really good (and I mean REALLY good) sales skills. It's their job to extract the information out of people's heads, emails, documents, user guides, folders and systems. They also need to get users of the information to become contributers - a big change for non-social media types.

Those are the main functions of our program - and the most important. We also implement data quality improvement projects (see my apology at the top of the page), provide data quality related support, develop help guides and tips and tricks and perform manual and automated cleansing and enrichment using a data management software tool. For the next post, and I promise not to leave it so long, I'll identify some of our biggest challenges with not having executive sponsorship and how we overcame them using some purple cow techniques :)

Monday, March 16, 2009

Data Profiling - Basic Measures

Last week I talked about how data profiling is an important first step in getting your data quality program off the ground. Data profiling is just another term used to describe the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data.

There are a lot of robust data profiling tools available but you should be able to begin performing analysis using a common tool such as excel or access.


Here are the basic measures that we started with:

Company (or Account) Name
· Contains the word(s) “duplicate” or “out of business”
· Contains invalid characters

Company (or Account) Address
· Is blank
· Contains invalid characters
· Contains the words Address Not Known” or “unknown”

City
· Contains invalid characters and or 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
· Contains the word “various” or “unknown” or 'city Not Known'

Company (or Account) Annual Sales (or income)
· Is Blank
· Contains a numeric value less than 100$

Primary Contact First Name
· First Name is blank
· Contains the word(s) ‘Unknown’ or ‘Not Known’
· Contains the words Address Not Known” or “unknown”

Primary Contact Second Name
· Second Name is blank
· Contains the word(s) ‘Unknown’ or ‘Not Known’
· Contains the words Address Not Known” or “unknown”

Primary Contact Phone
· Is Blank

It’s important to keep in mind that you should be measuring against the business rules. For example, if a phone number is mandatory then you should not need to check if there are any blanks.

It is also important to understand the purpose for the data. For example, if the purpose of collecting the Company (or Account) address is to mail documentation then you should probably be checking to see if some of your address data contains Physical address information (such as 123 Main Street) vs. Mailing address information (such as P.O. Box 123).


The Initial Results

It doesn’t matter if you are a data geek or not, the results are always very interesting. Some of the statistics we identified as a result of our initial analysis were as follows:
47% of our address information was blank
10% of our address information was ‘Not Known’ or ‘unknown’
8% of our Company Names had ‘Duplicate’ in the name


What do your results tell you?

Your results tell you where you need to focus efforts to improve or even remove the data. If you are finding a lot of blanks (or null values) for your data then your company probably is not using the information collected and you may want to stop collecting it.


The Purple Cow (aka ‘how to stand out from the rest of the crowd’)

Since we didn’t have a sponsor and we had no idea who would be interested in the results we made a splashy ‘Did you know’ flyer that listed the results and posted them in the washrooms – on the doors of the stalls. Needless to say we got people’s attention...


Next Week
Next week I’ll talk about the types of resources we have on the team and how one of the most important attributes your team members can have is their ability to market the message.

Monday, March 9, 2009

From the Ground Up

I've been thinking about writing about Data Quality for a while now. I'm responsible for a Data Quality Program at a Crown Corporation and since the program started just 2 1/2 years ago a lot of progress has been made. What's the big deal you say? Well the program has no executive business sponsor (bad!), our team has been floating around different groups trying to find a home (bad!), there is no formal mandate (bad!) and yet despite all those no-No's we have made a lot of great progress! We are a team of enthusiastic creative thinkers who have broken the mold and achieved success due to 2 key strategies:
1/ Our program is based on Industry Best Practices
2/ Our methods for communicating and engaging others are very 'Purple Cow' - "In his book Purple Cow: Transform Your Business by Being Remarkable, Seth Godin says that the key to success is to find a way to stand out--to be the purple cow in a field of monochrome Holsteins.

The title: 'Data Quality - From the Ground Up', is just that. Implementing a successful Data Quality Program without an executive sponsor or a mandate starting from the ground up CAN work and the goal of writing this is to share these strategies in the hopes that what has worked for us will help others achieve the same success.

Getting Started - The Basics
This week I'll start with the basics; those industry best practices that are logical and do-able.
1/ Identify the important data
For us it was Customer type data and we started with the basics; name, address, city, province/state, country, phone, fax, email, website.
2/ Profile the important data
Data profiling is just another word for data analysis. Get an extract of the data and start with the most basic analysis. Is it complete? How much of it is blank?
3/ Find someone who cares
For us it meant someone in IT, as IT has known about data quality issues for a long time. Better, would be someone on the business side, but take what you can get.
4/ Communicate your results
Find a way, any way to communicate what you've found. Post the results on your intranet, send them in an email (interesting profiling results tend to get forwarded), or post them by the printers or water cooler. More information coming later on some of the 'purple cow' methods that worked for us.
5/ Define the data
Start gathering and documenting the basic definitions and business rules for the important data and share it. You would not believe how people will thank you.

This was our approach and we only used some basic Microsoft tools like excel for the profiling and word for the definitions. Today we have an enterprise data management tool for the profiling and a wiki with over 900 corporate definitions. Not bad for 2 1/2 years.

There are a lot of other things that can and need to be done as well, but the methods described above is a good start and you can't go wrong.

In the coming blogs I'll talk about the results of each of the above and explain how we logically progressed to where we are today. I'll also share some fun 'purple cow' stuff that we've done with amazing results.

Next Week: What did we find when we profiled our important data and what did we do about it?


Thoughts to ponder: Be a yardstick of quality. Some people aren't used to an environment where excellence is expected.
~Steve Jobs~