Data Quality - Tom's Ten Data Tips

Data quality gives a competitive edge. Everybodythis is the first time that customer records of
agrees how important good data quality is. Anddisparate systems are merged. There is typically
everybody has been agonized by erroneous data.tremendous "fallout", and records that do get
We've all lost a lot of time working with crappy data,merged contain many inconsistencies. This then often
and "Garbage In, Garbage Out" is probably the mostleads to disappointed end-users, and unmet
commonly cited proverb in IT. Then how come it isexpectations.
always so hard to find volunteers to do something6. Data quality is a management issue, not a
about it?technology issue
Because the consequences of non-quality data areThe typical situation in the overwhelming majority of
propagated throughout the organization, oneorganizations I have visited is like this:
seemingly innocent problem upstream can easily- there is low awareness of the embedded cost of
cause a dozen problems downstream, andtheir data quality issues
sometimes even more! The accumulated costs of- management has no idea of the potential value in
dealing with the resulting errors can becomefixing data quality issues "upstream"
staggering. Tackling and resolving the issues that- those who have insight in data quality issues have
cause data quality problems is one of the mostlittle or no incentive in bringing these issues out
high-leverage investments a company can make, in aHence, the problems have a nasty habit of
world that is increasingly relying on digital information.perpetuating themselves. For sure, subordinates need
Why do these problems exist, and why do they liveto carry their weight and take responsibility. But
on? It often appears to be business misalignment ofnotice how far all three of these issues, essentially
the worst kind when many 'bystanders' realize therethe final responsibility for bringing these "unwelcome
are indeed data problems, but nobody "owns" thesesurprises" out in the open lies with management.
problems. This commonly recurring phenomenon liesWhat is the culture like in your company? My
at the heart of the omnipresent challenge to findexperience has been that managers may or may not
resources (both money and time) to overcome suchbe motivated to bring such issues out in the open,
data quality problems.sometimes depending on the time horizon they
1. What is data quality?consider for their own tenure.
Data quality is determined not only by the accuracy7. Manage data for what it is: a strategic resource
of data, but also by relevance, timeliness,Data is not merely a byproduct of business
completeness, trust and accessibility (Olson, 2003). Allprocesses, but something that has value beyond its
these "qualities" need to be attended to if a businessimmediate processes. Finding new uses for existing
wants to improve its competitive advantage, anddata makes it more valuable, at no capital investment!
make the best possible use of its data. Data qualityFuture changes to the way the data are to be used
implies its fitness for use, including unanticipatedcannot be predicted, yet are guaranteed to happen!
future use. Accuracy takes up a special placeThis proliferation of data usage needs to be
because none of the others matter at all if the dataanticipated, and calls for flexible data models. Good
is inaccurate to begin with! All other qualities can bedatabase design is resilient in the face of
compromised, albeit at your peril.unanticipated changes. This means flexibility in
2. Data non-Quality is expensivehardware/infrastructure on the tangible side (avoid
"Reports from the Data Warehousing Institute onvendor or platform lock-in). On the intangible side,
data quality estimate that poor-quality customer datayou want to avoid aggregating or any other data
costs US business a staggering $611 billion a year incommitments that can not be reversed within the
postage, printing and staff overhead" (Olson, 2003).data scheme. It is fundamentally impossible to find a
There are many ways in which non-quality data cangeneric "right" way to aggregate inconsistencies in
cost money: typically these costs remain largelydata. That is why flexibility calls for late commitments
hidden. Senior management either doesn't noticein the data model.
these costs, or even more likely: is grappling with8. Higher quality data lead to far more flexibility for
problems of which it never becomes clear that theyyour corporate strategy
are caused by poor-quality data.Fast access to accurate data not only gives a
3. Quantifying the cost of non-quality is verycompetitive advantage. What is even more important
importantis the flexibility such companies enjoy in adjusting to
Since data quality has such a strong tendency to gochanges in market conditions. So over time, as
unnoticed, it is even more important to translate themarket changes will occur, the gap with the
consequences of poor-quality data to the onecompetition can grow even further. Also, changes in
dimension each and every manager understands solegislation or market regulation can be much more
well: dollars. This also gives a perspective on the kindseasily exploited and turned into an opportunity rather
of investments that are appropriate to make in orderthan 'suffered'.
to resolve such issues. Also, a mechanism for9. Data quality improvement is a process, not an
prioritizing improvement programs is desirable. Youevent
want to begin picking the low-hanging fruit first, butIn many ways, one can draw parallels between Total
you certainly also want to know where theQuality Management efforts, and the issues
whoppers are! According to Gartner, Fortune 1000surrounding data quality. The Japanese use a word
enterprises may lose more money in operational"Kaizen" that denotes both an incremental
inefficiency due to data quality issues than theyimprovement method as well as a philosophy. What is
spend on Data Warehouse and CRM initiatives.crucial is that it's an on-going, never-ending effort to
4. Data quality issues typically arise when existingkeep raising the bar. Data quality is never "perfect"
data are used in new waysas every new application of existing data is likely to
In my experience as a data miner, where I am verybring up new issues. And the proliferation of data
often looking for new ways of using existing data,usage is not ending any time soon. So data quality
this is where many problems originate. The data itselfissues are guaranteed to stay with us for a while.
hasn't changed, but it are new uses for existing data10. Collecting data is only a few decades old
that make problems apparent that were alreadyNo wonder we're dealing with "growing pains". Few
there. So what constitutes "data quality" needs becorporations actually planned their data strategy, and
considered in relation to its intended use. And changetheir IT infrastructure grew in a time when data
of usage then brings up new ways to evaluate thewere being handled in silos. As data are being shared
quality and hence may bring up concerns. The reasonand warehoused increasingly, we need to think
these problems didn't surface before is usuallythrough the goals and objectives of the enterprise
because the business adapted to the data, the waywith regards to the data. This is all fairly new, and
they are. People and processes avoided thefew if any 'established' standards exist. A sort of
consequences of inaccurate entries. Which incidentally,'global plan' or 'road map' as to where and how to
is also why legacy system migrations can be soexpand on existing capabilities is a sound investment
painful.to manage project risks. Also, this 'road map' needs
5. Many CRM projects collapse under data qualityto conform to the existing IT strategy. Time and
issuesmoney will only be invested if project goals are in line
Gartner and Forrester have estimated that 60-70%with the overall corporate strategies. The road is
of CRM implementations fail to deliver onlittered with unsuccessful BI projects, many of which
expectations. That is not to say that these projectsstarted without a clear business case. A
are all abandoned halfway; it's foremost thatwell-conceived data strategy greatly leverages the
expectation aren't met. One of the biggest reasonsconsiderable investments that are needed to get the
for the 'technical' challenges in bring CRM projects tobest mileage from your data.
completion is that disparate data sources are gettingWe appreciate comments and feedback.
merged to create a 360° customer view. Often,