Meeting the data-quality challenge

Ask any business or IT executive if high-quality data is important to the health of his organization, and the answer will be, Of course. But each day, organizations around the world use applications and databases filled with inconsistent, outdated or inaccurate data.

Companies routinely suffer from an inability to answer basic questions such as, How many customers do we have? Who are our best customers? Which of our products provide the greatest profit margin? and What do we purchase, in what quantities and from whom? Chances are, the answers to most of these questions reside within a system or, more likely, within several systems. But when companies rely on multiple applications to find the answer, they re likely to get conflicting answers. One traditional approach to improving an organization s data, often referred to as data quality or data cleansing, usually takes place while developing a data warehouse or when consolidating data from legacy systems into a new enterprise application. For example, consider a manufacturer building a data warehouse to serve as a single repository for business-critical data. As a first step, the company might implement a broad data-quality program to fix problematic information before it reaches the data warehouse. It may inspect the data with a data-profiling tool. Then, it might use traditional data-quality or -cleansing technology to standardize data and correct errors. Next, a data integration or consolidation phase would identify and resolve duplicate information across sources. Finally, a data enrichment phase would allow the company to add additional value to records, such as demographic data, geographic details or product specifications. Once completed, the manufacturer would ostensibly have a data warehouse full of solid, reliable data. But now, almost like clockwork, a new challenge emerges. Data entering the warehouse from that point forward -from partners, employees or customers- often fails to meet the established standards. Some companies then turn to data monitoring. In the data warehousing example above, a data monitoring regimen might: Detect problems with incoming data by validating it against established business rules Generate alerts that flag problematic data as it enters the system Identify trends in data quality showing when validity of the data starts to decline Data monitoring uses business rules and metrics, developed jointly by the IT department and business users, to serve as the controls for ongoing maintenance of data integrity. These are often the same rules as are used during initial data-profiling and -cleansing initiatives. Business users know the necessary data-quality parameters to meet business needs and can work with IT to set monitoring tasks that will enforce established rules. Like any technology implementation, though, the effectiveness of a monitoring regimen is only as good as the organizational effort and process methodologies behind the initiative. There are a few specific pitfalls to be aware of that can foil any data monitoring effort: Lack of codified business standard. To enforce business rules through monitoring, an organization must have a set of data standards for corporate information. For example, a company may choose to store organization names as full legal names, such as DataFlux Corp. instead of DataFlux. Having a codified set of rules makes data monitoring possible because users know what to enforce when establishing data-quality controls. But these rules must be consistent and agreed upon across lines of business and among business and technical users. Disconnect between business and IT while creating metrics for monitoring. While IT departments know how the data is stored and linked, the consumers of data i.e., line-of-business employees who rely on data to make decisions- know what the data should look like. The two sides must work together to create a meaningful set of control metrics within the existing IT environment. Without collaboration, any data monitoring project will fail to yield measurable results because the metrics won t reflect the needs or pains of the business users. Using data monitoring as an excuse to ignore the root cause of a problem. Companies may find groups creating data that isn t in line with company standards. For example, one company ran a data monitoring routine on the phone number field in its customer database and learned that many of the fields contained only one digit. The data entry staff was taking a shortcut in order to move to the next screen more quickly. Through data-quality technology, the company was not only able to fix the phone number information based on information it already knew about the customer, but it was also able to change the way it measured and rewarded its data entry clerks, focusing more heavily on accuracy than speed. So, in this instance, data monitoring provided management with a way to diagnose and treat a process-based problem. Because data is a fluid, dynamic resource, data quality isn t a one-time, fix-it-and-forget-it practice. Building and keeping good corporate data takes constant vigilance. Source: www.computerworld.coma>