Set the stage with data preparation

Data preparation is the core set of processes for data integration. These processes gather data from diverse source systems, transform it according to business and technical rules, and stage it for later steps in its life cycle when it becomes information used by information consumers.

ETL tools Extract - Transform - Load- have been around for more than a decade, have grown significantly in functionality, and are offered by many vendors. When talking to anyone in the industry or reading articles, you would assume that everyone is using these tools on every data warehouse project however, Gartner states, ETL tools are not commonplace in large enterprises. The majority of data warehousing implementations still rely on custom coding to deliver their ETL processes. Data preparation is not just about extracting and loading data. Ensuring data quality is often a critical activity that is required for the business to effectively use the data. This activity is always oversimplified and underestimated. It is often the activity that slows your project or, if done inadequately, it is the activity that causes the most problems with the business users assessment of the usefulness of the data. As mentioned earlier, business requirements and expectations need to be developed with the business users to establish the critical success factors in this area. Most people consider data quality to be checking for error conditions, data format conversions and mappings, and conditional processing of business rules based on a combination of dimension values or ranges. These types of data quality processes can be adequately performed with an ETL tool if planned for and designed into the overall data-preparation architecture. However, a more important aspect of data quality is data consistency rather than simply error detection or data conversion. Data may be valid in its transactional form when brought into a data warehouse, but become inconsistent and incorrect- when transformed into business information. Data preparation needs to take into account these transformations and ensure data quality and consistency. Tools alone will not enforce this consistency unless it is designed into the processes. There are some instances of data quality that can be handled by tools. These areas are often categorized as data cleansing tools and manage a variety of complex data quality issues that many companies encounter. A significant example is customer names and addresses. A customer or supplier name person or company- may be input in different variations such as IBM, IBM Corp. or International Business Machines Corporation. Many companies dealing with large customer lists, for example, have purchased specialized data cleansing software. Another example of data cleansing is the householding that financial institutions perform where they link household or family members personal and business accounts both for their customers convenience, as well as for their own convenience in promoting a full range of their services to their customers. There are other examples of industry-specific data cleansing tools that may be appropriate to use in addition to the ETL tools based on the business requirements relating to data quality. Data cleansing tools are important tools in many companies arsenals in ensuring data quality and consistency. However, too often, many in our industry equate data cleansing tools with data quality. While the data cleansing tools provide data preparation processes for data quality, they are not the only processes that are needed in that area. Many see the data cleansing tool as a silver bullet. Others feel if they cannot afford those tools or they are not applicable to their situation-, then they can forget about data quality. You can never forget about data quality, and you have to build data quality processes throughout data preparation. Source and full article: www.dmreview.coma>