Data warehouses have now been with us for over a decade and will be with us for many years to come. However, with virtually all business data now created electronically and the price of storage dropping like a rock, a new type of business intelligence repository is starting to appear - the process warehouse PW-.
As we all know, a data warehouse is a data repository built from the get-go for business intelligence. The seminal idea was to pull important data from operational systems, transform it and put it into a database tuned for queries. The benefits would be several - transaction systems wouldn t take a performance hit due to end-user queries, and the business would have a well-ordered history which would give it guidance for the future. The initial idea has been refined over the years - that data warehouses should be smaller, or geared toward a single subject data marts- that corporate-wide data models are essential that the refreshing of data warehouses should be increased to make them more real time. The list goes on. But one area that has received little refinement is the type of data that should be stored. As a general rule, data warehouses store end counts, rather than process checkpoints - for example, total units shipped in a month, rather than a unit identified by serial number- tracked through the milestones of assembly, QA, packaging and distribution. Data warehouses have stored the corporation s end result, largely due to regulation and the fact that the end result is easier to pin down. The SEC wants to know a company s final revenue and net income, not its internal workings. In addition, getting internal checkpoint numbers into the data warehouse was usually difficult, if not impossible. For example, a lot of manufacturing companies knew when a product kit was picked and when it turned up in the finished goods warehouse, but couldn t always figure out where the being created product was as it moved through the factory. Put simply, there were a lot of digital data dead spots within business. However, that is changing. Compared to a decade ago - when typewritten memos resided in file cabinets, not every company had e-mail and PCs were still relatively expensive - virtually everything is either digitally created or tracked. Memos are written in Microsoft Word messages are sent via e-mail workflow systems manage processes bar code readers and point-of-sale terminals track goods and sales. RFID tags are making it even easier for companies to track goods throughout the supply chain. In short, tracking data for virtually every business process is now available, and - just as importantly - companies can now afford to store it. Hitachi Global Storage Technologies just announced that it was testing 1TB disk drives for desktop PCs. These technological changes mean that companies can now store process checkpoints in a repository and analyze them to better understand and streamline the business. Rather than being satisfied that they got the product out the door, companies are now asking, It took us 15 days to build that widget. Why? An example might make the difference between a data warehouse and a process warehouse clearer. Web analytics is the discipline of using clickstream data to analyze Web site visitor behavior and value. Virtually every visitor s click on a Web site can be tracked Web analytics takes those clicks and generates metrics such as visit length, the visitor s path through the site and customer lifetime value. Five years ago, although the individual clicks were initially captured, they were often thrown away after being summarized in a data warehouse. Disk space was expensive and once again, the end counts - total clicks, total visits - were what enterprises wanted. Forward-looking companies stored clickstream history for a year most companies especially those in the midst of dot.com mania- heaved it after six months. However, that practice has changed as time has gone on. Summary data is being stored longer, and individual transaction data is now being stored, rather than being deleted. E-commerce companies recognized the seasonal behavior of online customers and began storing years of data to gain better insight into future behavior. Over time, companies realized that individual visitors had been coming to their sites for years, and the thought of chopping off that visitor s history due to archiving rules became unacceptable, especially with the continued drop in storage prices. At this point, online savvy companies worry more about checkpoints than they do about counts. The race is on to figure out if shrinking the checkout process by one step will increase sales generally, yes, but not always-. Enterprises have figured out that if they improve the process - e.g., make the site easier to navigate, easier to buy from - the counts total sales, dollar value per customer- will increase. Admittedly, Web analytics is a bit of a special case. The uniformity of the clickstream data makes it easier to analyze process checkpoints in the online world than the sometimes balkanized offline world. However, as more internal business processes are tracked, whether by workflow systems, barcode scanners, RFID tags or whatever, businesses will increasingly demand the ability to measure and optimize processes, no matter where they reside. And, since optimization of one step sometimes de-tunes something else, companies will want to check that they re doing better or worse - overall - than they were six months ago. A process warehouse may not yet be appropriate for your specific situation, but get ready. The PW is coming. Source: www.datawarehouse.coma>