Moving Beyond the Data Warehouse Impasse - Part 2

How can we move beyond the roadblock I described in my previous post? Let’s consider a powerful and positive alternative vision of corporate data resources – let’s call it the “data un-warehouse” scenario. This can be thought of as a data management regime that is the inverse of the data warehouse: data is “deconstructed” down to its elemental structure, free of indexes, dimensional modeling and complex schemas, and retained in a manner that is cost-effective but still readily available for exploration using standard Business Intelligence tools. Such a concept would not be realized as a technology that would replace the data warehouse, but rather would guide development of a parallel infrastructure that would have the effect of keeping the data warehouse “honest” while offering an additional portfolio of powerful analytic processes.

Arthur Ritchie

The Data Un-Warehouse

The “data un-warehouse” concept focuses on the possibility of a data store of record that could be invoked on a regular basis to verify the assumptions underlying data warehouse structures, providing an audit trail that could reveal structural flaws and open up new areas of potential analysis. It also provides a secure read-only repository for the “historical truth”, immune to tampering and always accessible when questions about the validity of the transformed data in the warehouse arise.

The data warehouse was originally conceived as a data store geared to a particular purpose, with specific data being retained to meet defined analytical goals. Over time, however, most organizations have come to view the data warehouse as a general-purpose repository of company data that represents “the truth” in a pure, untouched form.

In practice, however, this is clearly not the case. The data enters the warehouse through a defined process, goes through another extract, transform and load (ETL) step, and is then analyzed by the various functional groups within the company – finance, marketing, sales, product research, and so on – who add considerable value through their various analyses of the data. Contrastingly, the data un-warehouse scenario centers on the maintenance of an archive of the data before analysis. If this discipline is not in place, the data will cease to exist in its original form.

Then and Now

In an article entitled Information Management: Charting the Course: The System of Record in the Global Data Warehouse (DM Review, September 2004), Bill Inmon (known as the “father of the data warehouse”), addresses this issue, noting that:

“Derived, summarized or aggregated data has its system of record in the granular data used in the calculation or derivation of the data. The system of record for summarized or derived data changes as often as the wind blows. The real issue is not where the system of record for summarized data or aggregated data resides, but where the system of record for the granular, detailed data resides.”

While conceptually the un-warehouse would have been a good idea at any time, until recently the task of setting up a parallel infrastructure would have been technically daunting and financially prohibitive. Today, advances in storage technology, archiving software and knowledge of the usage of the data warehouse have coalesced to make the idea not only possible but practical when one considers its potential for assuring accountability and identifying opportunity.

Arthur Ritchie

About SAND Technology

SAND is an international provider of intelligent information management software. The SAND/DNA product suite scales to help any size enterprise cope with exploding data requirements, now and into the future. SAND/DNA Access allows for retaining all potentially relevant data in a tiny footprint while providing instant access to just what's required. SAND/DNA Analytics allows for complex what-if analysis to meet any planned and unplanned business need.

SAND/DNA solutions include CRM analytics, and specialized applications for government, healthcare, financial services, telecommunications, retail, transportation, and other business sectors. SAND/DNA has achieved "Certified for SAP NetWeaver" status and SAND Nearline Integration Controller has achieved "Powered by SAP NetWeaver" status.

SAND Technology has offices in the United States, Canada, the United Kingdom and Central Europe.