Data “Dumping Grounds” and the Importance of Corporate Memory
Received wisdom about data warehousing instructs us not to create a “dumping ground” for our raw detail data. But why not? This principle is a legacy from the not-so-distant past when it was impractical to keep huge amounts of data around if it was not being actively used – so once aggregates had been built, the original details were simply discarded. Of course, this meant that the organization was then confined to working with a particular “version of the truth” that someone had imposed on the data; there was no way to revisit the original details should the need arise for a change of perspective.

Creating a “corporate memory” of this sort brings a number of major benefits:
- If it turns out that any or all of the original data is required for a new project (involving new aggregations, “cubes” or KPIs, for example), it is substantially less expensive and much easier to access it.
- If it becomes necessary to assess the value of the corporate data, this can be done carefully over a period of time using data mining techniques, rather than in a rush to meet specific project deadlines.
- The original detail data will only need to be handled once, and then can be retained as a permanent record for use in auditing the data warehouse if this becomes necessary (to prove compliance, to aid in conflict resolution, or simply to rectify “human errors” that were made at some point in the process)
- It becomes possible to document in the metadata layer who accessed the information, when and for what reason – incidentally, this can also provide valuable insight into the abilities of those who are responsible for manipulating the raw data.
In my next post, I’ll cover the ingredients for building Corporate Memory into a Next-Generation Data Warehouse.
About SAND
SAND Technology provides scalable enterprise software and best practices for storing, managing, and accessing all your data, on-demand. SAND/DNA includes cost-effective nearline data access and high-speed, column-based analytics, aCRM, and specialized extensions designed to lower TCO and improve operational performance for SAP NetWeaver BI, IBM DB2, Microsoft SQL Server, Oracle, SAS, and more. SAND has offices in the United States, Canada, the United Kingdom and Central Europe, and can be reached online at www.sand.com.