Building Corporate Memory Into a Next-Generation Data Warehouse

It is now possible to design and implement a corporate memory within the data warehouse using a number of mature, tested and well-understood products and methodologies that can be deployed relatively quickly and administered with minimal DBA overhead. These solutions can grow with relatively linear scalability in terms of both cost and performance, while providing powerful support for both power analysts and reporting users. The ingredients for a successful data warehouse implementation that makes use of the corporate memory concept involve hardware, software and architectural design components, as listed below:

Arthur Ritchie
  • Hardware:
    • Cheap but powerful SMP hardware
    • Reasonably priced, fast and efficient SAN devices
    • Reasonably priced, fast and effective networks and switches for linking multiple SMP boxes together.
  • Architecture:
    • Nearline data storage (not archiving) to hold the detail data used to build and maintain aggregates and cubes. There is a very clear differentiation between the Nearline 2.0 and archiving approaches to data and performance management.
    • Database federation techniques enabling reduction of the amount of data in key RDBMS “hot spots” and minimization of batch window requirements
    • Well-defined uses for aggregation, indexing, and MOLAP cubes to support reporting.
  • Software:
    • Column-based data management technologies for avanced analytics, offering:
      • The ability to change data models “on the fly” to meet emerging requirements
      • Support for very wide tables (tens of thousands of columns), enabling very large numbers of KPIs
      • The ability to add new data types “on the fly”
      • The ability to allow existing applications to continue to work as data models change over time
      • The ability to present all available data in a simple, easily usable format, eliminating the need for analysts to navigate complex data relationships and slowly changing dimension constructs.
    • Better exploitation of new hardware and architectural techniques by traditional RDBMS systems
    • The ability to support massively parallel operations, allowing users to iteratively run very complex queries to find patterns in the data without having their thought process interrupted, without impacting other power users and other reporting users, and without interfering with the ability to meet critical SLAs.
Together, these components can work within a next-generation data warehouse design that supports standard reporting while also giving expert analysts the ability to access the raw “truth” of the original detail data. This makes it possible for the organization’s most creative thinkers to extend their thought processes “outside the box”, to challenge existing dogmas and provide alternative and hopefully more useful versions of “the truth”. In my subsequent posts, I will look more closely at the details of these various architectural components and elaborate on how each of them can work to contribute to a powerful, flexible, and efficient decision support infrastructure.

About SAND

SAND Technology provides scalable enterprise software and best practices for storing, managing, and accessing all your data, on-demand. SAND/DNA includes cost-effective nearline data access and high-speed, column-based analytics, aCRM, and specialized extensions designed to lower TCO and improve operational performance for SAP NetWeaver BI, IBM DB2, Microsoft SQL Server, Oracle, SAS, and more. SAND has offices in the United States, Canada, the United Kingdom and Central Europe, and can be reached online at www.sand.com.