Keeping the “Type A” Data Warehouse on a Leash

We have all encountered at least one of those “Type A” individuals in our working lives: overachieving, aggressive in every aspect of a project—scope, schedule and delivery—but apt to jump quickly into taking on a new challenge, sometimes before the previous one is complete.

Glen Leslie

I’m sure I am not alone in noticing some parallels between this description of the typically “overcommitted” Type A personality and the all-encompassing promises of contemporary data warehousing.

Data warehouses don’t necessarily end up with “Type A” characteristics because anyone wants them to be that way; they just get tapped by so many different user communities that they gradually become overcommitted as a result of ROI- and market-based delivery pressures. The face of a data warehousing project also often changes as the latest and greatest data warehousing technology is brought in.

Even under the best of circumstances – with an on-time and properly-scoped delivery of the first phase of a data warehouse project – the dreaded question inevitably arises:

“Mr. Data Warehouse Administrator, what if…?”

It is this question (indeed, it may literally be a demand for “what-if analysis”) that drives the data warehouse into “Type A personality” mode, if the stack of change requests on the administrator’s desk hasn’t already put it there. What is this “what if” factor?

K.I.S.S and the Kitchen Sink

By way of providing some background, allow me to indulge in a moment of nostalgia. In the 1990s, I learned about a project management method called K.I.S.S., standing for “Keep It Simple, Stupid” (or some variation on that theme). I understood this to mean that you should manage your project scope to keep things simple: start small and then grow. However, ten years or so later it seems that K.I.S.S. in data warehousing has morphed into the “Kitchen Sink Syndrome” – when it comes to data, keep everything including the kitchen sink! And this is where the “what if” question comes in: what if we need this data… or that data… or data from 5, 7 or 10 years ago—where are we going to keep all of it? In this case, “what if” all too often ends up meaning “can you keep this data in the data warehouse just in case I might need it at some point in the future?”

Until now, the solution to this problem has simply been to add more storage – since, after all, storage is cheap. The problem with this approach is that it assumes that prices for desktop (or “commodity”) storage are reflected in large-scale implementations in the corporate world. Unfortunately, for companies with large-scale data warehouses, storage is one of the areas where TCO has actually been growing rapidly. The reality is that it can easily cost upwards of $150,000 per terabyte per year when you factor in associated administrative, facility and utility costs.

Nearline Therapy

SAND/DNA Access, a software-based nearline storage repository, represents a solution that significantly reduces TCO while still allowing users transparent access to data (even though it is highly compressed). SAND/DNA Access offers data administrators the ability to move massive volumes of data into “nearline” storage. This still involves disk storage, but generally only requires 3% to 10% of the original footprint.

“Nearline” means that the secure repository keeps data accessible to the user’s SQL query tool of choice. Good performance is maintained by searching in the compressed data and then decompressing the result set, isntead of first uncompressing all data and then searching within it. As SAND/DNA Access tends to be much more CPU-intensive than I/O-intensive, it can leverage advances in multi-core and 64-bit processor technologies without requiring an expensive disk subsystem. And because the data is highly compressed, the repository is much more easily managed using file system-based backup/recovery tools and lends itself readily to disaster recovery (DR) options.

SAND also offers packaged integration with a number of other software solutions (for example SAP, Oracle, IBM DB2 UDB, Microsoft SQL Server, etc.) to allow transparent controlof this functionality via other vendors’ user interfaces.

So if you know a Type A data warehouse, SAND/DNA Access might be just what the data warehouse doctor ordered to keep it manageable in the face of exponential data growth.

Glen Leslie

About SAND Technology

SAND is an international provider of intelligent information management software. The SAND/DNA product suite scales to help any size enterprise cope with exploding data requirements, now and into the future. SAND/DNA Access allows for retaining all potentially relevant data in a tiny footprint while providing instant access to just what's required. SAND/DNA Analytics allows for complex what-if analysis to meet any planned and unplanned business need.

SAND/DNA solutions include CRM analytics, and specialized applications for government, healthcare, financial services, telecommunications, retail, transportation, and other business sectors. SAND/DNA has achieved "Certified for SAP NetWeaver" status and SAND Nearline Integration Controller has achieved "Powered by SAP NetWeaver" status.

SAND Technology has offices in the United States, Canada, the United Kingdom and Central Europe.