<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SAND News &#187; SAND Blogs</title>
	<atom:link href="http://www.sandmtl.com/news/category/sand-news/sand-blogs/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sandmtl.com/news</link>
	<description></description>
	<lastBuildDate>Thu, 04 Mar 2010 04:27:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Evolving the &#8220;Humpty Dumpty Warehouse&#8221; Into a &#8220;Phoenix&#8221;</title>
		<link>http://www.sandmtl.com/news/evolving-the-humpty-dumpty-warehouse-into-a-phoenix/</link>
		<comments>http://www.sandmtl.com/news/evolving-the-humpty-dumpty-warehouse-into-a-phoenix/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 15:43:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[SAND Labs]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=223</guid>
		<description><![CDATA[In my last blog post, I responded to Wayne Eckerson Wayne’s World Blog for TDWI, which revisited the dilemma of the “Humpty Dumpty Warehouse”. I suggested that the &#8220;Phoenix&#8221; might be a better model for modern enterprise data warehouses. Wayne continued the discussion in comments:

Arthur, you are right to suggest that the BI team needs [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://www.sandmtl.com/news/regarding-the-humpty-dumpty-data-warehouse-dilemma/">last blog post</a>, I responded to Wayne Eckerson <a href="http://portals.tdwi.org/Blogs/WayneEckerson/2009/09/Humpty-Dumpty.aspx">Wayne’s World Blog</a> for TDWI, which revisited the dilemma of the “Humpty Dumpty Warehouse”. I suggested that the &#8220;Phoenix&#8221; might be a better model for modern enterprise data warehouses. Wayne continued the discussion in comments:</p>

<blockquote>Arthur, you are right to suggest that the BI team needs to adapt to changes Phoenix-like rather than pick up the pieces every time the organization changes. I guess the Humpty Dumpty metaphor is not the best&#8211;albeit a lot of fun&#8211;unless the king&#8217;s men are using superglue to get Humpty back together again. Certainly, I&#8217;m a big advocate of adaptable DW and BI architectures. That&#8217;s a given I should have noted!</blockquote>

<p>Rather than superglue &#8212; though that sounds like fun! &#8212; last time I mentioned several key breakthroughs in information technologies that have matured to the point where a viable, flexible, Phoenix-like EDW can be created without taking a “rip and replace” strategy that would discard what has already been accomplished within the organization. </p>

<p>Because it maximizes existing investments both in products and people,  this represents a much more secure and cost-effective route than trading a well understood set of problems for a replacement technology that may well solve some problems, but will inevitably replace them with a variety of altogether new. These breakthroughs include the following:</p>

<ul>
<li>Enhanced data base federation capabilities from all the major RDBMS providers,  as well as from many Business Intelligence tool vendors like Business Objects.</li>
<li>Very high-performance, storage-efficient and massively scalable software-based Nearline 2.0 storage systems that can house the entirety of an organization&#8217;s structured detail data, federated with the primary RDBMS.</li>
<li>Very high-performance Column-Based Analytic Technology (CBAT) systems to support analytics for power users</li>
<li>Very inexpensive and powerful desktop computers with adequate storage
<li>Relatively inexpensive blade servers</li>
<li>Very high performance, efficient, automated ETL tools that can be used by the organization to set up and control the flow of data over time (including Disaster  Recovery support using the Nearline 2.0 storage architecture).</li>
</ul>

<p>It is now possible to integrate all of these subsystems into a single EDW architecture, resulting in a Scalable Corporate Information Factory (SCIF), to adapt Bill Inmon&#8217;s terminology). With this model in place, key BI analysts no longer need to focus primarily on transforming data to achieve a single version of &#8220;the truth&#8221; &#8212; in reality, an unattainable goal &#8212; to enable adequate performance, and more on on helping users derive real business value from corporate information by maximizing accessibility to &#8220;the facts&#8221; for the users who can provide essential business insights. </p>

<p>In subsequent posts I will explore this architecture in more detail.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/evolving-the-humpty-dumpty-warehouse-into-a-phoenix/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Regarding the Humpty Dumpty Data Warehouse Dilemma</title>
		<link>http://www.sandmtl.com/news/regarding-the-humpty-dumpty-data-warehouse-dilemma/</link>
		<comments>http://www.sandmtl.com/news/regarding-the-humpty-dumpty-data-warehouse-dilemma/#comments</comments>
		<pubDate>Sat, 26 Sep 2009 14:25:58 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=195</guid>
		<description><![CDATA[Wayne Eckerson, on his Wayne&#8217;s World Blog for TDWI, revisits the dilemma of the &#8220;Humpty Dumpty Warehouse&#8221;:

Most organizations are like Humpty Dumpty teetering and tottering on top of a big wall. With the slightest gust of wind, Humpty crashes and breaks into dozens of pieces. And DW teams are “all the king’s horses and all [...]]]></description>
			<content:encoded><![CDATA[<p>Wayne Eckerson, on his <a href="http://portals.tdwi.org/Blogs/WayneEckerson/2009/09/Humpty-Dumpty.aspx">Wayne&#8217;s World Blog</a> for TDWI, revisits the dilemma of the &#8220;Humpty Dumpty Warehouse&#8221;:</p>

<blockquote>Most organizations are like Humpty Dumpty teetering and tottering on top of a big wall. With the slightest gust of wind, Humpty crashes and breaks into dozens of pieces. And DW teams are “all the king’s horses and all the king’s men” who are charged with putting Humpty Dumpty back together again.</blockquote>

<p>Whether we’re talking about “Humpty Dumpty” in terms of the enterprise as a whole, or the data within a given warehouse, agreed &#8212; DW teams are doing the best they can with what they have. But often so are the CEOs, who are facing battles in the boardroom, battles between the shareholders, the company’s bankers, the boards of directors, the various C-Levels within the organizations and some of their powerful subordinates. Never mind vacuums created when key executives leave, or when mergers and acquisitions, divestitures, etc. change the nature of the business.</p>

<p>Unfortunately, most current Data Warehouses are built in such a way that this Humpty Dumpty dilemma will repeat itself over and over again. The real dirty little secret is that the same tricks used to make DWs efficient for reporting purposes (aggregation, indexing, and the subsequent discarding of underlying details) are the ones that make them difficult — and expensive — to change and update. </p>

<p>So what’s the answer? First, we must realize that there is no such thing as a “single version of the truth” but merely a convenient and workable one.  Next, we must break out of the “Humpty Dumpty” dilemma and its tragic ending and find a better story &#8212; a better model, like the “Phoenix” that can “rise from the ashes” overnight to meet all the new KPI’s to support the business needs.</p>

<p>The good news is that technological developments in <a href="http://www.sandmtl.com/news/tag/nearline-20/">Nearline 2.0</a>, RDBMS federation
capabilities and high-performance ETL tools offer a way for companies to transition from “Humpty Dumpty” to the new “Phoenix”-like approach &#8212; without resorting to a “rip and replace” strategy. </p>

<p>My next post will explore these ideas further.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/regarding-the-humpty-dumpty-data-warehouse-dilemma/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building Corporate Memory Into a Next-Generation Data Warehouse</title>
		<link>http://www.sandmtl.com/news/building-corporate-memory-into-a-next-generation-data-warehouse/</link>
		<comments>http://www.sandmtl.com/news/building-corporate-memory-into-a-next-generation-data-warehouse/#comments</comments>
		<pubDate>Fri, 22 May 2009 13:00:46 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Arthur's Blog Next Generation DW]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=169</guid>
		<description><![CDATA[It is now possible to design and implement a corporate memory within the data warehouse using a number of mature, tested and well-understood products and methodologies that can be deployed relatively quickly and administered with minimal DBA overhead. These solutions can grow with relatively linear scalability in terms of both cost and performance, while providing [...]]]></description>
			<content:encoded><![CDATA[<p>It is now possible to design and implement a corporate memory within the data warehouse using a number of mature, tested and well-understood products and methodologies that can be deployed relatively quickly and administered with minimal DBA overhead. These solutions can grow with relatively linear scalability in terms of both cost and performance, while providing powerful support for both power analysts and reporting users. The ingredients for a successful data warehouse implementation that makes use of the corporate memory concept involve hardware, software and architectural design components, as listed below:</p>

<p><span id="more-169"></span>
<div class="right"><img src="http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png" alt="Arthur Ritchie" /></div>
<ul>
    <li>Hardware:
<ul>
    <li>Cheap but powerful SMP hardware</li>
    <li>Reasonably priced, fast and efficient SAN devices</li>
    <li>Reasonably priced, fast and effective networks and switches for linking multiple SMP boxes together.</li>
</ul>
</li>
    <li>Architecture:
<ul>
    <li>Nearline data storage (not archiving) to hold the detail data used to build and maintain aggregates and cubes. There is a very clear differentiation between the Nearline 2.0 and archiving approaches to data and performance management.</li>
    <li>Database federation techniques enabling reduction of the amount of data in key RDBMS “hot spots” and minimization of batch window requirements</li>
    <li>Well-defined uses for aggregation, indexing, and MOLAP cubes to support reporting.</li>
</ul>
</li>
    <li>Software:
<ul>
    <li>Column-based data management technologies for avanced analytics, offering:
<ul>
    <li>The ability to change data models &#8220;on the fly&#8221; to meet emerging requirements</li>
    <li>Support for very wide tables (tens of thousands of columns), enabling very large numbers of KPIs</li>
    <li>The ability to add new data types &#8220;on the fly&#8221;</li>
    <li>The ability to allow existing applications to continue to work as data models change over time</li>
    <li>The ability to present all available data in a simple, easily usable format,  eliminating the need for analysts to navigate complex data relationships and slowly changing dimension constructs.</li>
</ul>
</li>
    <li>Better exploitation of new hardware and architectural techniques by traditional RDBMS systems</li>
    <li>The ability to support massively parallel operations, allowing users to iteratively run very complex queries to find patterns in the data without having their thought process interrupted, without impacting other power users and other reporting users, and without interfering with the ability to meet critical SLAs.</li>
</ul>
</li>
</ul>
Together, these components can work within a next-generation data warehouse design that supports standard reporting while also giving expert analysts the ability to access the raw &#8220;truth&#8221; of the original detail data. This makes it possible for the organization&#8217;s most creative thinkers to extend their thought processes &#8220;outside the box&#8221;, to challenge existing dogmas and provide alternative and hopefully more useful versions of &#8220;the truth&#8221;. In my subsequent posts, I will look more closely at the details of these various architectural components and elaborate on how each of them can work to contribute to a powerful, flexible, and efficient decision support infrastructure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/building-corporate-memory-into-a-next-generation-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data &#8220;Dumping Grounds&#8221; and the Importance of Corporate Memory</title>
		<link>http://www.sandmtl.com/news/data-dumping-grounds-and-the-importance-of-corporate-memory/</link>
		<comments>http://www.sandmtl.com/news/data-dumping-grounds-and-the-importance-of-corporate-memory/#comments</comments>
		<pubDate>Mon, 04 May 2009 15:34:30 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Arthur's Blog Next Generation DW]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=168</guid>
		<description><![CDATA[Received wisdom about data warehousing instructs us not to create a &#8220;dumping ground&#8221; for our raw detail data. But why not? This principle is a legacy from the not-so-distant past when it was impractical to keep huge amounts of data around if it was not being actively used – so once aggregates had been built, [...]]]></description>
			<content:encoded><![CDATA[<p>Received wisdom about data warehousing instructs us not to create a &#8220;dumping ground&#8221; for our raw detail data. But why not? This principle is a legacy from the not-so-distant past when it was impractical to keep huge amounts of data around if it was not being actively used – so once aggregates had been built, the original details were simply discarded. Of course, this meant that the organization was then confined to working with a particular &#8220;version of the truth&#8221; that someone had imposed on the data; there was no way to revisit the original details should the need arise for a change of perspective. </p>

<p><span id="more-168"></span></p>

<p><div class="right"><img src='http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png' alt='Arthur Ritchie' /></div>Now, however, <a href="http://www.sandmtl.com/news/tag/nearline-20/">Nearline 2.0</a> technology allows us to store massive quantities of data in an easily acccessible format without undue administration requirements, such an approach is economically viable not only in terms of storage and retrieval, but also from a human resources perspective. </p>

<p>Creating a &#8220;corporate memory&#8221; of this sort brings a number of major benefits:</p>

<ul><li>If it turns out that any or all of the original data is required for a new project (involving new aggregations, &#8220;cubes&#8221; or KPIs, for example), it is substantially less expensive and much easier to access it.</li>
<li>If it becomes necessary to assess the value of the corporate data, this can be done carefully over a period of time using data mining techniques, rather than in a rush to meet specific project deadlines.</li>
<li>The original detail data will only need to be handled once, and then can be retained as a permanent record for use in auditing the data warehouse if this becomes necessary (to prove compliance, to aid in conflict resolution, or simply to rectify &#8220;human errors” that were made at some point in the process)</li>
<li>It becomes possible to document in the metadata layer who accessed the information, when and for what reason – incidentally, this can also provide valuable insight into the abilities of those who are responsible for manipulating the raw data.</li></ul>

<p>In my next post, I&#8217;ll cover the ingredients for building Corporate Memory into a Next-Generation Data Warehouse.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/data-dumping-grounds-and-the-importance-of-corporate-memory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Decision Support for Users Who Don&#8217;t Know What They Don&#8217;t Know</title>
		<link>http://www.sandmtl.com/news/decision-support-for-users-who-dont-know-what-they-dont-know/</link>
		<comments>http://www.sandmtl.com/news/decision-support-for-users-who-dont-know-what-they-dont-know/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 13:10:46 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Arthur's Blog Next Generation DW]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=167</guid>
		<description><![CDATA[Since the beginning of the computer era, system designers have struggled to reconcile conflicting aims of performance vs. functionality and maintainability vs. adaptability. In the case of Business Intelligence, there has been no less of a need for tradeoffs in order to deliver workable systems. However, BI system design has also typically been constrained even [...]]]></description>
			<content:encoded><![CDATA[<p>Since the beginning of the computer era, system designers have struggled to reconcile conflicting aims of performance vs. functionality and maintainability vs. adaptability. In the case of Business Intelligence, there has been no less of a need for tradeoffs in order to deliver workable systems. However, BI system design has also typically been constrained even further by four fundamental realities:</p>

<p><span id="more-167"></span></p>

<p><div class="right"><img src='http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png' alt='Arthur Ritchie' /></div><ul>
<li>Users don&#8217;t know what they don&#8217;t know &#8211; they will always have difficulty  articulating what they need, until they actually start working with the data.</li></p>

<p><li>We in IT tend to want to implement relatively restrictive systems with well-defined acceptance criteria, in order to get buy-in from users. This protects us from backlash – and we know that users with looming deadlines will inevitably settle for less than they really require.</li></p>

<p><li>Our IT training and experience has taught us that any system able to accommodate all the information that might be needed would be beyond our means – and even if it were affordable, it would be unlikely to perform adequately. As a result, we have gone to great lengths to determine the minimum amount of information required, then extracted that information and transformed it into the records that we retain.</li></p>

<p>However, it must be admitted that we almost never accurately judge what is “the minimum amount of information required” – and this typically leads to a series of costly attempts later on to find the missing data and retrofit it into our carefully designed data models.</p>

<p><li>Generally, corporate executives have given up on the possibility of understanding the complexities of the computer world, and so have relinquished control of computer systems to the IT department. This lack of overarching leadership often results in a conflict situation, with the IT department facing various frustrated end-user groups who “don’t know what they don’t know”.</li>
</ul> </p>

<p>If IT departments are to make any progress in implementing Decision Support systems that really work, things need to change. We urgently need to approach the task of designing data warehouses in a spirit of humility and  mutual co-operation. We must also acknowledge that some aspects of the resulting system may miss the mark &#8211;  so we will need to build into the system a means of detecting and correcting problems, and for accepting user feedback and then efficiently incorporating the necessary adaptations into  the system.</p>

<p>For a data warehouse environment to be of real value, it needs to be able to operate on the basis that users don’t know what they don’t know, by supporting the twists and turns of &#8220;incremental thinking&#8221; whereby they only gradually come to an understanding of their requirements. If a system designer comes right out and asks users what they want to know, they will typically be met with uncertainty. At this point, individuals anxious to promote their own way of thinking may step in with the assurance that they know what is really needed, while business experts who need to anticipate the effects of any number of potential scenarios from hurricanes to &#8220;credit crunches&#8221; often find themselves unable to make their voices heard. Locking in to a rigid, inflexible design on the basis of such inadequate input practically guarantees that the system will fail to deliver value over the long term.</p>

<p>In a Business Intelligence environment, incremental thinking usually involves &#8220;feeling one&#8217;s way&#8221; towards a goal by asking successive questions. Frequently, the approach will change – in terms of query strategies and the data sets we want to analyze – as insight is gained into the problem. Analysts work much like a detective will operate at the scene of a crime, picking up and examining various clues but not being able to link them together until the necessary understanding is developed. There are many spectacular examples of the enormous business value that can be derived from being able to ask any question that may arise in an analyst&#8217;s train of thought, particularly in highly dynamic activities like fuel hedging and currency trading. Success in these areas requires that plenty of historical details be available for unfettered and recursive access – allowing for the types of question that often provide the biggest business benefits.</p>

<p>However, it is obviously not necessary or even feasible to scan all the details every time a question needs to be answered. For this reason, a combination of traditional reporting and analysis architectures with a <a href="http://www.sandmtl.com/news/tag/nearline-20/">Nearline 2.0</a> repository (used to store massive amounts of detail data) can be the ideal solution. This architecture takes advantage of breakthroughs in database federation techniques and cheap, scalable SMP processors to create a massively parallel architecture that can easily scale to meet future needs, with little or no chance of technological obsolescence. In future posts I will look more closely at how such an architecture can be designed and implemented to enable high-performance, flexible decision support for business &#8220;at the speed of thought&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/decision-support-for-users-who-dont-know-what-they-dont-know/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Redefining the Role of IT in Business Intelligence</title>
		<link>http://www.sandmtl.com/news/redefining-the-role-of-it-in-business-intelligence/</link>
		<comments>http://www.sandmtl.com/news/redefining-the-role-of-it-in-business-intelligence/#comments</comments>
		<pubDate>Mon, 06 Apr 2009 16:00:42 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Arthur's Blog Next Generation DW]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=165</guid>
		<description><![CDATA[If our businesses are going to survive, we need to stop designing Business Intelligence systems that tell us what we want to hear, and which work well in good times but behave incomprehensibly during periods of significant inconsistency. Instead, we need to build systems that empower our best analysts to help correct flaws in our [...]]]></description>
			<content:encoded><![CDATA[<p>If our businesses are going to survive, we need to stop designing Business Intelligence systems that tell us what we want to hear, and which work well in good times but behave incomprehensibly during periods of significant inconsistency. Instead, we need to build systems that empower our best analysts to help correct flaws in our activities and identify opportunities that we can exploit. We need to be in a position where existing paradigms can be challenged and replaced by new ones on an ongoing basis. However, just as new scientific theories need to fit with observed reality, these new business approaches must be well supported by the facts as recorded in a company’s information repository.</p>

<p><span id="more-165"></span></p>

<p><div class="right"><img src='http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png' alt='Arthur Ritchie' /></div>The recent crisis in financial systems, resulting in major corporations going cap-in-hand to governments for handouts, dramatically illustrates why we need to move beyond reliance on reporting systems to adoption of a challenge-based system. With a challenge-based system, key analysts and &#8220;out-of-the-box&#8221; thinkers with sufficient analytic capability and insights are given unconstrained access to the data warehouse so that they can identify problems and recommend corrective action based on the facts. In this way,  the people currently controlling the “spin” on corporate information can be educated (or bypassed if necessary), and executives can be better informed and made more responsive to constantly changing business needs and opportunities. </p>

<p>The IT department cannot continue to dictate how IT-held data will be used: users have to be empowered to make these decisions themselves within their area of primary expertise. The focus of IT should instead be on making data available to end users, ensuring that it is not corrupted and that it can be audited when required, and on infrastructural issues like security, disaster recovery and so on. The job of the IT department should thus go far beyond merely distributing preconfigured reports, to encompass the more essential task of empowering users to get what they really need, when and how they need it, and to make sure that adequate resources are in place to get the job done. In short, IT departments need to work in a similar way to utility companies, providing users with infrastructure options but not with final solutions – and as with a power company, for example, fair and appropriate charge-out policies and procedures need to be put in place, so that IT can be transformed from a cost centre into a profit centre.</p>

<p>So,  the primary role of IT should be to provide unfettered access to data, to make sure that the data will not be lost for any reason (even natural disasters), and to meet various levels of demand. While security has traditionally been another key issue for IT, even this area might be better handled by a specialized department, with IT providing assistance as needed. In any event, IT should not be able to prevent anyone with the appropriate security levels (which in some cases may exceed those of the head of IT) from getting to the data they need. Often, invocations of security or the impact on performance for “other system users” are simply a ruse to cover-up for the fact that the data is not readily available or that the system is not designed to cope with the demand.</p>

<p>We need to understand that a functional Business Intelligence (BI) system is governed by the same rules that govern any other large architectural undertaking like an office building or an elevated highway. It must meet the diversified needs of the community it serves and those of the people whose lives it affects. It also has to fit within the constraints already in place in the environment in which it is to be established – since very few of us get to work with a &#8220;clean slate&#8221;, and the legal and administrative hurdles can often be formidable.</p>

<p>Successful implementation of BI systems thus needs to be considered as part art and part science. Pure science has not been able to solve all society’s problems, and those who try to adopt a purely &#8220;scientific&#8221; approach to BI will not be successful. Fundamentally, the field of technology is not just about pursuing technical innovation, but about helping human beings become more effective and productive in pursuing their goals. In my following posts, I will be looking at how IT departments can go about designing systems that allow for just this sort of flexible, on-demand provision of information services to analysts, with the ultimate aim of enabling truly effective decision support for maximum repsonsiveness in today&#8217;s unpredictable business environment.</p>

<p>Arthur Ritchie</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/redefining-the-role-of-it-in-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Business Intelligence: An Oxymoron?</title>
		<link>http://www.sandmtl.com/news/business-intelligence-an-oxymoron/</link>
		<comments>http://www.sandmtl.com/news/business-intelligence-an-oxymoron/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 20:41:59 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Arthur's Blog Next Generation DW]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=164</guid>
		<description><![CDATA[An old joke has it that the term &#8220;military intelligence&#8221; is an oxymoron – and in light of the current global financial crisis, it is tempting to put &#8220;Business Intelligence&#8221; in this category as well. Our inability to predict or deal competently with major events, from wars in the Middle East to the meltdown of [...]]]></description>
			<content:encoded><![CDATA[<p>An old joke has it that the term &#8220;military intelligence&#8221; is an oxymoron – and in light of the current global financial crisis, it is tempting to put &#8220;Business Intelligence&#8221; in this category as well. Our inability to predict or deal competently with major events, from wars in the Middle East to the meltdown of global financial systems, shows just how ineffective our Business Intelligence/Data Warehousing strategies or “fit-for-purpose” reporting systems can be in responding to events as they unfold in this complex world. We are now confounded by the facts: we cannot predict the future; the largest military powers cannot conquer and control much weaker opponents; economists cannot adequately monitor essential financial systems. Automated trading systems, whose rules we once thought we understood and controlled, seem to have taken on a life of their own. </p>

<p><span id="more-164"></span></p>

<p><div class="right"><img src='http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png' alt='Arthur Ritchie' /></div>In my view, unless talented analysts are given unfettered access to whatever corporate data they require and the ability to analyze it as they see fit in the context of the many external data sources that are available, we will continue to find ourselves unprepared to deal with the unexpected. Implemented correctly, corporate Business Intelligence (BI)  systems can support an organization&#8217;s best analysts as they challenge traditional business dogmas and develop a practicable way forward based on the facts, as recorded in detailed corporate data. To achieve this, IT departments need to stop acting as “data jailors” who strictly control which data will be accessible, in what form, and start empowering creative thinkers to realize their maximum potential, be it in marketing, manufacturing, distribution or some other field. In order to do this, however, IT departments need to start acting more like a power utility service: enabling &#8220;decision support&#8221; (to revive an older term for Business Intelligence) by providing corporate information or raw data as required, in the right amounts at the right time, while also serving as &#8220;consultants&#8221; who help end users access the data they require, when and how they need it. </p>

<p>In this series of blog posts, I will be discussing how an effective information infrastructure for decision support can be implemented without resorting to a “rip and replace&#8221; strategy that would involve completely scrapping the existing data warehouse. As the foundation of an optimal system for enterprise data management and decision support, I will be proposing the concept of Data Warehouse as a tiered architecture that combines three database models/technologies to support both standard reporting and &#8220;power analytics&#8221; as well as highly accessible storage of massive amounts of granular data to be used by the reporting and analytics engines.</p>

<p>Arthur Ritchie</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/business-intelligence-an-oxymoron/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using XAM with Nearline 2.0 to Ensure Data Compliance</title>
		<link>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/</link>
		<comments>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 16:12:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>
		<category><![CDATA[xam]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=156</guid>
		<description><![CDATA[Recently, SAND has been conducting tests on our SAND/DNA Access product to benchmark its support for the XAM (eXtensible Access Method) API. These tests, executed at EMC&#8217;s lab in Hopkinton, Mass. using the latest version of EMC&#8217;s Centera solution, were a great success in a number of respects, and the SAND/DNA Access Nearline 2.0 software [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, SAND has been conducting tests on our <a href="http://www.sand.com/dna/access/index.html">SAND/DNA Access</a> product to benchmark its support for the XAM (eXtensible Access Method) API. These tests, executed at EMC&#8217;s lab in Hopkinton, Mass. using the latest version of EMC&#8217;s <a href="http://www.sandmtl.com/news/sand-technology-integrates-sanddna-with-emc%c2%ae-centera%e2%84%a2-content-addressed-storage-cas/">Centera solution</a>, were a great success in a number of respects, and the SAND/DNA Access <a href="http://www.sandmtl.com/news/tag/nearline-20/">Nearline 2.0</a> software component is now the first commercial product to obtain XAM certification from EMC. In today&#8217;s blog post, I will describe the XAM interface, explain our motivation for implementing it, and provide some details about the benchmark tests.</p>

<p><span id="more-156"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>The Wikipedia entry for XAM describes it as &#8220;a storage standard developed and maintained by the Storage Networking Industry Association (SNIA). It is in the process of being ratified as an ANSI standard. <a href="http://en.wikipedia.org/wiki/Xam">XAM</a> is an API for fixed content aware storage devices. XAM replaces the various proprietary interfaces that have been used for this purpose in the past. Content generating applications now have a standard means of saving and finding their content across a broad array of storage devices.”</p>

<p>For a data framework solution like Nearline 2.0, XAM provides the ability to store large amounts of structured data (for example, call detail records, application logs, web logs, syslogs, low-level manufacturing data, data exported from an enterprise data warehouse such as SAP BW, and so on) securely and efficiently to ensure compliance with data governance requirements. SAND/DNA Access can use XAM storage providers like EMC Centera as WORM (Write Once Read Many) devices, ensuring that stored data can not be modified. Furthermore, the Centera XAM compliance solution offers robust integrated business continuity and disaster recovery protection for structured data in SAND/DNA Access. In turn, SAND/DNA Access maintains massive volumes of structured data as content for the XAM storage provider, enabling a complete enterprise solution for data governance and compliance. SAND/DNA Access also brings high-performance query capability for structured data in the XAM storage provider, making it possible to conduct true electronic discovery (e-discovery) on this data. All these considerations contributed to SAND&#8217;s decision to implement support for the new XAM API.</p>

<p>We began development of XAM API support at the beginning of October 2008, and within a month were ready to execute our benchmark tests at the EMC Hopkinton Lab. These tests, which were fully supported by the EMC Centera team, allowed us to validate our XAM API implementation and to evaluate the performance of the data migration process (moving data from SAND/DNA Access to the Centera server) as well as query speeds. In both cases, observed performance was very impressive: using limited processing power (4 blade servers), we were able to achieve data migration speeds higher than 400 MB per second, and query speeds of close to 11 million records scanned per second! This shows that the new XAM functionality of SAND/DNA Access is more than up to the task of enabling existing Centera or XAM storage provider customers to extend their infrastructure to incorporate data compliance and governance regimes for massive amounts of structured data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 and the Data Management Framework</title>
		<link>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/#comments</comments>
		<pubDate>Mon, 10 Nov 2008 14:35:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=149</guid>
		<description><![CDATA[In my last post, I outlined some of the advanced data modeling options that have been made possible by the advent of Nearline 2.0. Today, I want to discuss how Nearline 2.0 can act as an essential component in a data management framework. The data management framework, which can be viewed as an extension of [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I outlined some of the <a href="http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/">advanced data modeling options</a> that have been made possible by the advent of Nearline 2.0. Today, I want to discuss how Nearline 2.0 can act as an essential component in a data management framework. The data management framework, which can be viewed as an extension of the data model concept to the level of enterprise data architecture, governs the processing and management of enterprise data throughout its &#8220;lifecycle&#8221;, from creation to disposal. It includes all operational components, and covers key issues such as data backup, disaster recovery, data retention, data access security and so on. </p>

<p><span id="more-149"></span></p>

<p>Let&#8217;s start at the beginning of the process. In a traditional data warehouse implementation, data is extracted from operational systems, then transformed and loaded into the data warehouse. Figure 1 shows this process, usually called ETL (Extract, Transform and Load) or sometimes ELT (Extract, Load and Transform), in the context of the Corporate Information Factory.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_01.jpg" alt="Traditional "Nearline 1.0" Framework" /></p>

<p>In previous posts, I presented Nearline 2.0 as a &#8220;database extension&#8221; that can reduce TCO, guarantee satisfaction of SLAs, and improve the performance of the online database that underpins the data warehouse solution. This configuration is based on the traditional concept of Information Lifecycle Management (ILM), normally implemented as a tactical effort to protect the enterprise against problems associated with the &#8220;data tsunami&#8221;. Figure 2 illustrates the incorporation of Nearline 2.0 into the data management framework as a database extension used to store (and maintain access to) older or less frequently used detail data.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_02.jpg" alt="Modern Nearline 2.0 Framework" /></p>

<p>While this ILM concept is a sound one, it has important limitations because of its reliance on migrating data from the online database to the nearline repository, with all the technical and human resource requirements this entails. Is it really necessary to do it this way? In my last post, I introduced the notion of Just-in-Case (JIC) data, which involves maintaining access to the detail data that underlies data warehouse summary tables. If the JIC data (which is static by definition) is &#8220;nearlined&#8221; at creation time &#8212; or as as soon as possible after it is created, &#8212; we can avoid the costs and system impact of moving it into the data warehouse. Why should we migrate JIC data to the online data warehouse, only to face the additional costs and hassles of archiving it later on? This will be a vital consideration when strategic decisions are being made about the enterprise data management framework.</p>

<p>It clearly makes much more sense for the data to be stored directly in the Nearline 2.0 repository. At SAND, we call this type of architecture, which is only possible because of the availability of Nearline 2.0 technology, the Corporate Information Memory (CIM). This is illustrated in figure 3.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_03.jpg" alt="Corporate Information Memory" /></p>

<p>The key benefit of this architecture for any enterprise is that it is based on an evolution of the current architecture, and so protects existing investments in people skills, software and equipment. This fits perfectly with Gartner&#8217;s observations in a recent report on “Key Issues for Delivering a Data Warehouse Project, 2008”:</p>

<p>&#8220;The big theme for data warehousing in 2008 is the increased demand for more data, in more places and doing so with an evolutionary approach. Data warehouses are mission-critical and integrated into operations. Failure to support business process change and inflexibility will not be tolerated. The problem is how to take the years of effort and funds invested previously and leverage them into a modern data warehouse because &#8220;rip and replace&#8221; strategies are not acceptable.&#8221;</p>

<p>So, it is not difficult to see how Nearline 2.0 can act as a pivotal element in an enterprise data management framework, and how it presents one of the best available solutions to the critical issues that have arisen in this area. In my next post, I will discuss the benefits of this type of architecture in more detail.</p>

<p>Richard Grondin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 and Advanced Data Modeling</title>
		<link>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 19:19:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=147</guid>
		<description><![CDATA[In my last post, I discussed the “Quick Check” method for identifying the benefits that a Nearline 2.0 implementation can deliver in the areas of operational performance, SLAs and TCO. Certainly, it is easy to see how it would be preferable to manage a database that is 1 TB rather than 20 TB in size, [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I discussed the <a href="http://www.sandmtl.com/news/starting-nearline-20-the-quick-check-approach/">“Quick Check” method</a> for identifying the benefits that a <a href="http://www.sandmtl.com/news/introducing-nearline-20/">Nearline 2.0</a> implementation can deliver in the areas of operational performance, SLAs and TCO. Certainly, it is easy to see how it would be preferable to manage a database that is 1 TB rather than 20 TB in size, particularly when it comes to critical tasks like backup and recovery, disaster recovery, off-site backups and historical analytics. Today, however, I want to focus on another benefit of Nearline 2.0 that is less obvious but still very important: data modeling flexibility.</p>

<p><span id="more-147"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>Nearline 2.0 permits organizations to keep a much greater amount of useful data accessible, without requiring compromises on SLAs, TCO and reporting performance. This in turn makes a variety of flexible data modeling options available. </p>

<h3>The Physical Table Partitioning Model</h3>

<p>The first of these new data modeling options is based on physical table partitioning. The largest tables in a data warehouse or data mart can be physically divided between an online component and a Nearline counterpart. This allows the existing data model to be maintained, while introducing a &#8220;right-sizing&#8221; concept where only the regularly accessed data is kept online, and all data that doesn’t require such an expensive and/or hard to manage environment is put into the Nearline 2.0 solution. A typical rule of thumb for defining partition boundaries is based on the 90-day aging principle, so that any static data older than 90 days is migrated from the online warehouse to the Nearline 2.0 repository. </p>

<p>Now, many forms of enterprise data, such as CDR, POS, Web, Proxy or Log data, are static by definition, and are furthermore usually the main sources of data warehouse growth. This is very good news, because it means that as soon as the data is captured, it can be moved to the Nearline 2.0 repository (in fact, it is conceivable that this kind of data could be fed directly to Nearline 2.0 from the source system – but that is a topic for another post).  Because of the large volumes involved, this kind of detail data has usually been aggregated at one or more levels in the enterprise data warehouse. Users generally query the summary table in order to identify trends, only drilling down into the details for a specific range of records when specific needs or opportunities are identified. This data access technique is well known, and has been in use for quite some time. </p>

<h3>The Online Summary Table Model</h3>

<p>This leads me to the second novel design option offered by Nearline 2.0: the ability to store all static detail data in the Nearline repository, and then use this as the basis for building online summary tables, with the ability to quickly drill to detail in the Nearline 2.0 repository when required. More specifically, the Nearline 2.0 solution can be used to feed the online system&#8217;s summary tables directly. The advantage of this implementation is that it substantially reduces the size of the online database, optimizes its performance, and permits trend analysis on even very long periods. This is particularly useful when looking for emerging trends (positive or negative) related to specific products or offerings, because it gives managers the chance to analyze and respond to issues and opportunities within a realistic time frame. A recent <a href="http://press.experian.com/documents/showdoc.cfm?doc=3264">press release from Experian</a>  provides an excellent example of how business value can be produced by expert analysts using long-term historical data and the right set of tools.</p>

<p>Some organizations are already building this type of data hierarchy, using Data Marts or analytic cubes fed by the main Data Warehouse. I call this kind of architecture “data pipelining”. Nearline 2.0 can play an important role in such an implementation, since its repository can be shared between all the analytic platforms. This not only reduces data duplication, management/operational overhead, and requirements for additional hardware and software, it also relieves pressure on batch windows and lowers the risk of data being out of synch. Furthermore, this implementation can assist organizations with data governance and Master Data Management while improving overall data quality. </p>

<h3>The Just-In-Case Data Model</h3>

<p>Another important data modeling option offered by Nearline 2.0 relates to what we can call &#8220;just-in-case&#8221; data. In many cases, certain kinds of data will also be maintained outside the warehouse just in case an analyst requires ad hoc access to it for a specific study. Sometimes, for convenience, this “exceptional&#8221; data is stored in the data warehouse. However, keeping this data in an expensive storage and software environment, or even storing it on tape or inexpensive disks as independent files, can create a data management nightmare. At the same time, studies demonstrate that a very large portion of the costs associated with ad hoc analysis are concentrated in the data preparation phase. As part of this phase, the analyst needs to &#8220;shop&#8221; for the just-in-case data to be analyzed, meaning that he or she needs to find , &#8220;slice&#8221;, clean, transform and use it to build a temporary analytic platform, sometimes known as an Exploration Warehouse or &#8220;Exploration Mart. </p>

<p>Nearline 2.0 can play a very important role in such a scenario. Just-in-case data can be stored in the Nearline repository, and analysts can then query it directly using standard SQL-based front-end tools to extract, slice and prepare the data for analytic use. Since much less time is spent on data preparation, far more time is available for data analysis &#8212; and there is no impact on the performance of the main reporting system. This acceleration of the data preparation phase results from the availability of a central catalog describing all the available data. The Nearline repository can be used to directly feed the expert&#8217;s preferred analytic platform, generally resulting in a substantial improvement in analyst productivity. Analysts can focus on executing their analyses, and on bringing more value to the enterprise, rather than on struggling to get access to clean and reliable data.</p>

<p>At SAND, we have developed a new analytical offering based on this approach: &#8220;Database on Demand&#8221; or DBOD. Using DBOD, an analyst can specify their data requirements and have a 100% indexed RDBMS built directly from these specifications, ready for querying by front-end tools such as SAS, SPSS, Business Objects, MicroStrategy and so on. I&#8217;ll touch on that more in a future post.</p>

<p>Richard Grondin
<img src="http://www.sandmtl.com/news/images/blog/nearline_20_architecture.jpg" alt="Nearline 2.0 Architecture" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
