<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SAND News &#187; Nearline 2.0</title>
	<atom:link href="http://www.sandmtl.com/news/tag/nearline-20/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sandmtl.com/news</link>
	<description></description>
	<lastBuildDate>Thu, 04 Mar 2010 04:27:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using XAM with Nearline 2.0 to Ensure Data Compliance</title>
		<link>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/</link>
		<comments>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 16:12:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>
		<category><![CDATA[xam]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=156</guid>
		<description><![CDATA[Recently, SAND has been conducting tests on our SAND/DNA Access product to benchmark its support for the XAM (eXtensible Access Method) API. These tests, executed at EMC&#8217;s lab in Hopkinton, Mass. using the latest version of EMC&#8217;s Centera solution, were a great success in a number of respects, and the SAND/DNA Access Nearline 2.0 software [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, SAND has been conducting tests on our <a href="http://www.sand.com/dna/access/index.html">SAND/DNA Access</a> product to benchmark its support for the XAM (eXtensible Access Method) API. These tests, executed at EMC&#8217;s lab in Hopkinton, Mass. using the latest version of EMC&#8217;s <a href="http://www.sandmtl.com/news/sand-technology-integrates-sanddna-with-emc%c2%ae-centera%e2%84%a2-content-addressed-storage-cas/">Centera solution</a>, were a great success in a number of respects, and the SAND/DNA Access <a href="http://www.sandmtl.com/news/tag/nearline-20/">Nearline 2.0</a> software component is now the first commercial product to obtain XAM certification from EMC. In today&#8217;s blog post, I will describe the XAM interface, explain our motivation for implementing it, and provide some details about the benchmark tests.</p>

<p><span id="more-156"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>The Wikipedia entry for XAM describes it as &#8220;a storage standard developed and maintained by the Storage Networking Industry Association (SNIA). It is in the process of being ratified as an ANSI standard. <a href="http://en.wikipedia.org/wiki/Xam">XAM</a> is an API for fixed content aware storage devices. XAM replaces the various proprietary interfaces that have been used for this purpose in the past. Content generating applications now have a standard means of saving and finding their content across a broad array of storage devices.”</p>

<p>For a data framework solution like Nearline 2.0, XAM provides the ability to store large amounts of structured data (for example, call detail records, application logs, web logs, syslogs, low-level manufacturing data, data exported from an enterprise data warehouse such as SAP BW, and so on) securely and efficiently to ensure compliance with data governance requirements. SAND/DNA Access can use XAM storage providers like EMC Centera as WORM (Write Once Read Many) devices, ensuring that stored data can not be modified. Furthermore, the Centera XAM compliance solution offers robust integrated business continuity and disaster recovery protection for structured data in SAND/DNA Access. In turn, SAND/DNA Access maintains massive volumes of structured data as content for the XAM storage provider, enabling a complete enterprise solution for data governance and compliance. SAND/DNA Access also brings high-performance query capability for structured data in the XAM storage provider, making it possible to conduct true electronic discovery (e-discovery) on this data. All these considerations contributed to SAND&#8217;s decision to implement support for the new XAM API.</p>

<p>We began development of XAM API support at the beginning of October 2008, and within a month were ready to execute our benchmark tests at the EMC Hopkinton Lab. These tests, which were fully supported by the EMC Centera team, allowed us to validate our XAM API implementation and to evaluate the performance of the data migration process (moving data from SAND/DNA Access to the Centera server) as well as query speeds. In both cases, observed performance was very impressive: using limited processing power (4 blade servers), we were able to achieve data migration speeds higher than 400 MB per second, and query speeds of close to 11 million records scanned per second! This shows that the new XAM functionality of SAND/DNA Access is more than up to the task of enabling existing Centera or XAM storage provider customers to extend their infrastructure to incorporate data compliance and governance regimes for massive amounts of structured data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/using-xam-with-nearline-20-to-ensure-data-compliance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 and the Data Management Framework</title>
		<link>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/#comments</comments>
		<pubDate>Mon, 10 Nov 2008 14:35:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=149</guid>
		<description><![CDATA[In my last post, I outlined some of the advanced data modeling options that have been made possible by the advent of Nearline 2.0. Today, I want to discuss how Nearline 2.0 can act as an essential component in a data management framework. The data management framework, which can be viewed as an extension of [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I outlined some of the <a href="http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/">advanced data modeling options</a> that have been made possible by the advent of Nearline 2.0. Today, I want to discuss how Nearline 2.0 can act as an essential component in a data management framework. The data management framework, which can be viewed as an extension of the data model concept to the level of enterprise data architecture, governs the processing and management of enterprise data throughout its &#8220;lifecycle&#8221;, from creation to disposal. It includes all operational components, and covers key issues such as data backup, disaster recovery, data retention, data access security and so on. </p>

<p><span id="more-149"></span></p>

<p>Let&#8217;s start at the beginning of the process. In a traditional data warehouse implementation, data is extracted from operational systems, then transformed and loaded into the data warehouse. Figure 1 shows this process, usually called ETL (Extract, Transform and Load) or sometimes ELT (Extract, Load and Transform), in the context of the Corporate Information Factory.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_01.jpg" alt="Traditional "Nearline 1.0" Framework" /></p>

<p>In previous posts, I presented Nearline 2.0 as a &#8220;database extension&#8221; that can reduce TCO, guarantee satisfaction of SLAs, and improve the performance of the online database that underpins the data warehouse solution. This configuration is based on the traditional concept of Information Lifecycle Management (ILM), normally implemented as a tactical effort to protect the enterprise against problems associated with the &#8220;data tsunami&#8221;. Figure 2 illustrates the incorporation of Nearline 2.0 into the data management framework as a database extension used to store (and maintain access to) older or less frequently used detail data.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_02.jpg" alt="Modern Nearline 2.0 Framework" /></p>

<p>While this ILM concept is a sound one, it has important limitations because of its reliance on migrating data from the online database to the nearline repository, with all the technical and human resource requirements this entails. Is it really necessary to do it this way? In my last post, I introduced the notion of Just-in-Case (JIC) data, which involves maintaining access to the detail data that underlies data warehouse summary tables. If the JIC data (which is static by definition) is &#8220;nearlined&#8221; at creation time &#8212; or as as soon as possible after it is created, &#8212; we can avoid the costs and system impact of moving it into the data warehouse. Why should we migrate JIC data to the online data warehouse, only to face the additional costs and hassles of archiving it later on? This will be a vital consideration when strategic decisions are being made about the enterprise data management framework.</p>

<p>It clearly makes much more sense for the data to be stored directly in the Nearline 2.0 repository. At SAND, we call this type of architecture, which is only possible because of the availability of Nearline 2.0 technology, the Corporate Information Memory (CIM). This is illustrated in figure 3.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_framework_03.jpg" alt="Corporate Information Memory" /></p>

<p>The key benefit of this architecture for any enterprise is that it is based on an evolution of the current architecture, and so protects existing investments in people skills, software and equipment. This fits perfectly with Gartner&#8217;s observations in a recent report on “Key Issues for Delivering a Data Warehouse Project, 2008”:</p>

<p>&#8220;The big theme for data warehousing in 2008 is the increased demand for more data, in more places and doing so with an evolutionary approach. Data warehouses are mission-critical and integrated into operations. Failure to support business process change and inflexibility will not be tolerated. The problem is how to take the years of effort and funds invested previously and leverage them into a modern data warehouse because &#8220;rip and replace&#8221; strategies are not acceptable.&#8221;</p>

<p>So, it is not difficult to see how Nearline 2.0 can act as a pivotal element in an enterprise data management framework, and how it presents one of the best available solutions to the critical issues that have arisen in this area. In my next post, I will discuss the benefits of this type of architecture in more detail.</p>

<p>Richard Grondin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-and-the-data-management-framework/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 and Advanced Data Modeling</title>
		<link>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 19:19:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=147</guid>
		<description><![CDATA[In my last post, I discussed the “Quick Check” method for identifying the benefits that a Nearline 2.0 implementation can deliver in the areas of operational performance, SLAs and TCO. Certainly, it is easy to see how it would be preferable to manage a database that is 1 TB rather than 20 TB in size, [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I discussed the <a href="http://www.sandmtl.com/news/starting-nearline-20-the-quick-check-approach/">“Quick Check” method</a> for identifying the benefits that a <a href="http://www.sandmtl.com/news/introducing-nearline-20/">Nearline 2.0</a> implementation can deliver in the areas of operational performance, SLAs and TCO. Certainly, it is easy to see how it would be preferable to manage a database that is 1 TB rather than 20 TB in size, particularly when it comes to critical tasks like backup and recovery, disaster recovery, off-site backups and historical analytics. Today, however, I want to focus on another benefit of Nearline 2.0 that is less obvious but still very important: data modeling flexibility.</p>

<p><span id="more-147"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>Nearline 2.0 permits organizations to keep a much greater amount of useful data accessible, without requiring compromises on SLAs, TCO and reporting performance. This in turn makes a variety of flexible data modeling options available. </p>

<h3>The Physical Table Partitioning Model</h3>

<p>The first of these new data modeling options is based on physical table partitioning. The largest tables in a data warehouse or data mart can be physically divided between an online component and a Nearline counterpart. This allows the existing data model to be maintained, while introducing a &#8220;right-sizing&#8221; concept where only the regularly accessed data is kept online, and all data that doesn’t require such an expensive and/or hard to manage environment is put into the Nearline 2.0 solution. A typical rule of thumb for defining partition boundaries is based on the 90-day aging principle, so that any static data older than 90 days is migrated from the online warehouse to the Nearline 2.0 repository. </p>

<p>Now, many forms of enterprise data, such as CDR, POS, Web, Proxy or Log data, are static by definition, and are furthermore usually the main sources of data warehouse growth. This is very good news, because it means that as soon as the data is captured, it can be moved to the Nearline 2.0 repository (in fact, it is conceivable that this kind of data could be fed directly to Nearline 2.0 from the source system – but that is a topic for another post).  Because of the large volumes involved, this kind of detail data has usually been aggregated at one or more levels in the enterprise data warehouse. Users generally query the summary table in order to identify trends, only drilling down into the details for a specific range of records when specific needs or opportunities are identified. This data access technique is well known, and has been in use for quite some time. </p>

<h3>The Online Summary Table Model</h3>

<p>This leads me to the second novel design option offered by Nearline 2.0: the ability to store all static detail data in the Nearline repository, and then use this as the basis for building online summary tables, with the ability to quickly drill to detail in the Nearline 2.0 repository when required. More specifically, the Nearline 2.0 solution can be used to feed the online system&#8217;s summary tables directly. The advantage of this implementation is that it substantially reduces the size of the online database, optimizes its performance, and permits trend analysis on even very long periods. This is particularly useful when looking for emerging trends (positive or negative) related to specific products or offerings, because it gives managers the chance to analyze and respond to issues and opportunities within a realistic time frame. A recent <a href="http://press.experian.com/documents/showdoc.cfm?doc=3264">press release from Experian</a>  provides an excellent example of how business value can be produced by expert analysts using long-term historical data and the right set of tools.</p>

<p>Some organizations are already building this type of data hierarchy, using Data Marts or analytic cubes fed by the main Data Warehouse. I call this kind of architecture “data pipelining”. Nearline 2.0 can play an important role in such an implementation, since its repository can be shared between all the analytic platforms. This not only reduces data duplication, management/operational overhead, and requirements for additional hardware and software, it also relieves pressure on batch windows and lowers the risk of data being out of synch. Furthermore, this implementation can assist organizations with data governance and Master Data Management while improving overall data quality. </p>

<h3>The Just-In-Case Data Model</h3>

<p>Another important data modeling option offered by Nearline 2.0 relates to what we can call &#8220;just-in-case&#8221; data. In many cases, certain kinds of data will also be maintained outside the warehouse just in case an analyst requires ad hoc access to it for a specific study. Sometimes, for convenience, this “exceptional&#8221; data is stored in the data warehouse. However, keeping this data in an expensive storage and software environment, or even storing it on tape or inexpensive disks as independent files, can create a data management nightmare. At the same time, studies demonstrate that a very large portion of the costs associated with ad hoc analysis are concentrated in the data preparation phase. As part of this phase, the analyst needs to &#8220;shop&#8221; for the just-in-case data to be analyzed, meaning that he or she needs to find , &#8220;slice&#8221;, clean, transform and use it to build a temporary analytic platform, sometimes known as an Exploration Warehouse or &#8220;Exploration Mart. </p>

<p>Nearline 2.0 can play a very important role in such a scenario. Just-in-case data can be stored in the Nearline repository, and analysts can then query it directly using standard SQL-based front-end tools to extract, slice and prepare the data for analytic use. Since much less time is spent on data preparation, far more time is available for data analysis &#8212; and there is no impact on the performance of the main reporting system. This acceleration of the data preparation phase results from the availability of a central catalog describing all the available data. The Nearline repository can be used to directly feed the expert&#8217;s preferred analytic platform, generally resulting in a substantial improvement in analyst productivity. Analysts can focus on executing their analyses, and on bringing more value to the enterprise, rather than on struggling to get access to clean and reliable data.</p>

<p>At SAND, we have developed a new analytical offering based on this approach: &#8220;Database on Demand&#8221; or DBOD. Using DBOD, an analyst can specify their data requirements and have a 100% indexed RDBMS built directly from these specifications, ready for querying by front-end tools such as SAS, SPSS, Business Objects, MicroStrategy and so on. I&#8217;ll touch on that more in a future post.</p>

<p>Richard Grondin
<img src="http://www.sandmtl.com/news/images/blog/nearline_20_architecture.jpg" alt="Nearline 2.0 Architecture" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-and-advanced-data-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Starting Nearline 2.0: The Quick Check Approach</title>
		<link>http://www.sandmtl.com/news/starting-nearline-20-the-quick-check-approach/</link>
		<comments>http://www.sandmtl.com/news/starting-nearline-20-the-quick-check-approach/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 19:44:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=145</guid>
		<description><![CDATA[In  previous posts, I introduced the concept of “Best Practices” for Nearline 2.0. Today, I will get down to the details of how and where to start with a Nearline 2.0 solution, beginning with a Best Practices approach designed to quickly identify the benefits of such an implementation in a given environment. At SAND [...]]]></description>
			<content:encoded><![CDATA[<p>In  previous posts, I introduced the concept of <a href="http://www.sandmtl.com/news/nearline-20-best-practices/">“Best Practices” for Nearline 2.0</a>. Today, I will get down to the details of how and where to start with a Nearline 2.0 solution, beginning with a Best Practices approach designed to quickly identify the benefits of such an implementation in a given environment. At SAND Technology, we offer this “Nearline 2.0 Quick Check&#8221; as part of our professional services portfolio.</p>

<p><span id="more-145"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>The Nearline 2.0 Quick Check normally takes about a week, and is designed to identify the main sources of data growth in the enterprise, the overall rate of growth, and the impact of this growth on existing Service Level Agreements (SLAs). It also involves describing data update processes in terms of frequency and dependencies. Based on this information, the concrete benefits of Nearline 2.0 for the organization can be clearly defined, along with the potential effects on TCO of not adopting or delaying the implementation. The key by-product of this exercise is a business case report that can be presented to corporate management.</p>

<h3>First Steps</h3>

<p>The first step in the Nearline 2.0 Quick Check involves installing a set of measurement tools to identify the overall size of each data warehouse and data mart used by the enterprise, along with the annual growth rate of each (note that it is not unusual for organizations to have more that one data warehouse in production at a given time, for a variety of reasons). A first set of metrics, generally extracted by the operational staff, provides a general analysis of storage usage over a specific period. The second set (pertaining to database objects) can be provided by the DBA team or collected using the Quick Check measurement tools. After discussions with key personnel, the collected metrics are used as the basis for estimating the TCO savings that would result from implementing various Nearline 2.0 scenarios.</p>

<p>The final set of metrics collected during the Nearline 2.0 Quick Check relates to organizational SLAs and TCO. Batch windows are analyzed to identify the major points of contention, and the processes that require long execution times and place a heavy load on processors, the network and the I/O subsystem. Frequently, the operational team will already know which processes are causing them nightmares, and can say whether they occur on a daily, weekly, monthly, quarterly or annual base. TCO can be more difficult to evaluate, and for this the reason needs to be examined with the assistance of organizational management.</p>

<h3>The Nearline 2.0 Advantage</h3>

<p>Of course, at the heart of this approach is the recently developed Nearline 2.0 concept, with all the new data management scenarios this has enabled. As I mentioned in a previous post, <a href="http://www.sandmtl.com/news/introducing-nearline-20/">older Nearline 1.0 or archiving solutions presented major drawbacks</a>, in that data removed from the online environment became very difficult and costly, if not impossible, to access. Because of this, determining the precise timing of data migration to a Nearline solution or Archive is a critical decision that can have a major impact on the organization.</p>

<p>A Nearline 2.0 solution, because of its performance characteristics and unique internal architecture, enables implementation of Nearline scenarios designed to deliver the highest return on investment and reduction of TCO, without giving up high-performance, flexible access to data for analytic purposes. Elaboration of the scenario doesn’t require any in-depth analysis of data access patterns. Rather, the data can be &#8220;nearlined&#8221; as soon as it becomes &#8220;static&#8221;, meaning that no more updates are planned for it. Analysis of update requirements will show that some data records are never updated after their creation, and can therefore be considered for nearlining right away &#8212; this is frequently the case with CDR, RFI or transaction log data. Data of this sort offers some very interesting data modeling options in a Nearline 2.0 solution, a topic I will be covering in my next post.</p>

<h3>Quick Check Results</h3>

<p>Once the Nearline 2.0 Quick Check is complete, a report can be prepared describing the current situation and where the enterprise will be in 6 months, in 1 year and so on. What is the expected cost to the enterprise of supporting the current rate of growth? What would be the effect of implementing various Nearline 2.0 scenarios? Armed with this information, the management team will have clear facts on which to base a decision to implement Nearline 2.0. And, based on my experience over the last few years, I’m pretty sure that they will decide to proceed earlier rather than later. </p>

<p>Richard Grondin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/starting-nearline-20-the-quick-check-approach/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 Best Practices</title>
		<link>http://www.sandmtl.com/news/nearline-20-best-practices/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-best-practices/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 12:00:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=143</guid>
		<description><![CDATA[In previous posts, we introduced the concept of Nearline 2.0, showed how it represented a significant step forward from traditional archiving practices, and discussed how Nearline 2.0 could help your business. To recapitulate: the major advantage of Nearline 2.0 is its superior data access performance, which enables a more aggressive approach to migrating data out [...]]]></description>
			<content:encoded><![CDATA[<p>In previous posts, we introduced the <a href="http://www.sandmtl.com/news/introducing-nearline-20/">concept of Nearline 2.0</a>, showed how it represented a significant step forward from <a href="http://www.sandmtl.com/news/nearline-20-vs-the-archive/">traditional archiving</a> practices, and discussed how Nearline 2.0 <a href="http://www.sandmtl.com/news/how-can-a-nearline-20-solution-help-your-business/">could help your business</a>. To recapitulate: the major advantage of Nearline 2.0 is its superior data access performance, which enables a more aggressive approach to migrating data out of the online repository to nearline (a process known as “data nearlining”) than is practical when using a traditional archiving product.
<span id="more-143"></span>
<div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>Today, I will be considering the question of when an enterprise should consider implementing a Nearline 2.0 solution. Broadly speaking, such implementations fall into two categories: they either offer a &#8220;cure&#8221; for an existing data management problem or represent a proactive implementation of data best practices within the organization. </p>

<h3>Cure or Prevention?</h3>

<p>The &#8220;cure&#8221; type of implementation is typically associated with a data warehouse &#8220;rescue&#8221; project. This is undertaken when the data warehouse grows to a point where database size causes major performance problems and affects the warehouse&#8217;s ability to meet Service Level Agreements (SLAs). In these kinds of situation, it is mainly the operations division of the organization that is affected, and who demand an immediate fix that can take the form of a Nearline implementation. The question here is: How quickly can the &#8220;cure&#8221; implementation stabilize warehouse performance and ensure satisfaction of SLAs?</p>

<p>On the other hand, the best practice approach, much like current practices related to healthy living, focuses on prevention rather than on curing. In this respect, best practices dictate that the Nearline 2.0 implementation should start as soon as some of the data in the data warehouse becomes &#8220;infrequently accessed&#8221;. Normally, this means data older than 90 days, since the access rate for granular data older than 90 days is usually minimal. The main idea is to keep the size of the data warehouse from inflating for no good business reason, by nearlining data as soon as possible. Ultimately this should work to protect the enterprise from an operational crisis arising from deteriorating performance and unmet SLAs. </p>

<p>In order to better judge the impact of using either of these two approaches, it is important to understand the various steps involved in the “Data Nearlining” process. What do we find when we &#8220;dissect&#8221; the process of nearlining data?</p>

<h3>Dissecting the “Data Nearlining” Process</h3>

<p>“Data Nearlining” involves multiple processes, whose performance characteristics can significantly influence the speed at which data is migrated out of the online database. The various processes can be grouped into two major steps: data extraction and database housekeeping.</p>

<h4>Data Extraction</h4>

<ul> 
<li>The first step (optional in some cases) is to lock the data that is targeted by the nearlining process, in order to ensure that the data is not modified while the process is going on.</li>
<li>Next comes the extraction of the data to be migrated. This is usually achieved via an SQL statement based on business rules for data migration. Often,  the extraction can be performed using multiple extraction/consumer processes working in parallel.</li>
<li>The next step is to secure the newly extracted data, so that it is recoverable.</li>
<li>Then, the integrity of the extracted data must be validated (normally by comparing it to its online counterpart).</li>
</ul>

<h4>Database Housekeeping</h4>

<p><ul>
<li>Next, delete the online data that has been moved to nearline.</li>
<li>Then, reorganize the tablespace of the deleted data.</li>
<li>Finally, rebuild/reorganize the index associated with the online table from which data has been nearlined.</li>
</ul></p>

<p>The Database Housekeeping process is often the slowest part of a Data Nearlining process, and thus can dictate the pace and scheduling of the implementation. In a production environment, the database housekeeping process is frequently decoupled from ongoing operations and performed over a weekend. It may be surprising to learn that deleting data can be a more expensive process than inserting it, but just ask an enterprise DBA about what is involved in deleting 1 TB from an Enterprise Data Warehouse and see what answer you get: for many, the task of fitting such a process into standard Batch Windows would be a nightmare.</p>

<p>So, it is easy to see that starting earlier in implementing Nearline 2.0 as a best practice can help to massively reduce not only the cost of the implementation, but also the time required to perform it. Therefore, the main recommendation to take away from this discussion is: Don’t wait too long to consider embarking on your Nearline 2.0 strategy! </p>

<p>That&#8217;s it for today. In my next post, I will take up the topic of which data should be initially considered as a candidate for migration </p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-best-practices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Can a Nearline 2.0 Solution Help Your Business?</title>
		<link>http://www.sandmtl.com/news/how-can-a-nearline-20-solution-help-your-business/</link>
		<comments>http://www.sandmtl.com/news/how-can-a-nearline-20-solution-help-your-business/#comments</comments>
		<pubDate>Mon, 06 Oct 2008 16:37:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=142</guid>
		<description><![CDATA[In my last post, I discussed how a Nearline 2.0 solution allows vast amounts of detail data to be accessed at speeds that rival the per-formance of online systems, which in turn gives business analysts the power to assess and fine-tune important business initiatives on the basis of actual historical facts. We saw that the [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I discussed how a <a href="http://www.sandmtl.com/news/introducing-nearline-20/">Nearline 2.0 solution</a> allows vast amounts of detail data to be accessed at speeds that rival the per-formance of online systems, which in turn gives business analysts the power to assess and fine-tune important business initiatives on the basis of actual historical facts. We saw that the promise of Nearline 2.0 is basically to give you all the data you want, when and how you want it &#8212; without compromising the performance of existing warehouse reporting systems.
<span id="more-142"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>Today, I want to consider what this capability means specifically for a business. What are the concrete benefits of implementing Nearline 2.0? Here are a few of the most important ones.
<h3>Nearline 2.0 enables you to keep all your valuable data available for analysis.</h3>
Having more data accessible – more details, covering longer periods – enables a number of improvements in Business Intelligence processes:
<ul>
    <li>A clearer understanding of emerging trends in the business &#8211; what will go well in the future as well as what is now “going south”</li>
    <li>Better support for iterative analyses, enabling more intensive Business Performance Management (BPM)</li>
    <li>Better insight into customer behavior over the long term</li>
    <li>More precise target marketing, bringing a three- to five-fold improvement in campaign yield</li>
</ul>
<h3>Nearline 2.0 enables you to dramatically increase information storage and maintain service levels without increasing costs or administration requirements.</h3>
<ul>
    <li>Extremely high compression rates give the ability to store considerably more information in a given hardware configuration</li>
    <li>A substantially reduced data footprint means much faster data processing, enabling effective satisfaction of Service Level Agreements without extensive investments in processing power</li>
    <li>Minimal administration requirements bring reductions in resource costs, and ensure that valuable IT and business resources will not be diverted from important tasks just to manage and maintain the Nearline 2.0 implementation</li>
    <li>High data compression also substantially reduces the cost of maintaining a data center by reducing requirements for floor space, air conditioning and so on.</li>
</ul>
<h3>Nearline 2.0 simplifies and accelerates Disaster Recovery scenarios.</h3>
A reduced data footprint means more data can be moved across existing networks, making Nearline 2.0 an ideal infrastructure for implementing and securing an offsite backup process for massive amounts of data,</p>

<h3>Nearline 2.0 keeps all detail data in an immutable form, available for delivery on request.</h3>

<p>Having read-only detail data available on-demand enables quick response to audit requests, avoiding the possibility of costly penal-ties for non-compliance. Optional security packages can be used to control user access to the data.</p>

<h3>Nearline 2.0 makes it easy to offload data from the online database before making final decisions about what is to be moved to an archiving solution.</h3>

<p>The traditional archiving process typically involves extensive analysis of data usage patterns in order to determine what should be moved to relatively inaccessible archival storage. With a Nearline 2.0 solution, it’s a simple matter to move large amounts of data out of the online database &#8212; thereby improving performance and guaranteeing satisfaction of SLA&#8217;s, &#8212; while still keeping the data avail-able for access when required. Data that is determined to be no longer used, but which still needs to be kept around to comply with data retention policies or regulations, can then be easily moved into an archiving solution. For more consideration of the <a href="http://www.sandmtl.com/news/nearline-20-vs-the-archive/">differences between Nearline 2.0 and traditional archiving</a>, see Arthur&#8217;s recent blog post.</p>

<p>Taken together, these benefits make a strong case for implementing a Nearline 2.0 solution when the data tsunami threatens to overwhelm the enterprise data warehouse. In future posts, I will be investigating each of these in more detail.</p>

<p>Richard Grondin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/how-can-a-nearline-20-solution-help-your-business/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nearline 2.0 vs. the Archive</title>
		<link>http://www.sandmtl.com/news/nearline-20-vs-the-archive/</link>
		<comments>http://www.sandmtl.com/news/nearline-20-vs-the-archive/#comments</comments>
		<pubDate>Mon, 29 Sep 2008 16:54:09 +0000</pubDate>
		<dc:creator>Arthur</dc:creator>
				<category><![CDATA[Arthur's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=141</guid>
		<description><![CDATA[In his most recent SAND blog post, Richard introduced the notion of “Nearline 2.0” and discussed how this concept, and related best practices, can be of vital importance to businesses dealing with the “data tsunami” we’ve been experiencing in recent years.

In this post, I’d like to step back a moment and explore the ways in [...]]]></description>
			<content:encoded><![CDATA[<p>In his most recent SAND blog post, Richard <a href="http://www.sandmtl.com/news/introducing-nearline-20/">introduced the notion of “Nearline 2.0”</a> and discussed how this concept, and related best practices, can be of vital importance to businesses dealing with the “data tsunami” we’ve been experiencing in recent years.</p>

<p>In this post, I’d like to step back a moment and explore the ways in which the dynamics of Nearline 2.0 differ from traditional methods of data archiving in terms of their approach to keeping data warehouse size under control. </p>

<p><span id="more-141"></span></p>

<h3>Putting Your Database “on a Diet”</h3>

<p><div class="right"><img src='http://www.sandmtl.com/news/images/portraits/ritchie_arthur.png' alt='Arthur Ritchie' /></div>Faced with massive and continually increasing growth in data volumes, data warehouse administrators have come up with a number of techniques designed to maintain acceptable warehouse performance. These include pre-building aggregates and Key Performance Indicators (KPI’s) from large amounts of detailed transaction data, and indexing as many columns as possible in order to speed up query processing.  As data warehouses continue to grow, however, the time required to do all the necessary preprocessing of data increases to the point where these tasks can no longer be performed within available “batch windows” when the warehouse is not being accessed by users. So, trade-offs need to be made. Doing less preprocessing work reduces the required time, but also means that queries that depend on aggregates, KPIs or additional indexes may take an inordinately long time to run, and may also severely degrade performance for other users as the system attempts to do the processing “on the fly”. This impasse leads to two possible choices: either stop providing the analytic functionality – making the system less valuable, and users more frustrated, — or “put the database on a diet” by moving some of the data it contains to another location.</p>

<p>Both Nearline 2.0 and archiving solutions can help trim down an over-expanded database: these allow substantial reduction of database size through implementation of an Information Lifecycle Management (ILM) approach, where unused or infrequently used detailed transactional data is removed from the online database and stored elsewhere. When the database is smaller, it will perform better and be capable of supporting a wider variety of user needs. Aggregates and KPI’s will be built from a much smaller amount of detailed transaction data. Additionally, column indexing will be more practicable as there will be fewer rows per column to be indexed. The natural side effect is, of course, that there is much less data to be analyzed and compared.</p>

<h3>Getting “Lean” Not “Mean”</h3>

<p>There are a number of important differences between archiving warehouse data (using products from Open Text,  EMC  Documentum,  and so on) and storing it in Nearline 2.0 (using SAND/DNA). However, since both types of product are used to hold data that has been moved out of the main “online” system, it is unclear to some why one would need to be implemented if the other is in place. To help clarify why one or the other type of system (or both) might be required in a given situation, it is worthwhile to go over the major points of contrast between Nearline 2.0 data and archived data.</p>

<p><img src="http://www.sandmtl.com/news/images/blog/nearline_20_vs_archive.jpg" alt="Online, Nearline 2.0, and Archive" /></p>

<h4>Archive</h4>

<p>Normally, the concept of electronic archiving focuses on the preservation of documents or data in a form that has some sort of certifiable integrity (for example, conformity to legal requirements), is immune to unauthorized access and tampering, and is easily subject to certain record management operations within a defined process – for example, automatic deletion after a certain period, or retrieval when requested by an auditor. The archive is in fact a kind of operational system for processing documents/data that are no longer in active use.   </p>

<p>The notion of archiving has traditionally focused on unstructured data in the form of documents, but similar concepts can be applied to structured data in the warehouse. An archive for SAP BI, for example, would preserve warehouse data that is no longer needed for analytical use but which needs to be kept around because it may be required by auditors, as would be the case if SAP BI data were used as the basis for financial statements. The archive data does not need to be directly accessible to the user community, just locatable and retrievable in case it is required for inspection or verification – not for analysis in the usual sense. In fact, because much of the data that needs to be preserved in the archive is fairly sensitive (for example, detailed financial data), the ability to access it may need to be strictly regulated.  </p>

<p>While many vendors of archiving solutions stress the performance benefits of reducing the amount of data in the online database, accessing the archived data is a complicated and relatively slow process, since it will need to be located and then restored into the online database, or accessed directly in a much slower backup data base that is not readily maintained from a performance or accessibility perspective. For this reason, it is unrealistic to expect archived data to be usable for analysis/reporting purposes.</p>

<h4>Nearline 2.0</h4>

<p>In the Information Lifecycle Management approach, the Nearline 2.0 repository holds data that is used less frequently than the “hottest”, most current data, but which still needs to  be readily available for analysis or for constructing new/ revised analytic objects for the warehouse to evaluate emerging trends.   </p>

<p>While the exact proportion of Nearline 2.0 to online data will vary, the amount of “less frequently used” data that needs to be kept available is normally quite large. Moving this out of the main database greatly reduces the pressure on the online database and enables continued performance of standard database operations within available time windows, even in the face of the explosive data growth that many organizations are currently facing.  </p>

<p>Thus, the archiving requirements described above do not apply to a Nearline 2.0 product such as SAND/DNA, which is designed to reduce the size of the online warehouse database, while at the same time keeping the data more or less transparently accessible to end users who may need to use it for analysis, for rebuilding KPI’s and so on. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/nearline-20-vs-the-archive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing Nearline 2.0</title>
		<link>http://www.sandmtl.com/news/introducing-nearline-20/</link>
		<comments>http://www.sandmtl.com/news/introducing-nearline-20/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 14:11:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Richard's Blog]]></category>
		<category><![CDATA[Nearline 2.0]]></category>

		<guid isPermaLink="false">http://www.sandmtl.com/news/?p=138</guid>
		<description><![CDATA[In today&#8217;s post, I want to introduce the notion of &#8220;Nearline 2.0&#8243;. While the name might seem esoteric, this concept represents the logical evolution of older data warehouse and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today&#8217;s business world. [...]]]></description>
			<content:encoded><![CDATA[<p>In today&#8217;s post, I want to introduce the notion of &#8220;Nearline 2.0&#8243;. While the name might seem esoteric, this concept represents the logical evolution of older data warehouse and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today&#8217;s business world. Whereas older archiving solutions based their viability on the declining prices of hardware and storage, and rigid “Nearline 1.0” solutions were primarily designed to work with transactional systems, Nearline 2.0 embraces the dynamism of a software and services approach to fully leverage the potential of large enterprise data architectures. </p>

<p><span id="more-138"></span></p>

<p><div class="right"><img src="http://www.sandmtl.com/news/images/portraits/grondin_richard.png" alt="Richard Grondin" /></div>Looking back, we can now see that the older data management solutions presented a paradox: in order to mitigate performance issues and meet Service Level Agreements (SLA) with users, they actually prevented or limited ad-hoc access to data. On the basis of system monitoring and usage statistics, this inaccessible data was then declared to be unused, and this was cited as an excuse for locking it away entirely. In effect, users were told: &#8220;Since you can’t get at it, you can’t use it, and therefore we’re not going to give it to you&#8221;!</p>

<p>Nearline 2.0, by contrast, allows historical data to be accessed with near-online speeds, empowering business analysts to measure and perfect key business initiatives through analysis of actual historical details. In other words, Nearline 2.0 gives you all the data you want, when and how you want it.  (And without impacting the performance of existing warehouse reporting systems!)</p>

<p>Aside from the obvious economic and environmental benefits of this software-centric approach and the associated best practices, the value of Nearline 2.0 can be assessed in terms of the core proposition cited by <a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html">Tim O&#8217;Reilly</a> when he coined the term &#8220;Web 2.0&#8243;:</p>

<blockquote>&#8220;The value of the software is proportional to the scale and dynamism of the data it helps to manage.&#8221; </blockquote>

<p>In this regard, Nearline 2.0 provides a number of important advantages over prior methodologies:</p>

<p><strong>Keeps data accessible:</strong> Nearline 2.0 enables optimal performance from the online database while keeping all data easily accessible. This massively reduces the work required to identify, access and restore archived data, while minimizing the performance hit involved in doing so in a production environment.</p>

<p><strong>Keeps the online database “lean”:</strong> Because Nearline 2.0 data can still be easily accessed by users at near-online speeds, it allows for much more recent data to be moved out of the online system than would be possible with archiving. This results in far better online system performance and greater flexibility to further support user requirements without performance trade-offs.</p>

<p><strong>Relieves data management stress: </strong>Data can be moved to Nearline 2.0 without the substantial ongoing analysis of user access patterns that is usually required by archiving products. The process is typically based on  a rule as simple as “move all data older than x months from the ten largest tables”.</p>

<p><strong>Mitigates administrative risk: </strong>Unlike archived data, Nearline 2.0 data requires little or no additional ongoing administration, and no additional administrative intervention is required to access it.</p>

<p><strong>Lets analysts be analysts: </strong>With Nearline 2.0, far less time is taken up in gaining access to key data and “cleansing it”, so much more time can be spent performing “what if” scenarios before recommending a course of action for the company. This improves not only the productivity but also the quality of work of key business analysts and statistical gurus.</p>

<p><strong>Copes with data structure changes: </strong>Nearline 2.0 can easily deal with data model changes, making it possible to query data structured according to an older model alongside current data. With archive data, this would require considerable administrative work.</p>

<p><strong>Leverages existing storage environments:</strong>  Compared to older archiving products/strategies, the high degree of compression offered by nearline 2.0 greatly increases the amount of information that can be stored as well as the speed at which it can be accessed.</p>

<p><strong>Keeps data private and secure: </strong>Nearline 2.0 has optional privacy and security packages that protect key information from being seen by ad-hoc business analysts (for example: names, social security numbers, credit card information).</p>

<p>In short, Nearline 2.0 offers a significant advantage over older Nearline 1.0 and archiving technologies. When data needs be removed from the online database in order to improve performance, but still needs to be readily accessible by users to conduct long-term analyses or to rebuild aggregates/KPIs/InfoCubes for period-over-period analysis, Nearline 2.0 is currently the only workable solution available.</p>

<p>In my next post, I&#8217;ll discuss more specifically how implementing a Nearline 2.0 solution can benefit both your data warehouse and your business.</p>

<p>Richard Grondin</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sandmtl.com/news/introducing-nearline-20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
