Sunday, February 9, 2014

Does Data Virtualization Foreshadow the End of Data Warehouses as We Know them Today?

How many warehouses do you think have been eliminated (or never built) because of Amazon.com? I have no idea, but I bet it’s a big number. Maybe the warehouses they do need are smaller, too. This is worth reflecting on, even though it’s pretty obvious.  Why were they able to skip the warehouse in the distribution/delivery process? They figured out that it is much more efficient to deliver the goods directly from the source. They needed agility, and they made it happen.

It seems to me that the time has come for IT departments to start thinking the same way about Data Warehouses (DW). It ought to be easier to deal with electronic data than physical objects, shouldn’t it? So, what’s the problem with this picture?  Why not go straight to the source for data when it’s needed and deliver the freshest data where it’s needed? Now that Data Virtualization (DW) has become mature, increasingly forward-looking companies are heading that direction.

Your data warehouse diehards will tell you something similar to what Vincent Rainardi says in his blog http://dwbi1.wordpress.com/2012/12/03/why-do-we-need-a-data-warehouse/  that a data warehouse is worth it because it is:

a)      Integrated
b)      Consistent
c)      Contains historical data
d)      Tested and verified
e)      Performant

He goes on to say that the reason the DW meets these characteristics is that so much time has been invested by business analysts, data architects, ETL Architects, ETL Developers, and testers. (Is this good?)

I believe that Data Virtualization can bring all of these characteristics to the table with the arguable exception of historic data. But notice that the first and foremost reason above for a data warehouse is that it provides integrated data. Perhaps going forward, Data Warehouses should be designed primarily to maintain historical data that is not being captured and/or maintained anywhere else. Let’s say we reduce the Data Warehouse use to maintaining historic data, with all other data access and movement being accomplished by Data Virtualization. That thought raises lots of flags, doesn’t it? Security; validation; moving data physically when needed; writing back when the data is federated; performance, etc. Actually, by combining DV with other patterns, companies are addressing these requirements now.

Data Warehouses and Data Virtualization are inextricably tied together, with clearly overlapping objectives. Now that Data Federation and Data Virtualization are coming of age, we need to begin thinking more in terms of the best use for each, so that we can leverage Data Virtualization wherever it makes sense. DV adds dramatically to the agility of a company’s infrastructure and to its capacity for informed, rapid decisions. Data Federation, which is at the heart of DV as we commonly speak of it, can also be applied to ETL-type data movement, eliminating the staging. So, Data Federation can be the best way to populate todays and tomorrow’s DW.

Within five years, the most competitive companies will be using predominantly agile integration for BI, BA, and transactions, with the data warehouses focused primarily on accessibly preserving historic data.   Realistically speaking, though, there will still be many companies relying on their workhorse Data Warehouse, and they still will have trouble calling themselves “agile.”

Let me know what you think.