Data Virtualization is a technology for ETL, EAI, as well as Business Intelligence
I’m enjoying talking with enterprise architects about Data Virtualization (DV) in their companies, how they are fitting it in their overall architecture and what their expectations are. What I’m seeing is the strong tendency toward thinking of DV as a tool for Business Intelligence and less as the basis for “convergence” of all silos of integration. The latter is the topic of conversations I had not too long ago with several Gartner analysts.
Typically data virtualization is considered only for “on-demand” uses. This makes sense if your fundamental premise is that DV is essentially the same having a database, which is never pro-active about delivering data. That means that you must apply technology on top of it to provide queries and act on them. Why not apply the same federation to other forms of data integration? On-demand or delivered physically to an endpoint, the approach eliminates staging databases and other overhead.
Enterprise Data Virtualization Domains
A few of the enterprise architects I have spoken with recently are very clear that agile data federation is one of the main reasons data virtualization is compelling. However, it is difficult – arguably impossible- to define an enterprise data model that is “everything to everybody.” DV brings the opportunity to think “agile,” and I just don’t remember any “single universal” approach to anything in IT that has ever succeeded for more than a few cool success stories about a handful of companies.
Just because a data model is virtual, that definitely does not mean that it is not brittle. In the beginning, you may have a clean model, but the natural progression of additional and changing requirements will eventually yield the traditional issues of staging databases and data warehouses and lead to instability because of the intense level of interdependency across the model. That “extra mile” of custom programming required to make the data exactly relevant to each end use also carries risk and overhead to maintain.
Smaller domains for Data Virtualization tend to increase agilityConsider instead defining reusable rapidly-built, for-purpose federations that include appropriate data validation and conversion to exactly the requirements of the end usages, and that deliver to the endpoint either physically or on-demand as a service. From that perspective, it is not necessary to always define the universe as a single model, and it is not necessary to use different products and development approaches whether you want to use the same data federation to move data (ETL), to handle operational transactions (EAI) with a surrounding orchestration, or on-demand as a web service or other calling mode.
Perhaps “reusable rapidly-built, for-purpose” seems a bit oxymoronic. Think instead of the concept of Master Data, where each definition is constructed in a way that is reusable and is sanctioned for a specific context. One would never have a Master Data definition that is a complete enterprise data model. Instead, there would be useful subsets of that, such as Customer, which with Data Virtualization, is packaged with all the sources, transformation, alignment, and business rules needed to access it. That Customer can be called upon for any use; it can be incorporated in data workflows, ETL, on-demand, uploading to a database or application, dashboard, or queried for reporting.
Bite-sized chunks are always more palatable. While it’s possible to define a virtual enterprise model, queries against that model will almost always yield information that needs further modification to make sense with the objectives and in the context of the calling program. What does that mean? It means that additional data manipulation, even complex transform will most often be required. So, from the perspectives of agility, of ease of use and understanding, and consideration of execution risk, enterprise models should be undertaken with discretion.
For-purpose mapping and data conversion, as with Stone Bond's Enterprise Enabler®, ensures that the usage receives the data in exactly the form required, with the most appropriate performance and latency.