Monday, October 7, 2013

The Good Ghosts of Data Virtualization

I’m a datum, tra-la, tra-la, just humming along, happier than most of my colleagues, now that Data Virtualization has come to my life.  I’m what they call a Master data, or rather a source of record, which has been quite a challenge, since I used to get cloned over and over, morphed, and turned upside down. That took lots of time and seemed to be necessary because they couldn't quite get to ME, the original: unvarnished, and clean as a whistle. My colleagues still deal with this and are copied to staging databases all the time, leaving many places where versions reside. There is a constant complex of synchronization and updates across all the databases. Just imagine all the time to build all the comings and goings! Often it’s not even clear what the original, real value is or where it all began. Of course, this is done through such a gallimaufry of tools and custom coding that the information they convey is old by the time it is used. And everyone knows the greater the gallimaufry rating, the higher the tech debt accumulation. Bad stuff.

With this new Data Virtualization approach, I get to stay right where I am, and whenever called upon, I send a fresh virtual ME instantly. I say “virtual” because there’s no copy made of me, only a ghost of me is transmitted, aligned with other data, usually to a browser or to feed some analytics algorithm. Here’s where my psychiatrist has to get involved, because I have this existential dilemma. Does the ghost that’s passed forward actually exist? Does a datum exist if it’s passed virtually? Well, at least it can’t be stolen like copies of data can. I stay up nights worrying about my colleagues who have to be physically copied, or moved altogether, sometimes getting a little beat up, to some cloud in order to be used by a SaaS application. I never have to move because my ghost is passed directly to the cloud when it’s needed, never taking residence there. And one of the really cool things is that if a user of the browser decides that my value needs to be updated, they change it in the browser and send the ghost back to me with the new value, assuming they have permissions to do that.

Sometimes when I’m needed virtually, there are too many calls for ghosts, and the phantom of the opera sings one note too high and crashes my host software system. Fortunately, the masters at Stone Bond, with Enterprise Enabler®, not only have the speediest development environment on the planet, they also have a way of caching my ghost along with others in memory since I really don’t change that often. In the background, the cache gathers fresh ghosts whenever needed. When my call comes in, my ghost flies from the cache of ghosts and finishes the journey, combined with some other ghosts coming live from other systems. The phantom just hums along and the host system lives happily ever after. 

Thursday, September 19, 2013

The Synergy of SharePoint 2013 and Data Virtualization


It’s a powerful combination: Data Virtualization and SharePoint. SharePoint 2013, the most recent release, supports broader-than-ever opportunities for data integration and data-centric implementations. SharePoint has evolved to include its availability as a service in the cloud and a hybrid architecture, along with its fully on premise model.

The availability of these alternate architectures is a compelling argument for adopting SharePoint for many types of applications beyond the historical content management uses. The rich Business Connectivity Services (BCS) provide a range of features that are data-centric, elevating SharePoint to another level as a platform for applications, dashboards, and BI tools.

As implementations mature, though, certain requirements are coming to the forefront. In particular, the reality of most applications is that data needs are not limited to a single data source, or even a single data source type, which is a limitation of not only SharePoint, but of the third-party vendors that provide connectivity. The potential of the built-in powerful BCS is not fully leveraged, and the end result is either limited scope of applications or the need to build a staging database either on premise or in the cloud, the latter of which brings additional security concerns for many customers.


The ideal use of SharePoint leverages BCS to access data live from the original sources, validates the data, presents it to the end user, or application for action or editing, and pushes changes back to the original sources without ever saving it anywhere. End user security ensures that end users only see or edit data to which they have access.

This is what Data Virtualization (DV) brings to the table, and Stone Bond’s Enterprise Enabler is a full-scope integration platform, with data federation at its core. The single Integrated Development Environment (IDE) for development, testing, deployment and monitoring, is built on and fully extensible with Microsoft .Net. Within a few minutes, one can configure and generate data virtualization in all of the data access modes supported by SharePoint, OData,SOAP, REST, ODBC, ADO.Net, and Enterprise Enabler's Custom Connector.


Monday, August 26, 2013

Convergence is, at Best, Asymptotic

No, not asymptomatic. Asymptotic. “Convergence” is a term we hear these days in IT. The convergence of Data Integration, in particular, is the one I care about. In the analysts’ vernacular, converged integration seems to mean a product, company, or platform that handles all modes of data integration – ETL, EAI, ESB, DV et al.

By definition, “convergence” means a coming together, which clearly implies the parts started from other places and came voluntarily or were coerced into coming together. Just looking at history, companies like IBM, Oracle, and Informatica absorbed outside companies and products to nominally have a product suite with all the modes. Here’s how it works: Need ETL? Buy Ascential, rename it, give it a massage with hot towels, then say, “Voilà! Voilà! Voilà!” and your product now covers ETL, too! Or, take a Data Virtualization product, write a bunch of code, and again, “Voilà!” and your product covers ETL, too.

With all due respect to the analysts, the word convergence may describe the reality of most companies incorporating more and more integration modes, but keep in mind though, that in mathematics convergence means getting closer and closer but never quite getting there: asymptotic [translation: Close, but no cigar!].

I am certain that the analysts do not mean Convergence in the mathematical sense. The term is quite useful for establishing a powerful vision of dramatically reduced time-to-value, clean architectures, flexible integration patterns, and highly streamlined change management and maintenance over time.


If you read my last blog http://tinyurl.com/lmwtzth you probably recognize that there is a significant difference with Enterprise Enabler®. It was designed from the ground up with a powerful core that handles all the common functionality of the range of modes, with implicit data federation across disparate sources (databases, electronic instruments, spreadsheets, data warehouses, ERP systems, cloud services and so on). 

 That core is the common root of all data integration modes, a bit like the trunk of a tree that has any number of branches and leaves, instead of trying to converge a bunch of branches by stuffing them all in an opaque vase and pretending like they have a single trunk. That spells trouble, and doesn't even come close to asymptotic. Probably not asymptomatic, either. 

Friday, August 16, 2013

On-Demand vs. Event-Driven. Who Cares?


Let’s say I’m an integration. (Don't laugh!) What is this “integration”  that I actually am? Well, I have been configured for a specific purpose or set of uses, and I know exactly how to access each source, not just the class of source, like SAP, web service, Oracle database, or Excel spreadsheet, for example, but I know where each resides, what its security access is, and what subset of information is of interest for this purpose. I also know what data to filter out from each, what validation needs to be done as I grab the data, and how to make whatever multiple sources are involved meaningful together.

I cross-reference key fields, change units of measure, align formats, and apply functions to aggregate, analyze, or otherwise manipulate the data so it makes sense for this set of uses. And if I’m passed a query along the way, I definitely know how to deal with providing a subset from the domain I’m designed for.   I’m pretty smart, don’t you think? Not to mention self-centered. I’m an Active Integration from Enterprise Enabler® that has been configured in fifteen minutes.

What do I care how I get initiated, or by whom? Certainly not me! I’m just standing by the phone waiting for a call. In the end, whatever terminology you use, I’m the core of “Convergence” of integration modes. Just tell me, and I’ll do the heavy-lifting of an ETL, physically moving data from several heres to a there, or serve data up virtually as a data set through a web service. Put me on a schedule or maybe have me do my integration whenever the ice cream truck comes. (That’s my favorite kind of event to respond to!).  So you see that even if it’s a user refreshing a browser page, or querying a specific subset of data, it’s all the same to me.

Sometimes I inadvertently find myself in the middle of a brawl of my counterparts from other worlds, who only know one way of operating. You know, ETL, EAI, SOA, or Data Virtualization. They think they’re the best at what they do, which is always only one of those. They have to contort themselves to the point of pain in order to play in more than one arena.  I, on the other hand, am so flexible I can do any of the above and touch my toes (with and ice cream cone in my hand) without grunting. If it starts to drip, I just cache it somewhere. And by the way, I'll challenge my counterparts to their specialty any day!

It’s true, though, that sometimes I have my own internal existential contemplation, wondering if I really exist. All my counterparts call me names, saying I’m nothing but a bunch of metadata.  L  Nevertheless, I am the Master that can be used anywhere in any mode, maintaining standardization and agility at the same time. It’s a bit like yoga.

Wednesday, June 5, 2013

What is Data Virtualization?

“Virtualization” is everywhere but nowhere. The term is virtually ubiquitous. The first use I remember when computers got into the picture was “virtual reality,” when we computationally rendered 3D worlds in 2D, complete with lighting models and all that. The good thing is that all those complicated algorithms are now encapsulated for cool things and used by beginner gamers. In those days, we had to actually calculate every pixel ourselves. But I digress.

First, let’s clarify that “virtualizing data” means putting in the cloud or elsewhere in order to eliminate some of the hassles of its existence and maintenance. That has nothing to do with data virtualization, which is a term that I believe is still evolving.

Data Virtualization, according to Rick van der Lans, who literally wrote the book, is “the technology that offers data consumers a unified, abstracted, and encapsulated view for querying and manipulating data stored in a heterogeneous set of data stores.**”  

As the discipline matures, he is expanding his view, as in his new white paper, Creating an AgileData Integration Platform Using Data Virtualization. Definitely recommended reading. 

The “unified, abstracted, and encapsulated view,” from his original definition is the core concept, in my opinion, of data virtualization. In other words, there is a mechanism to bring together, or “federate,” virtually, data from many sources in a way that is useful. This means that the data is federated without creating a physical or cached staging database, but is aligned, transformed, and made available for use. So, for example, you may have a SharePoint BCS application that needs data from SAP, Oracle, and Salesforce.com. Data virtualization will provide a mechanism to merge all of the data into the form necessary for the end user’s interaction in SharePoint. The data is federated “on the fly” and delivered virtually to a web page, on-demand upon refresh of the screen. Think about the security of the backend data that has been accessed…it never actually moves from its original source! Data virtualization also includes writeback to the sources (with end user security, but that’s for another blog) so that an end user can, for example correct his phone number or address, sending it as an update directly to the backend source. )See more examples at http://tinyurl.com/a3wkffc)

You can see that this description expands the definition to include any sources, not just data stores, although the focus of most data virtualization products is BI, in which case, that limitation it makes sense. The BI view of using data virtualization usually is with respect to federating relational databases for the sole purpose of querying. The tools that were designed assuming that constraint have some difficulty accommodating the expanding definition. 

In addition to evolving from federating data stores to federating any kinds of disparate sources, data virtualization is shedding the concept of “on-demand” only. Now federated data is not just available by web services, ado.net, ODBC, JDBC, etc, but for any type of data integration, such as ETL, EAI, etc. 

In fact, it is the “data virtualization” concept of federation that becomes the kingpin for “Convergence,” as Gartner is wont to say, of all integration modalities in a single toolset, sharing metadata and business rules across all. 

**Rick F. van der Lans, Data virtualization for Business Intelligence Systems, Morgan Kauyfmann, 2012

Tuesday, April 16, 2013

The Domain, It’s Plain, Is Mainly Under Strain

Data Virtualization is a technology for ETL, EAI, as well as Business Intelligence

I’m enjoying talking with enterprise architects about Data Virtualization (DV) in their companies, how they are fitting it in their overall architecture and what their expectations are. What I’m seeing is the strong tendency toward thinking of DV as a tool for Business Intelligence and less as the basis for “convergence” of all silos of integration. The latter is the topic of conversations I had not too long ago with several Gartner analysts.

Typically data virtualization is considered only for “on-demand” uses. This makes sense if your fundamental premise is that DV is essentially the same having a database, which is never pro-active about delivering data. That means that you must apply technology on top of it to provide queries and act on them. Why not apply the same federation to other forms of data integration? On-demand or delivered physically to an endpoint, the approach eliminates staging databases and other overhead.

Enterprise Data Virtualization Domains

A few of the enterprise architects I have spoken with recently are very clear that agile data federation is one of the main reasons data virtualization is compelling. However, it is difficult – arguably impossible- to define an enterprise data model that is “everything to everybody.” DV brings the opportunity to think “agile,” and I just don’t remember any “single universal” approach to anything in IT that has ever succeeded for more than a few cool success stories about a handful of companies.

Just because a data model is virtual, that definitely does not mean that it is not brittle. In the beginning, you may have a clean model, but the natural progression of additional and changing requirements will eventually yield the traditional issues of staging databases and data warehouses and lead to instability because of the intense level of interdependency across the model. That “extra mile” of custom programming required to make the data exactly relevant to each end use also carries risk and overhead to maintain.




Smaller domains for Data Virtualization tend to increase agility

Consider instead defining reusable rapidly-built, for-purpose federations that include appropriate data validation and conversion to exactly the requirements of the end usages, and that deliver to the endpoint either physically or on-demand as a service. From that perspective, it is not necessary to always define the universe as a single model, and it is not necessary to use different products and development approaches whether you want to use the same data federation to move data (ETL), to handle operational transactions (EAI) with a surrounding orchestration, or on-demand as a web service or other calling mode.

Perhaps “reusable rapidly-built, for-purpose” seems a bit oxymoronic. Think instead of the concept of Master Data, where each definition is constructed in a way that is reusable and is sanctioned for a specific context. One would never have a Master Data definition that is a complete enterprise data model. Instead, there would be useful subsets of that, such as Customer, which with Data Virtualization, is packaged with all the sources, transformation, alignment, and business rules needed to access it. That Customer can be called upon for any use; it can be incorporated in data workflows, ETL, on-demand, uploading to a database or application, dashboard, or queried for reporting.

Bite-sized chunks are always more palatable. While it’s possible to define a virtual enterprise model, queries against that model will almost always yield information that needs further modification to make sense with the objectives and in the context of the calling program. What does that mean? It means that additional data manipulation, even complex transform will most often be required. So, from the perspectives of agility, of ease of use and understanding, and consideration of execution risk, enterprise models should be undertaken with discretion.

For-purpose mapping and data conversion, as with Stone Bond's Enterprise Enabler®, ensures that the usage receives the data in exactly the form required, with the most appropriate performance and latency.



Friday, February 8, 2013

The Second Face of Data Virtualization

                                                                                                                                                 
The industry, the analysts, and the rest of us are trying to get a handle on the many faces of Data Virtualization (DV). As soon as we think we get it, another use case shows up. If we define DV as federating data “in flight” as opposed to putting it in a central database/warehouse/mart, then we are talking about streamlining EAI, ETL, ESB, SOA, and reporting (aka BI, BA, etc.), where data is either physically moved or served on-demand. The same multi-source federation is valuable for all these patterns, however DV has become known nearly exclusively as a support technology for Business Intelligence.

How Unfortunate!

While DV is valuable across these other patterns, I want to highlight the new architectural pattern taking hold as a second key application of the technology that is being referred to as Transactional Data Virtualization (TDV). This pattern would simply not be possible without TDV.

Transactional Data Virtualization

Most people associate DV exclusively with its “first face,” Business Intelligence. The second face may surprise you as you absorb the concept of taking DV from reporting to operations. Data is federated live directly from the sources, served up on-demand to applications or to end users in portals, and updated by the application or user, effecting a heretofore unparalleled agility and efficiency to your business.

I wrote a blog some time ago about turning dashboards into operational consoles, where TDV makes data actionable. KPI dashboards can be turned into interactive user interfaces where users can take action on the conclusions and decisions based on the KPIs. Our customers are streamlining their customer-facing applications and portals by using TDV to federate data from multiple backend systems and present it on web pages. Certain fields, for example an address or preferences, are editable. If the end user is authorized, and write-back is enabled, Enterprise Enabler® updates that information in the source systems, with transaction management. If the changed information needs to reside in more systems than the original source of record, it updates those systems at the same time. I’m sure you picked up that this means you don’t have to run a separate synchronization step, which means that latency in synchronizing can be reduced an irrelevance.

Data Federation Must Have a Transformation Engine

Data Virtualization brings tremendous value any time data is required that must be compiled with or put in the context of data from one or more other sources. In my experience, this happens close enough to “always” to round up. Unless there is a transformation engine involved that federates multiple disparate sources, the federation will require custom coding to align the data properly. That’s a strike against agility, which is not good, but with such an engine, DV enables fluid complex integrations of many flavors.

On the Horizon

Enterprises with forward-looking IT organizations are incorporating Data Virtualization into their Enterprise Architecture and are quickly reaping the ROI as they reduce tech debt. The days of heavy, complex, infrastructure are numbered as CIOs elect to eliminate the old obstinate and unyielding integration platforms to finally deliver true agility to the business.




Tuesday, January 15, 2013

No Transformation Engine? No Agility.



Transformation Engine
     It keeps coming back to data transformation. That was the very first challenge with respect to integration that intrigued me years ago, because if all data at its various sources were in the same units of measure, had the same names, didn't need to be run through a formula to be in the same numeric context, were spelled the same, etcetera, etcetera, etcetera, then all you would have to do is get it all together where you wanted it.  I believe that the need to transform and manipulate data remains the single most important impediment to speedy, streamlined data flow throughout an enterprise. 
    The emergence of transformation engines aligned with the early Extract/Transform/Load (ETL) integration architecture pattern years ago. Unfortunately, it generally is not considered an essential function of element of other patterns, which continues to astound me. Regardless of whether the incumbent architecture is EAI, ESB, ETL, B2B, or Data Virtualization, the same issues are present, but the transformation engine is often not part of the solution. That means that all that data transformation is done by one-off coding or scripting, sometimes augmented by limited-scope conversion utilities. It seems like an “unmentionable” topic: people turning their heads the other way and pretending that it’s not important. In fact, it’s every bit as big a deal for the other patterns as it is for ETL!
     Without a transformation Engine, it is impossible to streamline the logic that makes the data work meaningfully. Without it, all data transformation yields brittle break points, impeding the ability to adjust quickly to changes in business or technical requirements and generally slows development and execution time and promotes the attitude of “it works. Don’t touch it.”  

What should you look for in a transformation engine?
·         Well, first of all, look for a transformation engine! If there is one, then consider…
·         A single transformation must handle
1.   Many sources to one (no staging required)
2.   Multiple source types (databases, Cloud apps, electronic instruments, web services, etc.etc.)
3.   Lookups, alignment, complex data manipulation, filtering, etc.
4.  En route” data cleansing
·       Transformations must be completely metadata driven
·     All metadata should all be configured and modified in a single interface, without leaving to separate tools.

What are some typical characteristics of transformation engines that do not meet all these criteria?
·      XSLT engines, for example, operate only on data structured as XML. That means you must perform a separate transformation for each source that does not inherently handle its data in XML. Any transformation engine that requires incoming and /or output data to be in a particular format violates point 2 above.
o   Result: custom coding, or utilities that must be executed to handle the conversion at both ends of the transformation. More hand coding , and in the end to manually coded transformations
·   Classic ETL engines perform only one-to-one transformations, violating point 1 above. The only way to use these engines in a pattern that requires alignment across multiple sources is to stage the data physically or virtually.
o   Result: Development time includes designing the data model to align the sources, building the model, and implementing full transformations from each source to the model, and then from the model to the destination(s).
·    Many integration products have some amount of data conversion utilities built in.
o   Caution: These are always limited in scope and require leaving the environment to use a scripting or programming language to implement complex data manipulation.

     Just remember that without a transformation engine, you are looking at plenty of overhead and most likely the inability to merge, transform move data in real time from multiple sources. Not only will run-time performance be impacted, but the use of coding inevitably means dramatic reduction of agility in development and ability to adjust to changing requirements. 

     Now let’s take a brief look at data transformation in a data virtualization pattern. The typical picture of the concept of data virtualization is something like this:

     Each arrow represents the logic to transform the data from how it is in the particular source to how it needs to be represented in the virtual model. If there is no transformation engine, this logic will require a considerable amount of manual coding, although I concede that there may be a very small subset of the useful manifestations of this pattern that are simple enough to configure the data manipulation without any custom coding or transformation engine.  However, if you drop in the full transformation capability onto each arrow, you have a powerful, agile implementation that can be developed quickly and modified in seconds.

     A variation on this pattern brings the ability to write back to the sources: what we call bi-directional data virtualization. But that’s for another day. 
     You may want to check out Stone Bond Technologies’ Enterprise Enabler® if you want to see a product that has a transformation engine that meets all the mentioned criteria.