Tuesday, January 15, 2013

No Transformation Engine? No Agility.



Transformation Engine
     It keeps coming back to data transformation. That was the very first challenge with respect to integration that intrigued me years ago, because if all data at its various sources were in the same units of measure, had the same names, didn't need to be run through a formula to be in the same numeric context, were spelled the same, etcetera, etcetera, etcetera, then all you would have to do is get it all together where you wanted it.  I believe that the need to transform and manipulate data remains the single most important impediment to speedy, streamlined data flow throughout an enterprise. 
    The emergence of transformation engines aligned with the early Extract/Transform/Load (ETL) integration architecture pattern years ago. Unfortunately, it generally is not considered an essential function of element of other patterns, which continues to astound me. Regardless of whether the incumbent architecture is EAI, ESB, ETL, B2B, or Data Virtualization, the same issues are present, but the transformation engine is often not part of the solution. That means that all that data transformation is done by one-off coding or scripting, sometimes augmented by limited-scope conversion utilities. It seems like an “unmentionable” topic: people turning their heads the other way and pretending that it’s not important. In fact, it’s every bit as big a deal for the other patterns as it is for ETL!
     Without a transformation Engine, it is impossible to streamline the logic that makes the data work meaningfully. Without it, all data transformation yields brittle break points, impeding the ability to adjust quickly to changes in business or technical requirements and generally slows development and execution time and promotes the attitude of “it works. Don’t touch it.”  

What should you look for in a transformation engine?
·         Well, first of all, look for a transformation engine! If there is one, then consider…
·         A single transformation must handle
1.   Many sources to one (no staging required)
2.   Multiple source types (databases, Cloud apps, electronic instruments, web services, etc.etc.)
3.   Lookups, alignment, complex data manipulation, filtering, etc.
4.  En route” data cleansing
·       Transformations must be completely metadata driven
·     All metadata should all be configured and modified in a single interface, without leaving to separate tools.

What are some typical characteristics of transformation engines that do not meet all these criteria?
·      XSLT engines, for example, operate only on data structured as XML. That means you must perform a separate transformation for each source that does not inherently handle its data in XML. Any transformation engine that requires incoming and /or output data to be in a particular format violates point 2 above.
o   Result: custom coding, or utilities that must be executed to handle the conversion at both ends of the transformation. More hand coding , and in the end to manually coded transformations
·   Classic ETL engines perform only one-to-one transformations, violating point 1 above. The only way to use these engines in a pattern that requires alignment across multiple sources is to stage the data physically or virtually.
o   Result: Development time includes designing the data model to align the sources, building the model, and implementing full transformations from each source to the model, and then from the model to the destination(s).
·    Many integration products have some amount of data conversion utilities built in.
o   Caution: These are always limited in scope and require leaving the environment to use a scripting or programming language to implement complex data manipulation.

     Just remember that without a transformation engine, you are looking at plenty of overhead and most likely the inability to merge, transform move data in real time from multiple sources. Not only will run-time performance be impacted, but the use of coding inevitably means dramatic reduction of agility in development and ability to adjust to changing requirements. 

     Now let’s take a brief look at data transformation in a data virtualization pattern. The typical picture of the concept of data virtualization is something like this:

     Each arrow represents the logic to transform the data from how it is in the particular source to how it needs to be represented in the virtual model. If there is no transformation engine, this logic will require a considerable amount of manual coding, although I concede that there may be a very small subset of the useful manifestations of this pattern that are simple enough to configure the data manipulation without any custom coding or transformation engine.  However, if you drop in the full transformation capability onto each arrow, you have a powerful, agile implementation that can be developed quickly and modified in seconds.

     A variation on this pattern brings the ability to write back to the sources: what we call bi-directional data virtualization. But that’s for another day. 
     You may want to check out Stone Bond Technologies’ Enterprise Enabler® if you want to see a product that has a transformation engine that meets all the mentioned criteria.