Friday, August 14, 2015

Some People Just Don't Get Data Virtualization

“But where’s the data?”  Terri-the-architect, as she was often called, had definitely been around the block, and had plenty of successes under her belt. She grew up in the halcyon days of the data warehouse, proudly touting star schemas and cubes to anyone who would listen. As it became drudgery, she carried the mantle as it got heavier and heavier. Poor Terri is clearly still buried in the heavy-duty design, extension, and redesign of the massive data warehouse model. And she’s ETL-ing all over the place, which is always a messy proposition.  She often longed to be back in the days when she was working with the very latest technologies.
Marvin-the millennial, who was tagging along, nudged Jerry-the-Gen-Xer, trying not to emit a guffaw. Both quickly busied themselves on their cell phones. When they recovered their composure, Marvin tried an explanation. “It’s a little like a hologram. It looks and acts like it’s there, but in reality it’s only an illusion.”

Terri was frantically scanning the network to see where the database was. “Better not be up there in the cloud! You know that’s sensitive data we’re working with. No, you’re better off loading it into the data warehouse.  I can arrange for a team to get that done for you. We’ll even expedite the project, so you could have it in, say eight weeks.”
“Snicker, snicker.”

“Oh, good timing,” said Terri, “I was having my afternoon chocolate attack.” She stood up and walked all the way around her computer, even underneath it. “Ok, guys, where did you stash the data?” Clearly she was getting distraught. “Come on, is this some kind of a trick?"

“Hmm..Yes. Maybe Magic?” proposed Jerry. Marvin turned his back and madly double-thumbed his cell. His shoulders and head were shaking as he laughed silently.

“Ok. Here’s the scoop,” said Marvin.  Marvin was a self-proclaimed data scientist, and most people would agree that it fits his expertise.  He walked over to his cube and pulled up his latest analysis that he had set up in Spotfire. “Until a couple of weeks ago, the way we did this was that we had IT pull data from the data warehouse into a SQL database, and add the data from two or three other data sources. They set up something that ran every week to update all the data for me.”

“You’re talking about the ETL scripts that keep the data fresh,” Terri interrupted.  “But now there’s no database, and it looks like the ETL scripts aren’t anywhere either.” 

Marvin continued, “See this is data from SAP, Salesforce, Oracle, and even live data from the plant. I can make it sing and dance in Spotfire, without waiting a week to get new data. It’s always the latest and greatest!”  

Jerry added, “Yep, we bought this agile integration software called Enterprise Enabler® that does what’s called Data Virtualization.”

Terri interrupted Jerry before he could say any more. “Oh, so it IS in the cloud. You’ve virtualized the data warehouse into the cloud. Can’t do that. See what happens when I take a vacation? Everything goes caty-wampus!”

“Calm down, Terri. Let me finish. The data is NOT in the cloud at all. Enterprise Enabler grabs the data as it is needed directly from the sources. No data warehouse or database needed. It aligns it, and resolves Spotfire’s queries and returns it essentially to the display. No copies and no data moving anywhere.”

“Well, my word!” Terri exclaimed. “I’ve never seen anything like this before. Must take a lot of programming to get that to work.”

"That’s another cool thing, Terri," exclaimed Jerry. "Stone Bond’s Enterprise Enabler is a single platform that you use to configure these “virtual models,” and it stores all of the configuration as metadata. Again, no data is stored, unless, of course, you need to cache part of the data for a bit so as not to bring SAP to its knees. That’s configurable, too and we did it in two weeks.”
Terri seemed confused. “No, But where's the CODE? There HAS to be programming involved!"

Marvin and Jerry in unison, “Nope.”   Both exit stage left.

Terri sits down, exhausted. She knows she hasn’t kept up with the newest technologies, and she really misses the thrill of making successes out of them. She drifts off…

No one really knows what happened to Terri. She just disappeared. No one heard from her again.  But there were rumors of sightings late at night of a ghost-like lady with very white hair, madly searching the networks and mumbling something like, “Yoohoo! Code! Wheere aare yoooou? I’ll find you sooner or later.”
She awakes with a start. “Some people just don’t get it. No, Terri-the-architect is not going to disappear like that. Not me! So, where’s the documentation, so I can get started?”

Thursday, July 9, 2015

Agile ETL

It seems now that most enterprise architects are at least aware of the concept and meaning of data virtualization, something I, along with our team, had awaited many years. It really is a fairly significant mind-shift to eliminate dependence on the idea of staging databases and data warehouses in order to make federated data available as needed. Until the term “data virtualization” was coined and analysts began spreading the word, we were a bit stuck, since even they could not understand, or perhaps articulate what our integration platform, Enterprise Enabler®, actually does and how.

The interesting thing is that we started out applying the underlying concepts of live federation to ETL configurations. As far as I can tell, analysts haven’t grabbed onto this concept yet, but the power of this variation on DV makes it worth contemplating.  For now we’ll call this “Agile ETL,” or, I suppose it could be dubbed “Virtual ETL.” Yes, that sounds better. 

What is ETL? As everyone knows, it means:

Extract data from source. Transform it to the destination format. Load it directly to a destination application or to a staging database or data warehouse. For each source, the process is the same. Then when it’s needed elsewhere, the consuming application, dashboard, or other system queries the staging DB or Data Warehouse. So that’s EXTRACT. TRANSFORM. LOAD… Three distinct steps for each source, generally involving considerable custom programming.

In our book, “transformation” and “federation” are always done together, so, any time the “T” in ETL is performed, federation is also invoked if there are multiple sources involved. So Agile, or Virtual ETL inherently involves one or more sources, and is a point-to-point solution only as a special case, i.e., when there’s only one source.

The steps in Data Virtualization

First let’s look at Data Virtualization (DV). What is it that constitutes a DV? See the key elements in the diagram above:
  1. Data accessed live directly from sources
  2. ALL SOURCES are included (e.g., electronic devices, any application, data feed, bus, Big Data, lake, cloud-based, on premise, in the field.  Oh, AND, of course, all databases, and relational or hierarchical formats, which constitute the total domain of other DV software products unless considerable programming is involved.
  3. Data is federated, transformed, and validated live as it come from the sources. (This is not necessarily available without custom coding in competing products.)
  4. No data is stored en route, except where caching is applied for performance or impact on source systems
  5. The target entity or data model is defined up front, but can be easily modified.
  6. Each DV is packaged in one or many consumable and queryable formats.
  7. Each DV may include end user awareness with full CRUD (Create, Read, Update, and Delete) functionality, honoring security permissions. This “write-back” capability has huge implications for simplifying synchronization and enabling users of consuming applications to actually take action on data (e.g., updating or correcting data.) This is not a built-in capability for most DV platforms. 

Most people think of DV as only being for Business Intelligence and analytics. You can see that it is also a powerful tool any on-demand uses, such as portals, dashboards, Enterprise Service Layer (ESL), and a basis for Master Data Management (MDM).

Compare the path of Agile ETL?

Who says that ETL has to be clunky, just because that’s the way it grew up? Who says it must be one-to-one? When we started out more than twelve years ago, our objective was to get data from lots of different sources, combine them (federate) in the process, and deliver the data validated and however and whenever the destination application, data store, or electronic device required it.

Let’s look at what, in my book, constituted, and continues to define, Agile ETL. Hmmm…. Being a strong proponent of reusability, I’ll refer to the list above for brevity and clarity:
  1. Same as (1.) above. Data accessed live
  2. Same as (2.) above. ALL SOURCES
  3. See (3.) above. Data is federated, transformed, and validated live
  4. See (4.) above. No data is stored en route.
  5. Same list as SOURCES (2.) above ALL DESTINATIONS.
  6. Each Agile ETL is associated with one or more data workflows that include triggers, additional validations, business rules, multi-threaded logic, etc., essentially a configured composite application. Triggers are many, including web service call, so the waters do get muddy.
  7. Same as (7.) above.End user awareness for authorization to execute ETLs and/or the associated workflow process.

Since we started out with Agile ETL, the DV became a matter of adding the packaging as services along with complex query capabilities.

Since Enterprise Enabler is a single secure platform where every element of configuration is stored as metadata, you can readily see that reusability becomes natural, and that added benefits abound, such as monitoring for change and analyzing its impact, tracing the actual data lineage, and MDM.

Even if you decide to continue using Data Warehouse architecture rather than going the DV route, isn’t it at least time to add agility to your ETL?

Wednesday, March 18, 2015

Agile Integration Software Rescues the Dead Side of Bimodal IT Architecture

I’m sure you’ve heard of the recent high profile discussions about how to bring much-needed innovation to mature companies that are carrying the comforting ballast of old fashioned infrastructure.  That infrastructure is the greatest impediment to agility and innovation (unless you count the people and culture that go along with it). I first heard about “bimodal” at a local Gartner program a few weeks ago and found the concept both thrilling and disturbing.
The idea of bimodel divorces the reliable and stable back-office (Mode 1 “Core”) from all that is innovative (Mode 2 “Innovation”). This means that innovation is explicitly separate, with new, presumably agile, infrastructure to create new lines of business for generating new revenue streams, and to provide more contemporary modes of interacting with consumers and employees.
I’m a little concerned that a bi-modal declaration promotes an easy way out for Mode 1 laggards. Their management no longer have to worry about modernizing or even interacting with Mode 2. In approaching the problem this way, we are continuing to be enablers of the infrastructure and its management who are addicted and afraid to even try to wean from their brittle, ancient technologies and methodologies.  I suspect that part of the reason that we got to this point is their continued abdication of decisions and recommendations to vendors and consultants on the dead side of bimodal. With advancement generally limited to creative marketing and re-messaging the same 20-year-old technologies and ideas, the name-brand consultants in enterprise IT nominally grab the buzz but only deliver it on the periphery.

I definitely agree that relying on the Mode 1 (reliable, stable) IT is highly unlikely to bring  significant innovation, and I also believe that the best way to get real innovation underway is to completely separate it out, with different people, skill sets, management, and objectives. But if we imagine how this will play out, there is likely to be a complete bifurcation where the innovative side never is able to leverage the back-office functions. They will inevitably invent their own (less reliable and less stable) versions of back-office. What happens then? Mode 1 eventually dies on the vine? We regress to pre-1980 basics?  Business in general accepts worse performance on the backend functions?

(You didn't really try clicking that button, did you?)
It’s probably obvious that my take is that BOTH modes should advance aggressively, ‘though I do believe the innovative side should be unencumbered by the Old World. Mode 1 management should take this as a gauntlet to push hard to replace their integration infrastructure with Agile Integration Software, such as Stone Bond’s Enterprise Enabler®, which is a proven enterprise-ready framework that boosts agility, interacts with both Mode 1 and Mode 2 applications and data, and offers up to 90% reduction in time to production along with a huge reduction in tech debt. That is what is needed for companies to survive and enjoy a competitive advantage in the coming years.
You Mode 1 people do have a choice. You can continue as is and sit by waiting for your inevitable demise, or you can be the hero that solidly bridges Mode 2 and Mode 1. We are seeing this successfully implemented by forward-looking CIOs. So, find a leader and press GO!

Thursday, February 26, 2015

The Hyper-Converged Integration Platform

Frankly, I find it amazing that it has taken so long for the concept of convergence of integration to become a topic of discussion. In fact, it’s mind-boggling to me that almost all of the manifestations of integration functionality appeared on the scene as islands, with delineation only just now beginning to blur. ETL tools do ETL; EAI tools do transactions; SOA does web services; DV tools do data virtualization, and so on.

Stone Bond’s Enterprise Enabler® came on the scene ten years ago or so, as a platform with metadata shared across all “patterns,” rendering the classic integration patterns  essentially moot. If someone stepped into data integration, contemplating it as a general problem to be solved, they might identify these various patterns, but they would also quickly see that they are not mutually exclusive. There is clearly more overlap in the demands across these patterns than an observer of the evolution of data integration tools would support.
The providers of integration tools were much too hasty in solving the problem: not considering anything beyond the particular integration style at hand. It’s reminiscent of the custom programmed applications that are designed for a specific customer. Eventually it dawns on someone that this solves a problem for a large set of businesses. What happens? This nice, clean solution gets bells, whistles, and tweaks for the second customer and... Voila! It becomes a (usually lousy) “Product” that requires months of customization to implement. Now, think about how different the Product or the integration tools  would have been, had the initial design taken into consideration the superset of potential users and uses.

Click to Enlarge Picture
Whether you are physically moving data, packaging integrations as web services, or generating virtual data models for MDM or for querying, there are some critical key elements that are necessary to have at the core of the integration platform.
Let’s go back to the idea of hyper-converged integration platform. It is only possible if the overall design takes into account the essence of shared functionality across the characteristics that will, or may, be needed in every pattern. Even if you don’t know what the patterns will be, you do know that the platform should always be able to, for example,

  • Access data and write data to any kind of endpoint - live
  • Federate and align that data across multiple sources, whether they are the same or totally different
  • Align and  transform to also ensure the data makes sense to the receiving endpoint, whether physical or virtual
  • Apply business logic
  • Validate and filter data
  • Manage various modes of security
  • Apply workflow, error handling, and notification
  • Package the integration in many different ways
  • Scale up and out
  • Reuse as much configuration as possible

A hyper-converged integration platform has all of these capabilities, and as a single platform, all of the objects configured are reusable and available for more universal value. For example, an ETL that brings five data sources together and posts to a destination (no staging), can also be reused as a Data Virtualization model for live querying on demand.
Whatever mental picture you have of integration toolsets,  try thinking instead about an Enterprise Nervous System, with data flowing freely throughout the company exactly how and when it is needed.

Enterprise Enabler is a hyper-converged Integration platform, perhaps because the overall design came about from years of contemplating the essentials of integration as a whole. It’s easier to start out with a universal consolidated solution than to back into it from ten different, fully developed tools.
An integrated set of tools is highly unlikely to become a Converged Integration Platform, and will forego the powerful agility and elimination of tech debt that Enterprise Enabler can bring.