Thursday, October 20, 2016

Agile Data Virtualization

There is plenty of talk about Data Virtualization , also known as the Logical Data Warehouse. You can read anywhere about the virtues and cost savings of using Data Virtualization ("DV") in many scenarios. I tend to believe that Data Virtualization is one of the most important new trends in data management in at least a decade, drawn in, finally, by the great Big Data reality and the ensuing demand for efficiency in funneling data to non-technical business analysts, data scientist, as well as the average business user. Instead of spending months to accumulate and correlate data from various sources, it is possible to provide the data in a matter of hours or days.

I also believe that the companies that adopt Data Virtualization promptly will discover that they have naturally developed a competitive advantage over their rivals.

So, this is data virtualization:  The ability to define virtual data models from multiple totally different sources with validation and logic, and provide these as query-able services. Instead of a cumbersome Data Warehouse, the data resides at the sources with logical federation happening when it is queried. Of course, each Data Virtualization product uses its own techniques and constraints to build and execute its models, and certainly some are more agile than others.

What Makes Some Data Virtualization Products Agile?

Most Logical Data Warehouse products are Plain Vanilla as opposed to Agile. Stone Bond Technologies’ Enterprise Enabler® (“EE”) has had data federation and virtualization at the core of its product for more than twelve years, first with Agile ETL. EE’s transformation engine handles multiple live simultaneous sources in parallel, applying federation rules, validation rules, and business rules and formulas as the data is processed directly from the sources and delivered to the desired destination in whatever form is required. Those executions are initiated by a trigger (“push”).

This is the same logic that EE uses to handle all of the logic for Data Virtualization, on-demand / query-able virtual model (“pull”).

Because of the roots of Agile ETL that drive Enterprise Enabler’s Data Virtualization, its DV inherently can incorporate all kinds of options for caching, data validation incorporating lookups from external sources, and even end-user aware write-back to the sources.

Write-Back to the sources is a powerful feature that promotes agility and expands the scope of what can be expected from a DV solution. This feature is the key to realizing Gartner’s “Business Moments.”

Some Agile DV Characteristics to Look for

Note that these points do not cover the features that are generally expected in every DV product.

·         Single Platform. Enterprise Enabler is an all-in-one environment for configuring, testing, deploying, and monitoring all virtual models and executions. Design-time validation also contributes to never needing to leave the Environment.
·         Metadata Driven. Enterprise Enabler is 100% metadata driven. There is never a need to leave and use any other tool. 
·         Reusability. All business rules, formulas and configured objects such as Virtual Models are reusable.
·         Repurposing.  Within a couple of minutes, a virtual model can be re-caste as an Agile ETL, which, when triggered, streams the data simultaneously from the sources, applies the same logic and validation rules, and posts it to the indicated destination in exactly the required format.
·         Robust Transformation Engine. The transformation engine gets data from all sources in their native mode, so there is no staging step involved. The transformation orchestrates across the sources to apply the configured semantic alignment, validation and formulas.
·         Embedded code editors/test environment/compiler. When the logic becomes very complicated, sometimes it is necessary to pass code snippets to the transformation engine. The code snippets become part of the metadata and are managed within the platform itself.
·         No Restrictions on Data Sources. Literally.
·         Data Workflow. In real-life situations, it is common to need some kind of data workflow, processes and notifications, for example, in making BI dashboards actionable without leaving the DV platform. If you think that’s not part of DV, maybe you’re right. But is definitely is needed in Agile DV.
·         Auto-generation and hosting of Services. Soap 1.1, 1.2, REST, Odata, SharePoint, External List, ADO.Net. Accessible via ODBC, JDBC, MySQL, others.
·         Full Audit Trails, Versioning, Security.
·         Plenty of performance tuning available. For example, caching can be easily configured.
·         Framework for “Actionable MDM.” Source data sets can be designated as Source of Record, and the Enterprise Master Services (virtual models) can be designated as the Master Data Definition, which incorporates all of the logic, notations, and security necessary to establish an Enterprise Service Layer.

For years IT used classic ETL to slowly build an expensive, brittle infrastructure. Using Agile ETL could have brought cheaper, faster, and more flexible infrastructure.

As you move forward toward enterprise use of Data Virtualization, why not start out with Agile DV and avoid the hidden pitfalls of most Data Virtualization platforms? 

Monday, June 20, 2016

The Dirty "Little" Secrets of Legacy Integration


The more I learn about integration products on the market, the more astounded I become.  Fortune XXX companies buy politically “safe” products from IBM, Informatica, SAP, and Oracle and launch into tens of millions of dollars’ worth of services to implement them. Egad!   They’d be better off with holes in their collective heads!

Remember the childrens’ story, The Emperor’s New Clothes? Isn’t it time for someone to come out and tell the real story?

Shouldn’t an enterprise grade integration solution simplify data integration instead of creating Rube Goldberg projects?  Does it really make sense to have to send personnel to intense training classes that take months?

Nine things that astound me about other enterprise grade integration products, along with how Enterprise Enabler makes it easier, ultimately reducing the time-to-value by 60% to 80%.

1.     Robust transformation engines are mostly non-existent. This means that anything beyond the simplest relationships and formulas must be hand coded. That’s a huge impediment to fast time-to-value. Enterprise Enabler has a powerful transformation engine that captures mapping, validation, federation, and business rules as metadata instructions through a single GUI without requiring programming. 

2.      Transformation engines cannot interact with the endpoints in their native state. This means there has to be a complete integration to get each source’s data into a compatible format before transformation. Enterprise Enabler's transformation engine receives data from multiple sources in their native formats, combines (federates) them live, and delivers results in any form, or by query.  

3.       Data federation is not applied to ETL, DV, or other modes directly from the sources. Each source is handled individually, and posted to a staging area in memory or database. Enterprise Enabler brings the data together logically "on-the-fly" without storing it in memory or anywhere, and passes it through the transformation engine to the destination or the virtual model. Sometimes for performance select data is cached and refreshed as required. 
  
4.       Many, if not most, endpoints are accessed via some standard like ODBC as opposed to using their native mode. This means that it is not possible to leverage special features of the native application, and negates the possibility of being a strong player in IoT. Enterprise Enabler accesses each source in its native format, enabling the execution to leverage various features specific to the endpoints at execution. Because of the robust proprietary endpoint connectors, called AppComms, Enterprise Enabler easily incorporates legacy ERPs with electronic instrumentation in a single integration step.

5.   Data Virtualization does not support “Write-Back” to the sources. (probably because of #4) Enterprise Enabler supports authorized end-user aware CRUD (Create, Read, Update, and Delete) write-back to source or sources when data is changed or entered from the calling application or dashboard. 

6.     Implementing an integration solution is a matter of working with a number of mostly stand-alone, disconnected tools each of which imposes rules for interaction. Enterprise Enabler is a single platform where everything is configured, tested, deployed, and monitored, with design-time validation, and embedded C# and VB code editors and compilers for outlier situations. A developer of DBA never needs to leave the Integrated Development Environment. 

7.      Various data integration modes (e.g., ETL, EAI, DV, SOA) are delivered by separate tools and do not offer reusability across those modes. With Enterprise Enabler, all modes, including complex integration patterns are configured within the same tool, leveraging and re-using metadata across modes. This means that an enterprise master virtual model can be easily re-purposed, with all the same logic, as an ETL. 

8.     Further, Enterprise Enabler has a data workflow engine, that serves as a composite application builder, with full visibility to the active state and process variables defined throughout the execution. 

9.   Finally, Enterprise Enabler's Integration Integrity Manager monitors endpopints for schema changes at integration touchpoints. When found, IIM traverses the metadata to determine the impact and notifies the owners of those objects.  

In short, none of the legacy integration platforms can hold up to the demands for agility that are essential for maintaining competitive advantage.

Friday, April 22, 2016

Top 20 Question you should ask when selecting a Logical Data Warehouse

If you are evaluating or researching a Logical Data Warehouse (LDW), it is likely that you are looking for a way to eliminate all the overhead and implementation time of your current Data Warehouse or to reduce the proliferation of databases and all the overhead associated with those. You may be looking to use it for one or more scenarios, for example:

  • Support Business Intelligence and Analytics with clean, fresh data consolidated live from multiple sources
  • Standardize usage of data throughout the enterprise with Data as a Service (DaaS)
  • Generate actionable and repeatable Master Data definitions
Logical Data Warehouse           

Classic Data Warehouse         

The following 20 questions will help you to make sure you won’t need to augment with additional tools once you are into your project.

Top 20 Questions to Ask When Selecting a LDW platform


Basic:

  1. Is your LDW agile? Because your requirements are constantly changing, and you need to be able to make changes in a matter of seconds or minutes
  2. Can the LDW connect directly to every source you need? You don’t want to have to invent custom programming to force feed your LDW. That defeats the purpose.
  3. Are all the consumption methods you need supported? ODBC, JDBC, OData, SharePoint BCS, SOAP, REST, Ado.Net, others
  4. Can you configure in-memory and on-disk caching and refresh for selected sources that do not need to be refreshed at each query? This improves performance and alleviates load on the source system. In many situations, you really don’t need all of the source data updated on every query if it doesn’t change much or often. The best platforms will have full ETL capabilities.
Ease of Use and Configurability:

  1. Can you configure, test, deploy, and monitor from a single platform?
  2. Does it have design time debugging? You don’t want to keep going in and out of the platform to test.
  3. Can you re-use all components, validations, rules, etc.?
  4. Is there numbered versioning on configurations, including the who/what and time stamp?
  5. Is there auto packaging and self-hosting of all services?
Enterprise Readiness:

  1. Can you do data validation on source data?
  2. Is there a full transformation engine for complex business rules about combining data?
  3. Is it available on-premise and IPaaS?
  4. Is there an Execution audit trail?
  5. Can a virtual data model be a source to another virtual data model?
  6. Is there write-back to sources with transaction rollback?
  7. Is there end user authentication for full CRUD (Create, Read, Update, Delete) for “actionable” dashboards and such?
  8. Does it handle streaming data?
  9. Are there multiple points for performance tuning?
  10. Does the Platform have individual user logins specifying the permissions for each? For example, maybe a DBA can view and modify a transformation, but not create one.
  11. Is there definitive data lineage available including validation rules, federation rules, etc.?
Want to learn more about the Logical Data Warehouse?  Download the whitepaper.

Friday, February 19, 2016

When Business Intelligence isn’t Intelligent Business (and what to do about it)


The great conference hall fell silent when the CFO stepped up to the front of the room.  He hadn’t called the whole IT team together since he had taken that position three years before.  The rumors and speculation of why he was doing this brought out everyone’s fears and excitement, not to mention creativity. There was even a rumor that he was going to leave the company and teach sky diving.  “There’s less risk in sky diving,” he often remarked.

“All of you know I’m the first person to encourage good Business Intelligence projects that can save us money or give us a competitive edge, but our track record with BI projects indicates that the cost simply isn’t justified in most cases. In fact, our track record is abysmal!

The soon-to-be sky diver turned on the projector. “Here we are. This is the bottom line.”


“Last quarter we approved twelve projects, and only one has been completed… three months later. We can’t pretend any longer, and it’s my job to say this.   These are all projects that theoretically could save us hundreds of thousands of dollars, but the data preparation is so complex and time consuming that that by the time we’re ready for the analytics, business drivers have shifted, which means that the data requirements have changed. I’m seriously tempted to get us out of the BI business.

“I expect everyone in IT to put some thought into this. My door is open, so bring me a solution.”

After the meeting, Marvin-the Millennial was already in the CFO’s office waiting for him.

“Hi, I’m Marvin. Fairly new here, but I have the answer. Remember Terri, the Data Warehouse architect, who…um..sort of disappeared a couple of months ago?”   CFO nods, “Very smart but, I must say, a little odd.”

"Well, her legacy is a couple of huge data warehouses that, as you know, sir, constitute the official repositories for all BI and BA dashboards. If I may be so bold, I believe these are the root cause of the long implementation times.  Since Terri left, I have been working with Enterprise Enabler, which is a data virtualization technology.”

“And what might that be?” interrupted CFO.

“Well, basically, instead of building ETL scripts and moving all the data into the Data Warehouse, you just grab the data from the original sources, live, with Enterprise Enabler resolving the BI queries on the fly and returning the data exactly the way it would if the data were physically stored in a database. You can avoid that whole classic data prep exercise and all the associated risk. On top of that, you can bring live data instead of data that is stale.”

“This whole story sounds like fiction to me. What if some of the data comes from systems like from  SAP instead of relational databases. That must require custom coding?”
“No – Out of the box.”
“Online data feeds?”
“Out of the box.”
“IOT, Big Data, flat files…”

“Same, same, same. Everything is configured from a single extensible platform, and the data relationships across these totally different sources are associated as easily as making a relationship across tables in a regular database. And by the way, these Virtual Data models can be used as Master Data definitions.

“You are not going to believe this, but yesterday I made a slide just like yours showing the data prep time with Enterprise Enabler.” Marvin unfolded his dog-eared slide on the table in front CFO.



“Come. I’ll show you what I’ve been working on. I’ve configured several virtual data models and set up various analytics dashboards using Tableau, Spotfire, and Power BI. Oh, and there’s another really cool thing! I can write back to the sources or take other actions from the dashboard based on decisions I conclude from playing around with the analytics.”

“Ok, Marvin, how are we going to get everyone trained on Enterprise Enabler?"
"They can start with www,stonebond.com"

"We are going to scrap any more Data Warehouse projects unless we need to capture the data for historic reasons. This will bring an important competitive advantage (unless our competitors discover Enterprise Enabler, too.)

Guess sky diving will have to wait. Let’s get this show on the road!”