Friday, June 9, 2017

Agile MDM - 1 Myth & 2 Truths

MDM using Data Virtualization

If there’s anything that can benefit from agility, it is most certainly Master Data Management. Untolled MDM projects flat-out failed over the last 15 years? Moreover, why? Must be largely because any dynamic corporation (growing business constantly changes) and with change comes the demand for Master Data to reflect the current reality of at hand immediately. That’s not possible with legacy methodologies and tools.

With the advent of Data Virtualization, “Enterprise Master Services” or “KPIs” are always fresh and accurate ( with the most recent information). This approach significantly reduces the number of copies of data, thereby reducing the chance of discrepancies across instances of data. Data remains in the original sources of record and is accessed as needed on demand for Portals, BI, Reporting, and Integration.

Furthermore, it is not really necessary to define an "everything to everybody" Master definition. Think about it more like an organic approach, growing and changing the models, creating new versions for specific use cases or constituents. The key there is that Enterprise Enabler® (EE) tags every object with notes and keywords as well as the exact lineage, so that a search will find the best fit for the use.

Doesn’t Data Virtualization mean you’re getting a Streaming Data Point?

No, it does not, this is the myth. I often hear the following concern: “If I want to get the KPI, I don’t want just the current live value, I want the last month value or even some specific range of days.”  The answer is, Data Master is actually a virtual data model defined as a set of metadata that indicate all of the sources, and all of the security, validation, federation, and transformation logic. When the virtual model is queried, Enterprise Enabler® reaches out to the endpoints, federates them, applies all other logic, resolves the query, and returns the latest results. So the data set returned depends on the query. In other words, a Master Data Service/Model resolves the query and retrieves data live from the sources of record, and delivers the latest data available along with the historical data requested.

In the case where the model consists of real-time streaming data, of course, you are interested in the live values as they are generated. These models still apply business logic, Federation, and such, and you have some way to consume the streaming data, perhaps continuous updates to a dynamic dashboard. However,  that’s not what makes MDM Agile.

The Challenge of Change

The more dynamic your business, the more important agility becomes in Master Data Management. Applications change, new data sources come along, processes and applications move to cloud versions. Companies are acquired, and critical business decisions are made that impact operation and the shape of business processes. All of these changes could mean updates need to be applied to your Master Data Definitions. The truth is, with legacy MDM methodologies (the definition, programming, and approvals) will be calculated in months, while you are impeding the progress and alignment of new business processes.

What’s the “Agile” part of Enterprise Enabler's MDM?

Agile MDM is a combination of rapidly configuring metadata-based data Masters, efficiently documenting them, “sanctioning” them and making them available to authorized users. Ongoing from there, it is a matter of being able to modify data masters in minutes with versioning, and moving to the corrected or updated service/model. It’s also about storing physical Master data sets only when there is a true need for them.

Ready for the second truth? When you use an Agile Data Virtualization technology such as StoneBond’s Enterprise Enabler®, along with proper use of its data validation and MDM processes for identifying, configuring, testing, and sanctioning Data Masters, you are applying agile technology, and managed agile best practices, to ensure a stable, but flexible, MDM operation. Enterprise Enabler offers the full range of MDM activities in a single platform.

The diagram below shows the basic process for Agile MDM that is built into the Enterprise Enabler.


Step 1.  A programmer or DBA configures a Data Master as defined by the designated business person.

Step 2. The Data Steward views lineage, authorization, tests, augments notes and sanctions the model as an official Master Data Definition.

Step 3. The approved Data Master is published to the company MDM portal for general usage.

Thursday, April 6, 2017

5 Reasons you should leverage EE BigDataNOW™ for your Big Data


Big Data has been swooping the BI and Analytics world for a while now. It’s touted as the better way of Data Warehousing for Business Intelligence (BI) and Analytics (BA) projects. It has removed hardware limitations on storage and data processing. Not to mention, it has broken the barriers of schema and query definitions. All of these advancements have sprung the industry in a forward direction.

Literally, you can dump any data in any format and start building analytics on the records. We mean any data whether it’s a file, table, object, or in any schema into Hadoop.



1. EE BigDataNOW™ will organize your Big Data repositories no matter the source

Ok, so everything is good until you realize all your data is sitting in your Hadoop clusters or Data Lakes with no way out; how are you supposed to understand or access your data? Can you even trust the data that is in there? How can you ensure everyone who needs access has a secure way of retrieving the data? How do you know if the data is easy to explore and understand for the average user?
Most importantly, how do you start exposing your Big Data store with API’s that are easy to use and create? These are some of the questions you are faced with when you want to make sense of you Big Data repositories.

Stone Bond’s EE BigDataNOW™ helps you achieve the “last-mile” of your Big Data journey. It helps you organize your Big Data repositories, whether in a Lake, in the cloud or on-premise, EE helps make sense of all the data for your end users to access. Users will be able to browse the data with ease and expose it through APIs. EE BigDataNOW™ lets you organize the chaos and madness that the data loading individuals uploaded.

2. Everyone is viewing and referencing the same data

For easy access to the data, Stone Bond provides a Data Virtualization Layer for your Big Data repository that organizes the data into logic models and APIs. It lets you provide a mechanism for administrators to build logical views with secure access to sensitive data. Now everyone is seeing the same data and not different versions of it. This reduces the confusion by providing a clear set of Master Data Models and trusted data sets that are sanctioned to have the accurate data for their needs. It auto-generates APIs for the models on the fly so users can access the data through SOAP/REST or OData and be able to build dashboards and run analytics on the data. It also provides a clean queryable SQL interface, so users are not learning new languages or writing many lines of code. It finally brings a sense of calmness and sureness that is needed for true Agile BI development.

3. It’s swift … did we mention you access & federate your data in real-time?

EE BigDataNOW™ can be a valuable component on the ingestion side of the Big Data store too; it will federate, apply transformations and organize the data to be loaded into the Data Lake using its unique Agile ETL capabilities, making your overall Big Data experience responsive from end to end. EE BigDataNOW™ has a fully UI driven, data workflow engine that loads data into Hadoop whether its source is streaming data or stored data. It can federate real-time data with historical data on demand for better analysis.

4. Take the load off your developers

One of the major complexities that Big-Data developers run into is building and executing the Map-Reduce jobs as part of the data workflow. EE BigDataNOW™ can create and execute Map-Reduce jobs through its Agile ETL Data Workflow Nodes; this will help run Map-Reduce jobs and store results in a meaningful, easy way for end users to be able to access the Map-Reduce jobs.



5. EE BigDataNOW™ talks to your other non-Hadoop Big Data sources

EE BigDataNOW™ includes non-Hadoop sources such as Google Big Query, Amazon Redshift, SAP HANA, etc. EE BigDataNOW™ can also connect to these nontraditional Big Data sources, and populate or federate data from these sources for all your Big Data needs.

To read more about Big Data, don’t forget to check out Stone Bond’s Big Data page. What are you waiting for? Break through your Big Data barriers today!


This is a guest blog post written by,

Monday, February 13, 2017

Did You See the Gartner Market Guide for Data Virtualization?

Gartner’s Market Guide to Data Virtualization (DV) that was published a few months ago was really a “coming of age” milestone for that relatively unknown data integration pattern. With the data explosion on all fronts, the traditional tools and patterns such as ETL, EAI, ESB, or Data Warehouse are mostly obsolete. To download and read full Gartner Market Guide for Data Virtualization click here

Unfortunately, it looks like we’re entering another déjà vu scene, where the next "best way" to handle integration problems is hyped as one more stand-alone category of integration. Remember how we had to decide, before initiating a new project, whether the problem required ETL, ESB, or SOA? Bear in mind that it was never that cut-and-dry; every project needed a little of each, so you just picked one. Then you realized you had to have three different tools and vendors, not to mention plenty of custom coding and timelines, counted in years, to get to the desired end, if at all.  In my experience, no architecture can rely solely on a single integration pattern. Most DV tools focus exclusively on Data Virtualization. There may be a vendor that offers tools in each category, but those are typically separate tools that don’t share objects and functionality.

Stone Bond Technologies has always considered integration as a continuum. There is a huge body of capabilities that are necessary for every single pattern.  You always have to access all manner of disparate data sources; you always have to align them to make sense of them together; you always need to apply business rules and validations. You need to make sure the formats and units of measure are aligned … and on and on. Then you need data workflow, notifications, and events. You need security at every turn. That’s where Enterprise Enabler started – as a technological foundation that handles these requirements without staging the data anywhere, and that virtually eliminates programming. With that, delivering as DV, ETL, EAI, ESB, or SOAP is not so difficult. Most integration software, on the other hand, starts with a particular pattern and ends up adding tools or custom coding to figure out "The Hard Part."

It turns out that Data Virtualization demands that multiple disparate data sources be logically aligned in such a way that together they comprise a virtual data model that can be queried directly back to the sources.

I like the diagram that Gartner included in the Guide (To view Gartner's diagram and read the full Market guide, click here). Below is a similar image depicting Stone Bond’s Enterprise Enabler® (EE) integration platform in particular. Note, the single agile Integrated Development Environment (IDE) covers all integration patterns, and is 100% metadata driven. The only time data is stored is when it is cached temporarily for performance or for time-slice persistence.


Enterprise Enabler®


Refer to the above diagram for a few additional things you should know about Enterprise Enabler:
  • As you can see, all arrows depicting data flow are bi-directional in this diagram. EE federates across any disparate sources, and also can write back to those sources with end-user awareness and security.
  • IoT is also included as part of the source list. Anything that emits a discernible signal can be a source or destination
  • AppComms™  are Stone Bond’s proprietary connectivity layer. An AppComm knows how to intimately communicate with a particular class of sources (e.g., SAP,   Salesforce, DB2, XML, and hundreds of others) including leveraging application-specific features. It also knows how to take instructions from the Transformation Engine as it orchestrates the federation of data lives from the sources.
  • The Transformation engine manages the resolution of relationships across sources and the validation and business rules.
  •  EE auto-generates and hosts the DV services
  • Data Virtualizations and associated logic can be re-used as Agile ETL with a couple of clicks. Agile ETL leverages the federation capabilities of DV without staging any data.
  • EE includes a full data workflow engine for use with Agile ETL or seamlessly inserted as part of the overall DV requirements.
  • EE has a Self-Serve Portal which allows BI users to find and query appropriate virtual data models
  • EE monitors endpoints for schema changes at touch-points where data is used in any of the DV services or Agile ETL. You’ll be immediately notified with detailed import analysis. (patented Integration Integrity Manager) 

Thursday, January 5, 2017

Even Beyond the Logical Data Warehouse


What is a Logical Data Warehouse? There is still much uncertainty and ambiguity around this subject. Or, perhaps I should say, there should be.

Instead of trying to lock down a definition, let’s take advantage of the opportunity to think about what it CAN be. It is the role, if not the obligation, of Experts to describe the essence of any new discipline. However, in the case of LDW, a premature assessment is likely to sell short the potential reach and extensibility of the contribution of Data Virtualization (DV) and Federation to the entire universe of data and application integration and management.

Certainly, the players with the biggest marketing budgets are likely to spread a limited, but compelling, definition and set of case studies, which could become the de facto discipline of Logical Data Warehouse. While these definitions may represent a significant step forward for data management, they would be limiting the full potential of what these new models could bring to the marketplace.

I fear, however, a repeat of the biggest historical impediment to realizing a universal data management framework. Each new “wave” of innovation has been blindly adopted and touted as the single best approach ever. ETL only went so far, then EAI came along as a separate technology (to save the world), Data Warehouse (to store the world) then SOA (to serve the world), and now Data Virtualization and Logical DataWarehouses (to access data faster and with more agility). In this case of Data Virtualization and Logical Data Warehouse, we owe it to our fellow technology implementers to leverage every aspect possible, to advance the cause of the ultimate data integration and management platform.


If we look at all of the data integration patterns, don’t we see that there is a tremendous amount of functionality that overlaps all of these patterns? Why do we even have these distinctions?

What if we seize this DV/LDW revolution as the opportunity to reinvent how we think about data integration and management altogether? Consider the possibility of a platform where:

LDW is a collection of managed virtual models 

  
  • These can be queried as needed by authorized users.
  • The same logic of each virtual model is reusable for physical data movement
  • Virtual data models incorporate data validation and business logic
  • Staging of data is eliminated except caching for performance
  • Virtual data models federate data live for ETL
  • Virtual data models and accompanying logic can be designated, or “sanctioned” as Master Data definitions
  • Master Data Management eliminates the need for maintaining copies of the data
  • Golden Records are auto-updated, and in many cases, become unnecessary
  • With the “write-back” capabilities, data can be updated or corrected in either end user applications/dashboards or by executing embedded logic
  • Write-back capabilities mean that anytime a source is updated, all of the relevant sources can be synchronized immediately also. (Imagine that eventually, the sync  process as we know it today simply disappears.)
  • Complex data workflows allow the use of virtual models and in-process logic to be incorporated into the LDW definitions.
  • These logic workflows handle preventive and predictive analytics as well as application and process logic
  • Data Lineage is easily traced based on traversing the metadata that describes each virtual model. 
  • Every possible source: applications, databases, instruments, IoT, Big Data, live streaming data, all play seamlessly together.
         Oh, and LDW is pretty cool for preparing data for BI/BA also!


We at Stone Bond Technologies have been leaders in Data Federation and Virtualization for more than ten years. We believe it is our responsibility to remove all obstacles, allow data to flow freely, but securely wherever and whenever it is needed. Our vision has always been a single, intimately connected, organic platform with pervasive knowledge of all of the data flowing throughout the organization, whether cloud, on premise, or cross-business; applications, databases, data lakes.. any information anywhere.

Being too quick, individually or collectively, to take a stand on the definition of Logical Data Warehouse is likely to abort the thought process that is still ripe with the opportunity to take it way beyond the benefits that are commonly extolled today.


Thursday, October 20, 2016

Agile Data Virtualization

There is plenty of talk about Data Virtualization , also known as the Logical Data Warehouse. You can read anywhere about the virtues and cost savings of using Data Virtualization ("DV") in many scenarios. I tend to believe that Data Virtualization is one of the most important new trends in data management in at least a decade, drawn in, finally, by the great Big Data reality and the ensuing demand for efficiency in funneling data to non-technical business analysts, data scientist, as well as the average business user. Instead of spending months to accumulate and correlate data from various sources, it is possible to provide the data in a matter of hours or days.

I also believe that the companies that adopt Data Virtualization promptly will discover that they have naturally developed a competitive advantage over their rivals.

So, this is data virtualization:  The ability to define virtual data models from multiple totally different sources with validation and logic, and provide these as query-able services. Instead of a cumbersome Data Warehouse, the data resides at the sources with logical federation happening when it is queried. Of course, each Data Virtualization product uses its own techniques and constraints to build and execute its models, and certainly some are more agile than others.

What Makes Some Data Virtualization Products Agile?

Most Logical Data Warehouse products are Plain Vanilla as opposed to Agile. Stone Bond Technologies’ Enterprise Enabler® (“EE”) has had data federation and virtualization at the core of its product for more than twelve years, first with Agile ETL. EE’s transformation engine handles multiple live simultaneous sources in parallel, applying federation rules, validation rules, and business rules and formulas as the data is processed directly from the sources and delivered to the desired destination in whatever form is required. Those executions are initiated by a trigger (“push”).

This is the same logic that EE uses to handle all of the logic for Data Virtualization, on-demand / query-able virtual model (“pull”).

Because of the roots of Agile ETL that drive Enterprise Enabler’s Data Virtualization, its DV inherently can incorporate all kinds of options for caching, data validation incorporating lookups from external sources, and even end-user aware write-back to the sources.

Write-Back to the sources is a powerful feature that promotes agility and expands the scope of what can be expected from a DV solution. This feature is the key to realizing Gartner’s “Business Moments.”

Some Agile DV Characteristics to Look for

Note that these points do not cover the features that are generally expected in every DV product.

·         Single Platform. Enterprise Enabler is an all-in-one environment for configuring, testing, deploying, and monitoring all virtual models and executions. Design-time validation also contributes to never needing to leave the Environment.
·         Metadata Driven. Enterprise Enabler is 100% metadata driven. There is never a need to leave and use any other tool. 
·         Reusability. All business rules, formulas and configured objects such as Virtual Models are reusable.
·         Repurposing.  Within a couple of minutes, a virtual model can be re-caste as an Agile ETL, which, when triggered, streams the data simultaneously from the sources, applies the same logic and validation rules, and posts it to the indicated destination in exactly the required format.
·         Robust Transformation Engine. The transformation engine gets data from all sources in their native mode, so there is no staging step involved. The transformation orchestrates across the sources to apply the configured semantic alignment, validation and formulas.
·         Embedded code editors/test environment/compiler. When the logic becomes very complicated, sometimes it is necessary to pass code snippets to the transformation engine. The code snippets become part of the metadata and are managed within the platform itself.
·         No Restrictions on Data Sources. Literally.
·         Data Workflow. In real-life situations, it is common to need some kind of data workflow, processes and notifications, for example, in making BI dashboards actionable without leaving the DV platform. If you think that’s not part of DV, maybe you’re right. But is definitely is needed in Agile DV.
·         Auto-generation and hosting of Services. Soap 1.1, 1.2, REST, Odata, SharePoint, External List, ADO.Net. Accessible via ODBC, JDBC, MySQL, others.
·         Full Audit Trails, Versioning, Security.
·         Plenty of performance tuning available. For example, caching can be easily configured.
·         Framework for “Actionable MDM.” Source data sets can be designated as Source of Record, and the Enterprise Master Services (virtual models) can be designated as the Master Data Definition, which incorporates all of the logic, notations, and security necessary to establish an Enterprise Service Layer.

For years IT used classic ETL to slowly build an expensive, brittle infrastructure. Using Agile ETL could have brought cheaper, faster, and more flexible infrastructure.

As you move forward toward enterprise use of Data Virtualization, why not start out with Agile DV and avoid the hidden pitfalls of most Data Virtualization platforms? 

Monday, June 20, 2016

The Dirty "Little" Secrets of Legacy Integration


The more I learn about integration products on the market, the more astounded I become.  Fortune XXX companies buy politically “safe” products from IBM, Informatica, SAP, and Oracle and launch into tens of millions of dollars’ worth of services to implement them. Egad!   They’d be better off with holes in their collective heads!

Remember the childrens’ story, The Emperor’s New Clothes? Isn’t it time for someone to come out and tell the real story?

Shouldn’t an enterprise grade integration solution simplify data integration instead of creating Rube Goldberg projects?  Does it really make sense to have to send personnel to intense training classes that take months?

Nine things that astound me about other enterprise grade integration products, along with how Enterprise Enabler makes it easier, ultimately reducing the time-to-value by 60% to 80%.

1.     Robust transformation engines are mostly non-existent. This means that anything beyond the simplest relationships and formulas must be hand coded. That’s a huge impediment to fast time-to-value. Enterprise Enabler has a powerful transformation engine that captures mapping, validation, federation, and business rules as metadata instructions through a single GUI without requiring programming. 

2.      Transformation engines cannot interact with the endpoints in their native state. This means there has to be a complete integration to get each source’s data into a compatible format before transformation. Enterprise Enabler's transformation engine receives data from multiple sources in their native formats, combines (federates) them live, and delivers results in any form, or by query.  

3.       Data federation is not applied to ETL, DV, or other modes directly from the sources. Each source is handled individually, and posted to a staging area in memory or database. Enterprise Enabler brings the data together logically "on-the-fly" without storing it in memory or anywhere, and passes it through the transformation engine to the destination or the virtual model. Sometimes for performance select data is cached and refreshed as required. 
  
4.       Many, if not most, endpoints are accessed via some standard like ODBC as opposed to using their native mode. This means that it is not possible to leverage special features of the native application, and negates the possibility of being a strong player in IoT. Enterprise Enabler accesses each source in its native format, enabling the execution to leverage various features specific to the endpoints at execution. Because of the robust proprietary endpoint connectors, called AppComms, Enterprise Enabler easily incorporates legacy ERPs with electronic instrumentation in a single integration step.

5.   Data Virtualization does not support “Write-Back” to the sources. (probably because of #4) Enterprise Enabler supports authorized end-user aware CRUD (Create, Read, Update, and Delete) write-back to source or sources when data is changed or entered from the calling application or dashboard. 

6.     Implementing an integration solution is a matter of working with a number of mostly stand-alone, disconnected tools each of which imposes rules for interaction. Enterprise Enabler is a single platform where everything is configured, tested, deployed, and monitored, with design-time validation, and embedded C# and VB code editors and compilers for outlier situations. A developer of DBA never needs to leave the Integrated Development Environment. 

7.      Various data integration modes (e.g., ETL, EAI, DV, SOA) are delivered by separate tools and do not offer reusability across those modes. With Enterprise Enabler, all modes, including complex integration patterns are configured within the same tool, leveraging and re-using metadata across modes. This means that an enterprise master virtual model can be easily re-purposed, with all the same logic, as an ETL. 

8.     Further, Enterprise Enabler has a data workflow engine, that serves as a composite application builder, with full visibility to the active state and process variables defined throughout the execution. 

9.   Finally, Enterprise Enabler's Integration Integrity Manager monitors endpopints for schema changes at integration touchpoints. When found, IIM traverses the metadata to determine the impact and notifies the owners of those objects.  

In short, none of the legacy integration platforms can hold up to the demands for agility that are essential for maintaining competitive advantage.

Friday, April 22, 2016

Top 20 Question you should ask when selecting a Logical Data Warehouse

If you are evaluating or researching a Logical Data Warehouse (LDW), it is likely that you are looking for a way to eliminate all the overhead and implementation time of your current Data Warehouse or to reduce the proliferation of databases and all the overhead associated with those. You may be looking to use it for one or more scenarios, for example:

  • Support Business Intelligence and Analytics with clean, fresh data consolidated live from multiple sources
  • Standardize usage of data throughout the enterprise with Data as a Service (DaaS)
  • Generate actionable and repeatable Master Data definitions
Logical Data Warehouse           

Classic Data Warehouse         

The following 20 questions will help you to make sure you won’t need to augment with additional tools once you are into your project.

Top 20 Questions to Ask When Selecting a LDW platform


Basic:

  1. Is your LDW agile? Because your requirements are constantly changing, and you need to be able to make changes in a matter of seconds or minutes
  2. Can the LDW connect directly to every source you need? You don’t want to have to invent custom programming to force feed your LDW. That defeats the purpose.
  3. Are all the consumption methods you need supported? ODBC, JDBC, OData, SharePoint BCS, SOAP, REST, Ado.Net, others
  4. Can you configure in-memory and on-disk caching and refresh for selected sources that do not need to be refreshed at each query? This improves performance and alleviates load on the source system. In many situations, you really don’t need all of the source data updated on every query if it doesn’t change much or often. The best platforms will have full ETL capabilities.
Ease of Use and Configurability:

  1. Can you configure, test, deploy, and monitor from a single platform?
  2. Does it have design time debugging? You don’t want to keep going in and out of the platform to test.
  3. Can you re-use all components, validations, rules, etc.?
  4. Is there numbered versioning on configurations, including the who/what and time stamp?
  5. Is there auto packaging and self-hosting of all services?
Enterprise Readiness:

  1. Can you do data validation on source data?
  2. Is there a full transformation engine for complex business rules about combining data?
  3. Is it available on-premise and IPaaS?
  4. Is there an Execution audit trail?
  5. Can a virtual data model be a source to another virtual data model?
  6. Is there write-back to sources with transaction rollback?
  7. Is there end user authentication for full CRUD (Create, Read, Update, Delete) for “actionable” dashboards and such?
  8. Does it handle streaming data?
  9. Are there multiple points for performance tuning?
  10. Does the Platform have individual user logins specifying the permissions for each? For example, maybe a DBA can view and modify a transformation, but not create one.
  11. Is there definitive data lineage available including validation rules, federation rules, etc.?
Want to learn more about the Logical Data Warehouse?  Download the whitepaper.