Monday, November 21, 2011

Data Quality and your Enabled Enterprise

As a good example of Agile Integration Software, Enterprise Enabler's data quality features and capabilities serve a representative discussion. (http://www.enterpriseenabler.com/)   In the context of data integration, I tend to think of data cleansing and profiling in two separate categories, "batch" and “in transit," or "real time."

Batch - Often this is performed as a first-step-project to an integration implementation to ensure that any existing data that is being used is as correct as possible.  The context of correctness is generally defined by the source for which it exists. When the source is an existing data warehouse, the correctness is usually considered with respect to a pre-defined master data definition.

In-Transit or Real-Time - Once the integration is in place, new data is being generated and flows through the organization and systems via the agile integration framework. This data must be validated as soon as it appears in play, as well as when it is passed to its destination, since the definition of "correctness" is ultimately determined by the target use.

With Agile Integration, the philosophy is to focus on the data required for the purpose of the project at hand. While cleansing/validating an entire database or data warehouse full of data may be important, the chances are that it is not important for any particular integration project.  Addressing the subset needed means a more efficient project and faster time-to-value.

Pre-validating existing data

Using the inherent capabilities of Enterprise Enabler to discover data schemas and objects, one can simply "point" the appropriate AppComm (application communicator) to a database or application that is to become a source to the integration, and the schema or services available are presented. Select the tables, fields, objects, etc. of interest, and grab a sample or the full set of data. In a configured process, the data can be cleaned, validated and standardized using pre-built rules, external tools, or special logic for each unit of data, by field, by record, or by other cross-section.  Rules for logging, notifications, and mediation are configured as part of the process. With this approach, you are focused specifically on the data that will be used for the subsequent integration, and a staging database is not required. Once this process is configured, it can be triggered to automatically run as desired to ensure ongoing monitoring and validation of new data. The results can be fed to a BI tool or spreadsheet for statistical analysis on the data quality ("profiling").

With the AppComm approach, combined with the ability to easily create virtual relationships across disparate sources, cross validation ("matching")  across systems or merging data to enrich it, becomes a reasonable exercise, without having to design and build a consolidated staging database. Of course, if the situation still requires a staging database, there's no more efficient way to populate it than Enterprise Enabler. 

After you have completed this step, the chances are that the new data that will be captured from here forward needs to be cleansed, too. This can be done "real-time" as it is being acquired from the source and passed to the federation and transformation steps of an integration.

Validating data on-the-fly

As is the nature of Agile Integration, Enterprise Enabler offers multiple places where data cleansing, validation, and remediation can be managed within the flow of data through an integration. Some amount of detection of erroneous data is done as a natural part the data acquisition by the intelligent AppComm technology.  Driven by metadata definitions, AppComms check not only for valid data (type, format, etc.), but also for the expected schema.  Additionally,

o    Validation/cleansing rules, pre-built processes or  3rd party tools can be dropped in or invoked for detection and mediation at various points in execution:
·    As soon as the data has been acquired
·    As it is being transformed and merged with other sources
·    After it has been transformed
·    By the destination's Appcomm  before/as the data is being posted (plus transaction rollback and assurance in the case of multiple destinations)
·   Anywhere in the data workflow process surrounding the transformations
o      Enterprise Master System ensures that the data comes from the correct source when an end user invokes a particular piece of information.
o     Since Enterprise Enabler's user interface ("Designer Studio") is tied directly to a copy of the run-time engines, as you design an integration, you can do a trial run from the studio and see a sample of the data for inspection to get an idea of the quality of data you are dealing with.

Still don’t trust your data?

Sometimes there are situations where validation rules just won't cut it. Example: setting hard minimum and maximum values for something coming from a physical processing plant. You may be able to determine a reasonable range, but only with the knowledge of what happened yesterday will  you be able to determine that a "way out of whack" set of numbers are actually due to a disruption at some part of the plant yesterday. Enterprise Enabler has a preview/analysis feature that holds the result data (post transformation and process) just before it is posted to the destination, in a virtual store, only to be released and posted after review and approval by an authorized human being.  That person can do quick tests on ranges, averages, etc. as a gut feel reality check and then fix it if necessary before releasing the data set.

And for those of you who care about data governance

Only an AIS is a single end-to-end integration solution. This means that security can be maintained throughout the integration infrastructure.  Developers and Data Analysts log in with the permissions of their role and group, and anything they build or change is logged with who-what-when stamp. Every object in Enterprise Enabler is locked down in such a way, preventing intentional or accidental diversion or modification of data and their flows through the enterprise.

And what about bad data in your ERP?

My apologies, but I just can't help saying to the ERP vendors, "shame on you" for not taking the responsibility to ensure that the data captured and generated by your system is completely correct.  How could you let that happen? People trusted you!  Ok. Ok.. I'll stop short of calling for an "Occupy ERP" movement.

Alltogether..

With all of the various angles on Data Quality, it’s clear that Agile Integration inherently brings a range of capabilities that are simply not possible with other DQ products. Whether you are looking to correct existing data or ensure the quality of new data as it is created, the fact that the data quality is handled as a natural aspect of integration means a more efficient overall solution.


Thursday, November 17, 2011

Big Data Quality


Big Data means big data quality issues, right?  Well, of course, right.  Big data means more data that can be bad or go bad one way or another.  Big, bad data could have big bad consequences. But just think about some of the ways Big Data may have be in better shape than others.

Big Data
  • is usually captured automatically, without manual intervention
  • often has been gathered over many years, so that the framework for capture and validation at the source has improved and been "debugged" over time. Various standards may also play a role in the data capture and ultimate quality. Examples might be weather related data and GIS data.
  • is often used in ways where analytics and conclusions improve with data volume and errors in individual data become less important.  Data quality is essential for Business Intelligence (BI), but from some perspectives, and some aspects of data quality, DQ may move into the background.  
Big Data from Social Media has some additional considerations.
  • Capture mechanisms are well known. Facebook, emails, Twitter, etc.
  • We know that the quality of information from these is highly questionable - that's the nature, and the beauty of the beast.
  • We also know that they are well structured. For example an email has a very easily determined structure: there is the header, the body, attachments, etc. The content of the unstructured data (body, attachments) can be searched for relevant information and key words. Bad data might be a corrupted attachment or garbled text in the body, but other than that, errors are, almost by definition, not really bad data.
  • What do you/we want from social Media’s Big Data? Mostly the trends of the masses. If you clean it up that very exercise could corrupt the data.
Senile data forgets its source and loses relevance and accuracy

There is an altogether different situation with many of the nouveau trendy Corporate Big Data projects.  In this case, big data is likely to be consolidated data coming from a number of sources, including those suffering from data senility. Senile data has been through the wringer, moved from residence to residence, been "cleansed" and perhaps never saw the light. A data warehouse usually is populated with data from a huge number of sources, and fallible humans have pored through it, run human-defined cleansing and validation algorithms, and then subjected it to manually-programmed integration code.  It is incumbent upon the mining and analysis functions to accommodate assumptions about data quality.

So, as you can see, data quality and cleansing becomes an altogether different problem for Big Data.


Monday, October 31, 2011

Mainframe nearly to the cloud…

Most people know that Salesforce.com is one of the first and certainly most successful SaaS (Software-as-a-Service) applications on the market.  One good thing is that Salesforce stores all the data in the cloud and manages it, eliminating the need for their customers to have the skills and the hardware, software, and maintenance costs  to keep it on-premise.  That good thing is also the biggest  downside of SaaS: the concern that the data is stored in the cloud.   Unfortunately,  companies worry about having their data stored off-premise with very little control over its management, security, and perhaps even accessibility. 
 
Nevertheless, Salesforce.com has a huge customer base and offers business functionality important to every business I can think of. While business sectors like financial institutions and healthcare could easily make valuable use of the  functionality of Salesforce and other cloud apps, the risk and regulatory restrictions make storing their data in the cloud impossible.  
 
These institutions simply cannot make copies of their data or move it to the cloud. The data that is inherent to the functionality of Salesforce may not necessarily be the concern, but often it must be presented to users side-by-side with ancillary data that must come from the company's backend, on-premise systems.

But all is not lost!  Agile Integration Software (AIS) naturally solves this problem by creating federated views from multiple sources and making them available to any application, complete with end user access authentication. Here is the crux of the solution with Salesforce as the example:

1.     Salesforce.com offers the capability of modifying the screens, so anyone who is conversant in doing that can modify a screen to populate the data from an external source. One option would be to configure it to call a web service when the screen is presented or refreshed.

2.    Within a few minutes, an Agile Integration Software, such as Stone Bond's Enterprise Enabler Virtuoso, can be configured, generating metadata that virtualizes and aligns backend data with Salesforce data, and packages it as a web service compliant with Salesforce. Optionally, this would be a bi-directional (Read/write) connection.

3.    When an end user brings up the Salesforce page, Salesforce calls the web service, and Enterprise Enabler Virtuoso accesses the on-premise data live, aligns it with the relevant Salesforce data, and sends it to Salesforce screen. With the bi-directional option,  data can be entered or corrected on the screen to automatically update not only the Salesforce data, but also the on-premise data, assuming the end user has proper permissions to write back to those systems.

Companies have spent millions of dollars over the last few years trying to do this, and with the Agile Integration Software as the basis, Enterprise Enabler Virtuoso was configured in three weeks to incorporate this Salesforce connectivity. Now it is available off-the-shelf so that anyone can implement it in a few minutes or at most a day. 

The diagrams below depict the data residence and flow where on-premise data is required in a Salesforce.com implementation. The first is the common solution where a copy of the on-premise data is made and resides on the Salesforce cloud. I don’t need to tell you the overhead and pervasive concern with doing this.  The second shows the on-premise remaining on-premise, where it belongs, and AIS accessing, federating, and delivering a data view virtually to the Salesforce page.
 

http://tiny.cc/id5h0

Tuesday, October 11, 2011

MDM - Making It Actionable/Transactional as You Define It

How useful is your MDM… really? Does it just sit there in a repository, waiting for your MDM team to update it?

One of the common criticisms of MDM projects is the magnitude of the project and the low ROI. More than likely, you are in the middle of a project with great expectations of value.




Metadata and MDM

When most people think of metadata, the scope is limited. It's a schema that defines a virtual data set, for example. It may include a cross-reference in a lookup table. And maybe it includes definitions of what each element means and what unit of measure it is in.

Then what?

Then you have to add references to where the data ought to come from.

But then what?

You've spent quite a lot of resources defining this. Are you any better off than with the ancient
"Corporate Dictionary?" How do you actually use it?

The most common ways to implement Master Data definitions are indicative of Big Projects:

1)    Define a data warehouse to store the data in, so that it is accessible in the form defined in the Data Master. Once the data warehouse is designed, corresponding integration must be built to populate it from the appropriate sources, aggregating and transforming as needed, as often as necessary for minimal latency.
2)    Write web services to access the data from the sources and make them available as Master Data sets.

When I talk about metadata, I think in terms of representing not only the data schemas but also the metadata that describes where the data is, what part of it is relevant, how it aligns with other data of interest, how you or the real or virtual destination (master) needs to see it, and how it must be converted, or mapped, to be meaningful to the destination.

Then there are the events that trigger data flows, and all the surrounding logic notifications, security, and a host of other things. If you can capture all of this information as metadata, in reusable, separable "layers," you will have a highly flexible and "actionable" collection of metadata.

If you define a metadata Master, say, "Customer," for use corporate-wide, you will have several different sources that are in play to ensure that the various parts of the virtual "Customer" definition has the best information from the most appropriate sources. Part may come from your ERP, part from Salesforce.com, and another part from an Oracle database.

· Does your Master definition encapsulate everything you
  need to use the data?

· Can your metadata be pumped onto a message bus?

· Can it be packaged as a web service?

· As an ADO.net object?

· As a SharePoint external content type?

· Does it incorporate the capabilities to perform CRUD
  (Create, Read, Update and Delete) operations at the
  endpoints?

· If one of the sources schemas changes, do you have to do
  anything to accommodate it?

· Do you even need to know a source changed?

If I'm a programmer, I want to leverage the corporate Master Data for my programs and the users of my programs. I can look up the data definitions, sources, etc., and use them, but that still requires a lot of work. When the Master Data includes a full set of metadata, then all I have to do is invoke the web service or External Content Type in SharePoint, or ADO.net and so on. I simply select the Master I need and indicate how I want to use it. I don't need to know what the various sources even are, and if the source changes, I won't need to make any changes, since the metadata will reflect what it needs to. And I can pass that selection process on tot the end user of my application or dashboard.

The diagram above shows the scope of metadata captured for MDM by Agile Integration Software. The metadata is generated from a GUI and has an atomic structure so that a change to any metadata can be made without impacting the whole hierarchy of metadata. Using this type of metadata infrastructure, changes are absorbed without creating waves. Data is accessed directly from the original source, eliminating the need for a costly data warehouse to resolve virtual relationships across sources.

MDM - Making it Actionable/Transactional as you Define it

How useful is your MDM, really? Does it just sit there in a repository, waiting for your MDM team to update it?  One of the common criticisms of MDM projects is the magnitude of the project and the low ROI.  More than likely, you are in the middle of a project with great expectations of value.





When most people think of metadata, the scope is limited. It's a schema that defines  a virtual data set, for example. And a Data Master may include meaningful keywords and tags for identification. It may include a cross-reference in a lookup table. And maybe it includes definitions of what each element means and what unit of measure it is in. Then what? Then you have to add references to where the data ought to come from. But then what? You've spent quite a lot of  resources  defining this. Are you any better off than with  the ancient "Corporate Dictionary?"  How do you actually use it?

The most common ways to implement Master Data definitions are indicative of Big Projects:

1.       Define a data warehouse to store the data in, so that it is accessible in the form defined in the Data Master. Once the data warehouse is designed,  corresponding integration must be built to populate it from the appropriate sources, aggregating and transforming as needed, as often as necessary for minimal latency.

2.       Write web services to access the data from the sources and make them available as  Master Data sets.

When I talk about metadata, I think in terms of representing not only the data schemas but also the metadata that describes  where the data is,  what part of it is relevant, how it aligns with other data of interest, how you or the real or virtual destination (master)  needs to see it, and  how it must be converted, or mapped, to be meaningful to the destination.  Then there are the events that trigger data flows, and all the surrounding logic notifications, security, and a host of other things.  If you can capture all of this information as metadata, in reusable, separable  "layers,"  you will have a highly flexible and "actionable"  collection of metadata.

If you define a metadata  Master, say,  "Customer,"  for use corporate-wide, you will have several different sources that are in play to ensure that the various parts of the virtual "Customer" definition has the best information from the most appropriate sources.  Part may come from your ERP, part from Salesforce.com, and another part from an Oracle database. Does your Master definition encapsulate everything you need to use the data? That is, can your metadata be pumped onto a message bus?  Can it be  packaged as a web service?  As an ADO.net object? As a SharePoint external content type?  Does it incorporate the capabilities to perform CRUD (Create, Read, Update and Delete) operations at the endpoints? If one of the sources schemas changes, do you have to do anything to accommodate it? Do you even  need to know a source changed?

If I'm a programmer, I want to leverage the corporate Master Data for my programs and the users of my programs.  I can look up the data definitions, sources, etc., and use them, but that still requires a lot of work.  When the Master Data includes a full set of metadata, then all I have to do is invoke the web service or External Content Type in SharePoint, or  ADO.net and so on.  I simply select the Master I need and indicate how I want to use it.  I don't need to know what the various sources even are, and if the source changes, I won't need to make any changes, since the metadata will reflect what it needs to. And I can pass that selection process on tot the end user of my application or dashboard.

The diagram above shows the scope of metadata captured for MDM by Agile Integration Software. The metadata is generated from a GUI and has an atomic structure so that a change to any metadata can be made without impacting  the whole hierarchy of metadata. Using this type of metadata infrastructure, changes are absorbed without creating waves. Data is accessed directly from the original source , eliminating the need for a costly data warehouse to resolve virtual relationships across sources.




Monday, September 26, 2011

Atomic Architectures for Flexibility and Best Time-to-Value

Big Data definitely doesn't scare me as much as Big Projects. The good thing is that Agile Integration and cloud solutions, along with the pervasive viral nature of Social Media are fueling a shift away from Big Projects and toward incremental atomic approaches with highly reduced time to value.


Historically, Big Projects have been the only way to solve IT problems for Big companies. I've watched "generations" of IT management fall for the "next, next Big Project" promoted by BiG hardware companies, Big Systems Integrators, Big-time analysts. After all, who are you going to trust to set the direction for your Big company? The Big waves always are very well sold, and for the newbies, there is an air of doing something really new and really Big. Of course there's also Big money involved, enough to keep the economy healthy, maybe. As soon as one Big wave of Big Projects are several years in progress, the next Big begins to emerge and put the last one out of business before most are completed. Many stall, are pared way down to the only working prototype, or are abandoned altogether to be replaced with a fresh new Big approach.

Big Projects started long ago, but in the last twenty or so years they have included:

Defining a single corporate database, planning for all the applications to share that same db
Corporate Dictionary - standardizing the data names and documenting the source
ERP - A single comprehensive application means that you don't have to rewrite all the apps to use that db
EAI - to address the reality that the above two Big Projects can't be realized
Business Process Re-engineering (BPR) - Shifting focus from data to processes- Big Consulting Projects with no need to know much of anything about technology
Change Management - because radical BPR created lots of employee issues and confusion
Data Warehouse (DW) - in spite of the intentions, smaller projects and best of breed applications were more successful than Big Projects, and businesses came to rely heavily on those systems. Data Warehouses were supposed to bring all the data together for reporting.
Business Intelligence (BI) - analyze the data in the DW.
MDM - the modern Big Project for a corporate dictionary.

There were others, of course, but you get the idea. Finally wedges are putting crevices in the Big Project and opening it for solutions that are more atomic and less global. Some of the wedges are being driven by:

○ SOA
○ SaaS
○ Agile Integration Software (AIS)
○ Social Media
○ The economy and the imperative for improved time-to-value on projects

These factors open the floodgates for a bifurcation of approaches to enterprise technologies. As Mark Twain said, "If there's a fork in the road, take it." Traditionally Big technologies, like BI and ERP, are now offered as a cloud based service and for single users without the overhead of Big.

These wedges are all eroding the cornerstone of Monolithic solutions. For example, SOA is inherently atomic, with an enterprise solution being a collection of SOAP objects. While the initial SOA initiatives were envisioned as enterprise-wide, in the end even the prototype projects were Big, long, and difficult. If the technology were not built on reusable components, the ongoing work that continues to be done would likely have been abandoned. Similarly, while data warehouses continue to be expanded for Business Intelligence, we are seeing a huge number of BI tools coming on the market for specific use or end user-centric implementation. Cloud computing is also whittling away at Big Projects, with significant cost and time reductions as well as shorter time to value.

One of the interesting things is that this split is creating an environment where emerging technology waves now may have two completely different interpretations, one the old Big approach and the other a more agile and atomic approach.

Take data federation and virtualization, for example. The Big approach is to define a complete (or at least really Big) virtual enterprise data model for federation that acts like a staging database would, and then to implement the integration across and through the virtual staging model. Of course, at some point, it's necessary to define those integrations based on what the end result datasets or use happen to be for the consumers.

The new fork in the road (which I would take) requires no data model, virtual or not. An Agile Integration Software addresses federation and virtualization in an atomic manner, with the end use the initial driving force. Entities that describe , for example, "customer" are defined, the source of record for each piece of the Customer data is identified, and metadata is auto-generated and packaged to grab the data from the sources, federated it "on the fly" and deliver it to the calling program, end user or data workflow on demand or in an event-driven manner. An atomic approach to MDM naturally follows.

Hooray for the fork in the road!

Tuesday, August 23, 2011

Query Optimization across Apples and Oranges


I just recently realized that the problem of federated query optimization that my colleagues and I think about is a completely different problem from the one that has been so well addressed by academics and big database vendors. Even the more contemporary players in the federation and virtualization world don’t extend this concept across disparate sources, and they focus only on run-time speed, but not agility.

 
Those approaches simply do not address the reality that is brought to the forefront now that we have integration solutions that federate everything from web services, spreadsheets, medical instruments, social media, and many other sources, including relational databases in a single "query." The fundamental value of Agile Integration Software (AIS) is violated by the inherent constraints posed by the query optimization tools on the market.


       •        What good to us is a query optimizer that assumes all of the
              data sources are relational databases?

       •        And adding XML to the mix just doesn't "cut the mustard!

       •     What if, in order to use these tools, I have to construct a
              universal data model that includes all of the data that could
              possibly be in play? (The clunky antithesis of agility!)

       •     Do I have to anticipate every data query I might want to
             optimize?

       •    What if there is a lot of transformation that needs to be
             performed along the way to make the data meaningful
             across the sources?


For "pull" integration, where a user's browser interaction or a calling program triggers and specifies the data to be accessed, a SQL query is a universally comfortable way to access information. For a live query in virtual federation, that needs to be interpreted by the federating software into whatever the endpoints understand. The data flowing in from multiple connections needs to be synchronized as the query is being fulfilled from the disparate systems. A "push" integration typically is usually better known, with at least the sources pinned down ahead of time, and often with the exact data being sent each time.

 
In our world, performance is a different problem from typical query optimization on or across relational databases. In complex cross-application joins, the critical path is often more related to the i/o speed of one of the applications or the frequency of disbursement of data, or some other macro factor. The join and access order logic, for example, can be tuned to accommodate the highest resource consumer.

 
So you can see that our problem is not the same one. When people ask us about query optimization, we are sometimes talking apples and oranges!

 

 

 

Friday, August 5, 2011

The Illusion of Pre-Built Adapters


Why do people continue to fall for the idea of "pre-built" adapters? I guess that's pretty obvious. Anything you really want to believe in, you can. Unfortunately, it doesn’t follow that believing in something makes it so.


Dick: Ok, guys, have you figured out how we're going to get this Salesforce/SAP integration done in time for me to meet the VP's deadline?

Harry: I've been online all week studying the possibilities. I saw Adapters from three companies that look really good.

Dick: Come on, we've been down this road before.

Harry: Right, but things have changed! The latest Adapters work immediately off the shelf! Let me show you the videos on the one that looks like it has the most customers… (beep .. "Hello - Welcome to Something SOA Great's web site. I am about to show you the latest thing since…")

 Harry and Dick watch, enthralled. Tom stands behind them with a frown, rolling his eyes.

Dick: If I hadn't seen it, I wouldn't believe it.

 Tom: Hmm. I've seen it and I don’t believe it.

 Harry: Don’t be obstructionist. You just saw that SSG's Adapter automatically connected to both Salesforce and SAP. All the mapping is already built in, so we don't have to even know what the data fields are. You know what that means - we don’t have to deal with those know-it-all data analysts.

 Dick: We could just download it and be off to the races to make the deadline with time to spare. 

Tom: And what if we need to use custom field in Salesforce?

 Harry: Didn't you see that they have 10,000 Adapters in their library? And fifty different versions of this one, so we can look for the closest fit. Then we can tweak it just a little bit to fit what we need. They said they have tools for that.

 Dick: Let's do it!
 
Tom: I need a vacation. Have fun.

So Harry downloaded the Adapter to his desktop.

Harry: Here we go! I'll install here and get it up and running.

 Adapter: /very faint chuckle/

 Harry doesn’t hear. He’s reading the on-screen instructions.

Harry: OK, I'm connecting to SAP

 Two weeks later

Harry: Now I'm connecting to Salesforce

 Two weeks later

Harry: I think I'm going crazy. I keep hearing this noise that's getting louder every day. But I digress. Here we go - let me try running this beast.

 Adapter: BANG! CRASH ! HA! HA! HA! /hysterical laughter that can be heard all the way to the VP's office/
 
Tom is back from a month’s vacation overseas; He runs to Harry’s cube to see what's going on.


Tom: AARGH! What's going on here? .. Oh, no! The Adapter is squirting SAP data out the port all over the desk!

Dick: /loudly/ Not again! Everyone to their stations! Call 911! Call the auditors! Call OSHA!


Tom: Unplug something before someone drowns in this big pile of SAP Data.

 As the VP arrives at the scene, a cloud forms near the ceiling, creeping out to the hallway. A final Guffaw from Adapter, and the light mist of Salesforce data turns into a terrible storm

------------ End of Same Story, 23rd time around ---------------



What is it that we all want so very badly from Adapters?
  • Off-the-shelf solution
  • Effortless integration between two endpoints
  • No need to program complex mapping and business rules
  • No need to know the technical aspects of connecting with either endpoint
  • No need to have domain or business knowledge in either endpoint application.
  • No need for a data analysts to be involved
  • A perfect fit with both endpoints
What makes that impossible?
  • Almost every implementation of an endpoint is customized or changes over time
  • Your selection of source data is different from what is in the adapter
  • Your other endpoint also has been customized
  • Your business rules don’t match what's there already
What do you have to do to accommodate?
  • Write code to be able to feed the data to the adapter the way it expects to see it ( a full integration in itself!)
  • Write code to adjust the manipulation and fit to the customized endpoint
  • Open up the adapter, if possible, and add code to modify the business and mapping rules. 
 What do pre-built adapters offer?
  • Working at most once off the shelf
  • Good experience in re-working code
  • Opportunity to practice emotion control
  • Incentive to find an alternative.

The alternative:

Connectivity must be designed in such a way that the re-usable parts are solid, and reusable for every instance of a source or destination. Decoupling the business rules from the technical business rules and the connectivity improves reusability. This is the model used by agile integration software. AppComms Removing Splints from Octopus














     

Friday, June 24, 2011

Harnessing Social Media's Big Data

With all the latest hadoopla, there are a lot of people wondering what Big Data means to them. There's a sea of data being generated constantly from Facebook, LinkedIn, and Twitter, and the value of mining and analyzing that body of information is easy to imagine. You can find out all kinds of things that are relevant to your business decisions as well as information that can be turned into stellar marketing initiatives.

Social media easily trumps structured data and documents from the hype perspective. Since its arrival has been relatively recent, we don't really have the same internalized model to extrapolate from in order to conceptualize its meaning and treatment as Big Data. Data from RFIDs and medical instruments is also growing at an exponentially increasing rate, and also offers tremendous basis for completely new innovative solutions.

Forrester's Brian Hopkins, in his informative and interesting "Big Opportunities in Big Data"  discusses the areas and issues of Big Data that are at various stages of commercial readiness. It seems that we still don’t have all the bases covered.

One of the interesting aspects of Big Data, particularly the Big Data that is being captured via social media or instruments like RFIDs is that as soon as it's captured, it becomes history. The reason I'm focused on this type of data is that a huge body of historic information may not be particularly useful to many businesses. By its very nature, the value to businesses is mostly immediate. In the bigger picture of science and statistics, of course, or for fortune 500 companies, it can be, but it is possible to capitalize on the rapidly changing trends of your customer base and the mob mentality displayed before the opportunity eludes you. With the speed of change we are experiencing today, by the time you can get practical results from any Big Data project, you will have missed opportunities to react and reap today's value.

Yesterday I was discussing federation of Big Data with Dana Gardner, principal analyst at Interarbor Solutions, who noted, "Large amounts of data need to be mined, sure, but there are gems in the fresh data from the right applications at the right time that also spell business gold. The needle may be in a hay stack, or it may be inside two or more applications, where the value of the data is only accessible in the context of the integration activity."

Most people interested in Big Data focus on capturing and mining huge bodies of social media data, or in the case of RFIDs, having a complete picture of all of them at one time. There are plenty of uses of this kind of data that are much more practical, in some cases more useful, and certainly do not incur huge projects. For social media, you can leverage the great search capabilities of Facebook, Twitter, and LinkedIn. With RFIDs, just focus on the subset you care about, meld it with a rule, action, response, and you're off to the races long before your Fortune 999 competitors can get started. If you have been dreaming up great ideas about the value that social media's Big Data can bring to your business, let's consider an interim, easy answer to the somewhat premature heavy-duty Big Data approach.

Rather than get the data and then ask the questions from it in the traditional data warehouse tradition, figure out the questions and resultant actions first. Then capture exactly the data you need going forward. Anything in the past is social history. Grab the data as you see it and react immediately, or turn on selective capture for data for a month or so, and analyze the data as it arrives, or trends as they happen, take action, and "dispose of" the data. Keep in mind that the Big Guys in IT thrive on Big, whether it be databases, hardware, or global corporate projects. Do you need those? Can you wait years to get results from Big Data? Think about getting a head start on competitors with a 5 figure investment and your own creativity.

Friday, May 27, 2011

Value In The Integrated Metadata Stack

If you're using or looking at Agile Integration Software (AIS), the chances are you are discovering that there's metadata for everything that's not tied down (and even for those that are). Think about the conceptual epitome of integration. There have been various analogies over time, conjuring up a brain with information flowing (ENS - Enterprise Nervous System), or the flow and pervasiveness of water, and more recently we hear about the fabric. A few years ago I coined the term "synchronapse" to represent the idea of information flowing intelligently, like synapses firing anywhere as needed. Of course, that never took off - new words are fun, but an uphill battle.

I like the fabric metaphor. Good word: the fabric of nations, the geologic structure of a roc; something that represents the essence and the underlying structure; maintaining integrity but flexibly, so that if one point on the fabric moves, the fabric shifts to accommodate that change.

The only way to capture and control the fluid movement of the fabric and be able to ensure that the enterprise can quickly respond to internal and external changes, is to describe everything that can change with metadata. That's a cornerstone philosophy of AIS. Whether the fabric needs to adjust for planned business initiatives or unforeseen external events, the supporting integration infrastructure is adjusted via metadata changes.

Notwithstanding security controls, the full metadata stack must be available to any object or process in the environment, so that conditions at one point on the fabric can affect change in another. That is at best very difficult if each component of your integration stack has its own independent set of metadata. With AIS, as you build your integration with GUI tools, the various layers of metadata and the inter-relationships across the layers is being captured and managed automatically.

What's the value of an integrated metadata stack?
  • Reusability of metadata across the stack
    • Example: a for-purpose data selection from a source (e.g., customer demographics) can be reused as needed for any map. Also rules and formulas are reusable, along with processes and many other objects.
  • At run-time, any business rule can take action based on current values of any metadata
    • Example: a different transformation map can be executed depending on customer ID
  • Any layer can incorporate other metadata by reference
    • Example: an enterprise master data model can reference all the metadata that is needed to bi-directionally access and federate the appropriate sources

This is definitely one of the cool things about Agile Integration Software, possible because it's an IDE, all under one roof.


Wednesday, May 4, 2011

The Soft Side of Tech Debt

The lean and mean beats the sloth. Sure, some rabbits are a "flash in the pan," but eventually the turtle will lose. As I recall, in that fable the rabbit was fast but lazy and not so smart. You can't count on that being the case with your competitors that have less tech debt than your company. Just look at the big Dotcom successes. They solved problems that hadn't been solved before with completely new approaches and carried no tech debt. Now the problems they solved so well have become shared technology demands for the old "bricks and mortar" companies, implicitly increasing their tech debt, and whittling away at their competitive advantage.

Tech debt refers to the ever-increasing overhead and cumbersome nature that technology infrastructure brings to your company. Old programs that have been patched over and over, ancient hardware, and ever changing trends over time contribute to the tech deficit http://tinyurl.com/3tnyjxf .  Moving toward new trends always complicates your infrastructure unless you can make the 100% shift. Without a complete shift, the left-over ballast limits your ability to leverage new trends.

There is a soft underbelly of tech debt that can be equally debilitating to your company's competitive advantage, and that is the collective aspects of the IT department and services that prevent you from being able to address and keep up with the demands from the business side. There's a backlog of projects, too few people, and not enough of the right skills on the IT team. The focus is on high-profile, new trend projects that presumably would alleviate some of the older creaking infrastructure.

"All I need is five data points for my dashboard every week. How can it be that I'm looking at six months before I get it?" or, "I'm just building a little SharePoint application and I need a couple of pieces of information to include." Business just cannot comprehend why it is so difficult. Then they discover a "back-door" way to get the data themselves via a Rube Goldberg contraption that downloads, puts it in a spreadsheet, tosses it around with formulas and macros, and "Voila! Voila!" there's a palatable concoction to feed their needs. And so is born Shadow IT. The good thing is that the business person stops asking for things, and the bad thing is the surge of new, secret tech debt lurking in every department, where you least expect it.

Apart from subscribing to special purpose SaaS applications, or buying an in-house piece of software, the majority of Shadow IT centers around data access and integration. With continuous increases in empowerment of non-IT employees with tools such as SharePoint and others, it behooves you to start looking at ways to control the spike in tech debt by incorporating an agile tool for integration. Lean Integration methodologies are sensible and may reduce the rate of accumulation of long term tech debt with regard to existing tech debt-ridden infrastructure. Adding an inherently lean Agile Integration Software to your mix means that you will not only be able to respond quickly to many of the backlogged requests, but do so with less specialized IT skills, and ultimately, if your stars are aligned, to turn around the trend of tech debt.

Wednesday, April 27, 2011

How's Your Tech Debt?

The term "tech debt" has been around for a number of years, and resurfaces periodically. The early usage was a way to talk about the cost of short-sighted programming, bugs, and badly architected solutions. The metaphor gave programmers a financial analogy for contemplating the value of good programming practices, and for non-programmers to appreciate the need for plenty of time to do development well. If you don't do it well the first time, you will need to go back and patch it up, creating more potential break points and incurring increasing tech debt.

Clearly, this is a micro-view of tech debt. If you zoom out a bit, you start seeing debt in not just the quality of the starting point, but also in the ability to keep up with change over time. Eventually, even the best programming needs to be retired and replaced. Stepping back, you start looking beyond programming, to the quality of the product's design and architecture, and the infrastructure and hardware it is tied to.

Tech debt comes from:
  • Poor programming practices followed by patches
  • Not keeping your assets maintained
  • Isolating from the rest of the world
  • Not keeping up with universal trends

Over ten years ago,Y2K forced the greatest surge of eliminating technical debt, of course at a huge financial and operational cost. Now, that "new" software and infrastructure is nearing fifteen years in place, and it is incurring its own technical debt for its own reasons. In particular, the EAI that emerged during that Y2K time period was badly needed, but may have been an afterthought for ERPs, hastily architected, and starting out with a tech debt burden.

What about the next fifteen years? You've mastered SOA, but will there be something new that sends it the way of EAI? We're just getting off the ground with SaaS, but maybe in that time the pendulum will begin to swing back to new takes on old trends.

If you look at the dimensions that are generating tech debt, you'll see:
  • Quality
  • Time
  • Trends
Quality is clear; time inevitably invites new requirements, new infrastructures, as the old age out, forcing change on the other old-timer components; trends are a bit more insidious.

What do you do when the tech debt increases at an unsustainable rate? At that point of no return, you replace it. That point comes earlier if you have not kept up with basic investments in maintenance. It's inevitable that you will need to reformulate your strategy and tactics and get a fresh start.

How is your tech debt doing? Is it getting more expensive to keep the status quo than it would to replace or stepwise replace what you have? One of the key technologies you need to look at that can give immediate benefit in managing your tech debt is Agile Integration Software.

Thursday, March 31, 2011

SaaS Vendors Take Note: You Can Operate in the Cloud but Not in a Vacuum

This week I watched a webinar that proposed that more responsibility of integration with SaaS applications needs to be carried by the software provider. Analyst Dana Gardner of Interarbor discussed this topic in the webinar and on his blog http://tinyurl.com/4fgyw5j. Workday recently announced its cloud-based integration services as part of its SaaS ERP offering, stepping up to the plate to provide tools that will ensure that customers can integrate to their own software. They very well may be setting a trend that other ISVs will need to keep up with.

It is clear that, while the convenience and specialization of cloud-based solutions opens a new opportunity for businesses to trim down their infrastructure and the associated maintenance costs and effort, it doesn't eliminate the need for integration. Instead, it calls for a new paradigm for data integration.

Unfortunately, the heavy middleware in enterprises today will have trouble keeping up with the increasing business demands for agility. Change is tough when no one wants to touch the integration for fear of breaking something.

With the growing dependence on cloud-based software, your customers need a new generation of integration that can streamline data flows as their business processes move freely across legacy, clouds, and collaboration portals. If you are a SaaS vendor you need to seriously think about how you can rise to the occasion for your customers. The better you do it, the happier your customers will be.

Workday addressed the challenge by buying an integration software company! That's not only an expensive way to go, but it means that now, in addition to domain expertise and software development teams for their ERP, they must also maintain expertise and developers in the rapidly-changing integration space. That's probably not a prudent business approach for most SaaS vendors.

The alternative is to embed, rebrand, and/or offer Agile Integration Software, such as Enterprise Enabler® in your offerings www.enterpriseenabler.com. That way you get all the benefits without the headaches. I don't know if Workday's integration fits the Agile Integration Software model (see http://tinyurl.com/3yv85bw), but with AIS, even a SaaS software vendor is able to offer integration across cloud apps, and also incorporate on-premise backend legacy systems as well as pass data to and from your customer's SharePoint installations, on-premise or hosted.

Monday, March 14, 2011

Crossing the Chasm Between Consumer and Business Technology

How can it be that consumer technology manages always to deliver on ease of use, compatibility, and basic human appeal? Why do we have to deal with ugly, difficult, clunky software in the business world? How is it that there are standards that have been pervasively accepted and implemented for cell phones and all manner of electronic gadgets? Don't you ever wonder why it is that consumer technology hits the spot and continues to do so?

A huge contributor to the difference is the nature of the products themselves. Consumer electronics, phones, and even software are basically "throw-away." They are low cost and are expected to be completely replaced every couple of years, so consumers are forgiving of bugs and don't even think about upgrades. They simply get a new one. Maintaining the history of the world part 2 or part 1 is not the responsibility of these devices and software. The closest they come is the need to accommodate a few years of contacts' phone numbers and addresses. And even that.. only the business users of these things care.

For consumer software, the imperatives of interoperability with existing and old technology are very minimal, limited to internet and communications standards, which are discreet and maturing. The consumer market is huge, and with the constant replacement by customers, it's a path of enviable "recurring revenue" streams. Besides that, marketing to consumers is much more intuitive and pervasive than targeting the specific individuals who might want/need the business solution and also have the authority and the budget to buy it. With business systems, there's no walking by a store, seeing a flashing display, and buying on impulse.

For business software product companies, it's a whole different world: more complexity, expectations for longevity, necessarily higher price tags, and requirements of "upward compatibility" for new releases. There is always the imperative of having to work in tandem with everything else that's in place or being invented by competitors.

So our challenge is to figure out how to close the gap and move business technology closer to consumer technology. Of course, it's SaaS that offers the greatest potential to shake up the software product market over the next few years, but I think that Agile Integration will come into the mix in a big way, dramatically simplifying the underlying corporate IT infrastructure.

Monday, February 28, 2011

The Power and Flexibility of Data Federation and Virtualization Together

Finally, the world seems ready to take data federation and virtualization seriously. The power of each is, and has historically been, severely limited by the absence of the other in the same product. Using data virtualization means that you can switch out the underlying data and where it comes from without changing the actual integration layer, but rather by just redefining the metadata. That definitely adds flexibility to a company’s underlying infrastructure. Federating data means that you can bring data from multiple sources, combine them into a single data set or view, and let users and applications consume it without having to stage it. That eliminates the overhead of defining, implementing, and maintaining a data store. Combining the two, along with a robust transformation engine, in a single execution, adds orders of magnitude improvement in the agility of the overall IT infrastructure.


It seems to me that this is a very good time for IT decision makers to start aggressively implementing these combined capabilities throughout their organizations. Data warehouses have come of age, maybe have even reached "a certain age" status, and are carrying the ballast of years of data, maintenance, and accommodation of change. Streamlining and speeding up can only done by more computing power and better access algorithms and tools: new forklifts. Meanwhile, to remain competitive, the company must be able to rapidly deliver information quickly and appropriately wherever it is needed.

It's essential that the core IT department management and corporate executives not leave the flexibility to the "Shadow IT" that comes of desperation from the business and operations people who cannot wait and deal with the front lines of delivering the company's activity and daily business decisions.

With Agile Integration Software, the IT department could quickly deliver flexible, maintainable layer that interacts with everything that is already in place, but that steps up to the plate with agility. Please check out my white paper on Data Federation and Virtualization.  http://tinyurl.com/6a8zbxv

Monday, January 31, 2011

Federated Data for SharePoint 2010 using AIS

If you're not switching to SharePoint 2010, you are missing a huge opportunity. The new features actually position it so that it can become the ONLY application that end users need to log into. All kinds of useful SharePoint applications can be easily built to bring data from multiple backend systems together, aligned in a single window, for user viewing and updating.

You probably don't think of SharePoint as an MDM product, either, but maybe you don't really need the multi-million-dollar, huge upkeep, of the MDM products from the (other) Big Guys.

You may know that SharePoint 2010 advanced the concept and usability of virtual data definitions from the earlier Microsoft Office SharePoint Services (MOSS) to include full CRUD (Create, Read, Update, Delete) and other features like SSO (security). This "external entity" catalog, called BDC, or Business Data Catalog, functions as an MDM of sorts that contains the metadata describing corporate business entities along with the metadata describing how to get the data and where it comes from. When a user opens a window that is bound to one of the entities, s/he can interact with the live data without knowing or caring where it came from. The interaction is with virtual data accessed live from backend systems.

SharePoint Designer is a nice tool for building the metadata when it comes from a limited range of single sources. With some amount of programming, you can extend its use a little. Unfortunately, without the ability to update an external entity's metadata definition, SharePoint Designer must create a whole new definition, which then needs to be switched out for all the apps that use it.

SharePoint 2010 really flies, though, when you combine it with an Agile Integration Software (AIS) like Enterprise Enabler http://www.enterpriseenabler.com/. Agile integration brings the ability to build complex data mappings, from multiple sources (without programming) and generate the desired external entities in SharePoint. Federating data on the fly from multiple sources is powerful.

Now, suppose the backend systems changes. Maybe your entity combines data from SAP and Dynamics CRM. If you switch from Dynamics CRM to Salesforce, an AIS can apply the changes behind the scenes to connect the same external entity definition in SharePoint without any disruption.

There are a couple of things you might be concerned about, like security and transaction rollback on updates to multiple back ends. SharePoint supports SSO, so AIS will pick up the end user security from SharePoint and pass it to the backend systems. If the user does not have permission to read from a backend system, s/he will not see the data. If the user does not have write access to one of the backends, the full transaction will be rolled back. This is generally the only time data is actually persisted in an AIS.

This should definitely give you something to think about. You could use all the savings for a company trip to Colorado, or Hawaii, or anywhere else. Or maybe give a fat raise to everyone in IT.

----------------------------------
Video of building a SharePoint entity in AIS http://tinyurl.com/5vxce55