Friday, August 22, 2014

Cache as Cache Can

Caching is one of those afterthoughts, when you know you have a great solution, but you start wondering about performance. Since caching is about moving data at varying speeds, it is (or should be) an inherent feature and responsibility of any integration solution. You will find that a truly Agile Integration Software, such as Enterprise Enabler, makes it easy to configure a wide range of models of caching, and to adjust as your requirements change.

Agile Integration covers everything from ETL through near real time Bi-directional Data Virtualization (DV), all with federation at the core, so caching can be implemented anywhere, end-to-end, in the data flow cycle.

The Continuum of Caching
According to Wikipedia, cache is a “component that transparently stores data so that future requests for that data can be served faster.” I think of it as being any data store, however static or ephemeral, however Big or small, and whether the cached data is exactly in the source form, perhaps to be federated on the way out, or federated already as the endpoint needs or the Master Data form, ready to go on to its destination, or somewhere else in the flow of the data. The specific subset of data to be cached should be optimized to ensure the greatest efficiency, minimal size, and highest reusability. The transparency comes in because, in the big scheme of things, the destination, the consumer, or the workflow steps never need to know the data is not all coming live from the original sources.

This is where data federation and Data Virtualization add to the flexibility of caching. Agile Data Virtualization supports cache as one of the sources, so there could be DV involved to create the cache, whether in-memory, on disk or in a database, and then that cache can be used as one source in a federation that is delivered either on-demand or event-triggered.

Today, most people talk about cache as being refreshed as opposed to accumulating a history, however with all the options that can be configured, this is actually a  realistic and sometimes useful consideration. You can see that the possible combinations are many, clearly enough that one must be careful not to get tangled up, and not to lose sight of the original objectives of caching! 

One could easily argue that caching is more like ETL than like Data Virtualization, however DV often requires caching more than other integration patterns, since the uses generally expect rapid, “live” data, without latency. When the rubber meets the road, in many situations, caching is the only way to ensure that a DV solution with many users does not bring the source applications “to their knees.” This is why Agile Integration Software, which combines all the integration patterns, solves Data Virtualization problems better than pure DV platforms.

What do you need to determine before you configure caching?
·         Which data to cache
·         Why you selected caching this particular data
·         Where to cache – memory, disk, database, etc
·         How often to refresh – schedule, event, as soon as available
·         Where in its path to cache – directly from source, partially processed, before or after federation, endpoint ready, as part of a Master Data definition
·         When to release from cache- as soon as read, as soon as a particular set of consumers have read
·         Is the cache subject to bi-directional data flow

When should you plan to Cache?
First of all, keep in mind that if you don’t identify your caching needs up front, with Agile Integration Software, you can easily add it as your traffic grows and the parameters get to point where it’s needed.  Particularly when you are using Data Virtualization, and are hitting backend source systems live at each request, you should take a close look at the needs and best approaches to caching. You should consider caching in situations where:
·         You are concerned that too much traffic hitting mission critical or any sources could adversely impact the performance of those systems.
·         You are concerned about the response times for end users.
·         You need to have the same value throughout a process where you might be accessing it multiple times

What to Cache?
·         Data that doesn’t need to be real-time
·         Data that you want to ensure the same snapshot is used for different things
·         Data that changes so slowly that having it real-time doesn’t matter. You could refresh the cache once an hour or day or month, even.

Agile Caching
Agile Integration Software offers a wide range of options for caching, with ease of configuring even complex caching patterns without custom programming. With the ability to select full data sets, specific fields,  mixed in-memory and on-disk caching, and all combinations, including conditional full workflow-driven caches,  great architecting doesn’t have to be constrained by what is practical to implement.

Thursday, August 7, 2014

I Hate Data

I Hate Data. Loathing is the Mother of Invention.
I hate data. I’ve always hated data. As a programmer right out of school, I worked in “technical programming” as opposed to “business programming.”  Business was about accounting mostly, and technical about, well, technical stuff. One would think that technical would involve data and numbers while business would be about less precise things. Nevertheless, the fact was that it was the business side that worried about data and about uploading huge amounts for backup every night. Therein, of course, lies my initial dilemma. It happened that I was fortunate enough to focus on my passion, which was computer graphics. This was back in the days when we were figuring out how to make circles look round and to get rid of the “jaggies.”  At the time, I had little respect, if not outright disdain for business, but then, I was much younger and living in my 3-d graphics world.  We on the technical side focused on what you could program computers to do. I wrote code that made every graphics device I could find sing and dance, and the input devices like mice and early tablets and joysticks, too.  Imagine the thrill of making one of the first joysticks move cursers and move through 3D spaces rendered on 2D screens! Compare that to the dubious activity of staring at tons of numbers on countless reports. Those guys debugged things by studying numerical data, while I had the joy of detecting bugs with the screen looking like a war zone of odd shapes, colors, and flashes, or simply by crashing the computer altogether. Who wouldn’t choose the latter?
Sometime later, after my phases of programming robots and working with pattern recognition algorithms, I began working with refinery modeling and programs to perform mathematical optimization (LPs). These programs combined data from the refinery operations, from laboratories, inventories, planning systems and such, along with current economic information.  Lots of data was involved, but the huge impediment was getting the data from ten different sources in a manner that aligned all of it meaningfully. Often the end users would manually enter some of the data and they would pretend that running a very precise optimizer would give just as good results with some of last month’s data. Wrong! Why couldn’t the data just work smoothly in the background?
I hate data. It’s hard to deal with. Too many problems. That’s not what I want to focus on – there are much more interesting things to think about. That’s why Enterprise Enabler® just had to come along – so that I wouldn’t have to deal with all the idiosyncrasies of disparate data. It was very selfish of me. Let a computer handle all that craziness. Hide everything behind the scenes and automate everything that has to be done more than a couple of times.  But then Enterprise Enabler unexpectedly swept me into “business” and all kinds of things I never imagined. I have to be careful, now that the headaches of data are managed, I might start liking it. I can’t admit it, but I’m starting to think data may be what it’s all about. Big Data, little data, virtual, bi-virtual, octy-virtual, and numberical, too.