by Greg Nawrocki

The Chicken or the Egg?

news
Aug 30, 20053 mins

One of the hairiest challenges in modern computing has been finding ways for organizations to take better advantage of the explosion of data over the last twenty years. With the proliferation of eCommerce, CRM, devices at the “edge” — the sheer amount of information that’s being generated has seen a drastic increase, but it has also created new challenges in the movement, assimilation, and analysis of that data, specifically in the speeds in which these tasks must be done.

When we look back on the last decade of Grid computing (from the mid- 90’s through today), we often see them pigeonholed in the taxonomies of data and compute Grids. However, modern Grid computing has a less distinct division between the two.

The Google searches that we’re all so dependent upon to quickly return our queries are powered by a massive Grid. Financial services shops use Grid to run complex Monte Carlo simulations that help them predict capital market risks. These simulations are based on increasingly complex models, relying on an increasing number of data sets … the list goes on and on.

While many organizations started leveraging Grid to achieve better control over larger data sets, they are now tapping into Grid to catch up to the evolution of the actual devices that are producing the data. Take the LEAD (Linked Environments for Atmospheric Discovery) project, for example. Mesoscale meteorologists are seeing huge breakthroughs in the Doppler radar systems at their disposal. The next generation of Doppler radar is pushing the envelope in terms of sensitivity. The volume of data they are retrieving and the granularity of that data is continually on the rise. Thus, the sheer amount of data that meteorologists must crunch / analyze on the back end is getting greater every day. In addition, this next-gen Doppler technology have both increased configurability and the ability to react to new weather patterns on the fly. Hence the need to analyze data quickly and simulate possible outcomes, with this data fed back to the instrumentation itself so it can make active adjustments is also a concern. The LEAD project is developing a new Grid cyber infrastructure capable of moving that data to compute resources, scheduling data mining jobs — and enabling faster, more dynamic interpretation of real-time weather patterns.

Another area where devices are generating an exponential growth curve in terms of generated data is 3D vision. “Stereo Vision” uses cameras to gather environmental data — and on the back-end, various data mining techniques are used to help machines “see” the real world in a similar fashion to how humans see the world. They can sense depth and color, they can differentiate one object from the next — and it’s a huge growth area in terms of autonomic systems.

In stereo vision, there’s actually a new law (called “Woodfill’s Law”) — which says that “the computational complexity of computing a 3D image is cubed relative to side of the image’s edge.” As the camera sensor technology that’s used to produce these images becomes more complex (via processing power, Moore’s Law trends) and image resolution increases — the amount of data produced which must be analyzed to direct the next events of these autonomous systems is also growing at an exponential rate. One can see how this balance of accommodating for larger data sets and the need to process that data faster will keep feeding back upon itself.

All that’s left is to coin a clever term for this “chicken and the egg” relationship.