by Jon Udell

Playing the Internet scales

analysis
Mar 7, 20035 mins

Is this a technical or political problem?

The blog world was enriched last week when Tim Bray’s Weblog appeared on the scene. The co-founder and CTO of Antarctica Systems, Bray co-edited the XML 1.0 specification and remains deeply engaged in Web development. “It’s not like it used to be,” Bray observed in the hours following the launch of his blog, while “hovering over the access log like an expectant mother.” The most dramatic change from a few years ago? The effect of RSS (Rich Site Summary/Really Simple Syndication) newsreaders:

“I’m seeing maybe four or five hits a minute on the RSS feed,” Bray says. “When you consider the number of sites out there with RSS feeds and the number of people who subscribe to a bunch of them, we’re talking some pretty serious traffic here. Architecturally, this seems pretty dumb, and you have to worry whether or not it’s going to scale.”  

Point taken. Although I’ve been publishing and subscribing to RSS feeds for four years now, I’ve noticed a change just recently. One Web site I manage used to attract more visitors by way of its e-mail newsletter than by way of its RSS feed. Three months ago, things tipped dramatically in favor of RSS.

My guess is that’s just the start. RSS readers — for example, Radio UserLand, NetNewsWire, AmphetaDesk, and NewzCrawler — used to be pretty far off the beaten track. Last week, though, I started using Greg Reinacker’s NewsGator — a .Net-based plug-in for Outlook 2000 or 2002. If you’re an Outlook user, this is probably the best and most natural way to tap into the vital stream of information that Weblogs deliver. If you’re wondering where to start, just point NewsGator at somebody’s OPML (Outline Processor Markup Language) subscriptions file — mine, for example — to read the same feeds as I.

For RSS, integration with Outlook is a huge opportunity to break into the mainstream. But as Bray points out, “Yes, Houston, we [potentially] have a problem.” I, for example, subscribe to about 100 feeds. During the day, I migrate from one Windows box running NewsGator to another running Radio UserLand to a TiBook running NetNewsWire. Each of these RSS readers (also known as aggregators) polls each of its subscribed feeds hourly. A few months back, this sucked up a lot more bandwidth than it does today. Aggregators pulled whole feeds, then looked for changes until Simon Fell, the developer of PocketSoap, made this modest proposal:

“Here’s a suggestion for people writing RSS aggregators: Use the HTTP/1.1 Etag and If-None-Match headers so that you only fetch the feed if it’s changed.”  

The sound of forehead slapping was heard throughout the blog world, and within days, RSS readers became much more polite. It’s been a huge improvement, but we’re still left with a growing population of readers that poll hourly for changes that may occur only daily, weekly, or even less often. Although we use the terms “publish” and “subscribe” when we talk about RSS, it’s not yet the kind of event-driven pub/sub technology that notifies subscribers when topics of interest change.

In our Technology of the Year issue, I lauded KnowNow and Kenamea, two vendors whose products could be used to endow technologies such as RSS with true pub/sub notification. But when I described these products as “Internet-scale,” David Rosenblum, CTO of PreCache, pushed back. It’s one thing to build a pub/sub system that uses the Internet, he argues, and quite another to make it perform well at scale. According to Rosenblum, a former professor at the University of California, Irvine (where he mentored KnowNow’s co-founder, RohitKhare), PreCache’sNetInjector technology optimizes pub/sub using two adaptive strategies. First, the routers in this overlay network move subscriptions around automatically, a process called filter propagation. Second, they perform what he calls filter merging:

“As more and more subscriptions come onto the network,” Rosenblum says, “routers exploit overlaps in interest among subscriptions and merge them so that only the most general subscription characterizing all subscriber interests at the edge router propagates upstream to the publisher.”

That’s a mouthful, but it makes intuitive sense. I asked Rosenblum: “What does RohitKhare think about this?” His response: “Rohit doesn’t think it will work.” But Rohit told me he does indeed think PreCache can scale to previously unattainable numbers of events and devices. Internet scale isn’t just a matter of large numbers and wide area, though. “It’s a political problem.”

Both perspectives are true. I was in the audience at BrainShare ’95 when Bob Frankenburg — then president and CEO of Novell — conjured up a vision of billions of connected devices. My refrigerator magnets still don’t receive weather reports, but when they do, we’ll need something like PreCache to make them work. At the same time, I keep recalling RohitKhare’s joke at last year’s Emerging Technology Conference. The real integration challenge, he said, is in Layers 8 and 9 of the OSI stack: economic and political. That scale’s in a different key, and we’ll have to learn to play that one, too.