Paul Krill
Editor at Large

JavaScript framework smartens up Firefox

news
May 1, 20172 mins

Mozilla's Fathom framework helps browsers understand web pages the way people do

browser gears
Credit: EstudioWebDoce

With its Fathom JavaScript framework, Mozilla wants to extract meaning out of web pages and produce a more intelligent browser.

Positioned as a “mini language” for writing semantic extractors, Fathom already is in production with Firefox’s Activity Stream web traffic tracker, picking out page descriptions, images, and other items, said Mozilla’s Erik Rose. Still in an early stage of development, Fathom “enables Firefox to understand the structure and content of a web page,” he said. The framework could be implemented in browsers, browser extensions, and server-side software.

Rose presented scenarios in which Firefox could understand pages the same as a person. For example, the browser could recognize and follow a log-in link, provide hotkeys to dismiss popovers, hide superfluous navigation or header sections on small screens, and determine what to print without needing print stylesheets.

These scenarios, he said, assume the browser can identify meaningful parts on a page. Echoing the much-touted semantic web, Rose cited previous attempts in this vein, such as semantic tags, Resource Description Framework, and microformats.

Fathom, meanwhile, is a data-flow language like Prolog. It extracts meaning from web pages, identifying parts like address forms, Previous/Next buttons, and the main textual content. DOM nodes are scored and extracted based on user-specified conditions, and a system of types and annotations expresses dependencies between scoring steps and controls state. Existing sets of scoring rules can be extended without having to directly edit them, so third-party refinements can be mixed in.

Fathom’s rule sets are data that look like JavaScript function calls, but the calls are making annotations in a version of a syntax tree. “Today, that gets us automatic tuning of score constants,” Rose said. “Tomorrow, it could get us automatic generation of rules themselves.”

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author