Serdar Yegulalp
Senior Writer

LinkedIn fills another SQL-on-Hadoop niche

news analysis
Jun 11, 20153 mins

LinkedIn's open source, home-brew OLAP project is a new way for Hadoop users (and others) to query both real-time and historical data

Data and analytics
Credit: Thinkstock

Social networks generate colossal amounts of data that have come to defy the use of conventional data-processing tools, so it’s no wonder their engineering teams have built their own toolsets — such as Facebook and its machine-learning tools.

Enter LinkedIn, now offering its own Apache-licensed, open-sourced data-processing solution: Pinot, a real-time analytics engine and datastore, designed to run at scale. Yes, Hadoop is one of its data sources, providing yet another option for those looking to perform SQL-style queries.

LinkedIn’s own OLAP

As originally discussed by LinkedIn’s engineers late last year, Pinot was designed to provide the company with a way to ingest “billions of events per day” and serve “thousands of queries per second” with low latency and near-real-time results — and provide analytics in a distributed, fault-tolerant fashion.

The original system was assembled out of whole congeries of existing pieces — an Oracle database here, a Project Voldemort key-value store there — but LinkedIn found the amount of data ingested was too great for solutions not designed for OLAP-style jobs in the first place.

Like many other data-processing frameworks that live in or near Hadoop, Pinot is written in Java. It uses Apache Helix — also developed at LinkedIn — to perform cluster management. Real-time data comes in by way of Kafka, with historical data fetched from Hadoop.

Some sacrifices were made

With querying, Pinot shows some of its limitations — although most are deliberate design decisions, reflecting Pinot’s focus on the specific conditions for which LinkedIn created it.

For instance, the SQL-like query language used with Pinot does not have the ability to perform table joins, “in order to ensure predictable latency” (according to LinkedIn’s engineers). There’s truth to this, since SQL-on-Hadoop solutions have been known to suffer from poor performance if they attempt to perform joins between data stored in highly disparate places. Full-text search and relevance ordering for results also aren’t supported.

Finally, data is strictly read-only — although given the number of other SQL-for-big-data solutions that work the same way, this won’t likely be a major letdown.

A fairly vertical solution

Each SQL-on-Hadoop solution has so far addressed a slightly different set of needs — some for real-time queries (Spark SQL), some for historical data (Hive), some to emulate as much of SQL’s existing behavior as possible without sacrificing performance (Stinger). Pinot is similarly narrow in focus, given that it was built to scratch LinkedIn’s specific itches.

With the project going open source, though, LinkedIn clearly hopes it can scratch other peoples’ itches as well, especially if existing SQL-for-Hadoop/real-time-data solutions don’t cut it. It’s less clear if LinkedIn wants Pinot to eventually follow in the footsteps of other Hadoop projects and eventually become Apache-governed, although the choice of license for the project (Apache) would make such a transition a snap.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author