Big data, but faster: API speeds links between apps and storage

news

Feb 23, 20162 mins

Alluxio's storage API features support for Amazon S3, Google Cloud Storage, OpenStack Swift, EMC, and NetApp

Alluxio, originally known as Tachyon, is giving big data applications fast, unified access to the storage where their data resides.

Now at version 1.0, Alluxio provides frameworks like Spark, MapReduce, Flink, or Presto with access to multiple types of storage systems. Cloud storage providers Amazon S3, Google Cloud Storage, and OpenStack Swift are supported, alongside storage vendors EMC and NetApp.

From the outside, Alluxio might seem like an in-memory caching system like Memcached or Redis. Instead, it’s a layer that sits between distributed computing applications and storage, giving the former access to the latter via a unified API. Applications can use Alluxio’s API, which offers the highest possible speed, or they can use legacy APIs (an HDFS implementation, for instance), which are slower but more compatible.

In a blog post published earlier this month, engineers at Intel described how Alluxio helps address a few common problems with big data frameworks, such as sharing data between applications. Rather than write data to HDFS and read it back out again, users can write data to Alluxio’s in-memory store and read it back out at far greater speed.

Likewise, the JVM’s garbage collection and on-heap cache issues, which are exacerbated by frameworks like Spark, can be alleviated by using Alluxio. IBM has claimed that back in the Tachyon days, Alluxio outperformed in-memory HDFS by 110x for writes and “improves the end-to-end latency of a realistic workflow by 4x.”

Alluxio complements other solutions; Apache Arrow, for instance, speeds up data processing by making it available to an application in a format that suits modern CPUs. Data requested by Arrow would be fetched from storage and provided by Alluxio.

In its Tachyon incarnation, Alluxio drew support from several big data projects, Spark chief among them. The company plans to continue building support from other big data projects and storage system vendors.

Data ManagementAnalytics

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Big data, but faster: API speeds links between apps and storage

Alluxio's storage API features support for Amazon S3, Google Cloud Storage, OpenStack Swift, EMC, and NetApp

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

How to land a software development job in an AI-focused world

The agent security mess

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)