Serdar Yegulalp
Senior Writer

NoSQL standouts: The best key-value databases compared

reviews
Oct 27, 201716 mins

Aerospike, Hazelcast, Memcached, Microsoft Azure Cosmos DB, and Redis put different twists on fast and simple data storage

keys thinkstock
Credit: DutchScenery/Thinkstock

Most applications need some form of persistence—a way to store the data outside the application for safekeeping. The most basic way is to write data to the file system, but that can quickly become a slow and unwieldy way to solve the problem. A full-blown database provides a powerful way to index and retrieve data, but it may also be overkill. Sometimes, all you need is a quick way to take a freeform piece of information, associate it with a label, stash it somewhere, and pull it back out again in a jiffy.

Enter the key-value store. It’s essentially a NoSQL database, but one with a highly specific purpose and a deliberately constrained design. Its job is to let you take data (a value), apply a label to it (a key), and store it either in-memory or in some storage system that’s optimized for fast retrieval. Applications use key-value databases for everything from caching objects to sharing commonly used data among application nodes.

Many relational databases can function as key-value stores, but that’s a little like using a tractor-trailer to go on grocery runs. It works, but it’s dramatically inefficient, and there are far lighter ways to solve the problem. A key-value store, like other NoSQL databases, provides just enough infrastructure for simple value storage and retrieval, integrates more directly with applications that use it, and scales in a more granular way with the application workload.

Key-value NoSQL database features compared

Five widely used products (including one cloud service) are worth your consideration; they are explicitly billed as key-value databases or offer key-value storage as a central feature. Their basic differences:

  • Hazelcast and Memcached tend toward minimalism, and don’t even bother to back up the data on disk.
  • Aerospike, Cosmos DB, and Redis are fuller-featured, but still revolve around the key-value metaphor.

Aerospike key-value NoSQL database in depth

If Redis is Memcached on steroids, Aerospike could be said to be Redis on steroids. Like Redis, Aerospike is a key-value store that can operate as a persistent database or a data cache. Aerospike is designed to be easy to cluster and easy to scale, to better support enterprise workloads.

Features unique to Aerospike

Much in Aerospike echoes both other key-value stores and other NoSQL databases. Data is stored and retrieved via keys, and the data can be kept in a number of fundamental data types, including 64-bit integers, strings, double-precision floats, and raw binary data serialized from a number of common programming languages.

Aerospike also can store data in complex types—lists of values, collections of key-value pairs called maps, and geospatial data in the GeoJSON format. Aerospike can perform native processing on geospatial data—such as to determine which locations stored in the database are closest to each other by just performing a query—making it an attractive option for developers of applications that rely on location.

Data stored in Aerospike can be organized into several hierarchical containers. Some NoSQL systems are document-oriented, meaning data is encapsulated in some kind of object, typically JSON. With Aerospike, containers are roughly like documents, but with functions and behaviors specific to Aerospike. Each kind of container lets you set different behavioral properties on the data inside it.

For example, the topmost level of containers, namespaces, determines whether the data is stored on disk, in RAM, or both; whether the data is replicated in the cluster or across clusters; and when or how data is expired or evicted. Through namespaces, Aerospike lets developers keep the most frequently accessed data in memory for the fastest possible response.

How Aerospike handles storage and clustering

Aerospike can keep its data on almost any file system, but it has been written specifically to take advantage of SSDs. That said, don’t expect to drop Aerospike on any old SSD and expect good results. Aerospike’s developers maintain a list of approved SSD devices, and they have created a tool called ACT to rate the performance of SSD storage devices under Aerospike workloads.

Aerospike, like most NoSQL systems, uses a shared-nothing architecture for the sake of replication and clustering. Aerospike has no master nodes and no manual sharding. Every node is identical. Data is randomly distributed across the nodes and automatically rebalanced to keep bottlenecks from forming. If you want, you can set rules for how aggressively data is rebalanced. You can configure multiple clusters, running in different network segments or even different datacenters, to synchronize with one another.

Scripting in Aerospike

Like Redis, Aerospike allows developers to write Lua scripts, or UDFs (user-defined functions), that run inside the Aerospike engine. You can use UDFs to read or alter records, but it’s best to use them to perform high-speed, read-only, map-reduce operations across collections, or “streams,” of records on multiple nodes.

Where to get Aerospike

Aerospike’s community edition can be downloaded directly from Aerospike’s website. This includes server editions for Linux, desktop versions for Apple’s MacOS and Microsoft’s Windows, cloud editions for Amazon EC2, Azure, and Google Compute Engine, and Docker containers. The enterprise edition of Aerospike is available via Aerospike’s Quick Start program, which provides an unlimited 90-day trial version.

The source code is available on GitHub.

Hazelcast IMDG key-value NoSQL database in depth

Hazelcast comes billed as an “in-memory data grid,” essentially a way to pool RAM and CPU resources across multiple machines to allow data sets to be distributed across those machines and manipulated in-memory.

NoSQL databases offer key-value, graph, or document features. Hazelcast concentrates on key-value functionality, emphasizing speedy access to distributed data. According to its makers it can also be used as  an alternative to products like Pivotal Gemfire, Software Terracotta, and Oracle Coherence.

Hazelcast can be run as a distributed service or be embedded directly inside a Java application. Clients are available for Java, Scala, .Net, C/C++, Python, and Node.js, and one for Go is in the works.

Features unique to Hazelcast

Hazelcast is built with Java and has a Java-centric ecosystem. Each node in a Hazelcast cluster runs an instance of Hazelcast’s core library, IMDG, on the JVM. How Hazelcast works with data is also closely mapped to Java’s language structures. Java’s Map interface, for example, is used by Hazelcast to provide key-value storage. As with Memcached, nothing is written to disk; everything is kept in-memory at all times.

One benefit Hazelcast can provide in a distributed environment is “near cache,” where commonly requested objects are migrated to the server making the requests. This way, the requests can be performed directly in-memory on the same system, without requiring a round trip across the network.

Aside from key-value pairs, you can store and distribute many other kinds of data structures through Hazelcast. Some are simple implementations of Java objects, like Map. Others are specific to Hazelcast. MultiMap, for example, is a variant on key-value storage that can store multiple values under the same key. These features make it possible to emulate some behaviors of other NoSQL systems, such as organizing data into documents, but the empasis is on structures that allow data to be distributed and accessed quickly.

How Hazelcast handles clustering

Hazelcast clusters have no master/slave setup; everything is peer-to-peer. Data is automatically sharded and distributed across all members of the cluster. You can also designate certain cluster members as “lite,” which hold no data at first but can later be promoted to full members. This lets some nodes be used strictly for computation, or to distribute data gradually through a cluster while it’s being brought online.

Hazelcast can also ensure that operations proceed only if at least a certain number of nodes are online. However, you have to configure this behavior manually, and it works only for certain data structures. As of Hazelcast Version 3.9, you can reconfigure data structures across a cluster without having to first take it offline.

Where to get Hazelcast

Hazelcast is available for download directly from the Hazelcast site. It is typically deployed as a collection of Java .JAR files. Docker images are also available at the official Docker registry.

You can download the enterprise edition of Hazelcast directly from Hazelcast. You can also get a 30-day free trial key for Hazelcast.

Memcached key-value NoSQL database in depth

Memcached is about as basic and fast as key-value storage gets. Originally written as an acceleration layer for the blogging platform LiveJournal, Memcached has since become a ubiquitous component of web technology stacks. If you have many small fragments of data that can be associated with a simple key and don’t need to be replicated between cache instances, Memcached is the right tool.

Features unique to Memcached

Memcached is most commonly used for caching queries from a database and keeping the results exclusively in memory. In that respect, it’s unlike many other NoSQL databases, key-value or otherwise, since they store data in some persistent form. 

Memcached does not back its data store to anything. All keys are held only in memory, so they evaporate whenever the Memcached instance or the server hosting it is reset. Thus, Memcached can’t really be used as a substitute for a NoSQL database.

What it can be used for, though, is a high-speed way to stash commonly used data that might take orders of magnitude more time to query from a source.

Any data that can be serialized to a binary stream can be stashed in Memcached. Values can be set to expire after a certain length of time, or on-demand, by referencing the keys to the values from an application. The amount of memory you devote to any given instance of Memcached is entirely up to you, and multiple servers can run Memcached side by side to spread out the load. Furthermore, Memcached scales linearly with the number of cores available in a system because it is a multithreaded application.

Most popular programming languages have client libraries for Memcached. For example, libmemcached allows C and C++ programs to work directly with Memcached instances. It also lets Memcached be embedded in C programs.

How Memcached handles clustering

Even though you can run multiple instances of Memcached, whether on the same server or on multiple nodes across a network, there is no automatic federation or synchronization of data among instances. The data inserted into a Memcached instance is available only from that instance, period.

Where to get Memcached

Memcached’s source code is available for download from GitHub and from the official Memcached site. Linux binaries are available in the repositories for most Linux distributions. Windows users can build it directly from source; some unofficial binaries have been built in the past but do not appear to be reliably available.

Microsoft Azure Cosmos DB key-value NoSQL database in depth

Most databases have one overarching paradigm: document store, key-value store, wide column store, graph database, and so on. Not so Azure Cosmos DB. Derived from Microsoft’s NoSQL database as a service, DocumentDB, Cosmos DB is Microsoft’s attempt to create a single database that can use multiple paradigms.

Features unique to Azure Cosmos DB

Cosmos DB uses what’s called an atom-record-sequence storage system to support different data models. Atoms are primitive types such as strings, integers, and Boolean values. Records are collections of atoms, like structs in C. Sequences are arrays of either atoms or records.

Cosmos DB uses these building blocks to replicate the behavior of multiple database types. It can reproduce the behavior of tables found in conventional relational databases. But it can also reproduce the functionality of data types found in NoSQL systems—schemaless JSON documents (DocumentDB and MongoDB) and graphs (Gremlin, Apache TinkerPop).

Table storage is how Cosmos DB provides its key-value functionality. When you query a table, you use a set of keys—a partition key and a row key—to retrieve data. You can think of partition keys as bucket or table references, while row keys are used to retrieve the row with the data. The row can have multiple data values, but there’s nothing that says you can’t create a table with only one type of data stored in any particular row. You can retrieve data via .Net code or REST API call.

How Azure Cosmos DB handles replication and clustering

Cosmos DB also offers global reach. Data stored in Cosmos DB can be automatically replicated throughout all 36 regions of the Azure cloud. You can also specify one of five consistency levels for reads or queries, depending on the needs of your application. If you want the lowest possible latency for reads at the expense of consistency, choose the eventual consistency model. If you want strong consistency, you can have it, but at the cost of your data being confined to a single Azure region. The three other options strike different balances between these poles.

Where to get Azure Cosmos DB

Azure Cosmos DB is only available as a service in the Microsoft Azure cloud. It is not available as an on-premises offering.

Redis key-value NoSQL database in depth

If Memcached doesn’t offer enough functionality, consider Redis. Redis starts with the same basic idea behind Memcached—an in-memory key-value data store—but takes it further. Redis can not only store and manipulate more complex data structures than just simple binary blobs, but it also supports on-disk persistence. Thus, Redis can serve as a full-fledged database, instead of just a cache or a quick-and-dirty dumping ground for data.

Data types and data structures in Redis

The creators of Redis call it a “data structures server.” The most basic data structure in Redis is a string, and you can use Redis to stash nothing but strings if that’s all you need.

But Redis can also store data elements inside larger collections—lists, sets, hashes, and more sophisticated structures. This is not quite the same as the concept of documents as found in other NoSQL systems, but it serves some of the same need for ways to gang data together in containers.

Applications interact with Redis in much the same way as they do with Memcached: Take a key, associate it with a certain chunk of data, and use the key to obtain the data. Any binary sequence can be used as a key, up to 512MB in size, although shorter is better. Keys can have time-to-live values or be evicted according to least-recently-used rules.

Keys are inherently freeform and have no implicit schema associated with them. If you want to enforce a schema for how keys are constructed, such as an object:type:thing naming convention, you must implement it in your application. Redis will not do it for you.

To do more complex things with the data, you can draw on Redis’s specialized data types. These are more akin to the data types found in programming languages than those found in other databases, with each type suited to different use cases.

Consider the Redis list, which is a collection of string elements organized using the same kind of linked-list structure found in Java. Redis lists are great for things like stacks or lists of elements to be read in a fixed order, because adding or removing elements to or from the head or tail of the list takes the same amount of time regardless of the list size. However, if you want random access to items, you’re better off using a Redis sorted set.

Transactions, caching, scripting, and custom behaviors in Redis

Redis provides the ability to queue and execute operations atomically in the form of a transaction. Unlike transactions in other databases, Redis transactions don’t automatically roll back if a command in a transaction fails. Redis’s creators rationalize this by claiming that commands only fail due to programming errors, not conditions within Redis itself.

As a cache layer in front of other applications, Redis offers more flexibility than Memcached, starting with a variety of cache eviction policies to manage the data. Aside from a simple time-to-live policy, Redis also lets you do things like remove keys at random or give preference to removing keys with shorter time-to-live so that newer data can be added more efficiently. The number of choices can be confusing at first, but the recommended default works for the vast majority of use cases—and you can always change eviction policies on the fly, programmatically.

Redis includes an interpreter for the Lua language to run batch operations on Redis. You can think of Lua scripts as Redis’s version of stored procedures—a way to accomplish tasks that are slightly too complicated for Redis alone but that don’t need a full-blown application. But be warned that Lua scripts, when running, constitute blocking operations on the Redis instance. Nothing else can happen while a Lua script is executing.

Redis 4, which arrived in 2017, introduced a modules system, giving developers a way to add custom data structures and functionality to Redis. Functions that can be added via modules include JSON data types (another nod towards conventional NoSQL behavior), trainable neural networks, and full-text search functionality. It’s always worth exploring whether any functionality in your application could be offloaded to Redis, where the work can be performed closer to the data.

Where to get Redis

You can download the Redis source code directly from the official Redis site. Most Linux package managers can install Redis binaries for that edition of Linux. Microsoft has its own fork of Redis for creating Windows binaries.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author