Serdar Yegulalp
Senior Writer

What’s new in Prometheus monitoring for Docker and Kubernetes

news
Nov 8, 20173 mins

Prometheus 2.0 container monitoring system arrives with a more efficient time-series data storage format, better handling of stale event data, and snapshot-based database backups

eye on computer monitor showing privacy security or breach
Credit: Thinkstock

Prometheus, the open source monitoring system for Docker-style containers running in cloud architectures, has formally released a 2.0 version with major architectural changes to improve its performance.

Among the changes that have landed since the release of version 1.6 earlier this year:

  • An entirely new storage format for the data accumulated by Prometheus.
  • A new way for Prometheus to handle “staleness,” i.e. problems resulting when data reported by Prometheus doesn’t match the actual state of the cluster.
  • A method for taking efficient snapshot backups of the entire database.

Most of the changes shouldn’t force experienced Prometheus users to retool their environments. The new features are meant to work under the hood, without significantly altering workflow, although there are a few breaking changes (documented here).

New in Prometheus 2.0: More efficient time-series database storage format

Under the hood, Prometheus is a time-series database—a system for gathering statistics about running containers and storing them in a way that’s indexed by timestamps. Because time-series data arrives at high speed and from many sources, it’s hard to aggregate properly. Writing the data to disk becomes a major bottleneck.

Prometheus 2.0 addresses this by partitioning the data by ranges of time, rather than by data source. The result is far less CPU and disk usage, more manageable latency for queries, and a better mechanism for mopping up data that isn’t needed anymore.

Again, the vast majority of Prometheus deployments won’t need to do anything to leverage these improvements, other than deploy Prometheus 2.0.

New in Prometheus 2.0: Better handling of stale data from containers

Another problem Prometheus users have observed is how the system has trouble handling stale data. For instance, users sometimes get bombarded with alerts about a service being down, even after that service has already come back up. Another problem is if a resource disappears from monitoring and then reappears within a certain timefrane, it can end up being counted twice and produce misleading statistics.

Prometheus 2.0 deals with this by having more explicit rules for handling events from sources that have gone stale. The logic for handling this is surprisingly complex (see this slide deck for details), but the end user doesn’t have to deal with the vast majority of the details.

New in Prometheus 2.0: Full database snapshot backups

The new storage engine in Prometheus 2.0 makes it possible to take efficient point-in-time snapshots of the database. Triggering a snapshot is as simple as hitting a specific Prometheus API endpoint.

According to Prometheus developer Fabian Reinartz, those snapshots are small—a fractional percentage of the size of the whole database—and can be copied somewhere for safekeeping. “On disk failure or other scenarios, new Prometheus servers can be started with the snapshot backup with minimal data loss,” says Reinartz.

Prometheus download

Precompiled binaries and Docker images are available for download from the official Prometheus project page. Source code for the project, and all its related subprojects, is available on GitHub.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author