Contributor

Operations costs are the Achilles’ heel of NoSQL

analysis
Oct 27, 20154 mins

NoSQL vendors are under pressure to reduce maintenance costs

abstract rack of servers datacenter networking hardware

NoSQL databases scale by adding more commodity servers. With more commodity servers come increased costs and complexities. Some NoSQL systems are better at this than others and need less.

Consider the size of the Apple Cassandra installation that is reported at 75,000 nodes and over 10 petabytes of data. The complexity of the operations, monitoring, upgrades and other maintenance tasks must be overwhelming. Apple bought FoundationDB to cut their own costs while improving performance. Julie Bort writes:

While both Cassandra/DataStax and FoundationDB are NoSQL databases, FoundationDB had some unique technology. It works super-fast but needs far less hardware than Cassandra, making it even cheaper to use, even as it scales. (In geek speak, it’s an “in-memory” database that runs on flash storage.)

Goldmacher says it needs somewhere between 5% to 10% less hardware than Cassandra.

At Apple’s scale, 10% of 75,000 is 7,500 nodes — and it is not something to ignore. The most popular post on my blog is my article on how I’d like to replace Cassandra with DynamoDB in the AWS environment. The long term costs of operating Cassandra are on the minds of Cassandra adopters.

MongoDB is under pressure from customers to reduce operations costs as well. Viber migrated their MongoDB cluster to Couchbase, cutting the number of AWS EC2 instances in half. At Viber’s scale that is not a small number.

Companies interested in adopting NoSQL should consider their options carefully. The vast majority of database use cases do not need massive horizontal scalability. Most applications could be better off with traditional SQL databases. In the cloud, there are NoSQL alternatives that cost less and are easier to maintain. Let’s review just a few examples.

AWS RDS for PostgreSQL

PostgreSQL has been offering NoSQL capabilities like MongoDB since version 9.3. That includes ACID, hierarchical document data and ability to index JSON documents. AWS RDS service of PostgreSQL offers high availability, redundancy and fail-over. Being a managed service it requires very little attention. Many tasks such as backups and fail-over are fully automated. Rich management API and monitoring tools provide for customization of scaling behavior.

Redis

As John Martin of Computerworld wrote, “When it comes to storage, cache is king.” Azure, AWS and Google offer managed cache services. AWS Elasticache in particular offers a choice of Memcached and Redis. Redis is an interesting alternative to NoSQL since its low level data model is similar to that of Cassandra for some of the use cases. Redis database has to fit entirely in-memory, but it can be persisted to disk and recovered upon reboot. Redis can be configured in clusters for high availability and performance. On master failure, one of the slaves becomes the new master.

AWS DynamoDB and Google BigTable

AWS DynamoDB and Google BigTable offer a similar data model to Cassandra as well as infinite scalability. Neither service requires any administration or devops. One has to be on the look-out for burst performance, however. Burst capacity is one area where a custom configured NoSQL database can shine.

Object storage

An object storage tool like AWS S3 is a long term infinitely large key/value store. As a corner stone of AWS, S3 can integrate with CloudFront, RedShift and many other AWS services. It scales horizontally without any questions asked and can store JSON and binary documents as well as logs. S3 is also ridiculously cheap and can be used to store terabytes of data.

Final thoughts

Companies should keep in mind the costs associated with NoSQL technology. It is important to consider not only the technical merits but also the costs. Development teams that choose the right tool for the right job will always win.

Oleg Dulin is a Big Data software engineer and consultant in the New York City area.

In 1997 Oleg co-founded Clarkson University Linux Users Group. This group was influential in bringing awareness of open-source to Clarkson, and later morphed into what now is a dedicated lab and curriculum called Clarkson Open Source Institute. While at Clarkson, Oleg advocated on behalf of open-source and Linux and community and helped with construction of Clarkson’s first open-source high-performance computing cluster called “The North Country.”

While at IBM T. J. Watson Research Center in 1999-2000 Oleg co-authored a paper on federated information systems that was presented at Engineering of Federated Information Systems (EFIS) conference in 2000. This R&D project involved building a proof-of-concept federated IS that integrated structured (SQL) and unstructured (multi-media) data under a single set of API and user interfaces.

From 2001 to 2003 Oleg worked as a data integration consultant at a major investment bank in NYC on a web portal for private banking. This project involved aggregation of secure financial data from multiple legacy databases and presenting it in a customizable web portal.

In 2004, while working at a startup called ConfigureCode, Oleg contributed to two patent applications involving construction and semantic validation of mixed-schema XML documents. This technology was utilized in a Data Capture and Tracking System for Human Resources data integration.

From 2005 to 2011 Oleg worked at a Wall St. company (see Oleg’s LinkedIn Profile for more details) where he was instrumental in improving data quality, reducing trading errors, implementing analytics and reporting within the context of an equities order management system. The system was a 24/7 high performance computing platform that processed billions of dollars worth of trade executions daily.

From fall of 2011 to end of 2016, Oleg worked at Liquid Analytics as Cloud Platform Architect, where he was a thought leader in the implemention of a cloud-based PaaS for mobile Business Intelligence.

Presently, Oleg works at ADP Innovation Lab as Chief Architect.

The opinions expressed in this blog are those of Oleg Dulin and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author