Voldemort is a distributed key-value data store used at LinkedIn for high-scalability storage problems where simple functional partitioning is not sufficient.
It is named after the very popular fictional Harry Potter villain Lord Voldemort. Voldermort contains in-memory caching with storage system hence a separate caching tier is not needed. It supports horizontal scalability for reads and writes. It is a more of fault tolerant hash table.
- Horizontal scalability and High availability for O/R mapper such as hibernate and active-record
- Support for distribution across data centers that are far apart by pluggable data placement strategies
- Automatic data replication over large number of servers
- Versioned data items to maintain and maximize data integrity
- Transparent failure handling
Cassandra is an open source distributed database management system and an Apache Software Foundation project having Apache License (version 2.0).
It is designed to handle enormous amounts of data spread out across many commodity servers in traditional environment or in Cloud environment while providing a highly available service with no single point of failure. It is a NoSQL solution that was developed by Facebook and now used by companies that have large, active data sets such as eBay, Twitter, Reddit, Cisco, OpenX, Digg etc.
- MapReduce support
What is FlockDB?
FlockDB is an open source, fault-tolerant, and distributed graph database licensed under the Apache license for managing data at webscale. Twitter used it to build user database and manage relationships. It can be efficiently used in high throughput and low latency environments. FlockDB was created by Twitter for relationship related analytics. FlockDB is a database that stores graph data which is optimized for very large adjacency lists, and quick reads and writes but not optimized for graph traversal operations.
In FlockDB, graphs are stored as sets of edges between nodes which are identified by 64 bit integers. Each edge between nodes is also marked with a 64 bit position. Edge can be used for sorting. For social graphs, integer node IDs will be user IDs while in a graph containing favorite tweets, the destination will be a tweet ID.
What is Neo4j
It is an open source property graph database. It is implemented in Java. It is stores data structured in graphs. Graph based model makes it highly agile and fast. It is massively scalable, up to several billion nodes and highly available when it is distributed across multiple nodes. It can be easily embedded by including the Neo4j library jars in your build.
In high availability mode, it has single master and zero or more slaves. It’s high availability feature can handle write requests on all machines so there is no need to redirect those to the master particularly. A slave can handle writes by synchronizing with the master to maintain consistency. All updates propagate from the master to other slaves in due course so a write from one slave may not be immediately visible on all other slaves.
What is MongoDB
MongoDB is an open source, scalable, high-performance, and document-oriented database optimized for highly transient data and written in the C++ programming language. It provides RESTful API. Free Cloud based monitoring service is provided for monitoring MongoDB deployments. It supports search by range queries, fields, and regular expressions. Master slave replication is supported where master can perform read and write operations while slaves can read or take backup.MongoDB supports horizontal scaling with the use of sharding. It can be effectively used as a efficient file storage which is capable of taking benefits of load balancing and data replication.
- Flexible schemas are best fit for document and content management systems.
- Good fit in conjunction with RDBMS for ecommerce infrastructure
- Good fit for Gaming due to its high performance read-writes
- Very efficient for server side infrastructure of mobile applications
What is Apache CouchDB
- CouchDB provides ACID semantics by implementing a form of Multi-Version Concurrency Control (high volume of concurrent readers and writers without conflict).
- CouchDB supports bi-direction replication (or synchronization) and off-line operation
- Unique URI that gets exposed via HTTP. REST uses the POST, GET, PUT, and DELETE HTTP methods for the four CRUD operations
- It assures eventual consistency (model used in the domain of parallel programming) to be able to provide both availability and partition tolerance.
What is HBase
It is based on Google BigTable. It is a type of “NoSQL” database which is column-oriented, Open-source, and distributed. It uses the Hadoop Infrastructure (Zookeeper as a lock service, NameNode, HDFS – File system) and hence supports fault tolerance, scalability inherently, and adds random read-write capability.
Tables are distributed as regions, and regions are automatically split and redistributed as data grows. It supports linear and modular scaling adding RegionServers that can be hosted on Public Cloud. Regions are vertically divided by column families into stores which are stored as files on HDFS
Potential Use cases:
- Reads, supported by single-write master
- Ordered Partitioning which supports row-scans
- Range based scans
- Batch Analysis
- Large cache
HBase does not have many features such as triggers, secondary indexes etc.