Tag Archives: NoSQL

What is Voldemort

Voldemort is a distributed key-value data store used at LinkedIn for high-scalability storage problems where simple functional partitioning is not sufficient. It is named after the very popular fictional Harry Potter villain Lord Voldemort. Voldermort contains in-memory caching with storage system hence a separate caching tier is not needed. It supports horizontal scalability for reads and writes. It is a more of fault tolerant hash table.

Features:

  • Horizontal scalability and High availability for O/R mapper such as hibernate and active-record
  • Support for distribution across data centers that are far apart by pluggable data placement strategies
  • Automatic data replication over large number of servers
  • Versioned data items to maintain and maximize data integrity
  • Transparent failure handling

What is Cassandra

Apache Cassandra is an open source distributed database management system and an Apache Software Foundation project having Apache License (version 2.0). It is designed to handle enormous amounts of data spread out across many commodity servers in traditional environment or in Cloud environment while providing a highly available service with no single point of failure. It is a NoSQL solution that was developed by Facebook and now used by companies that have large, active data sets such as eBay, Twitter, Reddit, Cisco, OpenX, Digg etc.

Data Model in Cassandra

 

Features

  • Scalability
  • Fault-tolerant
  • MapReduce support
  • Decentralized

FlockDB Definition

What is FlockDB?

FlockDB is an open source, fault-tolerant, and distributed graph database licensed under the Apache license for managing data at webscale. Twitter used it to build user database and manage relationships. It can be efficiently used in high throughput and low latency environments. FlockDB was created by Twitter for relationship related analytics. FlockDB is a database that stores graph data which is optimized for very large adjacency lists, and quick reads and writes but not optimized for graph traversal operations.

In FlockDB, graphs are stored as sets of edges between nodes which are identified by 64 bit integers. Each edge between nodes is also marked with a 64 bit position. Edge can be used for sorting. For social graphs, integer node IDs will be user IDs while in a graph containing favorite tweets, the destination will be a tweet ID.

Neo4j – graph database

Neo4j

What is Neo4j

It is an open source property graph database. It is implemented in Java. It is stores data structured in graphs. Graph based model makes it highly agile and fast. It is massively scalable, up to several billion nodes and highly available when it is distributed across multiple nodes. It can be easily embedded by including the Neo4j library jars in your build.

In high availability mode, it has single master and zero or more slaves. It’s high availability feature can handle write requests on all machines so there is no need to redirect those to the master particularly. A slave can handle writes by synchronizing with the master to maintain consistency. All updates propagate from the master to other slaves in due course so a write from one slave may not be immediately visible on all other slaves.

MongoDB definition

What is MongoDB

MongoDB is an open source, scalable, high-performance, and document-oriented database optimized for highly transient data and written in the C++ programming language. It provides RESTful API. Free Cloud based monitoring service is provided for monitoring MongoDB deployments. It supports search by range queries, fields, and regular expressions. Master slave replication is supported where master can perform read and write operations while slaves can read or take backup.MongoDB supports horizontal scaling with the use of sharding. It can be effectively used as a efficient file storage which is capable of taking benefits of load balancing and data replication.

Use-cases:

  • Flexible schemas are best fit for document and content management systems.
  • Good fit in conjunction with RDBMS for ecommerce infrastructure
  • Good fit for Gaming due to its high performance read-writes
  • Very efficient for server side infrastructure of mobile applications

Apache CouchDB

What is Apache CouchDB

Apache CouchDB is an open source NoSQL database. CouchDB uses JSON (JavaScript Object Notation, lightweight data-interchange format) to store data. JavaScript is used as its query language. CouchDB is published under Apache Software Foundation in 2008. In CouchDB each database is a collection of independent documents. Each document manages its own data and meta data (self-contained schema). CouchDB is ideal in situation where network connection is not guaranteed due to its replication and synchronization capabilities. The BBC uses it for its dynamic content platforms. It can be used in applications such as CRM and CMS where data is changed occasionally and versioning is crucial. Cloudant is an enterprise software company which provides an open source distributed database service based on the Apache CouchDB project.

Features:

  • CouchDB provides ACID semantics by implementing a form of Multi-Version Concurrency Control (high volume of concurrent readers and writers without conflict).
  • CouchDB supports bi-direction replication (or synchronization) and off-line operation
  • Unique URI that gets exposed via HTTP. REST uses the POST, GET, PUT, and DELETE HTTP methods for the four CRUD operations
  • It assures eventual consistency (model used in the domain of parallel programming) to be able to provide both availability and partition tolerance.

Hypertable

What is Hypertable

Hypertable is an open source database inspired by publications on the design of Google’s BigTable. Hypertable runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS). It is written almost entirely in C++ for performance.

Hypertable Model

 

Potential Use cases:

  • Scalability – based on a design developed by Google to meet scalability requirements
  • Performance – responsive user experience with less request latency
  • Supports wide range of applications – data is sorted by a primary key hence well-suited for a broad range of applications.
  • Cost saving – capacity on a tiny proportion of the hardware
  • Clean Semantics – consistent database