Nicole Han


Word count: 797Reading time: 4 min
2019/03/25 Share

NoSQL Database

NoSQL provides - high performance, high availability at a large scale

No relational SQL, no SQL, no only SQL

Why needed? A lot of new data!!! Avoidance of complexity

A NoSQL DB is:

  1. schema-less: no tables, no relations!
  2. flexible: easy to add new types of data
  3. (data) scalable: specifically, ability to ‘scale out’, ie. do ‘horizontal scaling’ - both terms means that we can simply add more nodes (eg. servers) to an existing cluster, to accommodate more users, or to add more data to existing users.
  4. fast: easy to process large (massive) volumes of data

Why choose a NoSQL DB?

“to improve programmer productivity by using a database that better matches an application’s needs”

“to improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput”

JSON for storing an ENTIRE DB!!

db == JSON []array of {}objects, where each object (‘row’) is a set of a key-value pairs.
“ID”: 1,

“BASE” property:

  • BAsic availability: DB is up most of the time
  • Soft state: consistency is not guaranteed while data is written to a node, or between replicas
  • Eventual consistency: ata a later point in time, by push or pull, data will come consistent across nodes and replicas

The schema is “implicit”

comparison metric:

  • architecture
  • administration
  • deployment
  • development
  • performance and scalability

NoSQL advantages:

  1. high scalability
  2. schema flexibility
  3. distributed computing
  4. no complicated relationships
  5. lower cost
  6. open source


: a single NoSQL db instance - holds a part of a db

: a collection of nodes - holds an entire db

4 types of NoSQL DBs

  1. Key-Value:
  • e.g. DynamoDB (Amazon)
  • The whole db is a dictionary, which has records (“rows”) which have fields (columns).
  • Querying occurs only on keys. When querying on a key, the entire value (aggregate) (for matching keys) is returned.
  • k/v DBs are lightweight (simple), schema-less, transaction-less.
  1. Column-Family
  • Rather than dealing with rows of data, we deal with columns of them. So such databases are good for aggregate queries (eg. average age of employees in a company), and queries involving just a subset of all columns (eg. retrieve a student’s academic info).
  • Data is stored using ‘row keys’ (each row (picturing the data as a classic relational table) is assigned a unique key
  • Column: consists of a name (key) and a value; there is such a name:value pair for each row key, giving us a single column’s worth of data for all rows (eg. GPAs of all students at USC).
  • Column family: contains columns of related data (eg. for a ‘Users’ DB, the columns might be Name, Age, DOB, Address, Email), for all rows. A column family would have many rows of data, where for each row, there would be multiple columns and values.
  • In a column family, row keys directly contain (as values) column data, similar to a rectangular table; in contrast, in a supercolumn family, row keys contain (as values), k:v pairs, with supercolumn (“column group”) keys, and ‘column_name:column_value’ values [so there is an extra level of indirection, provided by the supercolumn names]. Column family -> column data; supercolumn family -> supercolumns -> column data.
  1. Graph
  • A graph database uses (contains) graph entities such as nodes (vertices), relations (edges), and properties (k-v pairs) on vertices and edges, to store data.
  • A graph DB is said to be ‘index free’, since each node directly stores pointers to its adjacent nodes.
  • In a graph db, the focus is on relationships between ‘linked data’.
  • Uses: social networks, recommendation engines, etc
  • Each row in a table becomes a node, and columns (and their values), node properties.
  1. Document:
  • e.g. MongoDB
  • The basic unit of storage in a document DB is a document - this can be JSON, XML, etc. There can be an arbitrary number of fields (columns and values, ie. k/v pairs) in each document.
  • In a document DB, a key is paired with a document (which is its ‘value’), where the document itself can contain multiple k/v pairs, key-array pairs, or even key-document pairs (ie nested documents).

What is a triple store?

  • A triple store (or triplestore, or RDF) database stores triples of (subject,predicate,object).
  • A triple defines a directed binary relation, via its predicate/attribute/property. In relational form, we express this as predicate(subject,object).

    Subject: what we are describing.

    Predicate: a property of the subject.

    Object: the predicate’s (property’s) value.

  • Querying a triplet store can be done in one of several RDF query languages, eg. RDQL, SPARQL, RQL, SeRQL, Versa.. Of these, SPARQL is currently the most popular.

  • The output of a triple store query is called a ‘graph’.

In a triple, the predicate is given equal status to subject and object [upcoming examples will make this clear].

  1. 1. NoSQL Database
    1. 1.1. No relational SQL, no SQL, no only SQL
      1. 1.1.1. Why choose a NoSQL DB?
        1. JSON for storing an ENTIRE DB!!
      2. 1.1.2. “BASE” property:
        1. NoSQL advantages:
      3. 1.1.3. terms
    2. 1.2. 4 types of NoSQL DBs
      1. What is a triple store?
      2. In a triple, the predicate is given equal status to subject and object [upcoming examples will make this clear].