Published on

DS 4100 Day 13

Authors
  • avatar
    Name
    Jacob Aronoff
    Twitter

DS 4100 Data Collection, Integration, and Analysis

Today we're talking a bit about NoSQL databases. These databases can be divided into four generes:

  • Key-Value
  • Columnar
  • Document
  • Graph

Key-Value

  • Key-Value (KV) databases such as Redis and Riak are the simplest kind of database in which keys are stored with matching values.
  • A KV-DB is essentially a lookup table that often uses hashing to speed up retrieval.
  • KV-DBs scale easily and have high performance.
  • In fact, a file system could be looked at as a key-value data store where the path is the key and the file content is the value.

Columnar

  • Columnar databases, such as HBase, are similar to key-value databases in that they store keys with information.
  • However, rather than storing a single value, a columnar database stores multiple pieces of information – similar to a record.
  • Unlike a relational database the columns do not have to be of the same data type.
  • Unlike relational schemas, column-based stores do not require a pre-structured table.
  • Each record is comprised of one or more columns containing the information and each column of each record can be different.
  • Columnar databases allow very large and un-structured data to be managed.
  • They are generally used when simple key-value pairs are not sufficient and storing many records with large amounts of information is a required.

Document

  • Document databases such as MongoDB and CouchDB are very similar to columnar databases but allow for much deeper nesting of information.
  • They are ideal for structured documents.
  • Performance is often an issue with these databases.
  • XML Stores are an example of a document database.
  • Document databases often use JavaScript as the native query language with data being exchanged between the client and the server using JSON object.

Graph

  • Graph databases such as Neo4J use “graphs” with nodes and edges connecting each other through relationships.
  • These databases are ideal in situations where “networks” or “deep connection” must be tracked, such as social networks.

The rest of the class we're working on our new assignment