DS 4100 Day 13
DS 4100 Data Collection, Integration, and Analysis
Today we’re talking a bit about NoSQL databases. These databases can be divided into four generes:
- Key-Value (KV) databases such as Redis and Riak are the simplest kind of database in which keys are stored with matching values.
- A KV-DB is essentially a lookup table that often uses hashing to speed up retrieval.
- KV-DBs scale easily and have high performance.
- In fact, a file system could be looked at as a key-value data store where the path is the key and the file content is the value.
- Columnar databases, such as HBase, are similar to key-value databases in that they store keys with information.
- However, rather than storing a single value, a columnar database stores multiple pieces of information – similar to a record.
- Unlike a relational database the columns do not have to be of the same data type.
- Unlike relational schemas, column-based stores do not require a pre-structured table.
- Each record is comprised of one or more columns containing the information and each column of each record can be different.
- Columnar databases allow very large and un-structured data to be managed.
- They are generally used when simple key-value pairs are not sufficient and storing many records with large amounts of information is a required.
- Document databases such as MongoDB and CouchDB are very similar to columnar databases but allow for much deeper nesting of information.
- They are ideal for structured documents.
- Performance is often an issue with these databases.
- XML Stores are an example of a document database.
- Graph databases such as Neo4J use “graphs” with nodes and edges connecting each other through relationships.
- These databases are ideal in situations where “networks” or “deep connection” must be tracked, such as social networks.
The rest of the class we’re working on our new assignment