Objectives

What are we here to talk about today?

What is NoSQL?

Why and when should you consider using it/them?

How do you use it/them?

Take a look at (a few, of many possible) examples

Introduction to Databases

From zero to...

Introduction to Databases

What is a Database?

"A database system is basically just a computerized record-keeping system."

"The database itself can be regarded as a king of electronic filing cabinet; that is, a repository or container for a collection of computerized data files."

(Introduction to Database Systems, 8th Ed, Date)

Introduction to Databases

What is a Database?

"a collection of data, usually pertaining to some reasonably well-defined purpose"

"data has to be stored in a way that lends it certain characteristics"

data must have a known format

data should be stored, retrieved, and modified only by a special program (DBMS)

data should be under transaction control

Introduction to Databases

What is a Database?

"A place where data goes to die, never to be seen again except by database administrators who will hoard it gleefully, like Smaug's hoard under the mountain." --Anonymous

Introduction to Databases

What is a Database Management System (DBMS)?

Typically it is a software system that provides:

Ability to enter, retrieve, edit, and remove data in some record form

Some degrees of guarantees around consistency and verification

Note that this is a pretty vague definition!

Server-hosted network-aware databases

Embedded or client-hosted databases

Source-code control systems (GitHub!)

Files and file systems

Introduction to Databases

Examples

Networked relational databases (RDBMS)

Commercial: Oracle, SQLServer, DB2, Informix, ....

Open-Source: MySQL, Postgres, Maria, Apache Derby, ...

Embedded relational databases

Open-Source: SQLite, BerkeleyDB

Commercial: Microsoft Access (format)

"NoSQL"

MongoDB, CouchDB, Firebase

Redis, Cassandra, HBase, Couchbase

ArangoDB, OrientDB, FaunaDB

Introduction to Databases

What comprises a DBMS?

Data

Information and data are often synonymous, but not always

Data: what is stored in the database

Information: the meaning of that data

Metadata: data about the data (schema)

Integrated: a unification of facts, with any redundancy partly or wholly eliminated

Shared: among different users

Hardware

Most concern is around storage

Secondarily the CPU for processing

Introduction to Databases

What comprises a DBMS?

Software

Server processes

Reporting processes

Clients

Human users (direct interrogation)

User programs (bespoke programs)

Administration tools

Reporting tools

Introduction to Databases

Database System Architecture

External level: the way the data is seen by the clients

Programming language data types (int, String, etc)

Conceptual level: the way the data "feels" as a concept

Commonly called "database models" or "data models"

Internal level: the way the data is stored by the system

Highly optimized to support database functionality

Introduction to Databases

Database System Architecture (Take 2)

Storage Engine

Query Engine

Transport (Distributed System Layer)

Execution Engine

Transaction/Lock Manager

Buffer/Recovery manager

Introduction to Databases

Distributed System Layer

Manages network interaction

Primary means by which interaction with clients occurs

... in non-local/non-embedded databases

Responsible for peer nodes

... in clustered database scenarios

Provides data segregation, consistency, etc

Introduction to Databases

Query Engine

Language parser

Optimizes storage access

Sometimes called a "query plan" or "execution plan"

a sequence of operations, sometimes in bytecode

Hands query plan off to Execution Engine

Introduction to Databases

Execution Engine

Executes the query plan

Collects the results

Some may be local operations

Some may be remote operations

Some may be arbitrary non-relational code

Communicates with the Storage Engine

Introduction to Databases

Storage Engine

Manages direct-to-longterm-storage interface

Provides a simpler API to clients

Sometimes pluggable

File formats are usually closely mapped

Interacts with other components

Transaction Manager

Lock Manager

Buffer Manager/Cache Manager

Recovery Manager (operation/transaction log)

Introduction to Databases

Some interesting notes

Memory- vs Disk-based DBMS

Storage block strategies (column- vs row-oriented, others)

Data vs index files/blocks/structures

B-tree disk storage

Buffering, ordering, mutability (concerns/features)

Introduction to Databases

What is a database?

Infrastructure for efficient data management

... that we can mostly treat as a "black box"

... except for how we "see" the data inside the database

Service for keeping data safe

consistent

secure

resilient

Introduction to Databases

Terms/Glossary

record: collection of data elements

index: an optimized (sorted) collection of keys

key: a means of identifying a record

transaction: carrying out an operation against the database

lock: a mechanism designed to constrain concurrent access

query: a request to the database to retrieve some data

Data models

The 'shape' of data

Data models

Definition

"... an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact."

"The Third Manifesto", p11 (Date, Darwen, 1994)

A particular domain schema or collection of entity types

A particular "shape" to domain data models

Influences storage and relationships between entities

Influences performance of reads, writes, etc

Data models

What is a data model?

Structures: how is the data formatted ("shaped")?

Constraints: what rules are enforced on the data (if any)?

Operations: what can we do with the data (retrieve, manipulate, etc)?

Data models

Different "shapes" of data

Relational: relations, tuples, and relvars

If you don't know what these are, you don't know your relational theory

strongly-typed, enforced by the database

Objects

capturing the object graphs that appear in O-O systems

strongly-typed, defined by an O-O language (not external schema)

Key-value pairs

CRUD based solely on the primary key; no joins

weakly- or untyped

Data models

Different "shapes" of data

Documents

collections of named fields holding data (or more collections)

weakly- or untyped

Graphs

capturing not just graph structures, but the "arcs" between nodes

graph-based query API/language

Columns

think tables, then turn your head 90 degrees

we group by columns, not by rows

Data models

Different "shapes" of data

Hierarchical

single-rooted strictly "one-way" acyclic graphs

generally, these are XML stores

"Network"

predecessor type to RDBMSs, finding some interest again

collection of values with "pointers" to related data

users manually "follow" the pointers in code

Hybrids of all the above ("multi-model")

Data models

Database vs Language "shapes"

any attempt to use one model from a different model is problematic

fundamental assumptions of one not present in the other, and vice versa

called an impedance mismatch

significant loss of functionality

attempts to mitigate through tooling

Object-Relational impedance mismatch is only the most obvious

which we attempt to solve through tooling (O/R-Ms)

Object-Hierarchical, Object-Document, Object-Graph, ...

the "object" on the left-hand side is because of our choice of languages

sort of begs the question: what if we had different languages?

Data Models: Relational

What does that mean?

Data Models: Relational

Relational Model

a model of data centered around relational algebra

emphasis is on mathematical operations on relations, sets, and tuples

entities can be related to one another via attributes

most commonly expressed as "tables, columns, and rows"

CAP Theorem

What was wrong with the RDBMS, anyway?

CAP Theorem

Pre-Web

RDBMSs had great features

Efficient storage

Simple retrieval

Guaranteed data stability

But with more data came... more data

CAP Theorem

Pre-Web

We needed to "partition" data

across multiple database servers/storage engines

but we refused to lose the benefits of RDBMS

Enter "Two-Phase Commit Transactions"

And all was well and good, until...

CAP Theorem

The Web

Before, enterprise apps were internal

Known user base, known loads, known scale

This user base was not likely to change without huge warning

But now, the Web!

After, we began to project our enterprise apps out into the Internet

This meant an unknown and unpredictable user base

With that came an unknown and unpredictable load and scale

CAP Theorem

Load/scale and contention

"Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you've lost."

Werner Vogels, CTO, Amazon

But why did we care about scale?

CAP Theorem

The Result: Bad

Repeat after me: "Contention is the enemy of scalability"

Contention began to take down the existing infrastructure

Traditionally-managed RDBMS'es simply couldn't keep up

ACID has its uses... but we found the edge really quickly

CAP Theorem

The Solution (sort of)

CAP Theorem

Consistency: every read receives the most recent write or an error

Availability: every request receives a (non-error) response, without the guarantee that it contains the most recent write

Partitionability: system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

"Pick two"

CAP Theorem

Practical

RDBMS goes for C + A

Mostly owing to RDBMS embracing ACID

ACID: Atomic, Consistent, Idempotent, Durable

But we can choose A+P or C+P by establishing conventions

Most "NoSQL" databases go for A + P

NoSQLs often embrace BASE

BASE: Basically Available, Soft-state, Eventually consistent

But some keep C + A and just offer different "shapes" to data

CAP Theorem

CAP begat PACELC

PACELC: if partition (P) happens, the trade-off is between availability (A) and consistency (C); Else (E), the trade-off is between latency (L) and consistency (C)

Less known, but more accurate depiction of the problem

Data Models: Document-Oriented Data

What does that mean?

Data Models: Document-Oriented Data

What is a Document-Oriented Data Model?

semi-structured data

loosely/un- typed

data is arranged into "documents"

(typically) query-by-example retrieval facility

Data Models: Document-Oriented Data

What is a document?

documents are (more or less) independent entities

no strong relationships to one another

no strong definitions on content

semi-structured data

typically name/value pairs and arrays

semi-hierarchical (sort of)

documents usually arranged into named collections

think the filing cabinet in a doctor's or lawyer's office

Data Models: Document-Oriented Data

Advantages

loosely typed means easy refactoring

no strong relationships makes sharding/scaling easy

document concept "clusters" relevant data together

Data Models: Document-Oriented Data

Disadvantages

loosely typed means one typo ruins your whole day

data validation is whatever your code provides

this has serious impact in polyglot scenarios

lack of a standard query language hurts migration

but there is little migration story to these dbs anyway

lack of a query language makes complex queries tricky

query-by-example hurts here

no "hard links" between data elements isolates data

no "joins" in SQL terms means potentially multiple round trips

Data Models: Document-Oriented Data

Canonical Doc-Oriented examples

blog (or other CMS)

bank accounts and transaction history

customer payment history

Data Models: Columnar Data

What does that mean?

Data Models: Columnar Data

What is an columnar data model?

data attributes stored in columns, collected into rows

rows identified by row keys

but the emphasis is on columns, not rows

little to no "relationships" between rows

or between "tables", for that matter

typically very low-level data

binary arrays/buffers, strings, etc

emphasis is on speed of read/write

duplicating data as convenient

Data Models: Columnar Data

"Duplicated data?!?"

yep; the model is about data access, not abstraction

"what do we need?"

"what do we need when we query it with 'this' thing?"

example: Users

we may want to query users by email or username

so we create two "tables": one keyed on email, one on username

both "tables" store the same data

and rely on code to keep them in sync

this often makes Columnar DBs good candidates for caches

Data Models: Columnar Data

Creating/storing/updating

we don't define "types", per se

data is instead stored very "clustered"

tightly-related data elements

create the key, add the data elements, store it

updates: add the data elements to a key, store it

Data Models: Columnar Data

Queries

focused entirely on the key

little to no queries along alternate axes

pull back that row or column-collection, nothing else

Data Models: Columnar Data

Advantages

rows can be flexible in composition

tightly-clustered data stored/fetched in one "bucket"

scale, scale, scale

performance

Data Models: Columnar Data

Disadvantages

no relationships

little to no query language support

little to no query capability, period

Data Models: Graph-Oriented Data

What does that mean?

Data Models: Graph-Oriented Data

What is a graph-oriented data model?

four fundamental atoms:

nodes/vertices

labels (group nodes into sets)

relationships/arcs (between the nodes)

properties (name/value pairs on nodes and relationships)

how we use those depends on the problem

nodes are often "nouns"

arcs often connect nodes as modifiers or qualifiers

Data Models: Graph-Oriented Data

Advantages

flexible

like document- or hierarchical data models

allows for easy(er) refactoring

data model is "whiteboard friendly"

data model matches the whiteboard exactly

captures data about relationships

in other models, this would need to be modeled

data can be associated with the relationship

cyclic relationships trivial to see, model

nodes can have any number of arc connections

Data Models: Graph-Oriented Data

Disadvantages

concerns over scale

concerns over query performance

nonstandard query language

lack of schema enforcement (beyond nodes/arcs/etc)

Data Models: Graph-Oriented Data

Examples

Neo4J (https://neo4j.com)

TitanDB (http://thinkaurelius.github.io/titan/)

flexible data storage engines

Infogrid (http://www.infogrid.org)

Data Models: Object-Oriented Data

What does that mean?

Data Models: Object-Oriented Data

What is an object-oriented data model?

an object-oriented model, writ to disk

objects hold data

objects inherit

objects refer to one another

often in cyclic or acyclic graphs

Data Models: Object-Oriented Data

Creating, storing objects

instantiate the object

store it into the OODBMS "object set"

commit the object set

update the object

save (either the object or the "object set")

some OODBMSs "remember" the object, others reqire passing it into the object set again

"dirty flags"

automatic persistence of modified objects

Data Models: Object-Oriented Data

Querying objects

query-by-example

"OQL": "SQL for objects"

out of favor after 2000

query-by-native language

Important note: "fetch depth"

how far deep do we traverse?

Data Models: Object-Oriented Data

Advantages

no impedance mismatch to O-O languages

no need for O/R-Ms, no mapping, none of it

easy refactoring

just refactor the class model

easy inheritance support

no more "table-per-class/table-per-concrete" joins

Data Models: Object-Oriented Data

Disadvantages

not prevalent

in fact, often considered "dead"

concerns over scale

lack of tooling

concerns over query optimization

Data Models: Object-Oriented Data

Examples

db4o

Versant

GemStone

Data Models: Key/Value

... and distributed hashtables

Data Models: Key/Value

What is it?

buckets into which we drop data items

data items are keys that have a set of attributes

keeps related data together accessed by key

no formal structure beyond "key" and associated data (value)

Data Models: Key/Value

Advantages

super-fast retrieval (index is the lookup)

opaque values are entirely flexible to the app

Data Models: Key/Value

Disadvantages

(generally) no query capabilities beyond key-based lookup

opaque values are therefore application-specific

Data Models: Key/Value

Examples

Amazon Dynamo

Redis

Data Models: Multimodel

When you can't choose just one

Data Models: Multimodel

What is it?

database implementation that supports more than one data model natively

integrated relationship support

integrated query support

generally easier/more flexible than "polyglot persistence"

Data Models: Multimodel

Shhhh....

most databases are key-value storage engines under the hood

B-tree implementations

with sorted pointer structures for fast query

so really the multimodel support comes in a few places:

indexing targets

query language support

Practice

In theory, there's no difference|between theory and practice|In practice, however.... --(various)

Practice

Databases

SQLite

MongoDB

Couchbase

Neo4J

SQLite

What it is, what it isn't

Overview

A small-footprint full-featured RDBMS

".. a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine."

http://www.sqlite.org

Mostly SQL-92 compliant

ACID Transactions supported

Stored as a single file stored locally

Self-contained

Available as a single C file if necessary

SQLite: Getting Started

Getting it, getting in

SQLite: Getting Started

SQLite may already be installed

Many operating systems already have it

Test first: In a command-line shell, type sqlite

If you get "not found" you need to install it

If you get "SQLite version..." you already have it

SQLite: Getting Started

Installing: macOS

Do you have Homebrew?

Yes: brew install sqlite

No: get Homebrew! Seriously!

SQLite Download page: https://www.sqlite.org/download.html

SQLite: Getting Started

Installing: Linux

Linuxbrew: brew install sqlite

Debian/Ubuntu: sudo apt install sqlite

RedHat: sudo yum install sqlite -y

Others: It's probably the "sqlite" package

SQLite: Getting Started

Installing: Windows

Package Manager

WinGet: doesn't seem to be supported yet?

Chocolatey: choco install sqlite

Scoop: scoop install sqlite

Otherwise https://www.sqlite.org/download.html

Grab zip file, unzip and adjust PATH

SQLite: Getting Started

Test Again

Command-line shell: sqlite3

Should open a "transient in-memory database"

Assuming that works:

.help: Full list of meta-commands

.open: Connects to a database file

.backup: Backup database to a new file

.databases: List all databases in use

.exit: Quit the SQLite shell

SQLite: Getting Started

SQLite GUIs

SQLite Studio: https://sqlitestudio.pl/

DBeaver: https://dbeaver.io/

Neo4J

Graph-oriented

Neo4J

Neo4J: graph-oriented

data model is basically "just" nodes and arcs

graph API (Visitor) for navigation/query

Java implementation

http://www.neo4j.org

Upshot: perfect for graph analysis and storage

Getting Started w/Neo4J

From 0 to HelloWorld

Getting Started w/Neo4J

Installing Neo4J

Download: https://neo4j.com/download-neo4j-now/

Package managers:

macOS: brew install neo4j

Linux:

Windows:

Getting Started w/Neo4J

Starting up

neo4j-admin CLI tool

restricted subset available through neo4j

neo4j/neo4j-admin server commands:

status: return server status

console: start the server in the foreground

start: start server in the background

stop: stop the background server

restart: restart the background server

Getting Started w/Neo4J

Browser Editor (aka Editor)

http://localhost:7474/browser/ brings up GUI dashboard/UI

this is the Editor, the primary (developer) interface

help topics

connection to Neo4J databases

execute Editor commands (colon-prefixed)

execute Cypher (query language) commands

each command creates a "result frame"

Getting Started w/Neo4J

Initial databases

system: administrative upkeep and management

neo4j: default database, empty at start

Couchbase Overview

NoSQL meets SQL

Couchbase Overview

Couchbase Server

NoSQL

multimodel

distributed

ad-hoc query language

SDKs across mainstream languages

https://couchbase.com

Getting Started with Couchbase

From zero to queries

Getting Started with Couchbase

Installation

Cloud

https://cloud.couchbase.com/sign-up

Docker

Package managers

Homebrew: brew install --cask couchbase-server-community

Download

https://docs.couchbase.com/server/current/install/get-started.html

Getting Started with Couchbase

Docker download/run

docker run -t --name db -p 8091-8096:8091-8096 -p 11210-11211:11210-11211 couchbase/server:enterprise-7.2.0

Getting Started with Couchbase

Impatient?

Couchbase Server has several dataset samples

Settings | Sample Buckets

travel-sample

gamesim-sample

beer-sample

Choose one, CBS will set it all up

Getting Started with Couchbase

SQL++ queries

JSON document model

"json types" describe different kinds of JSON documents

SQL-like query syntax

SELECT document fields

FROM source

WHERE criteria

documents stored in collections stored in scopes

fully-qualified names: namespace:bucket.scope.collection

note that free-form documents lack the uniformity of relational tables

this is by design!

Getting Started with Couchbase

Query syntax

SELECT name FROM `gamesim-sample`;

Returns 586 elements in a JSON array

SELECT jsonType FROM 'gamesim-sample';

Returns 586 elements, all of which are either "player", "item", or "monster"

MongoDB

Document-oriented

MongoDB

What is it?

Network database server

http://www.mongodb.org

stores BSON documents

query-by-example ad-hoc query facilities

map/reduce, server-side execution

master/slave replication; sharding

Getting Started with MongoDB

0 to HelloWorld in ...

Getting Started with MongoDB

Installing

macOS:

Homebrew: brew install mongodb; brew services start mongodb

Linux:

Ubuntu: sudo apt-get install mongodb-org

RedHat: sudo yum install mongodb-org

Windows:

Chocolatey: choco install mongodb

Downloads: http://www.mongodb.org

standalone executables; just unzip and PATH

Getting Started with MongoDB

Local installation configuration

mongod.conf (text) file

configuration parameters are same as command-line parameters

Getting Started with MongoDB

Configuration file example

storage:
   dbPath: ./data
security:
   authorization: disabled

Getting Started with MongoDB

Cloud

MongoDB often available from your cloud provider

Mongo Atlas: https://www.mongodb.com/atlas/database

Getting Started with MongoDB

Client

a NodeJS console shell running JavaScript

connects to localhost server on default port (unless parameterized otherwise)

most Mongo examples assume the JS client shell

Installing

Connect to a running server with the mongo client -- client uses JavaScript and JSON

Current Mongosh Log ID:	644f441765e9cf735506f318
Connecting to:		mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.8.1
Using MongoDB:		6.0.5
Using Mongosh:		1.8.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

test> show databases
admin   40.00 KiB
config  84.00 KiB
local   40.00 KiB
test>

Getting Started with MongoDB

Drivers

MongoDB has numerous popular language drivers

Some are community-suppoted, some from the company

Select as necessary

Wrapping up

What have you learned?

Summary

This is obviously a high-level view

further investigation is necessary!

prototypes are necessary!

allowing yourself time to fail is necessary!

Credentials

Who is this guy?

Architect, Engineering Manager/Leader, "force multiplier"

http://www.newardassociates.com

http://blogs.newardassociates.com

Sr Distinguished Engineer, Capital One

Educative (http://educative.io) Author

Performance Management for Engineering Managers

Books

Developer Relations Activity Patterns (w/Woodruff, et al; APress, forthcoming)

Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)

Effective Enterprise Java (Addison-Wesley, 2004)

SSCLI Essentials (w/Stutz, et al; OReilly, 2003)

Server-Based Java Programming (Manning, 2000)

Busy Developer's Guide to NoSQL

Objectives

What are we here to talk about today?

Introduction to Databases

From zero to...

Introduction to Databases

What is a Database?

Introduction to Databases

What is a Database?

Introduction to Databases

What is a Database?

Introduction to Databases

What is a Database Management System (DBMS)?

Introduction to Databases

Examples

Introduction to Databases

What comprises a DBMS?

Introduction to Databases

What comprises a DBMS?

Introduction to Databases

Database System Architecture

Introduction to Databases

Database System Architecture (Take 2)

Introduction to Databases

Distributed System Layer

Introduction to Databases

Query Engine

Introduction to Databases

Execution Engine

Introduction to Databases

Storage Engine

Introduction to Databases

Some interesting notes

Introduction to Databases

What is a database?

Introduction to Databases

Terms/Glossary

Data models

The 'shape' of data

Data models

Definition

Data models

What is a data model?

Data models

Different "shapes" of data

Data models

Different "shapes" of data

Data models

Different "shapes" of data

Data models

Database vs Language "shapes"

Data Models: Relational

What does that mean?

Data Models: Relational

Relational Model

CAP Theorem

What was wrong with the RDBMS, anyway?

CAP Theorem

Pre-Web

CAP Theorem

Pre-Web

CAP Theorem

The Web

CAP Theorem

Load/scale and contention

CAP Theorem

The Result: Bad

CAP Theorem

The Solution (sort of)

CAP Theorem

Practical

CAP Theorem

CAP begat PACELC

Data Models: Document-Oriented Data

What does that mean?

Data Models: Document-Oriented Data

What is a Document-Oriented Data Model?

Data Models: Document-Oriented Data

What is a document?

Data Models: Document-Oriented Data