ted@tedneward.com | Blog: http://blogs.tedneward.com | Twitter: tedneward | Github: tedneward | LinkedIn: tedneward
installation
CAP Theorem
querying
storing
"applications"
"NoSQL" isn't really a great categorization
Very rarely is "not X" a useful descriptor
A better categorization/taxonomy is required
Thus, I choose to eschew "NoSQL" as a useful term
The problem is one of load/scale and contention
Repeat after me: Contention is the enemy of scalability
And scale is the issue of the Internet
With the advent of the Web, we changed our enterprise apps
Before, enterprise apps were internal
Known user base, known loads, known scale
This user base was not likely to change without huge warning
With the Web app, we began to project our enterprise apps out into the Internet
This meant an unknown and unpredictable user base
With that came an unknown and unpredictable load and scale
Scale and load began to take down the existing infrastructure
Traditionally-managed RDBMS'es simply couldn't keep up
ACID has its uses... but we found the edge really quickly
Distributed Two-Phase Commit Transactions
CAP Theorem
Consistency: all database clients see the same data, even with concurrent updates
Availability: all database clients are able to access some version of the data
Partition Tolerance: the database can be split over multiple servers
"Pick two"
RDBMS goes for C + A
Most "NoSQL" databases go for A + P
Data "shapes"
Groupings of relations, tuples, and relvars
If you don't know what these are, you don't know your relational theory
strongly-typed, enforced by the database
Object databases
capturing the object graphs that appear in O-O systems
strongly-typed, defined by an O-O language (not external schema)
Key-value systems
CRUD based solely on the primary key; no joins
weakly- or untyped
Document-oriented
collections of named fields holding data (or more collections)
weakly- or untyped
Data "shapes" (continued)
Graph databases
capturing not just graph structures, but the "arcs" between nodes
graph-based query API/language
Column-oriented/tabular
columns-and-tables, but no relations/relvars
Hierarchical databases
generally, these are XML stores
Hybrids of the above
CouchDB: document-oriented
http://couchdb.apache.org
JSON documents
JavaScript map/reduce
replication in multiple forms
MVCC (no-lock) writes
CouchDB: document-oriented
no drivers; everything via REST protocol
"Couch apps": HTML+JavaScript+Couch
Erlang implementation
Upshot: best with predefined queries, accumulating data over time
Upshot: Couch apps offer an appserver-less way to build systems
No locks; MVCC instead
Multi-Version Concurrency Control
documents are never locked; instead, changes are written as new versions of a document on top of old document
each document has a revision number+content-hash
advantage: read requests already in place can finish even in the face of a concurrent write request; new read requests return the written version
result: high parallelization utility
conflicts mean both versions are preserved
leave it to the humans to figure out the rest!
CouchDB's principal I/O is HTTP/REST
GET requests "retrieve"
PUT requests "store" or "create"
POST requests "modify"
DELETE requests "remove"
this isn't ALWAYS true
for the most part, though, it holds
the CouchDB programmer's best friend: curl
or any other command-line HTTP client
good exercise: write your own!
CouchDB's principal data is the document
schemaless key/value nestable store
(essentially a JSON object)
certain keys (_rev, _id) reserved for CouchDB's use
_id is most often a UUID/GUID
fetch one (if you like) from http://localhost:5984/_uuids
database will attach one on new documents
_rev is this document's revision number
formatted as "N-{md5-hash}"
OK to leave blank for new document...
... but must be sent back as part of the stored document
assuming the _rev's match, all is good
if not, "whoever saves a changed document first, wins"
(essentially an optimistic locking scheme)
Assume this is stored in person.json
{ "_id": "286ccf0edf77bfb6e780be88ae000d0b", "firstname": "Ted", "lastname": "Neward", "age": 40 }
... then insert it into Couch like so:
curl -X PUT http://localhost:5984/%1 -d @%2 -H "Content-Type: application/json"
Documents can have attachments
attachments are binary files associated with the doc
essentially URLs within the document
simply PUT the document to the desired {_id}/{name}
provide the document's {_rev} as a query param
after all, you are modifying the document
add ?attachments=true to fetch the attachments as binary (Base64) data when fetching the document
Documents can be inserted in bulk
send a POST to http://localhost:5984/{dbname}/_bulk_docs
include array of docs in body of POST
if updating, make sure to include _rev for each doc
"non-atomic" mode: response indicates which documents were saved and which weren't
default mode
"all-or-nothing" mode: all documents will be saved, but conflicts may exist
pass "all_or_nothing : true" in the request
Retrieving an individual document is just a GET
GET http://localhost:5984/{database}/{_id}
Retrieving multiple documents uses Views
views are essentially predefined MapReduce pairs defined within the database
create a "map" function to select the data from a given document
create a "reduce" function to collect the "map"ped data into a single result (optional)
views are "compiled" on first use
Views can be constrained by query params
?key=...
: return exact row
?startkey=...&endkey=...
: return range of rows
Updating a individual document is just a POST
POST http://localhost:5984/{database}/{_id}
Recall: Everything is a document * a document can contain JavaScript code, executed by CouchDB * this is called a 'design document' * a design document is a doc URL-prefixed with "_design"
{ "_id" : "_design/example", "views" : { "all_docs" : { "map" : "function(doc) { emit(doc._id, doc._rev) }" } } }
CouchDB supports server-side validation
add a "validate_doc_function" member to _design/
function takes three parameters:
"newDoc": the incoming document
"savedDoc": the document on disk (if any)
"userCtx": the user and their roles
CouchDB can also show arbitrary pages
add "show" functions to design document
"shows" member in design doc
array of name : function value pairs
functions take 2 params: doc (document) and req (request)
use GET /{db}/_design/{design}/_show/{show}/{id}
URL query parameters are available on request object
view "templates" are also possible, in much the same way
ditto for "lists"
CouchApp is a framework/build system for building CouchDB applications (design docs)
It stores views in the CouchDB database
Because CouchDB is a REST-based API, it effectively acts as its own app server
In other words, CouchDB can be a 1-tier application server
a schemaless pseudo-revisioning document store
JavaScript-based application engine
designed for easy replication
CouchDB: The Definitive Guide
J. Chris Anderson et al, O’Reilly, 2011
CouchDB website
http://couchdb.apache.org/
Download: http://couchdb.apache.org/downloads.html
Complete HTTP reference: http://wiki.apache.org/couchdb/Complete_HTTP_API_Reference
CouchApp
Download: https://github.com/couchapp/couchapp/downloads
CouchOne website
Downloads or signup: http://www.couchone.com/get
Who is this guy?
Architect, Engineering Manager/Leader, "force multiplier"
Co-founder, Solidify US
http://www.solidify.dev
Principal -- Neward & Associates
Author
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)
See http://www.newardassociates.com