ted.neward@newardassociates.com | Blog: http://blogs.newardassociates.com | Github: tedneward | LinkedIn: tedneward
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." -- Leslie Lamport, 1987
Grady Booch, X/Twitter
https://twitter.com/grady_booch/status/1468376568695181312
"Not a joke: AWS outage affects Roomba vacuum cleaners" (2020)
https://twitter.com/doctorow/status/1468322810250424321
https://www.bbc.com/news/technology-55087054
"Did Three Million Smart Toothbrushes Lead to a DDoS Attack?" (2024)
https://www.spiceworks.com/it-security/endpoint-security/news/smart-toothbrush-ddos-attack/
There are only two hard problems in distributed systems:
#2 Exactly-once delivery
#1 Guaranteed order of messages
#2 Exactly-once delivery.
@mathiasverraes, X/Twitter
https://twitter.com/mathiasverraes/status/632260618599403520
Theoretically: A distributed system is any program which uses resources out-of-process to complete its work.
These out-of-process resources can include:
databases
unstructured storage
services
remote terminals/front-ends (including Web and mobile!)
Practically: network access
"If I can't run it while in airplane mode, it's a distributed system."
"And even if I can, it might still be a distributed system."
TBL begins research in 1989 at CERN
creates "WorldWideWeb: A Proposal for a HyperText Project" in 1990
https://cds.cern.ch/record/2639699/files/Proposal_Nov-1990.pdf
open to public in 1991
made freely-accessible in 1993
in 2014, "almost two in five people around the world were using it"
https://webfoundation.org/about/vision/history-of-the-web/
an entire page (p2, out of 8) is dedicated to the concepts of hypertext
"It will aim to provide a common (simple) protocol for requesting human readable information stored at a remote system, using networks" (p3)
"The architecture of the hypertext world is one of data stored on server machines, and client processes on the same or other machines. The machines are linked by some network. ... The servers are active processes that reply to requests. The hypertext data is explicitly accessible to them." (p4)
HTTP: HyperText Transfer Protocol
URI: Universal Resource Identifier (Location)
HTML: HyperText Markup Language
the browser (which was actually an editor too)
the server ("httpd")
No other distributed system has had that level of success
Plenty have had success on the platform
But none have created a platform of similar scale
Even fractional-sized efforts have struggled at times
Amazon
Many have tried; few succeed
The Web is a different beast than what most other projects are doing
Fielding walks through this in his thesis
born in 1965; named in 1999 by MIT as a "top 100 innovator"
received his doctorate from UC Irvine in 2000
co-founded the Apache HTTP Server project
"REST": REpresentational State Transfer
"Architectural Styles and the Design of Network-based Software Architectures"
https://ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf
https://ics.uci.edu/~fielding/pubs/dissertation/top.htm
Chs 1 through 4:
"Software Architecture": a general definition
"Network-based Application Architecture": a general definition
"Network-based Architectural Styles": a catalog of styles
"Designing the Web Architecture: Patterns and Insights": requirements
Ch 5 is where rubber meets road
a detailed walkthrough of different architectural styles
"derives" REST as an architectural style from this
at each step, he describes a style, and what needs to change
"Null" style (starting point)
"Client-Server"
"Stateless" ("Client-stateless-server")
"Cache" ("Client-cache-stateless-server")
"Uniform Interface"
"Layered System"
"Code-on-Demand" (optional)
Distributed systems are really about distributed state
When the system is 100% stateless, there's little to no problem
Scale is simple
Failover is easy
What are we storing where, and for how long?
Durable vs. Transient State
state held across processing steps vs...
state held during processing
Context-based vs. Process-based state
where is the transient state held?
Relational vs. Object vs. Hierarchical vs Event vs ...
what is the "shape" of the data?
NoSQL databases add a lot more shapes...
document, graph, key-value, ...
Location
where is the durable state held?
transient state needs to be accessed very quickly
durable state must survive failures
transforming data across shapes has costs
configuration is state
"Essentially everyone, when they first build an distributed system, makes the following 10 assumptions. All turn out to be false in the long run and all cause big trouble and painful learning experiences."
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous
Channel/Medium
Wire Format
Interaction
TCP/IP
UDP/IP
filesystem
"*nix" pipes
database
ASCII text
JSON
XML
XSD-verified XML
binary
Request-Response
Fire-and-Forget (Request, no response)
Solicit-Notify
Bidirectional
Bidirectional concurrent
HTTP: TCP/IP - ASCII text - Request/Response
HTTP API: TCP/IP - JSON - Request/Response
Java RMI: TCP/IP - binary (Java serialization) - Request/Response
CORBA: TCP/IP - binary (ORB serialization) - Request/Response
NATS: TCP/IP - ASCII text - Bidirectional concurrent
How do components know about each other?
a-priori knowledge
discovery
registration
How do components restart in failure?
How do components resist stressors?
Choosing distribution tools/channels is critical
Keep in mind transient vs durable state
Always ask "When this fails...." (never "If this fails....")
Never embrace an architectural style without understanding it deeply first
Architect, Engineering Manager/Leader, "force multiplier"
http://www.newardassociates.com
http://blogs.newardassociates.com
Sr Distinguished Engineer, Capital One
Educative (http://educative.io) Author
Performance Management for Engineering Managers
Books
Developer Relations Activity Patterns (w/Woodruff, et al; APress, forthcoming)
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)