System Design Memorandum on Priority Topics
TL;DR
Cache is to reduce latency in the system.
Cache prefers immutable or static data.
Cache can be stale when not updated. Ask, do we care that much?
Write Policy: Write through cache vs write back cache
Eviction policy: LRU, FIFO, LFU
How to sync between Cache and DB?
It is more complicated when syncing Cache replicas & DB replicas and maintain consistency & high throughput.
Get() follows cache-aside strategy ; update() follows write through strategy;
delete() request both operations into caches and DBs; upon DB changed, trigger some asynchronized threads to wait a certain while and notify the cache again to evict stale data that happened to be brought by concurrent reading from DB into cache (proven and incorporated by Facebook)
Hashing function basically transforms arbitrary pieces of data into fixed size values (typ. Integers)
Consistent hashing
Consistent hashing maximizes the cache hits when adding and removing cache nodes; minimizes key re-distribution; mitigate hotkey problem