Google File System

5.1 High Availability Chunk Replication RF of 3 master clones existing replicas when chunkservers go offline or detect corrupted replicas Master Replication operation log and checkpoints replicas “shadow” masters provide read-only access file metadata like directory contest could be stale reads replica information from logs pools from chunkservers to locate chunk replicas depends on primary for decisions to create and delete replicas 5.2 Data Integrity impractical to very replica data between replicas use 32bit checksum on 64KB blocks stored persistently with logging and separate from user data in reads: chunkserver verifies the checksum before returning data client reads from another replica master creates a different replica and delete the corrupted one in appends: incrementally update the checksum for last partial checksum blocks even if last partial checksum is corrupted, new checksum value will not match stored data and corruption will be detected in writes: if write overwrites an existing range on the chunk, need to verify the first and last blocks of the range being overwritten calculate new checksums based from previous checksum so that corruption of unchanged areas will be detected 6 Measurements 1 master, two master replicas, 16 chunkservers, and 16 clients 6.

Posts

Flink

What is it a distributed runtime uses pipelines for execution exactly-once state consistency lightweight checkpoint iterative processing windows semantics out-of-order processing 2 architecture cluster client job manager task manager (1 or more) client takes a program xforms to dataflow graph submits to job manager creates data schema and serializers cost-based query optmization job manager coordinates distributed execution of dataflow tracks state and progress of each operator and stream schedules new operators coordinates checkpoints and recovery persists minimal set of data for checkpoint and recovery task manager executes one or more operators report status to job manager maintain buffer pools to buffer or materialize streams maintain network connections to exchange of streams between operators 3.

Posts

Facebook - memcached

Requirements real-time aggregate dispersed data access hot set scale refs [1,2,5,6,12,14,34,36] Front-end cluster read heavy workload (100:1 R/W) wide fanout handle failures 10 Mops/s Q: what is a wide fanout Multiple FE clusters single geo region control data replication data consistency 100 Mops/s Multiple regions muliple geo regions storage replication data consistency 1 Bops/s Pre-memcached High fanout data dependency graph for a small user request Look-aside cache why deletes over set