Articles are paginated with only three posts here for example. You can set the number of entries to show on this page with the “pagination” setting in the config file.
Posts
460 LFU Cache
Requirements O(1) insertion requires O(1) getting the least frequently used Suggested data structures f: a map from frequency to a tuple containing the head and tail of a portion of a double linked list kv: a map from key to a node in the double linked list dll: a double linked list where the least frequent item is at the head of the list node: element of dll, containing frequency info, as well a prev, next, value, the usual stuff put(a,v) find corresponding node via kv[a] delete head of double linked list and update f accordingly delete node from its current position and insert it at the head of the next frequency, i.
Posts
Starting hugo from scratch
deploying hugo to github.io There are a number of ways of doing this. Since I like to keep a separate branch for the deployment, I opted for the option below. Another option would be use a docs directory and add to config.toml a line with publishDir = “docs”. This would also work, and you then just need to work on master branch.
This method of maintaining a separate branch is made easier with a worktree that directly writes to a public folder in gh-pages from the master branch.
Posts
Klepmann Chapter 9 Consistency and Consensus
Consistency Guarantees eventual consistency eventually data converges different than transaction isolation isolation - avoid race conditions due to concurrent execution consistency - about coordinating replica state levels strongest - linearizability causality and total ordering commit in a distributed system consensus problem Linearizability one replica illusion (one copy of the data) guarantee read is most recent - recency guarantee read A / write C / read A ok read A / begin write C / read B / end write C / read A not ok read is concurrent with the write linearizability, 1 always follows 1 (no flipping) cas(x, vold, vnew) vs Serializability serializability is about transactions guaranteeing a sequential order linearizability is about recency guarantees on read and write importance locking and leader election lock must be linearizable all nodes must agree on who holds the lock zookeeper - used for distributed locking and leader election constraints and uniqueness guarantees enforcing uniquess (username, filename) similar to acquiring a lock similar to cas operation constraints that bank balance >= 0 two people don’t book the same flight requires a single up-to-date value for account balance or seat occupancy uniquess constraints in DBs are linearizable foreigh key and attribute constraints can be implemented without linearizability cross-channel timing dependencies File storage service is not linearizable, that is two requests went into it, one to store the image, and the other to resize the image.
Posts
Kleppmann Chapter 07 - Transactions
Definition of transaction a group of several reads and writes logical unit - looks like one operation all or nothing (commit or abort/rollback) Who needs transactions when you need certain guarantees Acid Atomicity - abortability (resiliant to faults) Consistency - some invariants about the data must be true Isolation - concurrent transactions are isolated serializability - too strong Durability - committed means data will not be lost there is no perfect guarantee Base Basically Available, Soft state, and Eventual consistency Single-Object vs Multi-Object Operations many writes atomicity : all written or none isolation : one transaction cannot see a partial write of another identify multi-object transaction tie to client’s TCP connection - bad separate transaction id Single object writes Storage engine atomicity : log for crash recovery isolation : using a lock atomic operations increment compare-and-set Multi-object transactions when do you need them?
Posts
Vector clocks
vector clocks A, B, C, D are trying to set a date.
Alice starts off (Wed, (A:1)) (Tue, (A:1,B:1)) (Tue, (A:1,B:1,D:1)) (Thu, (A:1,C:1)) -> Conflict (A:1,C:1) did not descencd from (A:1,B:1,D:1)
Descend Each marker in vclk2 must have corresponding or greater marker than in vclk1
Resolve (Thu, (A:1, B:1, C:1, D:2))
problems width of vector clock grows with number of actors, or clients.
keep growth under control with timestamp to prune old clocks revisited Actor Some entity in the system that makes an update to an object
Posts
R lang
operators Exponent: ^ Bit: & | Conditional: && ||
difference between = and <- operators Just use =
vectors (range) everything is a vector 1:9, 1:2:9, c(1,3,2,-8.1) c() for concat vector addition v1 + v2 : element wise matrices matrix(c(1,2,3,4,5,6,7,8,9), nrow=3) nrow is the number of rows or ncol for number of columns fills in column order, i.e. [[1,4,7],[2,5,8],[3,6,9]] matrix(1:9, nrow=3, byrow=T) : fills in row order matrix multiplication m1 %*% m2 transpose t(m1) slicing m1[1,3] m1[,3] : all elements on third column m1[1,] : all elements on first rows m1[,-2] : all but the second column m1[1,1] = 15 m1[,2:3] = 1 : set columns 2 and 3 to 2 m1[,2:3] = 4:9 : set columns 2 and 3 to col2:4,5,6, col3:7,8,9 m1[m1>5] : filter elements in m1 that are greater than 5 m1[m1>5] = 3 : set all elements greater than 5 to 3 loops for(i in 1:3) { print(i) } while(sum(v1)>=5) {}
Posts
Spanner
Key to paper TrueTime exposes clock uncertainty, if the uncertainty is large, Spanner slows down to wait out the uncertainty. TS reflect serialization order, external consistency, or linearizability.
Spanner zones Zone zonemaster assigns data to spanservers (1000) location proxies used by clients to locate spanservers Spanserver responsible for 1000 tablets Tablet (key: string, timestamp: int64) -> string more like multiversion kv store state store in a set of B-tree-like files and a write ahead log in DFS called colossus one Paxos state machine per tablet Paxos long time leader leases (10 sec) implements a consistently replicated bag of mappings writes initiates paxos protocol at leader reads access state directly from any replica set of replicas called Paxos group Logs: every ‘Paxos write’ goes to tablet log and to Paxos logic Replica leader every leader replica implements a lock table contains the state for 2-phase locking maps ranges of keys to lock states leaders are long lived to maintain this lock table performs poorly under optimistic concurrency control replicas in paxos group called participant slaves implements a transaction manager implements a participant leader used between other paxos groups for 2-phase commit Participant/Coordinator one participant leader (leader to a paxos group) is chosen as coordinator leader, other replica leader called coordinator slaves state of transaction manager store in underlying paxos group Directory and placement directory set of contiguous keys with common prefix unit of data placement -> same replica config unit of data movement between paxos groups paxos group/tablet contains multiple directories not necessarily lex.
Posts
KMP and Aho-Corasick
Main concept of KMP Consider the following string:
\(\underbrace{a_1 a_2 a_3 a_4}_\text{text1}\)
Posts
731 My Calendar II
Acknowledgement Taken from fun4leet’s My Calendar II wonderful article
tl;dr A max segment tree, with an incremental update function is sufficient for this problem. Max is used because such tree allows quick query of the max number of bookings given a range. Incremental update is useful because each time book is called, one needs to increment the bookings in a particular range.
Data structure 4 things: a range \([l,r]\), data, lazy flag
Posts
920 Number of music playlists
Notes on solution to Leetcode 920 Let’s take the example from the article, songs: \(\left\{abcde\right\}\), playlist: \(abacabdcbaeacbd\),
\(\bar{x} = (1,2,4,7,11)\)
For \(\bar{x}\), each number in the n-tuple indicates a position in the playlist for the first occurrence of a particular unique song. The article uses 1-indexing so I will use the same to be consistent.
As an example for the \(\bar{x}\) above, consider the playlist family:
\(p_l = (1_1,2_2,c_3,3_4,c_5,c_6,4_7,c_8,c_9,c_{10},5_{11},c_{12},c_{13},c_{14})\)
Here I have used the numbers (characters) \(1-5\), to indicate a song number, instead of the \(a-e\) in the article.