GRU vs LSTM: GRU uses 2 gates and is faster, LSTM uses 3 gates and captures longer dependencies

GRU: 2 gates, faster; LSTM: 3 gates, longer dependencies

Related concepts

Triton differs from CUDA

Triton uses block-level programming, while CUDA uses thread-level programming

BFS vs DFS: BFS finds shortest path in unweighted graphs, DFS uses less memory

BFS finds shortest path in unweighted graphs; DFS uses less memory

to use an RNN/LSTM: for sequential data where order matters (mostly replaced by transformers)

Use RNN/LSTM for sequential data where order matters (mostly replaced by transformers)

CPU cache

L1/L2 cache hierarchy reduces global memory latency

LSM trees optimize: write-heavy workloads by buffering writes in memory

LSM trees optimize write-heavy workloads by buffering writes in memory

Greedy vs dynamic programming: greedy makes locally optimal choices, DP considers all subproblems

Greedy: locally optimal choices; DP: considers all subproblems

Swipe through 100 ML concepts daily