Use RNN/LSTM for sequential data where order matters (mostly replaced by transformers)

Image: N509FZ, CC BY-SA 4.0, via Wikimedia Commons

to use an RNN/LSTM: for sequential data where order matters (mostly replaced by transformers)

Use RNN/LSTM for sequential data where order matters (mostly replaced by transformers)

Related concepts

transformers use LayerNorm not BatchNorm

LayerNorm normalizes across all features, accommodating variable-length sequences unlike BatchNorm, which relies on fixed-size batches

to use log-transform: when data is right-skewed or spans multiple orders of magnitude

Log-transform: Apply when data is right-skewed or spans multiple orders of magnitude

Pre-LN transformers are easier to train

Pre-LN transformers use residual connections, allowing gradients to flow more smoothly during backpropagation

Pre-LN

Pre-LN: LayerNorm before attention; Post-LN: LayerNorm after attention

Batch norm vs layer norm: BN across batch, LN across features

Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences

to use a CNN: for data with spatial structure like images or time series

CNNs excel in recognizing patterns in spatially structured data such as images or time series

Swipe through 100 ML concepts daily