Minimum redundant symbols = (2t + 1) * k, where t = (number of symbol errors)/(2t + 1) and k = (codeword length - data length)
Minimum redundant symbols = (2t + 1) * k, where t = (number of symbol errors)/(2t + 1) and k = (codeword length - data length)
What the compute-optimal training ratio is: roughly 20 tokens per parameter
Optimal training ratio: Approximately 20 tokens/parameter
Shannon's channel capacity: C = B log₂(1 + S/N) bits per second
Shannon's formula: C = B log₂(1 + S/N) defines channel capacity in bits/s
What LDPC codes are: low-density parity-check codes used in 5G and WiFi
LDPC codes: Low-density parity-check codes for 5G and WiFi error correction
What 300-dim word2vec encodes: trained on word co-occurrence with skip-gram window
300-dim Word2Vec trained on word co-occurrence with skip-gram window
What BPE tokenization does: iteratively merges the most frequent byte pairs
BPE tokenization merges the most frequent byte pairs iteratively to create subword units
What tl.arange(0, BLOCK_SIZE) creates: a range of indices within the current block
`np.arange(0, BLOCK_SIZE)` generates an array of indices from 0 to BLOCK_SIZE-1
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews