SentencePiece does differently from BPE: operates on raw text including whitespace

SentencePiece tokenizes text without pre-tokenization, preserving whitespace

Image: Los Angeles Times, CC BY 4.0, via Wikimedia Commons

SentencePiece does differently from BPE: operates on raw text including whitespace

SentencePiece tokenizes text without pre-tokenization, preserving whitespace

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews