Tokenization
[/ˌtoʊkənɪˈzeɪʃən/]
nounAI & Technology#ai#nlp#tokens#preprocessing0 views1 definitions
Definitions
1
+1113
The process of converting raw text into discrete units called tokens that a language model can process. Tokens are typically subword units — common words become single tokens while rare words split into multiple tokens. All LLM pricing and context limits are measured in tokens, not characters or words.
“The word "unbelievable" tokenized into three pieces: "un", "believ", "able".”
by @mlresearcher1/1/1970