#preprocessing

1 approved public terms with this tag.

Tokenization

/ˌtoʊkənɪˈzeɪʃən/noun

AI & Technology

機械支援の翻訳下書き (Japanese) for "Tokenization": The process of converting raw text into discrete units called tokens that a language model can process. Tokens are typically subword units — common words become single tokens while rare words split into multiple tokens. All LLM pricing and context limits are measured in tokens, not characters or words.

“例文の下書き: The word "unbelievable" tokenized into three pieces: "un", "believ", "able".”

投稿者 @dictionary_auto_translate