Build a Tokenizer for the Thai Language from Scratch | Towards Data Science
A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets

Source: Towards Data Science
A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets