Cleaned 10k customer records. One emoji crashed my entire pipeline.
Cleaned 10k customer records. One emoji crashed my entire pipeline. Was scraping ecommerce product reviews last month. Got 10k records, ran a cleaning script to normalize text before feeding it to ...

Source: DEV Community
Cleaned 10k customer records. One emoji crashed my entire pipeline. Was scraping ecommerce product reviews last month. Got 10k records, ran a cleaning script to normalize text before feeding it to a sentiment analysis tool. Script ran fine on test data (500 rows). Pushed it to production. 48 minutes in, the whole thing just stops. No error message. Just frozen. Thought it was memory. 10k rows shouldn't be a problem, but maybe something leaked. Restarted the process, added memory tracking. Same thing. Froze at exactly the same spot (row 6,842). Checked the CSV manually. Row 6,842 looked fine. Customer name, review text, rating. Nothing weird. Then I noticed it. The review had a π© emoji in it. Specifically: "This product is π© don't buy it" Encoding hell My script was using basic text encoding. UTF8, right? Wrong. I was reading the CSV with encoding='latin-1' because an earlier version of the data had some Spanish characters that broke with utf8. Emojis are multibyte UTF8 characters. La