The LLM Is the New Parser
I spent the early 2000s writing parsers. HTML scrapers with regex that would make you cry. XML deserializers that handled seventeen flavours of "valid". CSV readers that knew a comma inside quotes ...

Source: DEV Community
I spent the early 2000s writing parsers. HTML scrapers with regex that would make you cry. XML deserializers that handled seventeen flavours of "valid". CSV readers that knew a comma inside quotes wasn't a delimiter. The pattern was always the same: the world gives you garbage, you write defensive code to extract meaning. Then APIs won. JSON with schemas. Type-safe clients. The parsing era ended. We'd civilised the machines. Now I'm building Indexatron, a local LLM pipeline for analysing family photos. LLaVA looks at an image, I ask for JSON, and I get... this: json { "description": "A dog sitting on a wooden floor", "categories": ["dog"], "people": [ {"estimated_age": "Beer is an alcoholic beverage"} ] } python The model wrapped JSON in markdown code fences. It put beer in the people array with an age field containing a Wikipedia definition. Sometimes the braces don't balance. Sometimes it returns YAML when you asked for JSON. Sound familiar? We're back to parsing unreliable output. T