Why Data is Important for LLM
I had always thought that I could just feed any data into AI and expect a good output. One tiny example that I sometimes still do is less context when prompting. I remember asking: "Create me a set...

Source: DEV Community
I had always thought that I could just feed any data into AI and expect a good output. One tiny example that I sometimes still do is less context when prompting. I remember asking: "Create me a set of schedule to support my fundamental daily learning on Software and AI Engineer". It then created me schedules. It technically worked..., but not quite! It gave me an 8-hour straight schedule with no breaks. What I actually wanted was: Multiple learning sessions (morning, afternoon, evening) Breaks in between Software Engineer topics in the morning and afternoon AI topics in the evening As you can see, even though they have the same intent: create a set of schedules, the outcome is very different, just because of a missing context. This simple example already shows how critical input data is. And that’s just prompting. When we scale this up to real-world systems feeding data into LLMs like Gemini, ChatGPT, Qwen, or Kimi, the impact becomes much bigger. Data Types Speaking of data, I think w