Working with LLMs makes me feel uneasy

2026-03-19 05:00

Three years ago, I tried to use ChatGPT to resolve a tricky issue with a library I was using. It generated some made up nonsense that did not even pass the IDEs language server checks. Everything was fictional and entirely useless.

Two years ago, I started using GitHub Copilot in VSCode. Large autocompletions started appearing on my screen, like it was reading my mind what I was trying to do! Unfortunately, almost never was the suggestion what I really wanted to do, so I turned that thing off.

Last year, CodeRabbit was added to our GitHub repo. This thing was catching subtle issues in code that I had overlooked, even preventing some bugs. This thing really felt useful, so we kept it.

Finally these LLM tools were bringing some value!

These LLMs are feeling amazing

I spent the end of last year on parental leave with our 2-year old son. When I got back, the first thing I did was to setup Cursor. And the off we go! This thing was generated tests for my code. It was creating React component templates for me with autocomplete. It was refactoring tricky code from our codebase. It was even generating some bigger parts of logic that would have taken hours to do before. I was feeling really powerful! Finally, I could work in my IDE and not jump around the internet and reading docs. I could just ask the LLM for what the docs say, and even explain the concepts to me clearly, with some examples if needed.

But I was still using Cursor, and my workflow had not really changed. It was still me figuring out how to build stuff, what to write where, what patterns to use. LLMs were helping, like a powerful autocomplete or a search engine on steroids. I started hearing that some people were no longer writing any code, LLMs were writing all of it. I started noticing this happening to myself as well. It was hard to go into the code panel and write stuff, rather I would just write what I wanted in the agent chat and the agent would do it for me. Even for one-line changes, I would be doing this. It for sure was not faster, but it felt more convenient.

And then the big push started. This idea of people not writing code themselves became a target. "AI will not replace developers, developers who use AI will replace developers" was a phrase that was being repeated.

Maybe these LLMs are not so amazing?

So I decided to try the magical one-shot coding machine Claude Code, the holy grail that anyone could make an app with in a couple of hours. I bought 5 euros worth of Anthropic tokens and asked Claude to create a project for me with some specific technologies. It did it, but I noticed Tailwind was on an old version, as were a lot of other packages as well. So I told Claude to upgrade Tailwind. It tried, and failed. I tried to make it do it again, it failed again. Then I pointed it to the Tailwind upgrade docs, maybe it reading the docs would help. It started working, and then ran out of tokens. I felt the familiar feeling from a couple of years ago: I would've done better if I did this myself.

But of course I was not going to give up so easily. This was were we are headed, and I'm determined to keeping pace. I used plan mode, wrote meticulously what kind of setup I wanted, told Claude to use the latest versions for dependencies, told it to right unit, integration and e2e tests, to adhere to accessibility standards, specifics of what features of the app should look like and tried again. And it started working, for 45 minutes it kept working while banging its head against failing e2e tests that it had itself wrote. And finally, it was finished! I looked at the result, it looked nice. I had instructed it to use shadcn/ui, so the app looked like every other app in the tech space since shadcn/ui took off, pretty sleek but nothing extraordinary. Certainly way better than what some of my colleagues got out of it with what looked like bootstrap from 2020. But yeah, it worked! And it fulfilled the requirements.

At closer inspection I noticed that all kinds of things were a bit off though. For example, the hamburger nav I had asked for was there, but when I opened it was tiny and had both vertical and horizontal scrollbars due to being in the wrong container. I told Claude Code, and it fixed it. Problem solved right? But something felt really off.

Starting to get uneasy about AI

Before the LLMs the development workflow was quite difficult and not straightforward at all. Before building anything, you would have some ideas what had to be built. But you had to build in small increments to validate if those ideas were actually what you wanted to build. And you had to optimize for weeding out the unknowns from your process to not hit a roadblock halfway into the project. From what I've seen the issue of not really knowing what is needed has not changed. And while LLMs might make it faster to validate the ideas, the code they produce is not final. You need to go back into it and fix all kinds of details if you want to deliver strictly what is asked. If you do not have a strict specification for how the software should work, then the LLM can make the decisions, but this is rarely the case. The process of learning issues and difficulties is also greatly diminished as the LLM dishes out complex code that manages to somehow work.

The issue is especially bad when working with something unfamiliar, since the code from the LLM looks convincing. Whenever working with an area I'm very proficient with, I can see that the code it produces is often a mess. This makes me feel very uneasy. Doing things that would have previously taken a day to research can now be done with a prompt. But I get none of the learning I did from doing the work. I see the code that the LLM produces, but I did not put effort into writing it, so none of it sticks in my memory. How do I learn the things if the expectation is that we must go very fast now? Writing code was never to slow part for me. I can write thousands of lines of code a day if I just go at it. It was always about understanding the issue, figuring out a good way to solve the issue, and quite often learning the skills to do the work that took time.

What makes the thing even worse is how supportive and nice the LLMs are. You're idea is always the best, you are a world class architect, working at the leading edge with the golden standards and using the holy grails and magic bullets of the cutting edge technologies you have chosen (deployable on edge of course for maximum performance). Oftentimes the ideas came from the LLM though, so is it really praising you or is it praising itself? I have already caught myself in a warm feeling of superiority when we find out an incredible approach to solving some long-lasting issue in our codebase, only to find out three hours later that it was an absolute dead end. Which brings me to my final point.

It's a slot machine

From the workflow perspective, a huge annoyance I have right now is that using LLM to do the work is really like pulling the lever of a slot machine. Sure, the expected return might be positive, which is absolutely a difference to real slot machines. But the psychological effect is similar. Whenever I have an issue, I have the option of solving the issue myself, or pulling the lever on the slot machine. Today, I usually pull the lever, see if it worked. Pull the lever again with a slightly modified technique, see if it now worked. Write a note about how to pull the lever in the future for this case in the future (of course the note is written with the slot machine), which might increase the probability of a good outcome in the future. And if it still is not good, I go in and start doing the work. At this point, I have maybe wasted an hour or two at the worst case. Previously, I might have already finished without having to pull the levers of the slot machine first!

Every now and then, even quite often, the slot machine does work. So you get a reward out of pulling the slot machine. And if you pull the lever enough, then you at the end get to a point that satisfies you, without doing any of the actual work. So was it that bad? Well what is bad is that this is extremely addictive. So no wonder people can't stop using LLMs to generate code, even the simplest code that they would be better of writing themselves.

Unfortunately the addictive effect does not end when your day ends and you go home. There is this one bug that would be really easy to fix, why wouldn't I just quickly tell the agent to fix it while I prepare dinner for my family? So you do it. It's not work, it's just writing a prompt, right? Next thing, you are discussing with the LLM how to fix some issue while sitting on the couch with your partner after kids went to bed.

How should we work with LLMs then?

I do not know.

At the moment, it's clear to me that LLMs do not replace competence. In fact, without guidance they are completely incompetent in many areas. Maybe the models will get better, maybe not. You need to make sure that you are competent. And that does not happen through vibe coding. You need to put in effort in order to learn.

It's also clear that LLMs are incredibly useful in some scenarios. Need to look up some function from a library that does what you need? Ask the LLM for details, and it will plow through the docs for you. This is what we have always done, it's a rare developer that have every API they use memorized for every time they need to use it. Maybe some code has aged badly and needs a refactor? The LLM can certainly help with that.

Write the entire app for you? I guess the LLM can do that also. But trying to work with it like this in the current system is incredibly frustrating. And it will not make you a better coder for sure.

What future holds?

I do not have very coherent thoughts on this, and everything seems quite chaotic at the moment. For the software industry, I see a few paths going forward:

If the current trajectory continues, we will not be writing code at all. We will also become increasingly unaware of the implementation details of the applications being produced. The work of the software developer becomes developing guardrails for the LLMs to produce high quality software. Testing becomes paramount. Developers no longer build end-user software, only pipelines and tooling for the LLM to do that part. The software development part will move entirely to the business developers or product owners. They will describe the features they want, and the LLMs produce variants from which to choose from. Communication remains the bottleneck, but teams become way smaller, which compensates for this a bit. People trained in software engineering are still valuable as maintainers, but also there their job is more to get the LLMs to fix issues they have created and prevent them from doing it again. Non-functional requirements become encoded into the LLMs instructions and are infinitely reused from there, while functional requirements come from the business directly. In this world, the model of a product owner telling software developers what to build, and the software developers using LLMs to build the software becomes redundant. The traditional software developer seizes to exist. People that can take these tools and build valuable applications with them can make or break companies. These people could come from either the traditional product owner camp, or the traditional software developer camp.

The alternative is that some kind of pushback happens. If sufficient guardrails that make LLM output reliably good cannot be created, then software developers must be kept in the loop, meticulously reviewing the code LLMs output at increasing pace. This feels like a pretty dystopian world for a software developer to be in: you are not in control of what is being done; that comes from the requirements. You are no longer doing the stuff; that is done by the LLM. But you are still somehow responsible for the output of the LLM, so you need to keep studying things that you yourself no longer do, and catch the LLMs when they miss a critical issue at line 1562 of their 10k line PR. Doesn't sound very exciting to me.

There could be a middle ground as well. Since a single application can be valuable to billions of people with low unit costs, good software developers have had an exceptional high ceiling for their salaries at least until now. The LLM agents have been touted as being able to create "personal" software for each persons use, but I really do not see this taking off in the big picture. Just think of a typical corporate environment, management of the work becomes impossible if everyone uses their own software. So, just like today, creating something exceptional is still valuable, and customers choose the better products for their use case. If actual humans can keep an edge over LLMs in creating coherent software applications, then some kind of status quo will emerge where a lot of LLM generated run-of-the-mill stuff floats around, and thoughtfully crafted software rises to the top.

At the end of the day, LLMs can only generate text, they cannot generate e.g. the network effects that actually make many of the software companies today incredibly successful. They are also not human, even if they try trick you into believing so. Let's try to remember this, even while having to adjust to working with these magical code generators and occasional geniuses.