[Repost] Discussing the Evolution of AI Programming Tools and Vibe Coding

王白水 · September 8, 2025, 9:49am

Recommended Terms

An overview of AI programming over the past 20 years

Main Text

This article is transcoded by SimpRead 简悦, original address guangzhengli.com

Discussing the evolution of AI programming tools and Vibe Coding

Recently, there has been intense discussion about AI-assisted programming and Vibe Coding. After some thought, I decided to write a blog post to express my own views.

I started using GitHub Copilot in 2023, began continuous use of Cursor in 2024, and started heavily using Claude Code a few months ago.

Although I haven’t had the time to experience all AI IDEs on the market, I believe these three products hold milestone significance for AI-assisted programming. I also feel I have accumulated some experience to talk about Vibe Coding.

Vibe Coding Is Not a Good Name

The term Vibe Coding (translated as 氛围编程 in Chinese) went from unknown to something every programmer talks about, probably starting with a tweet from Andrej Karpathy (former Tesla AI director and one of OpenAI’s founding members).

I paste the original text below:

There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Personally, I generally dislike nitpicking on words, but since Vibe Coding has now become an all-encompassing term whereby all AI-assisted and AI-generated code tools and methods are uniformly called Vibe Coding, I believe it is necessary to make some distinctions.

Based on the original content, I think Andrej Karpathy’s definition of Vibe Coding highlights several key points:

Forget the code even exists → forget about the existence of code.
Barely participate in manual programming; even minor errors are fixed by AI rather than manual edits.
No more code review of AI-written code; only look at results, and continue the conversation if unsatisfied.
It’s not bad for one-time projects and is quite fun.

So, I understand Vibe Coding initially referred to a new way of programming—completely based on LLM conversations for coding.

Possibly due to the difficulty of translating “Vibe Coding” or its lack of specific meaning, it has evolved so that any method or tool assisting programming through AI or writing code via AI is now called Vibe Coding.

I believe many current discussions and debates stem from not distinguishing these two methods.

One could be called Vibe Coding, while the other, whether named AI-assisted programming, AI Coding, Agents Coding, or Context Coding, would be more appropriate.

Although these two programming methods are fundamentally different, before discussing Vibe Coding, I want to talk about the basic AI-assisted programming approach.

Context Coding

Personally, I prefer to call AI-assisted programming methods Context Coding, meaning context-based or context-driven programming.

The reason is that, aside from the crucial improvement in the programming ability of large models (for example, models upgrading from GPT to Claude), another major advancement is the stronger context engineering capabilities of different AI programming tools.

The most widely discussed top AI programming tools basically evolved from GitHub Copilot to Cursor, then to Claude Code. I believe the success of these products lies in better scientific Context Engineering.

With the large model unchanged, all improvements in AI-assisted programming arise from the fundamental principle of providing more suitable context to the LLM (large language model), whether via Chat, RAG, Rules, MCP, or some even cooler technologies in the future.

So from the perspective of providing LLM with suitable context, let’s analyze the features of these products to learn how to better use AI to assist programming.

GitHub Copilot

I believe most people’s first AI-assisted programming tool was GitHub Copilot. Besides integrating AI chat functionality, the most impressive feature for me was code completion.

Before this, VSCode was often criticized for inferior code completion compared to IntelliJ IDEA. IDEA implemented code completion through complete code indexing, AST (Abstract Syntax Tree), and intelligent ranking algorithms; it has been highly loved by programmers for over a decade and has long been my favorite IDE.

Regardless of the project language, I used to blindly choose IDEA for programming until VSCode Copilot appeared. I began switching between VSCode and IDEA, eventually abandoning IDEA altogether.

GitHub Copilot’s initial success came from being the pioneer in sharing code context with LLMs, mainly reflected in two capabilities:

First, it could provide code of the currently open IDE window to the LLM, allowing questions and suggestions based on that visible code.

At that time (2023), this was a major breakthrough because most people copied code into ChatGPT and then copied code back from ChatGPT to the IDE.

Second, it could perform code completion based on the current file’s context. Copilot provided the code context at the cursor position to the LLM, which then suggested completions.

With this feature, my programming habit changed significantly—I would write method comments first, let Copilot generate the corresponding method based on comments, then modify method details. This mode greatly accelerated some coding tasks.

However, Copilot had obvious drawbacks at the time. Firstly, the model capability was insufficient. GPT 3.5 was impressive but prone to hallucinations in programming and had very limited context acceptance. The probability of fully accepting LLM’s suggestions and code completions was low. Also, Copilot + GPT 3.5 lacked direct code editing ability; code had to be copied manually from LLM suggestions.

Another drawback was that Copilot could only provide context from the currently open code file to the LLM, so LLM could not suggest based on other files or the whole project context.

This meant that if you implemented a method in one file, switching to another file, the LLM lacked that method’s context and could not provide completions based on calls to that method. Cross-file retrieval and modifications for a programming task was unimaginable then.

In this situation, Cursor, which provided better context, stood out.

Cursor

After Copilot appeared, many IDE plugin AI tools emerged, differing mainly in prompt and LLM optimization. Only with the emergence of Cursor as a full AI IDE did the competition among AI plugin programming tools come to an end.

Let’s first discuss improvements unrelated to context engineering.

First, Cursor designed a proprietary model for Tab-based code completion, impressing me with its speed and accuracy and high acceptance rate of code completions. This led to a joke that programmers changed from copy engineers to Tab engineers.

Second, the emergence of the Claude 3.5 Sonnet model had stronger programming ability than GPT models, along with increased context length and direct file editing capability.

These combined Cursor’s Tab model and Claude 3.5 Sonnet to evolve AI-assisted programming from a “code completion tool” to a “programming agent.”

Besides, Cursor’s context engineering is worth studying.

Cursor’s first key breakthrough in context engineering was using RAG (Retrieval Augmented Generation) to index the entire project codebase, providing the LLM with the whole project’s context via semantic (vector) search.

If you have used Cursor to open a new project, in Cursor Settings under Indexing, Cursor starts indexing your entire project and shows the number of indexed files.

The principle behind this is: upon opening a new project, Cursor splits the entire codebase locally into smaller chunks, uploads them to Cursor’s cloud servers, uses embedding models to embed the code, and stores it in a cloud vector database.

When you ask questions in Chat/Agent, Cursor embeds the prompt during inference, uses Turbopuffer for nearest neighbor search, sends back scrambled file paths and line ranges to the Cursor client, which then locally reads those file code chunks.

Turbopuffer is a serverless vector and full-text search built from scratch based on object storage. If you are unfamiliar with vector search, you can simply understand it as semantic search, finding words with similar meanings.

If you want to learn more about embeddings and vector databases, you can check my 2023 blogs GPT Application Development and Thoughts and Vector Databases.

With this capability, Cursor can retrieve code contexts relevant to your current conversation and provide them to LLM. Having multiple files’ context allows LLM to:

Achieve cross-file method calls
Fix bugs involving multiple files
Refactor entire modules
Add new features requiring modifications across many files

On this basis, Cursor also supports directly @mentioning a file or folder to provide context to the LLM, and later added the ability to index Git history.

From the perspective of context engineering, Cursor succeeds over Copilot by providing more comprehensive code context to LLM and giving users more control over context.

Besides, Cursor later added document indexing to provide LLM with the latest technical document context, and Rules-related features to supply general programming rules to ensure code generated by LLM conforms to project architecture, coding style, and tech stack.

All these features aim to provide LLM with richer and more appropriate context, so I prefer calling it Context Coding.

Claude Code

While Cursor kept leading by various context engineering optimizations, Claude Code entered the competition in a completely unexpected way. Before Claude Code, I never imagined command-line Terminal (CLI) could be used for LLM-assisted coding.

Although I hadn’t experienced AI products in terminal form before, I was attracted by the Unix style when the product was first released and wanted to try it. The actual use was very easy to get started and programming efficiency was not inferior to Cursor, even surpassing Cursor in some cases.

In actual testing of writing Next.js programs, Claude Code was on par with Cursor for small to medium programming tasks. I usually use Claude 4 Sonnet or Opus models for both tools. Cursor is more convenient in normal programming because of command + K or Tab completion.

However, for large programming tasks, such as searching and modifying more than 10 files, Claude Code’s programming effect far exceeds Cursor. I believe the fundamental reason is Claude Code provides context to LLM with a “more is better” approach.

As Cursor is a downstream product of large model providers (e.g., Claude), to succeed commercially it must balance user payment and token usage. This has led to many adjustments irritating programmers, such as model speed changes, usage limits, and automatic switching to weaker models.

After long-term use, I also want to complain: random use limits are one thing, but casually switching models automatically to “Auto” (a low-end model), and differences in model speed and power between US residential IP and CN/HK regions sometimes cause large discrepancies.

Overall, under such commercial balances and limited external competition, the actual experience has shrunk considerably.

Unlike Cursor, Claude Code, as a product of Anthropic (provider of the large language model base), is less cautious about token consumption.

Each time Claude Code starts, it begins by analyzing project structure and basic technology stack info through terminal commands. This is different from Cursor’s focus on specific tasks and small sets of code files. Claude Code adopts a more global perspective to analyze the whole project first before development. Though this consumes more tokens, the code written aligns better with the original project development style and coding standards.

Claude Code also chooses a wholly different code context retrieval approach than Cursor. It uses Unix tool-based retrieval such as grep, find, git, cat, instead of RAG.

This approach generally fits programmers’ habits better. For instance, when tackling a task and unfamiliar with code, programmers typically start keyword or regex searches from key method or object names, progressively searching until all relevant business code is found, then begin programming.

Claude Code adopts this kind of context engineering pattern—in response to your query, it repeatedly searches keywords until all needed project context code is found, then codes. Or it cycles through rounds of conversation, programming, and searching repeatedly, until LLM considers the context complete.

Claude Code’s choice of this approach vs. Cursor’s RAG has sparked much debate.

The RAG camp argues grep has low recall rate, retrieving many irrelevant results; it’s token-consuming and slow as LLM must continuously converse and retrieve new context.

The grep camp argues complex programming requires precise context, and RAG’s code retrieval accuracy is not ideal since semantic similarity does not equal contextual or business context relevance.

Also, Cursor’s file hash Merkle tree indexing and synchronization schemes can return outdated code during heavy refactoring or under index server load, providing obsolete context.

Both camps have valid points. Claude Code is slower and more token-consuming than RAG; Cursor performs worse under complex tasks compared to Claude Code. This aligns with my experience.

However, considering the era before large language models have extraordinary capacity, speed and token consumption aren’t the primary concerns—it’s whether a tool solves engineering problems effectively, which is AI programming tools’ first goal. From this view, Claude Code is preferable.

Of course, I’m not fully endorsing the grep approach. I think in the near future, full AI IDEs will offer both RAG and grep capabilities, selectively used in different cases. Cursor will certainly work to improve grep capabilities rather than relying solely on RAG.

But for products like Claude Code or Gemini CLI, integrating RAG may be unnecessary, as many might not realize Claude Code’s strengths extend beyond coding assistance to scripting workflows interoperable with all development environments. Through bash environment integration of codebases, MCP marketplaces, and DevOps workflows for CI/CD automation.

These environments perfectly suit grep and similar retrieval methods and have no need to be constrained by RAG. Claude Code and similar products have huge potential in these areas, representing a blue ocean market for AI products.

How to Better Context Coding

Having studied so much about AI products’ context engineering, can it help us better Context Code? I believe it offers much to learn from.

Since the key to AI-assisted coding lies in delivering better context to LLMs, sometimes borrowing from our everyday development logic helps us understand how to provide LLMs with better context.

Suppose you join a new project team with only basic technical and framework knowledge needed by the project (and the LLM only has the same basics).

Usually, after cloning the codebase, the first thing is to understand the tech stack used, browse the directory structure, try to understand the overall project layout and layers, and learn what each type of named file does. This process substantially aids your understanding of a new project.

Therefore, the context we provide LLMs should ideally include the general tech stack (which technologies and tools are used), directory structure (project architecture and layering), and the purpose of corresponding files (file naming and meaning).Because LLMs by default do not retain memory from previous sessions, lacking the above contexts, it is best to also save this information into instruction files, or rule files, such as GitHub Copilot’s .github/copilot-instructions.md file, Cursor’s corresponding .rulers folder, and Claude Code’s CLAUDE.md file.

At present, each AI Agent uses differently named rule files with no standardization. So if your team uses multiple AI Agents simultaneously, or tools like Cursor and Claude Code together like me, you might consider using the Ruler open-source project to unify the management of instruction files.

With the above information, LLMs can retrieve basic overall project context every time a new conversation begins before programming, which aligns with our usual practice of looking at such information first before developing requirements.

From this perspective, we can also think of more foundational project contexts, such as during regular development: besides the above contexts, we also look at commonly used commands in this new project, such as install packages, lint, test, build, etc. We also look for utility classes to see what common methods exist, where the core business modules of the new project are, the core methods and core files, and what their purposes are. Only with this information can we conduct better programming and development.

Since the above information is very helpful for us when developing an unfamiliar project, theoretically, it should also be very helpful for large LLM models (which can be simply understood as interns with only basic technical abilities).

It should be noted that more context is not always better, especially easily outdated context information such as directory structures, files that are frequently refactored, and utility classes. If these change but the instruction rule files are not updated accordingly, the damage caused is greater than not providing these contexts at all. Maintaining and updating such files is a challenge, especially for large team projects.

Besides the above basic information, you can also ask LLMs to think and develop like experienced programmers—for example, with simple tasks, directly develop; for difficult requirements, first break the requirements into multiple subtasks, record or write them into a document, update the status in that document after completing each subtask, make small step commits for each subtask, and finally delete the document file.

This process can significantly reduce hallucination issues under complex tasks. I have observed that Claude Code’s coding and development process largely follows this pattern, with some differences in implementation.

Of course, for such mature development ideas and programming specifications, we can think of more:

Progressive modifications, small step commits
Learning from existing code, finding 2-3 similar implementations, using same libraries/tools as much as possible
Code readability is more important than showing off tricks
A function should solve only one problem; if explanation is needed, it’s too complex
Don’t introduce new tools without sufficient reason

According to your project requirements and your team’s needed programming specifications, selectively adding some thoughts and specifications into your team’s instruction rule files is also very useful context for LLMs.

Of course, even if you add all current programming specifications and clean code practices into the rule files, it is still unlikely for LLMs to always write code with sufficient abstraction level and robustness. From my personal experience, it is still too difficult for LLMs to write good abstractions—perhaps the training code itself lacks many good examples of good abstractions.

Besides basic project information and programming specifications, commonly used tools and debugging techniques in development are also good sources of context for LLMs.

For example, in daily development, if you need to call a third-party library or the latest method/API, the most common approach is to visit their official documentation and look up the latest method names or API paths. For LLMs, it is the same: training data becomes outdated, so it’s best to feed the latest documentation to LLMs, which we can solve through MCPs like context7.

Debugging issues is similar: you can instruct the LLM to add logging at every angle of the problematic code, imitating the IDE’s debug mode—this allows the LLM to get enough debugging information input, just like how we see inputs and outputs of each method in debug mode. Of course, you can also obtain sufficient context for LLMs through MCPs or from browser console logs, web search answers, etc.

Having said this, I believe everyone understands that I am not implying the above instruction and rule files are silver bullets capable of solving all AI-assisted programming problems.

Rather, I want to express that since AI-assisted programming fundamentally relies on delivering appropriate context, when LLMs perform poorly, it’s best to start thinking about how to provide better context to LLMs through familiar development ideas and programming habits. Whether through Rules or MCPs, everything revolves around the goal of context engineering.

Industry giants like Claude Code have recently built in a /context command that lets you intuitively see the composition ratio of different types of tools in the currently used context, including how many tokens of context remain.

From this, it also shows that the Claude team has a profound understanding of context coding, considering how to manage context from a developer perspective.

It helps you know how many tokens remain when using LLMs, whether compression might trigger at any time, whether to compress context in advance, and which prompt words consume large token counts—helping you understand token usage, such as if there are invalid MCPs or tools consuming too large a portion.

This is the first AI tool I have seen exposing context usage to users, and is a continuously improving AI tool in the context engineering field.

The improvement of LLMs and context engineering is like the past improvements in memory hardware not conflicting with software memory management—in times when LLM large models and context capacity are hard to improve, whoever provides better context and better context engineering will stand out more easily. This applies to AI programming as well as any AI product.

Personal Views

Besides the aforementioned context coding techniques and experiences, I’d like to share some personal views.

No AI programming tool mentioned above can solve all programming problems—it’s not that Claude Code is better and Cursor is unnecessary.

In serious engineering practices, Cursor’s Tab code completion is actually the one I use most. One reason is my earlier point that LLMs have limited abilities regarding abstraction issues—they either over-abstract or misunderstand my intentions and produce unwanted results.

For example, when creating a NextDevKit template, because template code needs to be robust, clean, and have reasonable modular abstractions and layering, relying on LLM alone cannot improve efficiency.

In this case, Tab completion greatly improves efficiency. For example, I write the layering and abstractions myself, then write comments and method names, and use intensive Tab code completion to generate code efficiently—sometimes much more efficient than adjusting dialogues back and forth or letting LLM output all code at once.

Of course, Tab can be annoying sometimes; the extreme example is when I wrote this blog post, it couldn’t output what I wanted to express, and I had to disable it with shortcut keys.

Claude Code is very helpful for understanding some unfamiliar new projects. For example, I have become habitual to use Claude Code to output overall project information and status, analyze which parts to modify for tasks, and complete tasks in unfamiliar tech stacks under reasonable programming specifications—it’s indeed stronger than doing it myself.

In summary, good engineers always choose appropriate tools to solve problems rather than become tool zealots—no need to prove your value through tools.

Furthermore, LLM has also altered some of my original programming habits. For example, I used to study exotic IDE tips and shortcuts to reduce repetitive coding time.

Now I have to thank LLM for freeing me from those boring tedious tasks, and like how I used to debug issues with IDE debug mode, I generally let LLM generate lots of logs, then roll back and delete after debugging. Also, scripts and one-time codes used to need a debate whether writing scripts or doing manually is faster—now unified by LLM generation, no more internal friction or waiting.

Additionally, some engineering tasks that do not require attention to code quality or reuse—complete the task, then delete low-quality but effective code. Next time with similar needs, regenerate from scratch, which is more effective and easier to manage than building reusable code bases.

Sometimes code quality doesn’t need to be cared for—for example, generating demos for client presentations or building new product features. If feedback and users are good, then consider refactoring and quality; if not, just delete.

These are some changes that happened with LLM assistance. I’m not sure if these changes are good or bad, but in this era, we indeed need to rethink code and products—change inevitably brings more reflection.

Vibe Coding

After discussing so much about context coding experiences and views, let’s finally talk about Vibe Coding.

After reading the above, you should be very clear that even an experienced programmer assisted by LLM finds it difficult to write readable, maintainable code that supports future requirement changes. For a person without programming experience, attempting to fully launch a product and support future requirement changes exclusively through Vibe Coding is currently still a very difficult thing.

In the short term, Vibe Coding introduces defects and security vulnerabilities; in the long term, it leads to hard-to-maintain code, accumulating technical debt, and greatly reduced system comprehensibility and stability.

The most vivid explanation I saw is: letting a non-programmer use Vibe Coding to build a large project they plan to maintain “is like giving a kid a credit card without first explaining the concept of debt” (source).

Building new features is like waving this little plastic card, buying whatever you want, quickly implementing new features. Only when you need to maintain it does it become debt.

If you try to fix one Vibe Coding–caused problem with another Vibe Coding, it’s like using one credit card to pay off another’s debt.

The best annotation on this matter is the story from Leo on X, who on March 15 this year posted about using Cursor via Vibe Coding style to create a product that attracted paying users without any manual coding.

But just two days later, after that post went viral, his product was attacked—the API key usage reached its maximum, someone bypassed the subscription and created random entries in the database.

Because Leo is not familiar with technology, each problem took longer to solve than implementing a feature via Vibe Coding.

Eventually on the 20th, Leo had to shut down his product and admitted that insecure code should not have been deployed to production.

I believe most programmers seeing this story’s outcome can breathe a temporary sigh of relief, as it means there is no short-term unemployment risk and their value still exists. But long-term? How to think about your career path?

I’ve always been pessimistic. In 2023, I mentioned that in modern social divisions, a small number of excellent programmers improve code quality and performance, solve technical problems, create new solutions, and design system architectures and algorithms. But most programmers are translators, converting people’s natural language requirements and business logic into code computers can understand and execute.

It’s like you complain your boss doesn’t understand programming and counts code volume as work, while the boss complains you don’t understand business. Technically, programming’s essence is theoretical construction, creative output.

From a business perspective, capital segments programmers into front-end, back-end, algorithms, and even finer fields. The benefit is productivity improvement; focused subfields enable better technical innovation and better talent training. The downside is labor alienation; programmers become not creative outputs but producers in a single domain—just a screw in the production process, becoming a replaceable translator with no independence.

This means Vibe Coding fundamentally revolutionizes the programming industry. As LLM capabilities grow and people realize LLMs can also act as translators, Vibe Coding will continue to erode and squeeze programmers’ survival space.

From this stage, the number of average programmers will begin to decline until extinction. This process is less about AI taking jobs and more about excellent programmers taking jobs, with income disparity increasing in this phase.

The tide of the times is unstoppable. I don’t want to be overly pessimistic about the future or be overly harsh about the matter, but under the roar of industrial machines, no one will really care about the voices of traditional handicraftsmen. Before computers, it was hard to imagine how large a group ticket sellers and telephone operators were.

Of course, I’m not saying average programmers can’t find a path. With AI, an average programmer with good business acumen and some marketing skills can create more commercial value than being a mere screw in the division of labor.

Work that used to require many people to collaborate can be greatly shortened in time and personnel scale by leveraging AI. Independent development and small team collaboration will definitely become more mainstream in the future.

Like in March this year, known indie developer Peter Levels launched a fully Vibe Coding product: real-time flight simulator MMO game.

The author also declared that nearly all code (100%) was realized by AI + Cursor + Grok 3 and earned a large income by selling in-game ads, reaching from $0 to $1 million ARR in just 17 days.

Before this project, Levels was already an experienced independent developer fully capable of taking over the project at any time. I think the success of this product might not be easily replicated with different timing or founders.

I cite this example just to illustrate that jobs and positions may disappear, but demands and opportunities are always there.

In this era, the only solution to this problem and anxiety is continuous learning and constant practice. I personally believe programmers are the most learning-spirited group; regardless of industry changes, those with ongoing learning abilities will never be replaced.

I hope we all find paths we like in the new era. This article was written hurriedly and contains many personal views. If you have different opinions, you are very welcome to comment below.

Topic	Replies	Views
如何配置vscode copilot项目提示词 how to config vscode copilot project instruction prompt 💻编程 vscode , copilot	235	December 31, 2025
vscode copilot新增计划模式（plan model） 💻编程人工智能 , vscode , microsoft-copilot	336	November 30, 2025
opencode——claudecode开源替代版，对中国用户更又好工具推荐 opencode	12	March 24, 2026
26年 4 月 24 日起github copilot开始使用用户数据训练模型 🗣️闲聊 github	52	March 26, 2026
【转载】 AI 应用层三大样Prompt、RAG、Agent 🛠工具与编程人工智能	33	February 26, 2025