[Repost] The Fourth MCP in AI Application Layer

Recommended Words

Previous text 【转载】 AI 应用层三大样Prompt、RAG、Agent

Main text

This article is converted by SimpRead, original address guangzhengli.com

It has been nearly a year since updating AI-related blogs. On the one hand, I was busy with side projects, and on the other hand, although AI technology has been advancing rapidly, there have not been many new developments in AI application layer development. It is generally still the three things described in the 2023 blog: Prompt, RAG, and Agent.

However, since Claude (Anthropic) led the release of MCP (Model Context Protocol) at the end of last November, AI application layer development has entered a new era.

But regarding the explanation and development of MCP, there currently seems to be little information available, so I decided to organize some of my experience and thoughts into an article to hopefully help everyone.

Why MCP is a breakthrough

We know that in the past year, AI model development has been very rapid, from GPT-4 to Claude Sonnet 3.5 to Deepseek R1, with clear improvements in reasoning and hallucination.

There are many new AI applications, but one thing we can all feel is that current market AI applications are basically brand new services and have not integrated with the services and systems we commonly use. In other words, integration between AI models and our existing systems has been progressing very slowly.

For example, currently, we cannot simultaneously use an AI application to perform online search, send emails, publish our own blog, etc. These functions are not difficult to implement individually, but integrating them all into one system is almost unattainable.

If you do not have a specific sense of this yet, let’s think about daily development and imagine that in the IDE, we could use the IDE’s AI to complete the following tasks:

  • Ask AI to query existing data in a local database to assist development
  • Ask AI to search Github Issues to determine whether a problem is a known bug
  • Use AI to send PR comments to colleagues via instant messaging software (such as Slack) for code review
  • Use AI to query or even modify current AWS or Azure configurations to complete deployment

The functions mentioned above are currently becoming a reality through MCP. You can follow Cursor MCP and Windsurf MCP for more information. You can try using the Cursor MCP + browsertools plugin to experience the capability of automatically obtaining Chrome DevTools console logs within Cursor.

Why is AI integration with existing services progressing so slowly? There are many reasons: on one hand, enterprise-level data is very sensitive, and most enterprises require a long time and process to make changes. On the other hand, technically, we lack an open, universal, and consensus-based protocol standard.

MCP is an open, universal, and consensus-based protocol standard led by Claude (Anthropic). If you are a developer familiar with AI models, you should not be unfamiliar with Anthropic. They released the Claude 3.5 Sonnet model, which is still probably the strongest programming AI model (just after writing this, 3.7 was released :sweat_smile:).

It is worth mentioning here that the best opportunity to release this protocol probably belonged to OpenAI. If OpenAI had promoted the protocol when GPT was first released, I believe no one would have refused. But OpenAI became CloseAI and only released a closed GPTs. Such standard protocols that require leadership and consensus are usually hard to spontaneously form from the community and are typically led by industry giants.

AfterClaude released MCP, the official Claude Desktop opened the MCP feature and promoted the open-source organization Model Context Protocol, involving different companies and communities. Below are some examples of MCP servers released by different organizations.

Official MCP Integration Tutorials:

  • Git - Git reading, operations, and search.
  • GitHub - Repo management, file operations, and GitHub API integration.
  • Google Maps - Integration with Google Maps for location information.
  • PostgreSQL - Read-only database queries.
  • Slack - Sending and querying Slack messages.

:military_medal: Examples of Third-Party Platforms Officially Supporting MCP

MCP servers built by third-party platforms.

  • Grafana - Searching and querying data in Grafana.
  • JetBrains – JetBrains IDEs.
  • Stripe - Interaction with Stripe API.

:globe_showing_americas: Community MCP Servers

Below are some MCP servers developed and maintained by the open-source community.

  • AWS - Operating AWS resources with LLM.
  • Atlassian - Interacting with Confluence and Jira, including searching/querying Confluence spaces/pages, accessing Jira Issues, and projects.
  • Google Calendar - Integration with Google Calendar, scheduling, finding times, and adding/deleting events.
  • Kubernetes - Connecting to Kubernetes clusters and managing pods, deployments, and services.
  • X (Twitter) - Interacting with Twitter API, posting tweets, and searching tweets via queries.
  • YouTube - Integration with YouTube API for video management, short video creation, etc.

Why MCP?

By now, you might have a question: When OpenAI released GPT function calling in 2023, wasn’t it also possible to achieve similar functionalities? The AI Agents introduced in our previous blog are designed to integrate different services. So why does MCP appear again?

What are the differences between function calling, AI Agent, and MCP?

Function Calling

  • Function Calling refers to the mechanism where the AI model automatically executes functions based on context.
  • Function Calling acts as a bridge between AI models and external systems. Different models have different implementations of Function Calling, and the code integration methods differ. It is defined and implemented by different AI model platforms.

Model Context Protocol (MCP)

  • MCP is a standard protocol, like a Type-C protocol for electronic devices (which can charge and transfer data), enabling AI models to seamlessly interact with different APIs and data sources.
  • MCP aims to replace fragmented Agent code integration to make AI systems more reliable and efficient. By establishing a universal standard, service providers can launch AI capabilities of their own services based on the protocol, supporting developers to build more powerful AI applications faster. Developers do not need to reinvent the wheel and can use open-source projects to build a strong AI Agent ecosystem.
  • MCP can maintain context across different applications/services, enhancing the overall autonomous task execution capability.

AI Agent

  • AI Agent is an intelligent system that can autonomously run to achieve specific goals. Traditional AI chat only provides suggestions or requires manual task execution, while AI Agent can analyze specific situations, make decisions, and take action on its own.
  • AI Agent can leverage the function descriptions provided by MCP to understand more context and automatically execute tasks across various platforms/services.

Differences

Simply put, MCP tells AI Agent the list of capabilities of different services and platforms. AI Agent then uses context and model reasoning to decide whether to call a particular service and then uses Function Calling to execute functions. These functions are informed to Function Calling through MCP, and the entire process is completed via the specific code provided by the MCP protocol.

So the main benefits of MCP for the community ecosystem are:

  • An open standard for service providers so they can expose their APIs and some capabilities for MCP.
  • Developers need not reinvent the wheel, and can enhance their Agent with existing open-source MCP services.

Thoughts

Why has MCP been widely accepted after Claude launched it? Personally, I participated in several small AI project developments in the past year, and integrating AI models with existing systems or third-party systems was quite troublesome during development.

Although there are some frameworks supporting Agent development on the market, such as LangChain Tools, LlamaIndex, or Vercel AI SDK.

LangChain and LlamaIndex are both open-source projects, but their overall development is quite chaotic. First, their code abstraction level is too high. Their promotion mostly encourages developers to complete certain AI functions with just a few lines of code, which works well in demos but leads to very poor programming experiences once the business logic becomes complex. Also, these projects are too focused on commercialization and neglect overall ecosystem building.

Then there is Vercel AI SDK. Although I personally feel that Vercel AI SDK’s code abstraction is relatively good, it mainly works well for frontend UI integration and some AI function packaging. The biggest problem is that it is too deeply tied to Next.js, with insufficient support for other frameworks and languages.

Therefore, Claude driving MCP is a great opportunity. First, Claude Sonnet 3.5 holds a high position among developers, and MCP is an open standard, so many companies and communities are willing to participate, hoping that Claude can maintain a good open ecosystem.

How MCP works

Now let’s introduce how MCP works. First, let’s look at the official MCP architecture diagram.

It is divided into the following five parts:

  • MCP Hosts: Hosts refer to applications that the LLM connects and starts, such as Cursor, Claude Desktop, and Cline.
  • MCP Clients: Clients maintain 1:1 connections with Servers inside Host applications.
  • MCP Servers: Provide context, tools, and prompts for the Client-side through standardized protocols.
  • Local Data Sources: Local files, databases, and APIs.
  • Remote Services: External files, databases, and APIs.

The core of the entire MCP protocol is the Server because Host and Client concepts are familiar to those who know computer networks and are easy to understand. But how to understand the Server?

Look at the development process of Cursor’s AI Agent, we find that the development of AI automation progresses from Chat to Composer and then evolves into a complete AI Agent.

AI Chat only provides suggestions. How to convert AI responses into behavior and final results relies entirely on humans, e.g., manual copy-paste or making some modifications.

AI Composer can automatically modify code but requires human participation and confirmation and cannot perform operations other than modifying code.

AI Agent is a fully automated program that in the future can automatically read Figma images, generate code, read logs, debug code, and push code to GitHub.

MCP Server exists to realize AI Agent automation. It is an intermediary layer telling AI Agents which services, APIs, and data sources exist. AI Agent decides whether to call a service based on the Server’s information and uses Function Calling to execute functions.

How MCP Server works

Let’s look at a simple example. Suppose we want AI Agent to complete automatic search for GitHub repositories, then search Issues, then check if it is a known bug, and finally decide whether to create a new Issue.

We need to create a Github MCP Server. This Server needs to provide three capabilities: search Repository, search Issues, and create Issue.

Let’s directly look at the code:

const server = new Server(
  {
    name: "github-mcp-server",
    version: VERSION,
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "search_repositories",
        description: "Search for GitHub repositories",
        inputSchema: zodToJsonSchema(repository.SearchRepositoriesSchema),
      },
      {
        name: "create_issue",
        description: "Create a new issue in a GitHub repository",
        inputSchema: zodToJsonSchema(issues.CreateIssueSchema),
      },
      {
        name: "search_issues",
        description: "Search for issues and pull requests across GitHub repositories",
        inputSchema: zodToJsonSchema(search.SearchIssuesSchema),
      }
    ],
  };
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  try {
    if (!request.params.arguments) {
      throw new Error("Arguments are required");
    }

    switch (request.params.name) {
      case "search_repositories": {
        const args = repository.SearchRepositoriesSchema.parse(request.params.arguments);
        const results = await repository.searchRepositories(
          args.query,
          args.page,
          args.perPage
        );
        return {
          content: [{ type: "text", text: JSON.stringify(results, null, 2) }],
        };
      }

      ```case "create_issue": {
        const args = issues.CreateIssueSchema.parse(request.params.arguments);
        const { owner, repo, ...options } = args;
        const issue = await issues.createIssue(owner, repo, options);
        return {
          content: [{ type: "text", text: JSON.stringify(issue, null, 2) }],
        };
      }

      case "search_issues": {
        const args = search.SearchIssuesSchema.parse(request.params.arguments);
        const results = await search.searchIssues(args);
        return {
          content: [{ type: "text", text: JSON.stringify(results, null, 2) }],
        };
      }

      default:
        throw new Error(`Unknown tool: ${request.params.name}`);
    }
  } catch (error) {}
});

async function runServer() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("GitHub MCP Server running on stdio");
}

runServer().catch((error) => {
  console.error("Fatal error in main():", error);
  process.exit(1);
});

In the code above, we use server.setRequestHandler to tell the Client side what capabilities we provide. The description field describes the purpose of this capability, and the inputSchema defines the input parameters needed to complete this capability.

Now let’s take a look at the specific implementation code:

export const SearchOptions = z.object({
  q: z.string(),
  order: z.enum(["asc", "desc"]).optional(),
  page: z.number().min(1).optional(),
  per_page: z.number().min(1).max(100).optional(),
});

export const SearchIssuesOptions = SearchOptions.extend({
  sort: z.enum([
    "comments",
    ...
  ]).optional(),
});

export async function searchUsers(params: z.infer<typeof SearchUsersSchema>) {
  return githubRequest(buildUrl("https://api.github.com/search/users", params));
}

export const SearchRepositoriesSchema = z.object({
  query: z.string().describe("Search query (see GitHub search syntax)"),
  page: z.number().optional().describe("Page number for pagination (default: 1)"),
  perPage: z.number().optional().describe("Number of results per page (default: 30, max: 100)"),
});

export async function searchRepositories(
  query: string,
  page: number = 1,
  perPage: number = 30
) {
  const url = new URL("https://api.github.com/search/repositories");
  url.searchParams.append("q", query);
  url.searchParams.append("page", page.toString());
  url.searchParams.append("per_page", perPage.toString());

  const response = await githubRequest(url.toString());
  return GitHubSearchResponseSchema.parse(response);
}

It is clearly seen that our final implementation interacts with GitHub through the https://api.github.com API. We use the githubRequest function to call GitHub’s API and return the results.

Before calling GitHub’s official API, the main work of MCP is to describe what capabilities the Server provides (for the LLM), what parameters are needed (and what these parameters do), and what the returned results are.

Therefore, the MCP Server is not something novel or sophisticated; it is simply a protocol with consensus.

If we want to build a more powerful AI Agent, such as automatically searching relevant GitHub repositories based on local error logs, then searching Issues, and finally sending the results to Slack.

We might need to create three different MCP Servers: one Local Log Server for querying local logs; one GitHub Server for searching Issues; and one Slack Server for sending messages.

After the user inputs the command “I need to query local error logs and send related Issues to Slack,” the AI Agent determines which MCP Servers to call, decides the calling sequence, and based on the returned results from different MCP Servers, decides whether to call the next Server, thus completing the entire task.

How to use MCP

If you haven’t tried how to use MCP yet, we can consider using Cursor (which I have personally tried), Claude Desktop, or Cline to experience it.

Of course, we don’t need to develop MCP Servers ourselves. The advantage of MCP is universality and standardization, so developers don’t need to reinvent the wheel (although learning might involve reinventing the wheel).

The first recommendation is some official organization Servers: Official MCP Server List.

Currently, the community MCP Servers are still somewhat disorganized, with many lacking tutorials and documentation, and many codes having functional issues. We can try some examples from Cursor Directory. I won’t go into specific configuration and practical details here; you can refer to the official documentation.

Some MCP resources

Here are some personally recommended MCP resources for your reference.

Official MCP Resources

Community MCP Server Lists

Final words

This article was written somewhat hastily, so errors are inevitable. I welcome any corrections from experts.

This article can be reprinted, but please indicate the source. It is simultaneously published on X/Twitter, Xiaohongshu, and WeChat Official Account. Please follow if interested.

References