How dose MCP work

MCP, which stands for Model Context Protocol, is an open-source protocol proposed by Anthropic in late 2024, aimed at providing a standard communication protocol for large language models, making it easier for them to call third-party services and access third-party data.

The official analogy for MCP is that it serves as a “USB Type-C interface” for large language models. Regardless of whether it’s OpenAI, Claude, Deepseek, or any other AI model provider, and regardless of whether it’s GPT4o, deepseek-r1, or Qwen-2.5, all models can use MCP to interact with external services and data—just like computers, phones, earphones, or cameras can all use the USB-C interface for charging or data transfer.

Many articles about MCP stop at this point and immediately start demonstrating how to implement an MCP Server using the “check weather” example. But I think this is far from sufficient. Many “innovations” in the AI field aren’t really reliable. What problems does MCP actually solve, and for whom? Are these real needs or false premises? How were these problems solved before? Can MCP succeed? And what insights can we gain from it?

I. What Problems Does MCP Solve? In Anthropic’s article “Introducing the Model Context Protocol,” they explain:

As AI assistants are increasingly accepted by the mainstream, the industry has invested heavily in enhancing model capabilities, making rapid progress in reasoning and quality. However, even the most sophisticated models are limited by data isolation—trapped behind information silos and legacy systems. Because each data source has different requirements and cannot be handled in a unified way, building a system that efficiently connects to all data sources becomes extremely difficult, limiting system scalability and overall efficiency.

AI large language models’ reasoning abilities and response quality are getting better and better, but no matter how powerful they become, if they don’t know your personalized data, they can’t provide effective answers tailored to your questions. For example, if you ask deepseek how to lose weight, it will likely give you a bunch of technically correct but generic advice, because it doesn’t have your data and can’t tailor its teaching to you specifically. Your data might be in third-party systems like Keep or Apple Watch, so how can large language models access data from these third-party companies?

Improving the quality of large language model outputs has always been a core driving force behind model “evolution.” The personal weight loss example also applies to businesses using AI to improve efficiency: how can large language models become digital employees that enhance business productivity? Similarly, models need to integrate with a company’s internal services and access its internal data.

II. How Did Large Language Models Integrate with Third-Party Tools Before MCP? MCP is not the first solution to address this problem. Before it, there were numerous practices like LangChain, Function Calling, and ReAct. People have been exploring this field for quite some time.

LangChain Tools As early as late 2022, when ChatGPT first came out, LangChain proposed to orchestrate large language models by connecting various tools, APIs, and data sources to them. This was definitely a good idea, keenly recognizing both the advantages and limitations of large language models.

LangChain’s open-source community has always been very active, but industry criticism mainly focuses on its excessive abstraction, somewhat making simple problems more complex. For instance, using the LangChain framework for a simple translation task is more complex and abstract than directly using OpenAI, and the various encapsulations make debugging after errors extremely difficult.

LangChain’s documentation is also incredibly complex, with concepts that can’t fit on a single screen. I’ve tried to learn it several times but eventually gave up. If a technology is making things more complicated, it’s definitely a step backward.

Function Calling Function Calling is a solution first proposed by OpenAI, aiming to make “OpenAI’s models” more conveniently and flexibly interact with user code or external services. Developers can use tools and a series of conventional syntax in OpenAI’s API to use Function Calling, supporting tasks like calling weather APIs, executing database queries, etc.

It’s important to note that Function Calling doesn’t mean the large language model directly calls third-party APIs; rather, it infers the most suitable tool based on context and correctly assembles the parameters, allowing developers to initiate the call themselves. The specific process is shown in the diagram below.

The overall process can be divided into the following steps:

Developers need to predefine and implement functions or tools, such as “query weather.” Developers need to describe this function (e.g., the weather query function provides a service to look up weather by city) and parameters (e.g., the city’s English name, specific date). This definition is mainly to help the large language model understand the purpose and usage of the function.

The large language model decides which function to use based on the user’s input and the predefined function descriptions. For example, if a user asks “What’s the weather in Hangzhou today,” the model recognizes that it can call the “query weather” function and assembles parameters in the correct format, such as {“location”: “hangzhou”, “date”: “2025-03-23”}.

The developer or the client that encapsulates the large language model initiates a call to the “query weather” function.

The developer passes the raw data returned by the service (such as temperature value) back to the large language model.

The large language model generates the final answer based on the API result, such as “Hangzhou’s temperature today is 25°C, sunny.”

This entire process is quite universal and mature. Google Gemini’s Function Calling and Anthropic Claude’s Tool Use implement the same process. However, the three companies have slight differences in implementation. For example, Gemini’s function definition uses the special keyword function_declarations, Claude uses input_schema for parameter descriptions, while in GPT and Gemini it’s called parameters. Such differences unnecessarily increase the cost for developers using function calling.

Additionally, models that support Function Calling are fine-tuned with specific datasets; it’s not a natural ability of large language models. For instance, ChatGPT models before June 2023 didn’t support Function Calling, and deepseek r1 claimed not to support it when it was released. These factors also limit the usage scope of Function Calling.

ReAct, Reasoning and Acting ReAct (Reasoning and Acting) is another working paradigm that draws on how humans solve problems. When solving a problem, it first breaks down the problem, reasons step by step (Reasoning) about what needs to be done, then takes action (Acting) based on the reasoning results, and continues reasoning based on the action results until the problem is solved.

The paper “ReAct: Synergizing Reasoning and Acting in Language Models” found that in experiments on question answering, fact verification, and interactive decision-making tasks, ReAct gave large language models more powerful problem-solving capabilities.

This sounds abstract, but in practical use, it’s all about using prompts to make the large language model work, just with different prompt structures. For example, Qwen provided a ReAct prompt template that can “stimulate Qwen’s ability to use tools”:

TOOL_DESC = """{name_for_model}: Call this tool to interact with the {name_for_human} API. What is the {name_for_human} API useful for? {description_for_model} Parameters: {parameters} Format the arguments as a JSON object."""
REACT_PROMPT = """Answer the following questions as best you can. You have access to the following tools:
{tool_descs}
Use the following format:
Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [{tool_names}]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can be repeated zero or more times)Thought: I now know the final answerFinal Answer: the final answer to the original input question
Begin!
Question: {query}"""

As you can see, although the ReAct mode works well, the prompt might be an extremely long set of instructions, which is a very high barrier for developers, with a difficulty level comparable to communicating with someone who doesn’t understand technology. Our typical approach to such scenarios is to simply give up.

III. How Does MCP Define the Problem? In the previous section, we introduced solutions for integrating third-party tools with large language models before MCP, each with its own advantages and disadvantages. But overall, the problem is that they’re too complex. This complexity is also reflected in chaos to some extent, because these products don’t further differentiate developers. Whether it’s LangChain, Function Calling, or ReAct, they all assume that developers simultaneously develop Functions and integrate Functions into AI Agents, ultimately delivering a ChatBot to non-technical users.

Anthropic’s perception of Agents is not as ChatBots, but as AI-powered tools, such as the Claude client, Cursor, Cline, Warp, etc. Their problem definition is at a higher level than other model providers. Furthermore, developers are divided into Providers who provide data and Consumers who use data. MCP clearly defines the core scenarios for these two types of developers: Providers use MCP Server to expose their data, and Consumers use MCP Client to connect to these MCP Servers.

MCP is an open-source standard designed to help developers establish secure bidirectional connections between data sources and AI-driven tools. Its architecture is clear and concise: developers can either securely open data interfaces through MCP Server or develop AI applications (i.e., MCP Client) that connect to these servers. This protocol ensures data interaction security and enables dynamic integration of AI tools with data sources through a standardized communication framework.

As a result, the problem of “making large language models more flexibly and conveniently integrate third-party tools” is transformed into three sub-problems:

How can data-providing Providers more conveniently publish their tools/services?

How can data-consuming Consumers more easily find these services and use them more conveniently?

How can Providers, Consumers, and large language models establish mutually recognized protocols?

IV. How Does MCP Solve These Problems? MCP, introduced in late November 2024, has in just over four months shown a strong trend toward becoming the industry’s de facto standard. Besides Anthropic’s endorsement and open-source protocol, what else has it done right?

MCP’s Multi-language Support MCP provides SDKs for Python, TypeScript, Java, and Kotlin, making it easier for these programmers to implement MCP Server and MCP Client. MCP’s choices are definitely aimed at covering the largest developer communities, such as:

TypeScript: Covers Web/mobile development, TypeScript SDK released on October 23, 2024

Python: The de facto standard in data science and AI fields, Python SDK released on November 20, 2024

Kotlin: Android’s officially recommended development language, JetBrains released Kotlin SDK on December 21, 2024

Java: One of the preferred languages for building enterprise-level Web applications, VMware’s Spring AI released Java SDK on February 14, 2025

C#: One of the preferred languages for developing Windows desktop applications, Microsoft released it on March 24, 2025

From this perspective, MCP is extremely comprehensive. In contrast, LangChain still only supports Python and JavaScript, while Microsoft’s Semantic Kernel only supports C#, Python, and Java.

MCP’s “Friend Circle” Besides the official SDK providers like JetBrains, VMware, and Microsoft, MCP lists some of the servers and clients that support the MCP protocol on its website, such as MCP Servers including Docker, Github, PostgreSQL, etc., and MCP Clients including Claude’s desktop client, the popular coding IDE Cursor, the supposedly best free coding plugin Cline, etc.

These pre-configured Servers and Clients can reach developers to the greatest extent, allowing them to play with the pre-configured Servers and Clients, and creating a “herd effect” that makes new developers tend to believe this ecosystem has great prospects.

This is a very clever approach. I haven’t seen any other large language model provider put so much thought into cultivating the developer ecosystem. Having many friends is more conducive to forming a broad consensus for MCP.

How Does MCP Communicate? Many articles don’t cover this part, and the official documentation is rather vague, but how do AI-integrated tools implement natural language calls to MCP Server? I was always curious until I saw the video “Capturing AI Prompts, Deconstructing MCP’s Underlying Principles,” where the author intercepted the “chat history” between Cline and the large language model using Cloudflare’s AI gateway, solving this puzzle. I replicated the approach according to the video, and when I got the call logs, I laughed because it’s just a combination of Function Calling and ReAct!

Taking the example of a user asking “current time,” let’s look at the chat record I intercepted between Cline and the Deepseek model:

First, Cline puts the MCP tool information in the system prompt, along with the user’s actual question, sending them together to the large language model. For example, if I installed the time MCP, the descriptions of how to use the get_current_time and convert_time tools would appear in the system prompt.

Then, the large language model infers whether it needs to use a tool and which one to use based on the user’s question. After deciding, it extracts and assembles the specific parameters. After installing Time MCP, when I asked “what is the current time,” the model’s thinking and decision instruction was to use the get_current_time tool.

Next, Cline initiates a call to the tool based on the model’s instructions and sends the previous chat record Context and this execution result together to the large language model.

Finally, the large language model judges whether the problem has been solved based on the user’s original question and the execution result, and returns the final result to Cline.

As can be seen, intelligent tools (such as Cursor, Cline) all need to query the cloud-based large language model, whether using MCP or Function Calling, and their chat patterns are exactly the same.

At the same time, I also found that the essence of MCP is not the tools defined by MCP Server, but the pre-set, extremely long system prompts sent by intelligent tools to the large language model. Cline’s system prompt is over 1,000 lines, structurally guiding the large language model on how to call tools for step-by-step execution, explicit thinking-action-observation loops, fully applying the ReAct framework. For example, this part of the prompt tells the large language model how to use MCP tools.

Epilogue: Whoever Wins Developers Wins the World AI tools like Cursor, Cline, and Claude App have encapsulated complex system prompts, making things simpler for developers. Service providers can focus solely on how to publish their services as MCP Servers, while ordinary developers can focus only on which MCP Client to install.

From this perspective, it’s possible to build an AppStore or Android Market for the AI era based on MCP. As MCP Servers proliferate, the question becomes how users find high-quality MCP Servers, and how MCP Server developers can obtain commercial profits through the MCP Server Store. The good news is that the MCP community is already discussing this issue: MCP Server Registry, so let’s look forward to it.

MCP continues to boom. On March 27, 2025, even OpenAI, as Anthropic’s competitor, began supporting MCP. MCP has become the de facto standard, and the Agent field is getting more and more interesting.

How dose MCP work

May 1, 2024 • chanchan

Comments