LLM application development based on LangChain 7 – Agent

The nature of a gentleman is not different; he is good and false in nature.

Ordinary people use large language models and easily regard large language models as encyclopedias. Indeed, when using ChatGPT, it is easy for us to have the illusion that it has memorized and learned a large amount of information obtained from the Internet: no matter you ask any question, you can almost always get a seemingly reasonable answer (Occasionally, it will Pretending that you don’t understand a certain question is actually a program interception that makes it inconvenient to answer for some reason.)

In fact, as far as the current level of large language models is concerned, it is more useful to use large language models as inference engines: provide it with some text or other information sources, and the large language model may use background knowledge learned from the Internet, and will also use the background knowledge we provide new information to help answer questions, reason about content, and even decide what to do next. LangChain’s proxy framework will assist us in doing this.

An agent is a system that uses a large language model to make decisions and calls tools to perform specific operations. By setting the agent’s personality, background, and tool descriptions, you can customize the agent’s behavior so that it can understand and reason based on input text, thereby automating task processing.

Many years ago, I used a popular workflow open source software: shark. Shark supports WFMC’s “ToolAgents” interface. The way ToolAgent uses Tool is very similar to the Agent we are going to talk about today. There is nothing new under the sun. In the past, workflow systems had to deal with the external world, but now large language models communicate with the external world. It is normal to use similar mechanisms. . Of course, technology is developing. In the past, only a certain tool was fixedly called, but now the large language model will independently choose which tool to call.

Tool Agent provides Enhydra Shark with interfaces for different software. For example, MailToolAgents, SOAPToolAgent, and RuntimeApplicationToolAgent implement the connectivity required to send and receive mail, call network services, and launch locally available applications.

If the chain is the basic function of LangChain, the proxy is the most powerful function in LangChain. Today we will take an in-depth look at what a proxy is, how to create and use a proxy, how to use different types of LangChain built-in tools: math tools, Wikipedia tools, search engines, etc., and how to create your own tools when the built-in tools are not enough, like this This allows the agent to interact with any data store, interface or function.

An agent is like a versatile dispatch manager with access to and use of a suite of tools. Based on user input, the agent decides which tools to call. It can not only use multiple tools at the same time, but also use the output of one tool as the input of another tool.

Preparation

Similarly, first initialize the environment variables through the .env file. Remember that we are using Microsoft Azure’s GPT. For details, please refer to the first article of this column.

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

deployment = "gpt-4"
# deployment = "gpt-35-turbo"
model = "gpt-4"

The wikipedia library needs to be installed.

!pip install -U wikipedia

We will use the AzureChatOpenAI class. Note that the temperature will be set to 0 to eliminate any possible randomness. Because we will use LLM as the agent’s inference engine, which will be connected to other data and computing resources, we want this inference engine to be as good and accurate as possible.

AzureChatOpenAI(temperature=0, deployment_name=deployment)

Use Agent

Use tools to calculate math problems

from langchain.agents import load_tools, initialize_agent
from langchain.agents import AgentType
from langchain.chat_models import AzureChatOpenAI

llm = AzureChatOpenAI(temperature=0, deployment_name=deployment)
tools = load_tools(["llm-math","wikipedia"], llm=llm) #Load tools
agent= initialize_agent(
    tools,
    llm,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    handle_parsing_errors=True,
    verbose=True)
agent("What is the 25% of 300?")

The llm-math tool is actually a chain that combines LLM and a calculator to solve math problems.

The wikipedia tool is a program that connects to the Wikipedia API, allowing you to query content on Wikipedia and return query results.

We will use the CHAT_ZERO_SHOT_REACT_DESCRIPTION proxy type, which uses ReAct to stimulate better reasoning capabilities.

Here is a brief talk about the ReAct framework of large language models. (Be careful not to confuse it with React on the front end)

ReAct is a technology for organizing prompts to maximize the reasoning ability of language models. Through the ReAct framework, the large model will be guided to generate a task-solving trajectory, that is, observe the environment – think – take action. The observation and thinking stages are collectively called Reasoning, and the stage of implementing the next action is called Acting. Each step of the reasoning process is recorded in detail, which also improves the interpretability and credibility of large models when solving problems.

  • During the inference phase, the model observes the current environment and state and generates inference trajectories, allowing the model to induce, track and update operation plans, and even handle abnormal situations.
  • In the action phase, the model takes next steps, such as interacting with external sources (such as a knowledge base or environment) and gathering information, or giving a final answer.

These advantages of the ReAct framework give it great potential for future development. We can expect that the ReAct framework will be able to handle more and more complex tasks. Especially with the development of embodied intelligence, the ReAct framework will enable intelligent agents to conduct more complex interactions in virtual or real environments. For example, an intelligent agent might navigate a virtual environment or manipulate physical objects in a real environment. This will greatly expand the application scope of AI, allowing them to better serve our lives and work.

The above code is very simple. It just calls load_tools to load the tool function, then initializes the agent through initialize_agent, and finally runs the agent to calculate the mathematical problem: What is the 25% of 300?

This example is very simple, but although the sparrow is small and has all the internal organs, we can just use it to understand the basic framework of ReAct.

Enter the AgentExecutor chain, start thinking (Thought) about what needs to be done, and then start action (Action). A json object will be passed in here: the input of the operation (action) and the operation (action_input). The action corresponds to the tool to be used, action_input. Just use the input of the tool. Next, comes the observation result: the calculation result returned by the observation tool Answer: 75.0. After thinking about the big model (Thought), if you think you have obtained the result, you will return the final result. Answer (Final Answer). If after thinking about it you think you haven’t gotten the final result yet, you may continue to call other tools and continue Action, Action Input, Observation, Thought… (Thought/Action/Action Input/Observation can repeat N times)

Call Wikipedia interface

Next, let’s look at an example of calling the Wikipedia interface. We ask the question: What book did Tom M. Mitchell write? In order to prevent duplicate names, we add background information:

Tom M. Mitchell is an American computer scientist and the Founders University Professor at Carnegie Mellon University (CMU).

question = "Tom M. Mitchell is an American computer scientist \
and the Founders University Professor at Carnegie Mellon University (CMU).\
what book did he write?"
result = agent(question)

As you can see, the agent calls the Wikipedia tool and finds two Wikipedia pages. LLM combines the background information we gave to find the correct page, and returns the results we want from the summary of the page.

Execute Python code

If you have used GitHub Copilot/Cursor (if you have not, hurry up and try it out), or ChatGPT’s code interpreter plug-in, you know that their most commonly used function is to use LLM to write code and execute the generated code. We use proxies here to do the same thing.

from langchain_experimental.agents.agent_toolkits.python.base import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool

agent = create_python_agent(
    llm,
    tool=PythonREPLTool(),
    verbose=True
)
customer_list = [["Harrison", "Chase"],
                 ["Lang", "Chain"],
                 ["Dolly", "Too"],
                 ["Elle", "Elem"],
                 ["Geoff","Fusion"],
                 ["Trance","Former"],
                 ["Jen","Ayai"]
                ]
agent.run(f"""Sort these customers by last name and then first name
and print the output: {<!-- -->customer_list}""")

First, create a Python agent and pass in a tool: PythonREPLTool (REPL is a way to interact with code, similar to Jupyter Notebook). The agent can use REPL to execute the code and get the code running results. The results are passed back to the agent. The agent Decide what to do next.

The problem we need the agent to solve is to have it sort a set of customer lists, customer_list. A very important point here is to let the agent print the output results. The printed content will be fed back to the LLM later, allowing the LLM to reason about the results of the code output.

If you want to know the specific execution process, you can turn on Debug mode:

import langchain
langchain.debug=True
agent.run(f"""Sort these customers by \
last name and then first name \
and print the output: {<!-- -->customer_list}""")
langchain.debug=False

I again recommend the platform developed by LangChain: https://smith.langchain.com/ (Go to the previous article to find the invitation code). With it, watching the calling process is like watching lines on the palm of your hand.

Use custom tools

The above are all using LangChain’s built-in tools. A major advantage of the proxy is that it can connect your own information sources, interfaces, and data. At this time, you need to create your own tools.

For simplicity, let’s define a tool to get the current date.

You need to install the library first: !pip install DateTime

from langchain.agents import tool
from datetime import date
@tool
def time(text: str) -> str:
    """Returns todays date, use this for any
    questions related to knowing todays date.
    The input should always be an empty string,
    and this function will always return todays
    date - any date mathmatics should occur \
    outside this function."""
    return str(date.today())
agent= initialize_agent(
    tools + [time],
    llm,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    handle_parsing_errors=True,
    verbose=True)
try:
    result = agent("whats the date today?")
except:
    print("exception on external access")

Defining a tool is simple: import the tool decorator, and use this decorator for any function to convert the function into a tool that LangChain can use. The key is to write a good comment on the tool function, so that LLM knows under what circumstances the tool needs to be called and how to call it.

Outlook

At the OpenAI Developer Conference on November 6, 2023, OpenAl released a series of Assistants API to facilitate people to build AIAgent-like applications (which can learn new knowledge and then call models and tools to perform tasks). So is it unnecessary to use LangChain? Agency mechanism?

Reference

  1. Short course: https://learn.deeplearning.ai/langchain/lesson/7/agents
  2. Documentation: https://python.langchain.com/docs/modules/agents/