I use LangChain to build my own LLM application project

With the technological development of LLM, its application in business has become more and more critical. LangChain has greatly lowered the threshold for LLM application development. By introducing what LangChain is, the core components of LangChain, and how LangChain is used in actual scenarios, this article hopes to help everyone quickly get started with the development of LLM applications.

Technical exchange group

A technical Q&A and communication group has been established! Students who want to join the communication group and obtain information can directly add WeChat ID: mlc2060. When adding, please make a note: research direction + school/company + CSDN, that’s it. Then I can add you to the group.

Cutting-edge technology information, algorithm exchanges, job referrals, algorithm competitions, interview exchanges (school recruitment, social recruitment, internships), etc., developed with 10,000+ from famous universities and enterprises such as Hong Kong University of Science and Technology, Peking University, Tsinghua, Chinese Academy of Sciences, CMU, Tencent, Baidu, etc. interactive communication~

Method ①, add WeChat ID: mlc2060, remarks: technical exchange
Method ②, WeChat search public account: machine learning community, background reply: technical exchange

What is LangChain

During this year’s agile team building, I implemented one-click automated unit testing through the Suite executor. In addition to the Suite executor, what other executors does Juint have? This is where my Runner exploration journey begins!

LangChain is a framework for developing LLM-powered applications. It can be simply thought of as Spring in the LLM field and the open source version of the ChatGPT plug-in system. The core 2 functions are:

1) LLM models can be connected to external data sources.

2) Allows interaction with the LLM model and environment, using tools through Agent.

LangChain core components

Understand, first, MCube will determine whether it needs to obtain the latest template from the network based on the template cache status. When the template is obtained, the template will be loaded. During the loading phase, the product will be converted into the structure of the view tree. After the conversion is completed, the expression will be parsed through the expression engine and Obtain the correct value, parse the user-defined event through the event parsing engine and complete the event binding. After completing the parsing assignment and event binding, the view is rendered, and finally the target page is displayed on the screen.

LangChain provides various components to help use LLM, as shown in the figure below. The core components are Models, Indexes, Chains, Memory and Agent.

2.3 Models

LangChain itself does not provide LLM, but provides a universal interface to access LLM, which makes it easy to replace the underlying LLM and customize your own LLM. There are mainly 2 categories of Models:

1) LLM: A model that takes text strings as input and returns text strings, similar to OpenAI’s text-davinci-003;

2) Chat Models: Models that are supported by language models but take a list of chat messages as input and return chat messages. The commonly used ChatGPT and Claude are Chat Models.

Interacting with the model is basically by giving Prompts. LangChain uses PromptTemplate to facilitate the construction and reuse of Prompts.

from langchain import PromptTemplate

prompt_template = '''As a senior editor, please write a summary of the text between >>> and <<<.
>>> {text} <<<
'''

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
print(prompt.format_prompt(text="I love Beijing Tiananmen"))

2.2 Indexes

Indexes are integrated with external data to obtain answers from external data. As shown in the figure below, the main steps are:

1) Load various types of data sources through Document Loaders,

2) Text semantic segmentation through Text Splitters

3) Vector storage of unstructured data through Vectorstore

4) Document data retrieval through Retriever

2.2.1 Document Loaders

LangChain loads external documents through Loader and converts them into standard Document types. The Document type mainly contains two attributes: page_content contains the content of the document. meta_data is descriptive data related to the document, similar to the path where the document is located, etc.

As shown in the figure below: LangChain currently supports various structured, unstructured, public and private data

Figure 4.

2.2.2 Text Splitters

LLM generally limits the size of the context window, including 4k, 16k, 32k, etc. For large text, text segmentation is required. The commonly used text splitter is RecursiveCharacterTextSplitter. The separator can be specified through separators. It first splits by the first delimiter, and iteratively splits if the size is not met.

There are two main considerations for text segmentation:

1) Put semantically related sentences together to form a chunk. Generally, different delimiters are defined according to different document types, or you can choose to split by model.

2) Chunk is controlled to a certain size and can be calculated through functions. By default, it is calculated through the len function, and token is generally used for calculation inside the model. Token usually refers to dividing text or sequence data into small units or symbols to facilitate machine understanding and processing. Using large models related to OpenAI, you can calculate its token size through the tiktoken package.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turb
    allowed_special="all",
    separators=["\
\
", "\
", ".", ","],
    chunk_size=7000,
    chunk_overlap=0
)
docs = text_splitter.create_documents(["Text here"])
print(docs)

2.2.3 Vectorstore

Through Text Embedding models, text is converted into vectors, and semantic search can be performed to find the most similar text fragments in the vector space. Currently, commonly used vector stores supported include Faiss, Chroma, etc.

The Embedding model supports OpenAIEmbeddings, HuggingFaceEmbeddings, etc. Loading local models through HuggingFaceEmbeddings can save embedding calling costs.

#Load local model through cache_folder
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="local model address")

embeddings = embeddings_model.embed_documents(
    [
        "I love Beijing Tiananmen!",
        "Hello world!"
    ]
)

2.2.4 Retriever

The Retriever interface is used to retrieve documents based on unstructured queries. Generally, documents are stored in a vector database. The get_relevant_documents method can be called to retrieve documents relevant to the query.

from langchain import FAISS
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://in.m.jd.com/help/app/register_info.html")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["\
\
", "\
", ".", ","],
    chunk_size=800,
    chunk_overlap=0
)
docs = text_splitter.split_documents(data)
#Set your own local model path through cache_folder
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="models")
vectorstore = FAISS.from_documents(docs, embeddings)
result = vectorstore.as_retriever().get_relevant_documents("User Registration Qualification")
print(result)
print(len(result))

2.3 Chains

Langchain links various components through chains and links between chains to simplify the implementation of complex applications. Among them, there are mainly LLMChain, Sequential Chain and Route Chain.

2.3.1 LLMChain

The most basic chain is LLMChain, which consists of PromptTemplate, LLM and OutputParser. The output of LLM is generally text, and OutputParser is used to allow LLM to structure the output and analyze the results to facilitate subsequent calls.

Figure 5.

Similar to the example below, to perform keyword advance and sentiment analysis on comments, by combining PromptTemplate, LLM and OutputParser through LLMChain, you can easily achieve something that previously required constant tuning by relying on small models.

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from azure_chat_llm import llm

#output parser
keyword_schema = ResponseSchema(name="keyword", description="Comment keyword list")
emotion_schema = ResponseSchema(name="emotion", description="The emotion of the comment, positive is 1, neutral is 0, negative is -1")
response_schemas = [keyword_schema, emotion_schema]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()

#prompt template
prompt_template_txt = '''
As a senior customer service officer, please identify the keywords in the text between >>> and <<<, and whether the sentiment contained is positive, negative, or neutral.
>>> {text} <<<
RESPONSE:
{format_instructions}
'''

prompt = PromptTemplate(template=prompt_template_txt, input_variables=["text"],
                        partial_variables={<!-- -->"format_instructions": format_instructions})

#llmchain
llm_chain = LLMChain(prompt=prompt, llm=llm)
comment = "JD Logistics has nothing to say, its speed and attitude are awesome! This router is so good-looking. How can I put it, it’s hot in Thai pants! The lines, the texture, the speed, it’s so fast! Mom will never use it again I’m worried about the internet speed at home!”
result = llm_chain.run(comment)
data = output_parser.parse(result)
print(f"type={<!-- -->type(data)}, keyword={<!-- -->data['keyword']}, emotion={<!-- -->data[ 'emotion']}")

Output:

2.3.2 Sequential Chain

SequentialChains are chains that execute in a predefined order. SimpleSequentialChain is the simplest form of a sequential chain, where each step has a single input/output, and the output of one step is the input of the next step. SequentialChain is a more general form of sequential chain, allowing multiple inputs/outputs.

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

first_prompt = PromptTemplate.from_template(
    "Translate the following into Chinese:"
    "\
\
{content}"
)
# chain 1: Input: Review Output: English Review
chain_trans = LLMChain(llm=llm, prompt=first_prompt, output_key="content_zh")

second_prompt = PromptTemplate.from_template(
    "Summarize the following in one sentence:"
    "\
\
{content_zh}"
)

chain_summary = LLMChain(llm=llm, prompt=second_prompt)
overall_simple_chain = SimpleSequentialChain(chains=[chain_trans, chain_summary],verbose=True)
content = '''In a blog post authored back in 2011, Marc Andreessen warned that, “Software is eating the world.” Over a decade later, we are witnessing the emergence of a new type of technology that's consuming the world with even greater voracity: generative artificial intelligence (AI). This innovative AI includes a unique class of large language models (LLM), derived from a decade of groundbreaking research, that are capable of out-performing humans at certain tasks. And you don't have to have a PhD in machine learning to build with LLMs-developers are already building software with LLMs with basic HTTP requests and natural language prompts.
In this article, we'll tell the story of GitHub's work with LLMs to help other developers learn how to best make use of this technology. This post consists of two main sections: the first will describe at a high level how LLMs function and how to build LLM-based applications. The second will dig into an important example of an LLM-based application: GitHub Copilot code completions.
Others have done an impressive job of cataloging our work from the outside. Now, we’re excited to share some of the thought processes that have led to the ongoing success of GitHub Copilot.
'''
result = overall_simple_chain.run(content)
print(f'result={<!-- -->result}')

Output:

Figure 7.

2.3.3 Router Chain

RouterChain dynamically selects the next chain based on input, and each chain handles a specific type of input.

RouterChain consists of two components:

1) The router chain itself is responsible for selecting the next chain to be called. There are two main types of RouterChain, among which LLMRouterChain makes routing decisions through LLM, and EmbeddingRouterChain makes routing decisions through vector search.

2) Target chain list, the subchains that the router chain can route to.

After initializing RouterChain and destination_chains, combine the two through MultiPromptChain.

2.3.4 Documents Chain

The following four Chains are mainly used for Document processing. They are often used in document-based summary generation, document-based Q&A and other scenarios, and will also be reflected in subsequent implementation practices.

2.3.4.1 Stuff

StuffDocumentsChain is the simplest and most direct chain. It puts all the obtained documents as context into Prompt and passes it to LLM to get the answer.

This method can completely retain the context and call LLM less frequently. It is recommended that you use this method if you can use stuff. It is suitable for scenarios where the document split is relatively small and a small number of documents are obtained at one time, otherwise it is easy to exceed the token limit.

2.3.4.2 Refine

RefineDocumentsChain obtains answers through iterative updates. The first document is processed first, passed to llm as context, and the intermediate answer is obtained. Then the intermediate results of the first document and the second document are sent to llm for processing, and subsequent documents are processed similarly.

This method of Refine can partially retain the context, and the use of tokens can be controlled within a certain range.

2.3.4.3 MapReduce

MapReduceDocumentsChain first processes each document through LLM, and then merges the answers of all documents through LLM to obtain the final result.

MapReduce processes each document separately and can be called concurrently. But there is a lack of context between each document.

2.3.4.4 MapRerank

MapRerankDocumentsChain is similar to MapReduceDocumentsChain. Each document is first processed through LLM. Each answer will return a score, and finally the answer with the highest score is selected.

MapRerank is similar to MapReduce, calling LLM in large batches, and each document is processed independently.

2.4 Memory

Under normal circumstances, the Chain is stateless, each interaction is independent, and there is no way to know the information of previous historical interactions. LangChain uses the Memory component to save and manage historical messages, so that conversations can span multiple rounds, retaining the context of historical conversations in the current session. The Memory component supports a variety of storage media and can be integrated with Monogo, Redis, SQLite, etc., and its simple and direct form is Buffer Memory. Commonly used Buffer Memory are:

1) ConversationSummaryMemory: Save records with summary information

2) ConversationBufferWindowMemory: Save the latest n records in original form

3) ConversationBufferMemory: Save all records in original form

By looking at the chain prompt, you can find that the {history} variable passes the session context obtained from memory. The following example demonstrates the use of Memory. You can see in detail that the answer is obtained from the previous question.

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

from azure_chat_llm import llm

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)
print(conversation.prompt)
print(conversation.predict(input="My name is tiger"))
print(conversation.predict(input="1 + 1=?"))
print(conversation.predict(input="What is my name"))

Output:

2.5 Agent

Agent literally means agent. If LLM is the brain, Agent is the tools used by the agent brain. Current large models generally have problems such as outdated knowledge and low logical computing capabilities. These problems can be solved through Agent access tools. This field is currently particularly active, with products like AutoGPT (https://github.com/Significant-Gravitas/AutoGPT), BabyAGI (https://github.com/yoheinakajima/babyagi), and AgentGPT (https://github.com /reworkd/AgentGPT) and other excellent projects. Traditionally using LLM, you need to give a prompt to achieve the goal step by step. With the Agent, the goal is given and it will automatically plan and achieve the goal.

2.5.1 Agent core components

Agent: Agent, responsible for calling LLM and deciding the next action. The prompt of LLM must contain the agent_scratchpad variable to record the intermediate process of execution.

Tools: Tools, methods that Agent can call. LangChain already has many built-in tools, and you can also customize tools. Pay attention to the description attribute of Tools. LLM will decide whether to use the tool through the description.

ToolKits: Toolset, a collection of tools for a specific purpose. Similar to Office365, Gmail toolset, etc.

Agent Executor: Agent executor, responsible for actual execution.

2.5.2 Type of Agent

Generally, the Agent is initialized through the initialize_agent function. In addition to parameters such as llm and tools, the AgentType also needs to be specified.

agent = initialize_agent(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                tools=tools,
                llm=llm,
                verbose=True)
print(agent.agent.llm_chain.prompt.template)

The Agent is a zero-shot-react-description type Agent, where zero-shot indicates that only the current operation will be considered, and previous operations will not be recorded or referenced. React indicates reasoning through the ReAct framework, and description indicates whether to use the tool’s description to decide whether to use it.

Other types include chat-conversational-react-description, conversational-react-description, react-docstore, self-ask-with-search, etc., similar to chat-conversational-react-description, which records previous conversations through memory, and the response will refer to previous operations.

The template used for its reasoning decisions can be obtained through the agent.agent.llm_chain.prompt.template method.

2.5.3 Custom Tool

There are many ways to customize Tool. The simplest way is to convert a function into Tool through the @tool decorator. Note that the function must have a docString, which is the description of the Tool.

from azure_chat_llm import llm
from langchain.agents import load_tools, initialize_agent, tool
from langchain.agents.agent_types import AgentType
from datetime import date

@tool
def time(text: str) -> str:
    """
    Returns today's date.
    """
    return str(date.today())


tools = load_tools(['llm-math'], llm=llm)
tools.append(time)
agent_math = initialize_agent(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                                   tools=tools,
                                   llm=llm,
                                   verbose=True)
print(agent_math("Calculate 45 * 54"))
print(agent_math("What day is today?"))

The output is:

Figure 14.

LangChain implementation practice

Understand, first MCube will determine whether it needs to obtain the latest template from the network based on the template cache status. When the template is obtained, the template will be loaded. During the loading phase, the product will be converted into the structure of the view tree. After the conversion is completed, the expression will be parsed through the expression engine and Obtain the correct value, parse the user-defined event through the event parsing engine and complete the event binding. After completing the parsing assignment and event binding, the view is rendered, and finally the target page is displayed on the screen.

3.1 Summary of document generation

1) Load remote documents through Loader

2) Document splitting based on Token through Splitter

3) Load the summary chain, the chain type is refine, and iterate to summarize

from langchain.prompts import PromptTemplate
from langchain.document_loaders import PlaywrightURLLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from azure_chat_llm import llm

loader = PlaywrightURLLoader(urls=["https://content.jr.jd.com/article/index.html?pageId=708258989"])
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["\
\
", "\
", ".", ","],
    chunk_size=7000,
    chunk_overlap=0
)

prompt_template = '''
As a senior editor, please write a summary of the text between >>> and <<<.
>>> {text} <<<
'''
refine_template = '''
As a senior editor, based on an existing summary: {existing_answer}, improve the existing summary for the text between >>> and <<<.
>>> {text} <<<
'''

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
REFINE_PROMPT = PromptTemplate(
    template=refine_template, input_variables=["existing_answer", "text"]
)

chain = load_summarize_chain(llm, chain_type="refine", question_prompt=PROMPT, refine_prompt=REFINE_PROMPT, verbose=False)

docs = text_splitter.split_documents(data)
result = chain.run(docs)
print(result)

3.2 Q&A based on external documents

1) Load remote documents through Loader

2) Document splitting based on Token through Splitter

3) Store documents through FAISS vectors, and embedding loads HuggingFace’s text2vec-base-chinese model

4) Customize the QA prompt and answer related questions through RetrievalQA

from langchain.chains import RetrievalQA
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

from azure_chat_llm import llm

loader = WebBaseLoader("https://in.m.jd.com/help/app/register_info.html")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["\
\
", "\
", ".", ","],
    chunk_size=800,
    chunk_overlap=0
)
docs = text_splitter.split_documents(data)
#Set your own model path
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="model")
vectorstore = FAISS.from_documents(docs, embeddings)

template = """Please use the background information provided below to answer the final question. If you don't know the answer, just say no and don't try to make it up out of thin air.
Use a maximum of three sentences when answering and keep your answer as concise as possible. At the end of your answer, be sure to say "Thank you for asking!"
{context}
Question: {question}
Useful answers:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={<!-- -->"prompt": QA_CHAIN_PROMPT})

result = qa_chain({<!-- -->"query": "User registration qualification"})
print(result["result"])
print(len(result['source_documents']))

Future Development Direction

With the development of large models, LangChain should be the most popular LLM development framework at present. It can interact with external data sources, integrate various commonly used components, etc., which greatly lowers the threshold for LLM application development. Its founder Harrison Chase also jointly developed two short courses with Andrew Ng to help everyone quickly master the use of LangChain.

At present, the iterative upgrade of large models is very fast. As a framework, LangChain must also maintain a very fast iteration speed. Its development is very hard, and a large number of commits are submitted every day. A new version is released basically every few days. Its contributors have reached more than 1,200, which is very active.

Personally, I think that in addition to implementing LLM applications in conjunction with business, there are two major directions that can be further explored:

1) Further lower the development threshold of LLM applications through low-code form. Visual orchestration tools like langflow are also developing rapidly;

2) Create a more powerful Agent. Agent is to large models, I personally feel that it is similar to SQL to DB, which can greatly improve the application scenarios of LLM.

Reference materials

1. https://python.langchain.com/docs/get_started/introduction.html

2. https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide

3. https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/

4. https://lilianweng.github.io/posts/2023-06-23-agent/

5. https://mp.weixin.qq.com/s/3coFhAdzr40tozn8f9Dc-w

6. https://github.com/langchain-ai/langchain