LangChain combines SQLAgent to implement TextToSql

LangChain currently provides SQL Chain (SqlDatabaseChain) and SQL Agent (SqlAgent) methods to interact with data stored in the database.

You can refer to the previous article: Using LangChain to implement TextToSql-CSDN blogicon-default.png?t=N7T8https://blog.csdn.net/wangning0714/article/details/133915510?spm=1001.2014 .3001.5502

LangChain makes it easier to build complex applications using LLM. It allows users to connect LLM to other data sources. By connecting LLM to other data sources, applications can process a wider range of information. This makes the application more powerful and diverse.

An agent is a component that has access to a suite of tools, including large language models (LLMs). Its distinguishing feature is the ability to make informed decisions based on user input, utilizing appropriate tools until a satisfactory answer is obtained. For example, in the case of text-to-SQL conversion, LangChain SQLAgent will not give up if an error occurs while executing the generated SQL. Instead, it will attempt to recover and correct the problem by explaining the error in subsequent LLM calls. Therefore, in theory, SQLAgent should be better than SQLChain in terms of productivity and accuracy.

This article shows how to let SQLAgent perform operations against our mysql database.

1. Dependence on the environment

The installation of mysql database is not shown here. Make sure there is an API-KEY available.

from langchain.llms import OpenAI
from langchain.sql_database import SQLDatabase
from langchain_experimental.sql.base import SQLDatabaseChain
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType

Mysql table structure display:

Create table sql as follows:

--Record user information (id, name, email)
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(255),
    email VARCHAR(255)
);

--Record user order information (order number, user ID, product ID, order time)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    user_id INT,
product_id INT,
    order_date DATE,
    FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

--Record product information (product id, product name)
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(255)
);


--Insert test data
INSERT INTO users (user_id, username, email)
VALUES
(1, 'user1', '[email protected]'),
(2, 'user2', '[email protected]'),
(3, 'user3', '[email protected]');
INSERT INTO orders (order_id, user_id, product_id, order_date)
VALUES
(1, 1, 1, '2023-01-15'),
(2, 1, 1, '2023-02-10'),
(3, 2, 2, '2023-03-05'),
(4, 3, 3, '2023-04-20');
INSERT INTO products (product_id, product_name)
VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');

2. Set up the environment

Similar content to the previous article

# mysql database connection
db_user = "root"
db_password = "root"
db_host = "localhost"
db_name = "test_001"
db = SQLDatabase.from_uri(f"mysql + pymysql://{db_user}:{db_password}@{db_host}/{db_name}")

# Here we use gpt-3.5 as the model
OPENAI_API_KEY = "Your OpenAI key"
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, model_name='gpt-3.5-turbo')

3. Configure SQLAgent

ConfigureSQLAgent

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

4. Run the query

question = "Help me query the information about the user's corresponding order and the products in the order"
agent_executor.run(question)

View the running results:

D:\Anaconda\envs\myenv\python.exe E:/Text-to-sql/git/pergit_wangning/code/1007code/TextToSql/LLMToSql-with agent implementation.py

> Entering new AgentExecutor chain...

Action: sql_db_list_tables
Action Input: ""
Observation: actor, magic_api_backup, magic_api_file, magic_api_info, orders, products, t_iit_m_actreg, t_iit_region, users
Thought:There are several tables in the database that could be relevant to the question. I should query the schema of the "users", "orders", and "products" tables to see what columns they have.
Action: sql_db_schema
Action Input: "users, orders, products"
D:\Anaconda\envs\myenv\lib\site-packages\langchain\__init__.py:34: UserWarning: Importing llm_cache from langchain root module is no longer supported. Please use langchain.globals.set_llm_cache() / langchain.globals. get_llm_cache() instead.
  warnings.warn(
Observation:
CREATE TABLE orders (
order_id INTEGER(11) NOT NULL,
user_id INTEGER(11),
product_id INTEGER(11),
order_date DATE,
PRIMARY KEY (order_id),
CONSTRAINT orders_ibfk_1 FOREIGN KEY(user_id) REFERENCES users (user_id),
CONSTRAINT orders_ibfk_2 FOREIGN KEY(product_id) REFERENCES products (product_id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8

/*
3 rows from orders table:
order_id user_id product_id order_date
1 1 1 2023-01-15
2 1 1 2023-02-10
3 2 2 2023-03-05
*/


CREATE TABLE products (
product_id INTEGER(11) NOT NULL,
product_name VARCHAR(255),
PRIMARY KEY (product_id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8

/*
3 rows from products table:
product_id product_name
1 Product A
2 Product B
3 Product C
*/


CREATE TABLE users (
user_id INTEGER(11) NOT NULL,
username VARCHAR(255),
email VARCHAR(255),
PRIMARY KEY (user_id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8

/*
3 rows from users table:
user_id username email
1 user1 [email protected]
2 user2 [email protected]
3 user3 [email protected]
*/
Thought:The relevant tables for this question are "users", "orders", and "products". The "orders" table has columns for order_id, user_id, product_id, and order_date. The "products" table has columns for product_id and product_name . The "users" table has columns for user_id, username, and email. I can join these tables on the appropriate columns to get the desired information.

Action: sql_db_query
Action Input: "SELECT users.username, orders.order_id, products.product_name FROM users JOIN orders ON users.user_id = orders.user_id JOIN products ON orders.product_id = products.product_id LIMIT 10"
Observation: [('user1', 1, 'Product A'), ('user1', 2, 'Product A'), ('user2', 3, 'Product B'), ('user3', 4, ' Product C')]
Thought:The query returned the usernames, order IDs, and product names for the users, orders, and products tables. I can use this information to answer the original question.
Final Answer: The users correspond to the orders and the products in the following way:
- User 1 has ordered Product A with order IDs 1 and 2.
- User 2 has ordered Product B with order ID 3.
- User 3 has ordered Product C with order ID 4.

> Finished chain.

Process ended with exit code 0

5. Summary

Through testing, we ran multiple questions against SQLChain and this article’s SQLAgent using GPT-3.5 and compared their respective results. The results show that SQLAgent outperforms SQLChain in answering more questions.

LangChain SQLAgent is a powerful tool that can create complex LLM chain calls to answer user questions. Although it returns answers to most relevant questions, it does not prevent LLM hallucinations.