This follows the previous article:
-
Elasticsearch: RAG using Open AI and Langchain – Retrieval Augmented Generation (1)
-
Elasticsearch: RAG using Open AI and Langchain – Retrieval Augmented Generation (2)
’s continuation. In today’s article, I’ll detail how to use ElasticsearchStore. This is also the recommended usage. If you haven’t set up your environment yet, please read the first article in detail.
Create an application and display it
Installation package
#!pip3 install langchain
Import package
from dotenv import load_dotenv from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores import ElasticsearchStore from langchain.text_splitter import CharacterTextSplitter from urllib.request import urlopen import os, json load_dotenv() openai_api_key=os.getenv('OPENAI_API_KEY') elastic_user=os.getenv('ES_USER') elastic_password=os.getenv('ES_PASSWORD') elastic_endpoint=os.getenv("ES_ENDPOINT") elastic_index_name='elasticsearch-store'
Add documents and divide documents into paragraphs
with open('workplace-docs.json') as f: workplace_docs = json.load(f) print(f"Successfully loaded {len(workplace_docs)} documents")
metadata = [] content = [] for doc in workplace_docs: content.append(doc["content"]) metadata.append({ "name": doc["name"], "summary": doc["summary"], "rolePermissions":doc["rolePermissions"] }) text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0) docs = text_splitter.create_documents(content, metadatas=metadata)
Write data to Elasticsearch
from elasticsearch import Elasticsearch embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200" connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True) es = ElasticsearchStore.from_documents( docs, embedding = embeddings, es_url = url, es_connection = connection, index_name = elastic_index_name, es_user = elastic_user, es_password = elastic_password)
Show results
def showResults(output): print("Total results: ", len(output)) for index in range(len(output)): print(output[index])
Similarity / Vector Search (Approximate KNN Search) – ApproxRetrievalStrategy()
query = "work from home policy" result = es.similarity_search(query=query) showResults(result)
Hybrid Search (Approximate KNN + Keyword Search) – ApproxRetrievalStrategy()
We enter the following command in Kibana’s Dev Tools:
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) es = ElasticsearchStore( es_url = url, es_connection = connection, es_user=elastic_user, es_password=elastic_password, embedding=embeddings, index_name=elastic_index_name, strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True ) ) es.similarity_search("work from home policy")
The reason for this error is that the current License mode does not support RRF. Let’s go to Kibana to start the current authorization:
Let’s run the code again:
Exact KNN Search (Brute Force) – ExactRetrievalStrategy()
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) es = ElasticsearchStore( es_url = url, es_connection = connection, es_user=elastic_user, es_password=elastic_password, embedding=embeddings, index_name=elastic_index_name, strategy=ElasticsearchStore.ExactRetrievalStrategy() ) es.similarity_search("work from home policy")
Index / Search Documents using ELSER – SparseVectorRetrievalStrategy()
In this step we need to start ELSER. For information on starting ELSER, see the article “Elasticsearch: Deploying ELSER – Elastic Learned Sparse EncoderR”.
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) es = ElasticsearchStore.from_documents( docs, es_url = url, es_connection = connection, es_user=elastic_user, es_password=elastic_password, index_name=elastic_index_name + "-" + "elser", strategy=ElasticsearchStore.SparseVectorRetrievalStrategy() ) es.similarity_search("work from home policy")
After running the above code, we can view the generated fields in Kibana:
The entire jupyter notebook of the above code can be downloaded at the address https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore.ipynb.
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeHomepageOverview 383353 people are learning the system