Doctran and LLM: A powerful combination for analyzing consumer complaints

Introduction

In today’s competitive marketplace, businesses strive to effectively understand and resolve consumer complaints. Consumer complaints can reveal a wide range of issues, from product defects and poor customer service to billing errors and safety issues. They play a vital role in the feedback loop between a business and its customers (about a product, service or experience). Analyzing and understanding these complaints can provide valuable insights into product or service improvements, customer satisfaction, and overall business growth. In this article, we’ll explore how you can leverage the Doctran Python library to analyze consumer complaints, extract insights, and make data-driven decisions.

Doctran

Doctran is a state-of-the-art Python library designed for document conversion and analysis. It provides a set of functions to preprocess text data, extract key information, classify/classify, query, summarize information, and translate text into other languages. Doctran leverages LLMs (Large Language Models) such as OpenAI GPT-based models and open source NLP libraries to parse text data.

It supports the following six types of document conversion:

  1. Extraction: Extract useful features/properties from the document.

  2. Edit: Remove personally identifiable information (PII) such as name, email ID, phone number, etc. from the document before sending the data to OpenAI. Internally, it leverages the spaCy library to remove sensitive information.

  3. Interrogation: Convert documents into question and answer format.

  4. Refine: Remove any content from the document that is not related to a predefined set of topics.

  5. Summary: Present the document as a concise, comprehensive, and meaningful summary.

  6. Translate: Translate documents into other languages.

This integration is also available within the LangChain framework document_transformers module. LangChain is a cutting-edge framework for building LLM-driven applications.

LangChain provides the flexibility to explore and leverage a variety of open and closed source LLM models. It allows seamless connection to various external data sources such as PDFs, text files, Excel spreadsheets, PPT, etc. It also has the ability to try different prompts, participate in prompt projects, utilize built-in chains and proxies, and more.

In Langchain’s document_transformers module, there are three implementations: DoctranPropertyExtractor, DoctranQATransformer and DoctranTextTranslator. They are used for extraction, interrogation and translation document conversion respectively.

Installation

Doctran can be easily installed using the pip command.

pip install doctran

After understanding the doctran library, let us now explore the different types of document conversions available in doctran using the following consumer complaints enclosed in triple backticks (“`).

“`

November 26 2021

Manager

Customer Service Department

Taurus Store

New Delhi – 110023

Topic: Complaints about defective “VIP” washing machines

Dear Sir,

I purchased an automatic washing machine on July 15, 2022, model G24, invoice number 1598.

Last week the machine suddenly stopped working and despite all our efforts it has not worked since then. The machine stops running after the rinsing process is completed, causing a lot of problems. Additionally, over the past day or so, the machine has also started making a lot of noise, causing us inconvenience.

Send your technician to repair it and, if necessary, replace it within the next week.

Hope to get a response as soon as possible

You really

“`

Load complaint as Doctran document

To perform document conversion using doctran, first we need to convert the original text into a doctran document. A doctran document is a basic data type optimized for vector searches. It represents a piece of unstructured data. It consists of original content and associated metadata.

Instantiate the doctran object by specifying OPENAI_API_KEY in the open_ai_key parameter. Next, the raw content is parsed into a doctran document by calling the parse() method on top of the doctran object.

sample_complain = """

November 26, 2021

The Manager
Customer Service Department
Taurus Shop
New Delhi – 110023

Subject: Complaint about defective VIP’ washing machine


Dear Sir,

I had purchased an automatic washing machine on 15 July 2022,
model no. G 24 and the invoice no. is 1598.

Last week, the machine stopped working abruptly and has not been working
since then despite all our efforts.
The machine stops running after the rinsing process is completed,
causing a lot of problems.
Moreover, the machine since the last day or so has also started making loud noises,
creating inconvenience for us.

Please send your technician to repair it and if needed get it replaced within the following week.

Hoping for an early response

Yours truly
"""

doctran = Doctran(openai_api_key=OPENAI_API_KEY)
document = doctran.parse(content=sample_complain)
print(document.raw_content)

Output:

DocTransformers Inc

1. Extraction

One of the main functions of doctran is to extract key attributes from documents. Internally, it leverages OpenAI function calls to extract attributes (data points) from documents. It uses the OpenAI GPT-4 model and the token limit is 8000 tokens.

GPT-4 is the abbreviation of Generative Pre-trained Transformer 4, which is a multi-modal large-scale language model developed by OpenAI. Compared to its predecessors, GPT-4 demonstrates an enhanced ability to handle complex tasks. Additionally, it can use visual input (such as images, diagrams, memes, etc.) alongside the text. The model achieved human-level performance on a variety of professional and academic benchmarks, including the Uniform Bar Examination.

We need to define a schema by instantiating the ExtractProperty class for each property we want to extract. The schema contains several key elements: property name, description, data type, list of optional values, and required flags (Boolean indicators).

Here, we specify four attributes-category, emotion, aggression, and language.

from doctran import ExtractProperty
properties = [
    ExtractProperty(
        name="Category",
        description="What type of consumer complaint this is",
        type="string",
        enum=["Product or Service", "Wait Time", "Delivery", "Communication Gap", "Personnel"],
        required=True
        ),
    ExtractProperty(
        name="Sentiment",
        description = "Assess the polarity/sentiment",
        type="string",
        enum = ["Positive", "Negative", "Neutral"],
        required=True
        ),
    ExtractProperty(
        name="Aggressiveness",
        description="""describes how aggressive the complaint is,
        the higher the number the more aggressive""",
        type="number",
        enum=[1, 2, 3, 4, 5],
        required=True
        ),
    ExtractProperty(
        name="Language",
        type="string",
        description = "source language",
        enum = ["English", "Hindi", "Spanish", "Italian", "German"],
        required=True
        )
]

To retrieve attributes, we can call the extract() function on the document. This function takes properties as parameters.

extracted_doc = await document.extract(properties=properties).execute()

The extract operation returns a new document with the properties provided in the extracted_properties key.

print(extracted_doc.extracted_properties)  

Output:

2. Interrogation

Doctran allows us to convert the content in the document into a question and answer format. User queries are usually formulated as questions. Therefore, to improve search results when using vector databases, it may be helpful to convert the information into questions. Creating an index from these questions allows for better retrieval of context than indexing the original text.

To query documents, use the built-in interrogate() function. It returns a new document with the resulting Q&A set available in the extracted_properties property.

interrogated_doc = await document.interrogate().execute()
print(interrogated_doc.extracted_properties['questions_and_answers'])

Output:

3. Summary

Using doctran, we can also generate concise and meaningful summaries of the original text. Call the summarize() function to summarize the document. Additionally, specify token_limit to configure the size of the digest.

summarized_doc = await document.summarize(token_limit=30).execute()
print(summarized_doc.transformed_content)

Output:

4. Translation

Translating documents into other languages can be helpful, especially when users need to query the knowledge base in a different language, or when there are no state-of-the-art embedding models for a given language.

Language translation for our consumer complaint use case is useful for global businesses with a multilingual customer base. Using the built-in translate() function, we can translate the information into other languages such as Hindi, Spanish, Italian, German, etc.

translated_doc = await document.translate(language="hindi").execute()
print(translated_doc.transformed_content)

Output:

Conclusion

In the era of data-driven decision-making, consumer complaint analysis is a vital process that can improve products and services, and ultimately increase customer satisfaction. Using LLM and advanced NLP tools, we transform raw text data into actionable insights that drive business growth and improvement. In this article, we discussed doctran, the different types of document conversions supported by this library with the help of consumer complaints.

Key Takeaways

● Consumer complaints are not just dissatisfaction but also a valuable source of feedback that can provide important insights to businesses.

● The doctran Python library and large language models (LLMs) such as GPT-4 provide a powerful toolset for transforming and analyzing documents. It supports various transformations such as extraction, editing, interrogation, aggregation and translation.

● Doctran’s extraction capabilities using OpenAI’s GPT-4 model can help enterprises extract key attributes from documents.

● Converting document content into question and answer format using doctran’s query functionality improves contextual retrieval. This approach is valuable for building effective search indexes and promoting better search results.

● Businesses with a global customer base can benefit from doctran’s language translation capabilities, making information accessible in multiple languages. Additionally, it provides the ability to generate concise and meaningful summaries of text content.

FAQ

Q1: What is the main purpose of Doctran Python library?

Answer: The main purpose of doctrinen Python library is to perform document conversion and analysis. It provides a set of functions to preprocess text data, extract valuable information, classify and classify content, and translate text into different languages. It uses large language models (LLMs) such as OpenAI’s GPT-based model to parse text data.

Q2: How to use Doctran to extract key attributes from a document, and what examples of attributes can it extract?

A: Doctran can extract key attributes from documents using OpenAI’s GPT-4 model. These properties are defined in the schema and can be retrieved using the extract() function. Some examples are extraction of categories, sentiment, aggressiveness, language from raw text.

Q3: What are the benefits of converting document content into Q&A format, and how can this be achieved using Doctran?

Answer: Using Doctran’s interrogation feature to convert document content into question and answer format can improve information retrieval. It allows better contextual retrieval than indexing raw text, making it more suitable for search engines. The built-in interrogate() function converts documents into question and answer format, thereby enhancing search results.

Q4: Why is language translation important in consumer complaint analysis, and how does Doctran support this functionality?

A: Language translation is critical in consumer complaint analysis, especially for businesses with a multilingual customer base. This feature ensures that information is accessible to a global audience. Doctran supports language translation using the built-in translate() function, enabling documents to be translated into various languages such as Hindi, Spanish, Italian, German, etc.

Thank you very much for reading, and Xiao Mo is here to wish you all the best in your future Python learning career! The follow-up mini-mo will update learning resources such as books and videos from time to time. The above books and materials can also be obtained for free by following the WeChat official account! Welcome to follow our WeChat public account: MomodelAl. At the same time, welcome to use the “Mo AI Programming” WeChat applet and log in to the official website to learn more: Mo Artificial Intelligence Education and Training Platform Mo, discover unexpected things and create possibilities. Note: Some resources come from On the Internet, if there is any infringement, please contact the author directly to delete it.

The knowledge points of the article match the official knowledge archive, and you can further learn relevant knowledge. Python entry skill treeArtificial IntelligenceMachine learning toolkit Scikit-learn385243 people are learning the system