Named entity recognition using Python and Spacy

Named entity recognition using Python and Spacy.

Search and follow “Python Learning and Research Basecamp” on WeChat, join the reader group, and share more exciting things

Picture

Named Entity Recognition (NER) is a natural language processing (NLP) method used to detect and classify named entities in text, including people, organizations, places, dates, quantities and other identifiable real-world objects entity.

Spacy is an open source natural language processing library based on Python that provides a wide range of functions, including tokenization, POS tagging, syntactic analysis, named entity recognition, text classification, and more. It has pre-trained models and flexible API, and is widely favored by natural language processing professionals and researchers.

How to install spacy? To learn more, visit this page (https://spacy.io/usage).

'''
Prefix the install statements with an exclamation if you are using
Jupyter Notebook or Google Collab interface.
'''
!pip install -U pip setuptools wheel
# Use pip to install the gpu version of spacy (for the installation of the cpu version, please refer to the spacy documentation)
!pip install -U 'spacy[cuda-autodetect]'
# Install the pre-trained language model based on Transformer (English)
!python -m spacy download en_core_web_trf
# Install CPU-optimized pipeline (English)
!python -m spacy download en_core_web_sm

If you are using Google Colab or a GPU-equipped computer, the next steps will involve switching the runtime type to GPU. Otherwise, if you are using a CPU-based installation, you can continue to use the CPU runtime type. Let us perform NER on the following example text.

Input text to use as example:

Artificial Intelligence (AI), an ever-evolving field, has witnessed remarkable growth since its inception. Dating back to the Dartmouth Conference in 1956, AI has emerged as a multidisciplinary domain encompassing machine learning, natural language processing (NLP) , computer vision, and robotics. Recent breakthroughs, like the introduction of deep learning techniques in the early 2010s, have accelerated AI advancements. Tech giants like Google, IBM, and Microsoft have invested heavily in AI research and development. Significant milestones include the landmark victory of IBM’s Deep Blue over Garry Kasparov in 1997 and the emergence of voice assistants like Apple’s Siri in 2011. AI continues to shape industries across healthcare, finance, and transportation, fueling innovation and transforming the way we live and work.”

We will use two different models, each with a specific purpose. One model (“en_core_web_sm“) focuses on efficiency, the other (“en_core_web_trf“) focuses on accuracy .

# Import dependencies
import spacy
sample_text = "Artificial Intelligence (AI), an ever-evolving field, has witnessed remarkable growth since its inception. Dating back to the Dartmouth Conference in 1956, AI has emerged as a multidisciplinary domain encompassing machine learning, natural language processing (NLP) , computer vision, and robotics. Recent breakthroughs, like the introduction of deep learning techniques in the early 2010s, have accelerated AI advancements. Tech giants like Google, IBM, and Microsoft have invested heavily in AI research and development. Significant milestones include the landmark victory of IBM's Deep Blue over Garry Kasparov in 1997 and the emergence of voice assistants like Apple's Siri in 2011. AI continues to shape industries across healthcare, finance, and transportation, fueling innovation and transforming the way we live and work."
# Load a normal model for better efficiency.
nlp = spacy.load("en_core_web_sm")
# Predict entities
text = nlp(sample_text)
# Display text and highlighted entities in jupyter mode
spacy.displacy.render(text, style="ent", jupyter=True)

Output:

Picture

Entity visualization output using Spacy’s visualization tools and the en_core_web_sm model

# Use Transformer-based model for NER
# Use spaCy to load the Transformer-based pipeline (roberta-base)
nlp_trf = spacy.load('en_core_web_trf')
# Make inferences about the model
text = nlp_trf(sample_text)
# Use jupyter mode to display documents
spacy.displacy.render(text, style="ent", jupyter=True)

Picture

Entity visualization output using Spacy’s visualization tools and the en_core_web_trf model

The Transformer-based model showed lower error rates in the comparison, as it accurately identified “AI” and “NLP” as “ORG” entities, “Siri” as “PRODUCT”, and “The Dartmouth Conference” is “EVENT”. This verifies Transformer’s excellent performance.

NER is widely used in different industries to extract insights and facilitate decision-making. Here are specific use cases:

  1. Healthcare: NER extracts medical entities (diseases, symptoms) from clinical records to aid information retrieval, diagnosis and treatment planning.

  2. Financial industry: NER identifies entities (company names, financial terms) from articles and reports to support market analysis and investment decisions.

  3. E-commerce: NER extracts product names, brands and attributes from reviews to enhance sentiment analysis and personalized recommendations.

  4. Media and Publishing: NER classifies entities (people, places) in articles and social media to enable content classification and trend analysis.

  5. Legal: NER identifies legal entities (case titles, statutes) in documents, simplifying research and contract analysis.

  6. Tourism and hospitality industry: NER extracts location names and landmarks from reviews to support sentiment analysis and personalized recommendations.

  7. Government: NER identifies entities (institutions, places) in documents and social media to assist with policy analysis and sentiment monitoring.

  8. CRM: NER extracts customer details from interactions to enhance customer profiling, lead generation and sales management.

These cases demonstrate the broad applicability of NER, provide insights, and enhance operational efficiency. .

Recommended book list

Autumn Reading Planicon-default.png?t=N7T8https://pro.m.jd.com/mall/active/3yzSCnrymNQEzLmwtZ868xFeytT7/index.html

“Python from Beginner to Master (Micro Course Compiled Edition)”

“Python from Beginner to Master (Micro Course Compiled Edition)” uses easy-to-understand language and rich cases to introduce the programming knowledge and application skills of the Python language in detail. The book has 24 chapters in total, including Python development environment, variables and data types, expressions, program structures, sequences, dictionaries and sets, strings, regular expressions, functions, classes, modules, exception handling and program debugging, processes and threads , file operations, database operations, graphical interface programming, network programming, Web programming, web crawlers, data processing, etc. It also introduces multiple comprehensive practical projects in detail. Among them, Chapter 24 is about the online development of extended projects and is a purely online chapter. The book has a complete structure, combines knowledge points with examples, and is equipped with practical cases. It is highly operable. Most of the example source codes are given detailed annotations, so readers can learn easily and get started quickly. This book adopts the O2O teaching model, with offline and online collaboration. It is based on paper content and expands more value-for-money online content. Readers can quickly read, expand knowledge, and broaden their horizons by using WeChat on their mobile phones to scan. Get excess practical experience.

Python from beginner to proficient (micro-course edited version) icon-default.png?t=N7T8https://item.jd.com/13524355.html

Highlights

“Python Performance Analysis, Use cProfile to Visualize and Solve Performance Bottlenecks”

“Get twice the result with half the effort, master 12 tips for Python development in VSCode”

“Use Scikit-Learn to quickly master machine learning prediction methods”

“Come and experience PandasAI data analysis, combining Pandas and ChatGPT”

“Master these 18 Pandas knowledge points and quickly get started with data analysis”

“GPT4ALL: The Ultimate Open Source Large Language Model Solution”

Search and follow “Python Learning Base Camp” on WeChat and join the reader group

Visit [IT Today’s Hot List] to discover daily technology hot spots