Custom Graph Component: 1.1-JiebaTokenizer specific implementation

JiebaTokenizer class inherits from Tokenizer class, and Tokenizer class inherits from GraphComponent class, and GraphComponent class inherits from ABC class (abstract base class). This article uses the example in “Using ResponseSelector to Implement Campus Recruitment FAQ Robot” to mainly explain in detail the specific implementation of the methods in the JiebaTokenizer class. 0. List of […]

TEA of NLP: Sentiment analysis based on python programming (jieba library)

Table of Contents TEA of NLP: Sentiment analysis based on Python programming (jieba library) introduction What is sentiment analysis? Sentiment analysis process Sentiment Analysis Applications Sentiment Analysis Challenges and Improvements 1. Data preparation 2. participle 3. Build an emotional dictionary 4. Calculate sentiment score 5. Result analysis in conclusion Practical application scenarios Sample code TEA […]

[python] You’ve heard of stuttering, isn’t it jieba?

Chinese word segmentation, in layman’s terms, means splitting a sentence (paragraph) into words, idioms, and individual characters according to certain rules (algorithms). Chinese word segmentation is a prerequisite technology for many application technologies, such as search engines, machine translation, part-of-speech tagging, similarity analysis, etc. The text information is first segmented into words, and then the […]

[Python] time library, random number, pyinstaller library, collections, lists, dictionary operators and functions, and the use of jieba function

Usage of time library, random number, pyinstaller library, collections, lists, dictionary operators and functions, and jieba function function to get time time formatting Program timing Basic random number function Extended random number function Common parameters of PyInstaller library set operator 4 enhancement operators Collection processing methods sequence operator list type dictionary type Use of jieba […]

Use jieba word segmentation to split the logic and logical objects of text and make a search engine

jieba uses the “Modern Chinese Parts of Speech Tagging” standard to mark Chinese parts of speech, using a large amount of Chinese to classify each part of speech in Chinese in detail. For a detailed list, please refer to the official document: jieba Parts of Speech Tagging The following is a comparison table of part-of-speech […]

jieba adds whooh to build a search engine for your own local database

Examples from whoosh.index import create_in from whoosh.fields import Schema, TEXT, ID from jieba.analyse import ChineseAnalyzer from whoosh.qparser import QueryParser import os analyzer = ChineseAnalyzer() schema = Schema(title=TEXT(stored=True, analyzer=analyzer), content=TEXT(stored=True, analyzer=analyzer), id=ID(stored=True)) if not os.path.exists(“index”): os.mkdir(“index”) ix = create_in(“index”, schema) documents = [ {<!– –> “title”: “below”, “content”: “First install jieba and whoosh libraries,”, “id”: “1” […]

Text analysis, feature extraction: stop words, jieba word segmentation, training using extracted features

Article directory use stop words jieba participle View word frequency matrix The extracted features are trained Use stop words stop word at the end stopwords = [] with open(‘./data/bayes_data/stopwords.txt’, ‘r’, encoding=’utf-8′) as f: lines = f. readlines() # print(lines) for temp in lines: line = temp. strip() # Delete blank characters such as spaces and […]

Source code analysis of jieba word segmentation (1)

Table of Contents 1 Four modes of jieba word segmentation 2 Basic principles required in the first three modes 2.1 trie tree structure 2.2 Generate Directed Acyclic Graph (DAG) 2.3 Dynamic programming calculation path 3 Method source code analysis in the first three modes 3.1 cut method 3.1.1 cut full mode 3.1.2 cut search engine […]

[ruby on rails] postgresql word segmentation search pg_jieba and zhparser scheme

1. pg_jieba scheme Install brew install cmake mkdir ~/tmp & amp; & amp; cd ~/tmp & amp; & amp; git clone https://github.com/jaiminpan/pg_jieba & amp; & amp; cd pg_jieba git submodule update –init –recursive mkdir build & & cd build cmake -DCMAKE_PREFIX_PATH=/usr/local/opt/postgres .. make install test $ psql -d vapordb psql (12.2) Type “help” for help. […]

Use of jieba thesaurus in python

Table of Contents 1. Introduction A. What is the jieba library B. Features and advantages of jieba library C. Install the jieba library 2. Word segmentation basis A. Dictionary loading B. Word segmentation mode C. Example of use 3. Custom dictionary A. Add words B. Load custom dictionary C. Example of use 4. Keyword extraction […]