word2vec – SyntaxBug

Word2Vec word vector analysis (word similarity) based on the character story of Genshin Impact character Keqing

First get the character text of Keqing: raw_texts = [ “Emperor Yanwang brought prosperity to Liyue Port, and his reputation for governing the world was turned into novels and biographies that people talked about. However, as one of the people closest to God, Keqing seems to be the one who lacks the least awe. \ […]

2-5 (outside the article) word vector Word2Vec code practical article

Table of Contents 1 Statement: 2 Code link: 3 Reference links: 4 practical steps: 4.1 Data selection: 4.2 Data preprocessing and word segmentation: 4.2.1 Data preprocessing: 4.2.2 Why word segmentation? 4.2.3 Participles: 4.3 Model training: 4.4 Visualization: 4.4.1 PCA dimensionality reduction: 4.4.2 Draw a starry sky map: 4.5 Analogical relationship experiment: 1 claim: This course […]

word2vec improved hierarchical softmax

Table of Contents summary overall process 1.Construct Huffman tree 2. Combine CBOW and Huffman trees 3. Build word2vec model (CBOW) summary Complete code Summary This blog mainly describes in detail the specific process and code implementation of word2vec using hierarchical softmax (hierarchical softmax). For the basic implementation of word2vec, you can read my previous blog: […]

word2vec word training model

word2vec is an efficient tool for representing words as real-valued vectors open sourced by Google in 2013. It is included in the gensim library. word2vec includes two models: CBOW and Skip-Gram. In vector space, the mutual relationships and contextual relationships between words are represented by the relationships between vectors. For example, the semantic similarity between […]

fetch_20newsgroups data set combined with word2vec algorithm

Table of Contents fetch_20newsgroups data set combined with word2vec algorithm 1. fetch_20newsgroups data set 2. Text preprocessing 3. Train word2vec model 4. Use word vectors fetch_20newsgroups data set combined with word2vec algorithm In the field of natural language processing, word vector representation is one of the commonly used techniques. The word2vec algorithm is a classic […]

Word2vec (CBOW, Skip-gram) word vector training based on sentencepiece tool and unicode encoding word segmentation, combined with TextCNN model, replaces the initial word vector for text classification tasks

Word2vec (CBOW, Skip-gram) word vector training based on sentencepiece tool and unicode encoding word segmentation, combined with TextCNN model, replacing the initial word vector for text classification tasks The experiment done by the blogger this time is difficult, but the idea is very good. I think those with poor foundation may not understand my question. […]

word2vec+textcnn text classification code

import pandas as pd import numpy as np import torch from torch import nn import torch.utils.data as data import torch.nn.functional as F from torch import tensor from sklearn.metrics import f1_score from datetime import datetime import time from collections import Counter import re import jieba from tqdm import tqdm import os importgensim from gensim.models import KeyedVectors […]

word2vec+gru text classification practical code

Use PyTorch to implement the Skip-gram model in Word2Vec

First, a custom data set using the Word2VecDataset class is created to generate training data. Then, the Skip-gram model is defined and trained using the cross-entropy loss function and Adam optimizer. In each training epoch, the data loader is traversed, forward-propagating, computing loss, back-propagating, and weight updates for each batch. Finally, the trained word vector […]

NLP word embedding — (1) gensim Word2Vec

Table of Contents 1. Word2Vec model 1.skip-gram 2.cbow 2. AG news data set loading 1. AG news data set 2.AG news data set download 3. View data 3. Stop vocabulary list 1.NLTK toolkit download 2.NLTK console 3. Download stop words in the console 4. Data preprocessing 5. Training the Word2Vec model 1.Train the model 2. […]