Panorama of NLP machine translation: full analysis from basic principles to technical practice

Directory 1. Introduction to machine translation 1. What is machine translation (MT)? 2. Source language and target language 3. Translation model 4. The importance of context 2. Rule-based machine translation (RBMT) 1. Formulation of rules 2. Dictionary and vocabulary selection 3. Limitations and Challenges 4. PyTorch implementation 3. Statistical Machine Translation (SMT) 1. Data-driven 2. […]

NLP word embedding — (2.1) FastText: training word vectors

Table of Contents 1. FastText principle and algorithm 1.fastText function 2.fastText advantages 3.N-gram 4.Hierarchical Softmax 2. FastText training word vector 1. Load FastText from the gensim package 2. Model hyperparameters 3. Model training word vector 4. Saving and loading models 5. Use of word vectors 3. Update model corpus 4. Extract FastText model training results […]

NLP word embedding — (1) gensim Word2Vec

Table of Contents 1. Word2Vec model 1.skip-gram 2.cbow 2. AG news data set loading 1. AG news data set 2.AG news data set download 3. View data 3. Stop vocabulary list 1.NLTK toolkit download 2.NLTK console 3. Download stop words in the console 4. Data preprocessing 5. Training the Word2Vec model 1.Train the model 2. […]

The knowledge system and technical principles of NLP A Gentle Introduction to Natural Language Processing

Author: Zen and the Art of Computer Programming 1. Introduction Natural language processing (NLP) is an important branch of the field of artificial intelligence, which involves how computers understand, analyze and generate human language. In the past ten years, NLP has become a research hotspot and has achieved great results. At the same time, with […]

AI “anti-corruption”, Germany’s Max Planck Institute combines NLP and DNN to develop anti-corrosion alloys

This article is about 3100 words, it is recommended to read for 6 minutes This article takes you to unlock more advanced corrosion-resistant alloy designs. Content at a glance: In a world surrounded by stainless steel, we may have almost forgotten the existence of corrosion. However, corrosion exists in every aspect of life. Whether it’s […]

Full analysis of NLP information extraction: PyTorch practical guide from named entities to events

Directory introduction The importance of context and information extraction Article Objectives and Structure Information Extraction Overview What is information extraction Application scenarios for information extraction Main challenges in information extraction Entity recognition What is entity recognition Application scenarios of entity recognition PyTorch implementation code Input, output and processing Relation extraction What is relation extraction Application […]

NLP (69) Intelligent Document Q&A Assistant Upgrade

This article goes a step further in the large model intelligent document question and answer project developed by the author before, supporting multiple types of documents and URL links, supporting multiple large model access, and making it more convenient and efficient to use. Project introduction In the article NLP (61) using the Baichuan-13B-Chat model to […]

NLP-LSTM text classification model practice # coding: UTF-8 import os import torch import numpy as np import pickle as pkl from tqdm import tqdm import time from datetime import timedelta MAX_VOCAB_SIZE = 10000 UNK, PAD = ‘<UNK>’, ‘<PAD>’ def build_vocab(file_path, tokenizer, max_size, min_freq): vocab_dic = {} with open(file_path, ‘r’, encoding=’UTF-8′) as f: for line in tqdm(f): lin = line.strip() […]