Directory 1. Introduction to machine translation 1. What is machine translation (MT)? 2. Source language and target language 3. Translation model 4. The importance of context 2. Rule-based machine translation (RBMT) 1. Formulation of rules 2. Dictionary and vocabulary selection 3. Limitations and Challenges 4. PyTorch implementation 3. Statistical Machine Translation (SMT) 1. Data-driven 2. […]
Tag: nlp
NLP (Seventy) Fine-tuning Multiple Choice MRC using LLAMA 2 model
This article will introduce how to use the LLAMA-2 7B model in the Firefly large model training framework to fine-tune the multiple-choice reading comprehension data set RACE middle, and the final effect is significantly improved. MRC Machine Reading Comprehension (MRC) belongs to Question Answering (QA) in NLP tasks and is a basic and important task […]
NLP word embedding — (2.2) FastText: text classification
Table of Contents 1. About fasttext download and installation 1. There is a problem 2. Download and install steps 2. Data acquisition and processing 1.fasttext requires data format 2.Data format processing 3. Training fasttext model 1.Train the model 2. Training results 3. Model tuning 1) Increase the number of training rounds: epoch 2) Adjust the […]
NLP word embedding — (2.1) FastText: training word vectors
Table of Contents 1. FastText principle and algorithm 1.fastText function 2.fastText advantages 3.N-gram 4.Hierarchical Softmax 2. FastText training word vector 1. Load FastText from the gensim package 2. Model hyperparameters 3. Model training word vector 4. Saving and loading models 5. Use of word vectors 3. Update model corpus 4. Extract FastText model training results […]
NLP word embedding — (1) gensim Word2Vec
Table of Contents 1. Word2Vec model 1.skip-gram 2.cbow 2. AG news data set loading 1. AG news data set 2.AG news data set download 3. View data 3. Stop vocabulary list 1.NLTK toolkit download 2.NLTK console 3. Download stop words in the console 4. Data preprocessing 5. Training the Word2Vec model 1.Train the model 2. […]
The knowledge system and technical principles of NLP A Gentle Introduction to Natural Language Processing
Author: Zen and the Art of Computer Programming 1. Introduction Natural language processing (NLP) is an important branch of the field of artificial intelligence, which involves how computers understand, analyze and generate human language. In the past ten years, NLP has become a research hotspot and has achieved great results. At the same time, with […]
AI “anti-corruption”, Germany’s Max Planck Institute combines NLP and DNN to develop anti-corrosion alloys
This article is about 3100 words, it is recommended to read for 6 minutes This article takes you to unlock more advanced corrosion-resistant alloy designs. Content at a glance: In a world surrounded by stainless steel, we may have almost forgotten the existence of corrosion. However, corrosion exists in every aspect of life. Whether it’s […]
Full analysis of NLP information extraction: PyTorch practical guide from named entities to events
Directory introduction The importance of context and information extraction Article Objectives and Structure Information Extraction Overview What is information extraction Application scenarios for information extraction Main challenges in information extraction Entity recognition What is entity recognition Application scenarios of entity recognition PyTorch implementation code Input, output and processing Relation extraction What is relation extraction Application […]
NLP (69) Intelligent Document Q&A Assistant Upgrade
This article goes a step further in the large model intelligent document question and answer project developed by the author before, supporting multiple types of documents and URL links, supporting multiple large model access, and making it more convenient and efficient to use. Project introduction In the article NLP (61) using the Baichuan-13B-Chat model to […]
NLP-LSTM text classification model practice
#utils_fasttext.py # coding: UTF-8 import os import torch import numpy as np import pickle as pkl from tqdm import tqdm import time from datetime import timedelta MAX_VOCAB_SIZE = 10000 UNK, PAD = ‘<UNK>’, ‘<PAD>’ def build_vocab(file_path, tokenizer, max_size, min_freq): vocab_dic = {} with open(file_path, ‘r’, encoding=’UTF-8′) as f: for line in tqdm(f): lin = line.strip() […]