DataWhale Summer Camp Third Phase AI4S Life Science Track Baseline Notes Sentence-by-Sentence

AI for Science Life Science Track Baseline line-by-line annotation # import numpy library for numerical calculation import numpy as np # Import pandas library for data processing and analysis import pandas as pd #Import the polars library for processing large-scale data sets import polars as pl #Import defaultdict and Counter in the collections library for […]

[SentenceTransformer Series] The concept of calculating sentence embeddings (01/10)

1. Description It is necessary to distinguish the difference between word embedding and sentence embedding. Sentence embedding refers to the process of representing a sentence or document as a fixed-length vector, enabling the vector to capture the semantic and contextual information of the sentence. It is a common task in natural language processing (NLP) and […]

The meaning of the sentence Transformers: in disguise

1. Description Transformers completely recreates the natural language processing (NLP) landscape. Before Transformers, our translations and language classification were very good thanks to recurrent neural networks (RNNs) – their language understanding capabilities were limited, resulting in many small errors, and poor performance on large blocks of text. Coherence is almost impossible. Since the introduction of […]

NLP text matching task Text Matching [supervised training]: PointWise (single tower), DSSM (two towers), Sentence BERT (two towers) project practice

NLP text matching task Text Matching [supervised training]: PointWise (single tower), DSSM (two towers), Sentence BERT (two towers) project practice 0 background introduction and related concepts This project implements three commonly used text matching methods: PointWise (single tower), DSSM (two towers), and Sentence BERT (two towers). Text Matching (Text Matching) is a branch of NLP, […]

Use sentences to control the lamps at home in seconds

Many years ago, I designed an IoT system based on the XMPP protocol, and each device corresponds to an XMPP account. The background can easily see the online status of various devices without developing a heartbeat API. It is also possible to perform remote operations on devices in batches by sending group messages. However, the […]

[Fun AIGC] sentencepiece trains a Tokenizer (marker)

Table of Contents I. Introduction 2. Installation 3. Train a tokenizer by yourself 4. Model running 5. Expansion 6. Supplement 1. Foreword Earlier we introduced a character encoding method [How to train a Chinese-English translation model] LSTM machine translation seq2seq character encoding (1) This method is to encode characters one by one, and a lot […]

[By syntactic analysis (Syntactic Analysis) or syntactic analysis (Parsing), ensure that the sentences of text segmentation are semantically complete]

Table of Contents 1. The shortcomings of traditional text segmentation methods 2. Examples of traditional text segmentation methods 3. What is text segmentation post-processing considering semantics 4. Detailed explanation of syntactic analysis (Syntactic Analysis) or syntactic analysis (Parsing) 5. How to build the ability of syntactic analysis through python 6. Back to the original question, […]

The front-end js code simulates the effect of Ctrl+A selecting all web content in one sentence document.execCommand(‘selectAll’);

document.execCommand(‘selectAll’);//The command is not case-sensitive document.execCommand(aCommandName, aShowDefaultUI, aValueArgument) aCommandName: command name aShowDefaultUI: Interactive mode, Boolean value, if true, the dialog box will be displayed, if false, the dialog box will not be displayed, generally false aValueArgument: dynamic parameter, for example: inserting a picture requires an additional parameter (image url), the default is null Returns a […]

Web security: Test of PHP deserialization vulnerability (write a sentence to the server.)

Web Security: Test of PHP Deserialization Vulnerability When programmers write code , there is no strict detection of the serialized string input by the user, resulting in malicious users being able to control a process of deserialization, resulting in uncontrollable consequences such as XSS vulnerabilities, code execution, SQT injection, and directory traversal. Certain magic methods […]

The red team avoids killing, the routine of a one-sentence Trojan horse

The routine of a one-sentence Trojan horse the-backdoor-factory Install Under Kali Method 1: git clone https://github.com/secretsquirrel/the-backdoor-factory Method 2: <strong>apt-get install backdoor-factory</strong> Instructions ./backdoor.py -h Check whether the software to be detected is supported (such as putty.exe) Specify code crack size Support payload module query Using Single Code Crack Injection Use multi-code crack injection Combined with […]