Java implements text duplication checking (similarity) without third-party tool version

Functional background: As business records gradually grow, duplicate project name data and duplicate content data gradually appear, which leads to a decline in the quality of project records. In order to avoid this situation from happening, we consider performing duplication checking on key data information. We originally planned to use a third-party standard duplication checking […]

Word2Vec word vector analysis (word similarity) based on the character story of Genshin Impact character Keqing

First get the character text of Keqing: raw_texts = [ “Emperor Yanwang brought prosperity to Liyue Port, and his reputation for governing the world was turned into novels and biographies that people talked about. However, as one of the people closest to God, Keqing seems to be the one who lacks the least awe. \ […]

Calculation process of multi-scale structural similarity L1 loss

Multi-scale structural similarity L1 loss SSIM Structural Similarity Index (SSIM) is an image quality measurement method used to evaluate the similarity between two images. SSIM is widely used for image quality assessment, performance evaluation of compression algorithms, image enhancement and restoration, etc. In application: The human eye perceives similarity between two images mainly based on […]

Recommendation algorithm based on Jaccard similarity—example

Directory Data Display Classification of recommendation algorithms based on similarity Based on popularity/context/social network Jaccard similarity Analyze data characteristics Methods to consider Advantages and Disadvantages of Calculation Methods Calculate Jaccard similarity between users Get the 10 users most similar to a given 10 books recommended to 1713353 users Data display import pandas as pd import […]

canvas uses front-end technology to generate image similarity hash (crop the image as long as the surrounding blank area is removed from the image content)

We made such a requirement in the front-end time. The designer designed the theme template through Photoshop software, and then we parsed the layer information in the psd file through the program, such as decorative pictures, text boxes, picture boxes, background pictures, etc. (this may be Some layer tags will be involved). For information on […]

OpenCV+OpenCvSharp implements image feature vector extraction and similarity calculation

The image feature vector is a mathematical representation used to describe the content of the image. It can reflect the color, texture, shape and other information of the image. Image feature vectors can be used to do many things, such as image retrieval, classification, recognition, etc. This article will introduce the extraction of image feature […]

Calculate text similarity and output the n highest similarities

Directory Configuration Create a virtual environment download TF concept code word2vec concept Model code result SpaCy concept Model code result Bert concept Model code result Compared Configuration Create a virtual environment python3.9 conda create -n py39 python=3.9 conda activate py39 Download pip install -r D:\myfile\jpy\py\000rec\install\requirements.txt cx-Oracle==8.3.0 pandas==2.1.1 jieba==0.42.1 joblib==1.2.0 gensim==4.3.0 scikit-learn==1.3.0 tqdm==4.65.0 sqlalchemy==2.0.21 spacy==3.5.3 zeep==4.2.1 […]

Indian Cuisine Analysis and Similarity Study

1. Project background India is a multi-ethnic and multi-cultural country with rich and diverse food traditions, and its food culture attracts a large number of tourists. Different regions in India have distinct differences in flavor and cooking methods, and understanding these differences is crucial to exploring the mysteries of Indian cuisine. Through research and analysis […]