[Large model] Large language model corpus download

Article directory Overview Hugging Face obs operation git-lfs example RedPajama-Data-1T SlimPajama-627B/ git clone resume Data Format References Overview Corpus is very important in large model training. Currently, there are various corpora available for download on the public Internet, but it is impossible for every user and every training task. To pull corpus through the public […]

Write corpus text to database 20231104

import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; public class BaseDao { public Connection conn = null; public PreparedStatement ps = null; public ResultSet rs = null; public void getConnection() throws Exception { Class.forName(“com.mysql.cj.jdbc.Driver”); conn = DriverManager.getConnection(“jdbc:mysql://localhost:3306/languages_material_database?serverTimezone=Asia/Shanghai & amp; & amp;useTimezone=true”, “root”, “123456”); } public ResultSet executeQuery(String sql, Object[] param) throws Exception { this.getConnection(); ps […]

Self-instruct way to generate corpus code actual combat

Practical combat of generating corpus code by self-instruct self-instruct introduction self-instruct frame Generate corpus code implementation process Step1 Generate new instructions through the model Step2 Judge the instructions generated by the model Step3: According to the judgment result of Step2, give different output Step4: Filtration and post-processing This article analyzes the process of generating corpus […]

Natural Language Processing (1) Brown Corpus

What is natural language processing? Natural language processing is an important direction in the field of computer science and artificial intelligence. It is a science that combines linguistics, computer science, and mathematics. The full English name of natural language processing is: Natural Language Processing People are accustomed to abbreviate it as NLP. In simple terms, […]