Series article directory
Article directory
- Series Article Directory
- foreword
- 1. Determine the encoding format of the txt file
- Two, the solution
-
-
- Specific use
-
- 3. Reference
Foreword
Brother, I read the txt file in ansi encoding format in the QFile class in the previous Qt, and displayed garbled characters in the QTextEdit control
The article talks about how to read ANSI and UFT-8 files, but there is a premise that we first know the encoding format of the txt file. How can we do it if we don’t know the encoding format of the txt file in the program? Let me talk to you about how to judge the encoding format of the file
1. Determine the encoding format of the txt file
Qt provides a class QTextCodec class, which is specially used to convert strings in different encoding methods.
The two important static methods are: fromUnicode and toUnicode.
Through these two static methods, you can convert strings of other types (such as gbk) to utf-8 strings (using toUnicode), or convert strings of utf-8 to strings of other types (such as gbk) (using fromUnicode).
The basic principle is: get a byte stream of a certain length and judge which bytes it contains to know what it is. If it is a text file, first try to read the first two bytes to see if it is BOM, windows, and Qt default to strings It is encoded in utf-8. If you want to open the gbk file and still use utf-8 encoding, of course it will not be recognized, and the display will be garbled characters.
When reading a txt file, it is often impossible to obtain the encoding format of the file. If it is used directly, garbled characters may appear. It needs to be converted to Unicode (Qt’s default encoding format) before use. Although the actual encoding format is very There are many, but GBK and UTF-8 are mainly used in general. You can try to convert them in turn. If invalid characters appear in the conversion, it is considered that it is not the encoding format. QString GetCorrectUnicode(const QByteArray & amp;ba) { QTextCodec::ConverterState state ; QTextCodec *codec = QText
2. Solution
enum class EncodingFormat : int {<!-- --> ANSI = 0, //GBK UTF16LE, UTF16BE, UTF8, UTF8BOM, };
EncodingFormat ProjectWin::FileCharacterEncoding(const QString & amp;fileName) {<!-- --> //Assume default encoding utf8 EncodingFormat code = EncodingFormat::UTF8; QFile file(fileName); if (file. open(QIODevice::ReadOnly)) {<!-- --> //Read 3 bytes for judgment QByteArray buffer = file. read(3); quint8 sz1st = buffer.at(0); quint8 sz2nd = buffer.at(1); quint8 sz3rd = buffer.at(2); if (sz1st == 0xFF & amp; & amp; sz2nd == 0xFE) {<!-- --> code = EncodingFormat::UTF16LE; } else if (sz1st == 0xFE & amp; & amp; sz2nd == 0xFF) {<!-- --> code = EncodingFormat::UTF16BE; } else if (sz1st == 0xEF & amp; & amp; sz2nd == 0xBB & amp; & amp; sz3rd == 0xBF) {<!-- --> code = EncodingFormat::UTF8BOM; } else {<!-- --> //Try to convert with utf8, if the number of invalid characters is greater than 0, it means ansi encoding QTextCodec::ConverterState cs; QTextCodec* tc = QTextCodec::codecForName("utf-8"); tc->toUnicode(buffer.constData(), buffer.size(), &cs); code = (cs.invalidChars > 0) ? EncodingFormat::ANSI : EncodingFormat::UTF8; } file. close(); } return code; }
Specific use
void ProjectWin::readParaFile(QString filePath) {<!-- --> //Read ansi encoding format file m_paraText->clear(); if (!m_paraText) {<!-- --> qDebug() << "m_paraText is null!"; return; } // filePath = "E:/work/ImageManageSys/utf8/0000_051623_162252_05_004_00001_00008_00.txt"; EncodingFormat code = FileCharacterEncoding(filePath); qDebug() << "code=" << (int)code; //Read ANSI encoding format file if(code == EncodingFormat::ANSI) {<!-- --> QString txtFile = filePath. left(filePath. size() -3); txtFile += "txt"; // QFile file(filePath); if(file.open(QIODevice::ReadOnly)) {<!-- --> // qDebug() << file. errorString(); QTextCodec::setCodecForLocale(QTextCodec::codecForName("gb2312"));//Chinese transcoding statement QString temStr; while(!file.atEnd()) {<!-- --> QByteArray arr = file. readAll(); arr.replace(0x0B,0x0D); temStr = QString::fromLocal8Bit(arr, arr.length());//QByteArray to QString under Window m_paraText->append(temStr); } // read task number while (!file. atEnd()) {<!-- --> QString line = file. readLine(); if(line.contains(u8"Task code:", Qt::CaseSensitive)) {<!-- --> int pos = line. lastIndexOf(":"); QString taskNum = line.right(line.size() - pos - 2); taskNum = taskNum. trimmed(); m_taskNumSet.insert(taskNum); break; } } } file. close(); } //Read UTF-8 encoded format file if(code == EncodingFormat::UTF8) {<!-- --> //Read utf8 encoding format m_paraText->clear(); if (!m_paraText) {<!-- --> qDebug() << "m_paraText is null!"; return; } QString txtFile = filePath. left(filePath. size() -3); txtFile += "txt"; QFile file(filePath); if(!file.open(QIODevice::ReadOnly)) {<!-- --> qDebug() << file. errorString(); } QTextStream in( & amp; file); in.setCodec("UTF-8"); // Set the encoding to UTF-8 QString chineseText; while(!in.atEnd()) {<!-- --> QString line = in. readLine(); if(line.contains(u8"Task code:", Qt::CaseSensitive)) {<!-- --> int pos = line. lastIndexOf(":"); QString taskNum = line.right(line.size() - pos - 2); taskNum = taskNum. trimmed(); m_taskNumSet.insert(taskNum); // break; } m_paraText->append(line); // Add to QTextEdit control } file. close(); } }
3. Reference
hello kandy