How to use QFile in QT to identify the encoding format of txt files: utf-8 ANSI

Series article directory

Article directory

  • Series Article Directory
  • foreword
  • 1. Determine the encoding format of the txt file
  • Two, the solution
      • Specific use
  • 3. Reference

Foreword

Brother, I read the txt file in ansi encoding format in the QFile class in the previous Qt, and displayed garbled characters in the QTextEdit control
The article talks about how to read ANSI and UFT-8 files, but there is a premise that we first know the encoding format of the txt file. How can we do it if we don’t know the encoding format of the txt file in the program? Let me talk to you about how to judge the encoding format of the file

1. Determine the encoding format of the txt file

Qt provides a class QTextCodec class, which is specially used to convert strings in different encoding methods.
The two important static methods are: fromUnicode and toUnicode.
Through these two static methods, you can convert strings of other types (such as gbk) to utf-8 strings (using toUnicode), or convert strings of utf-8 to strings of other types (such as gbk) (using fromUnicode).
The basic principle is: get a byte stream of a certain length and judge which bytes it contains to know what it is. If it is a text file, first try to read the first two bytes to see if it is BOM, windows, and Qt default to strings It is encoded in utf-8. If you want to open the gbk file and still use utf-8 encoding, of course it will not be recognized, and the display will be garbled characters.
When reading a txt file, it is often impossible to obtain the encoding format of the file. If it is used directly, garbled characters may appear. It needs to be converted to Unicode (Qt’s default encoding format) before use. Although the actual encoding format is very There are many, but GBK and UTF-8 are mainly used in general. You can try to convert them in turn. If invalid characters appear in the conversion, it is considered that it is not the encoding format. QString GetCorrectUnicode(const QByteArray & amp;ba) { QTextCodec::ConverterState state ; QTextCodec *codec = QText

2. Solution

enum class EncodingFormat : int
{<!-- -->
    ANSI = 0, //GBK
    UTF16LE,
    UTF16BE,
    UTF8,
    UTF8BOM,
};
EncodingFormat ProjectWin::FileCharacterEncoding(const QString & amp;fileName)
{<!-- -->
    //Assume default encoding utf8
    EncodingFormat code = EncodingFormat::UTF8;

    QFile file(fileName);
    if (file. open(QIODevice::ReadOnly))
    {<!-- -->
            //Read 3 bytes for judgment
            QByteArray buffer = file. read(3);
            quint8 sz1st = buffer.at(0);
            quint8 sz2nd = buffer.at(1);
            quint8 sz3rd = buffer.at(2);
            if (sz1st == 0xFF & amp; & amp; sz2nd == 0xFE)
            {<!-- -->
                code = EncodingFormat::UTF16LE;
            }
            else if (sz1st == 0xFE & amp; & amp; sz2nd == 0xFF)
            {<!-- -->
                code = EncodingFormat::UTF16BE;
            }
            else if (sz1st == 0xEF & amp; & amp; sz2nd == 0xBB & amp; & amp; sz3rd == 0xBF)
            {<!-- -->
                code = EncodingFormat::UTF8BOM;
            }
            else
            {<!-- -->
                //Try to convert with utf8, if the number of invalid characters is greater than 0, it means ansi encoding
                QTextCodec::ConverterState cs;
                QTextCodec* tc = QTextCodec::codecForName("utf-8");
                tc->toUnicode(buffer.constData(), buffer.size(), &cs);
                code = (cs.invalidChars > 0) ? EncodingFormat::ANSI : EncodingFormat::UTF8;
            }

            file. close();
    }

    return code;
}

Specific use

void ProjectWin::readParaFile(QString filePath)
{<!-- -->
//Read ansi encoding format file
    m_paraText->clear();
    if (!m_paraText) {<!-- -->
        qDebug() << "m_paraText is null!";
        return;
    }

// filePath = "E:/work/ImageManageSys/utf8/0000_051623_162252_05_004_00001_00008_00.txt";
    EncodingFormat code = FileCharacterEncoding(filePath);
    qDebug() << "code=" << (int)code;

//Read ANSI encoding format file
    if(code == EncodingFormat::ANSI)
    {<!-- -->
        QString txtFile = filePath. left(filePath. size() -3);
        txtFile += "txt";
    //
        QFile file(filePath);
        if(file.open(QIODevice::ReadOnly)) {<!-- -->
    // qDebug() << file. errorString();
            QTextCodec::setCodecForLocale(QTextCodec::codecForName("gb2312"));//Chinese transcoding statement
            QString temStr;
            while(!file.atEnd())
            {<!-- -->
                    QByteArray arr = file. readAll();
                    arr.replace(0x0B,0x0D);
                    temStr = QString::fromLocal8Bit(arr, arr.length());//QByteArray to QString under Window
                    m_paraText->append(temStr);
            }

            // read task number
            while (!file. atEnd())
            {<!-- -->
               QString line = file. readLine();
               if(line.contains(u8"Task code:", Qt::CaseSensitive))
                {<!-- -->
                    int pos = line. lastIndexOf(":");
                    QString taskNum = line.right(line.size() - pos - 2);
                    taskNum = taskNum. trimmed();
                    m_taskNumSet.insert(taskNum);
                    break;
                }
            }
        }
           file. close();
    }
    
    //Read UTF-8 encoded format file
    if(code == EncodingFormat::UTF8)
    {<!-- -->
       //Read utf8 encoding format
       m_paraText->clear();
           if (!m_paraText) {<!-- -->
               qDebug() << "m_paraText is null!";
               return;
           }



           QString txtFile = filePath. left(filePath. size() -3);
           txtFile += "txt";
           QFile file(filePath);
           if(!file.open(QIODevice::ReadOnly)) {<!-- -->
               qDebug() << file. errorString();
           }



           QTextStream in( & amp; file);
           in.setCodec("UTF-8"); // Set the encoding to UTF-8

           QString chineseText;


           while(!in.atEnd()) {<!-- -->
               QString line = in. readLine();


               if(line.contains(u8"Task code:", Qt::CaseSensitive))
               {<!-- -->
                   int pos = line. lastIndexOf(":");
                   QString taskNum = line.right(line.size() - pos - 2);
                   taskNum = taskNum. trimmed();
                   m_taskNumSet.insert(taskNum);
       // break;
               }



               m_paraText->append(line); // Add to QTextEdit control
           }

           file. close();
    }
}

3. Reference

hello kandy