The QFile class in Qt reads ansi-encoded txt files and displays garbled characters in the QTextEdit control

Series article directory

Article directory

  • Series Article Directory
  • foreword
  • 1. Still unable to solve the garbled problem
  • Two, the solution
    • 1. Method 1: Use the fromLocal8Bit() function of QString
    • 2. Read the file in utf-8 encoding format
  • Summarize

Foreword

Use the QFile class in Qt to read text files in ANSI encoding format, and display garbled characters in the QTextEdit control, which may be caused by encoding problems. The QFile class uses the system’s local encoding to read files by default, and the ANSI encoding is usually different from the system’s local encoding.

In order to correctly read ANSI-encoded text files and display them in the QTextEdit control, you can use the QTextCodec class to specify the correct encoding. The following is a sample code snippet that demonstrates how to read an ANSI-encoded text file and display the correct text in a QTextEdit control:

#include <QApplication>
#include <QFile>
#include <QTextStream>
#include <QTextCodec>
#include <QTextEdit>

int main(int argc, char *argv[])
{<!-- -->
    QApplication app(argc, argv);

    // Create a QTextEdit control
    QTextEdit textEdit;

    // read ANSI encoded text file
    QFile file("path_to_your_file.txt");
    if (file.open(QIODevice::ReadOnly | QIODevice::Text))
    {<!-- -->
        // Create a QTextCodec object using ANSI encoding
        QTextCodec *codec = QTextCodec::codecForName("Windows-1252");

        // Create a QTextStream object using the specified encoding
        QTextStream stream( & amp; file);
        stream.setCodec(codec);

        // read text file content
        QString content = stream. readAll();

        // Display text in the QTextEdit control
        textEdit.setPlainText(content);

        // close the file
        file. close();
    }

    // show window
    textEdit. show();

    return app.exec();
}

1. Still unable to solve the garbled problem

A QTextCodec object is created using the Windows-1252 encoding (also known as ANSI encoding) and applied to the QTextStream in order to correctly decode the file content. Then, we set the decoded text content as the text of the QTextEdit control. Using utf8 still can’t solve the garbled problem,

void ProjectWin::readParaFile(QString filePath)
{<!-- -->
    m_paraText->clear();
    if (!m_paraText) {<!-- -->
        qDebug() << "m_paraText is null!";
        return;
    }



    QString txtFile = filePath. left(filePath. size() -3);
    txtFile + = "txt";
    QFile file(filePath);
    if(!file.open(QIODevice::ReadOnly)) {<!-- -->
        qDebug() << file. errorString();
    }

// QByteArray fileData = file.readAll(); // Read the entire file content at one time
// QString decodedText = QTextCodec::codecForName("UTF-8")->toUnicode(fileData); // use UTF-8 decoding
// m_paraText->setPlainText(decodedText); // Set text content

    QTextStream in( & amp; file);
    in.setCodec("UTF-8"); // set the encoding to UTF-8
// in.setCodec("GBK"); // Set the encoding to GB18030
    QString chineseText;

// QTextCodec* codec = QTextCodec::codecForName("UTF-8"); // Specify the correct text encoding
    while(!in.atEnd()) {<!-- -->
        QString line = in. readLine();
// QByteArray utf8Data = line.toUtf8();
// qDebug() << utf8Data.data();
// line = line.trimmed(); //Remove the spaces at the 2 ends of the string

// emit appendText(line); // use signal to append text

        if(line.contains(u8"Task code:", Qt::CaseSensitive))
        {<!-- -->
            int pos = line. lastIndexOf(":");
            QString taskNum = line.right(line.size() - pos - 2);
            taskNum = taskNum. trimmed();
            m_taskNumSet.insert(taskNum);
// break;
        }

// m_paraText->setFont(QFont("Microsoft YaHei"));
// m_paraText->setFont(QFont("Microsoft YaHei")); // Use "Microsoft YaHei" font

        m_paraText->append(line); // Add to QTextEdit control
    }

    file. close();
}

2. Solutions

1. Method 1: Use the fromLocal8Bit() function of QString

void ProjectWin::readParaFile(QString filePath)
{<!-- -->
    m_paraText->clear();
    if (!m_paraText) {<!-- -->
        qDebug() << "m_paraText is null!";
        return;
    }

    QString txtFile = filePath. left(filePath. size() -3);
    txtFile + = "txt";
    filePath = "E:/work/ImageManageSys/utf8/0000_051623_162252_05_004_00001_00008_00.txt";
    QFile file(filePath);
    if(file.open(QIODevice::ReadOnly)) {<!-- -->
// qDebug() << file. errorString();
        QTextCodec::setCodecForLocale(QTextCodec::codecForName("gb2312"));//Chinese transcoding statement
        QString temStr;
        while(!file.atEnd())
        {<!-- -->
                QByteArray arr = file. readAll();
                arr.replace(0x0B,0x0D);
                temStr = QString::fromLocal8Bit(arr, arr.length());//QByteArray to QString under Window
                m_paraText->append(temStr);
        }

        // read task number
        while (!file. atEnd())
        {<!-- -->
           QString line = file. readLine();
           if(line.contains(u8"Task code:", Qt::CaseSensitive))
            {<!-- -->
                int pos = line. lastIndexOf(":");
                QString taskNum = line.right(line.size() - pos - 2);
                taskNum = taskNum. trimmed();
                m_taskNumSet.insert(taskNum);
                break;
            }

        }
    }

    file. close();
}

In this way, the txt file in ansi encoding format can be displayed normally, but if you read the txt file in utf-8 format, it will be garbled instead, remember! Remember! Remember! Say important things three times.

2. Read files in utf-8 encoding format

void ProjectWin::readParaFile(QString filePath)
{<!-- -->
    m_paraText->clear();
    if (!m_paraText) {<!-- -->
        qDebug() << "m_paraText is null!";
        return;
    }



    QString txtFile = filePath. left(filePath. size() -3);
    txtFile + = "txt";
    QFile file(filePath);
    if(!file.open(QIODevice::ReadOnly)) {<!-- -->
        qDebug() << file. errorString();
    }

// QByteArray fileData = file.readAll(); // Read the entire file content at one time
// QString decodedText = QTextCodec::codecForName("UTF-8")->toUnicode(fileData); // use UTF-8 decoding
// m_paraText->setPlainText(decodedText); // Set text content

    QTextStream in( & amp; file);
    in.setCodec("UTF-8"); // set the encoding to UTF-8
// in.setCodec("GBK"); // Set the encoding to GB18030
    QString chineseText;

// QTextCodec* codec = QTextCodec::codecForName("UTF-8"); // Specify the correct text encoding
    while(!in.atEnd()) {<!-- -->
        QString line = in. readLine();
// QByteArray utf8Data = line.toUtf8();
// qDebug() << utf8Data.data();
// line = line.trimmed(); //Remove the spaces at the 2 ends of the string

// emit appendText(line); // use signal to append text

        if(line.contains(u8"Task code:", Qt::CaseSensitive))
        {<!-- -->
            int pos = line. lastIndexOf(":");
            QString taskNum = line.right(line.size() - pos - 2);
            taskNum = taskNum. trimmed();
            m_taskNumSet.insert(taskNum);
// break;
        }

// m_paraText->setFont(QFont("Microsoft YaHei"));
// m_paraText->setFont(QFont("Microsoft YaHei")); // Use "Microsoft YaHei" font

        m_paraText->append(line); // Add to QTextEdit control
    }

    file. close();
}

Summary

In Qt, the QFile class itself does not provide a method to directly obtain the file encoding format. File encoding is an attribute of file content, and QFile only provides functions for reading and writing files. To get the encoding format of the file, you can use other libraries or methods to analyze the file content and infer the encoding format.

A common method is to use a third-party library such as uchardet or libmagic to detect the file’s encoding. These libraries can analyze the characteristics of the file content to guess its possible encoding format. You can read file contents into memory and use these libraries for encoding detection.

Here is an example using the uchardet library to detect file encodings:

Integrate the uchardet library in the Qt project, you can use CMake or manually compile the library.
Introduce the header file and link library of uchardet in the code of Qt project.
Use QFile to read the file content and pass it to uchardet to detect the encoding format.

#include <uchardet/uchardet.h>

QString detectFileEncoding(const QString & filePath)
{<!-- -->
    QFile file(filePath);
    if (!file.open(QIODevice::ReadOnly)) {<!-- -->
        qDebug() << "Failed to open file:" << file.errorString();
        return QString();
    }

    QByteArray data = file. readAll();
    uchardet_t ud = uchardet_new();
    uchardet_handle_data(ud, data.constData(), data.size());
    uchardet_data_end(ud);
    const char* encoding = uchardet_get_charset(ud);
    QString detectedEncoding = QString::fromLatin1(encoding);
    uchardet_delete(ud);

    file. close();

    return detectedEncoding;
}

void ProjectWin::readFileAndDetectEncoding(const QString & amp; filePath)
{<!-- -->
    QString detectedEncoding = detectFileEncoding(filePath);
    qDebug() << "Detected Encoding:" << detected Encoding;

    QFile file(filePath);
    if (!file.open(QIODevice::ReadOnly)) {<!-- -->
        qDebug() << "Failed to open file:" << file.errorString();
        return;
    }

    QTextStream in( & amp; file);
    in.setCodec(detectedEncoding.toUtf8()); // Set the detected encoding

    QString content = in. readAll();

    file. close();

    // Process the read file content...
}

The detectFileEncoding() function in the above code uses the uchardet library to detect the encoding format of the file and returns the detected encoding string. Then, in the readFileAndDetectEncoding() function, use the detected encoding to set the encoding of the QTextStream to read the file content correctly.

Please note that uchardet is an independent third-party library and does not come with Qt. You need to import and properly configure building and linking of this library in your project.

In addition, it should be noted that automatic detection of encoding is not always 100% accurate, especially for some special or mixed encoding files, there may be misjudgment. Therefore, it is best to know the exact encoding format of the file in advance, or agree on the encoding method of the file in advance.