Generation, encoding and decoding output of Huffman tree

In the data structure computer experiment, the teacher asked to construct a Huffman tree Huffman code, output the Huffman code form of the string, and restore the Huffman code to a string based on a piece of ciphertext.

The overall idea is as follows

Header file

#include <stdio.h>
#include <string.h>
#define N 50
#define M 2*N-1

Main function part

int main()
{
int i;
char str[N] = "\0"; //Character array initialization\0
printf("Please enter a string:");
gets_s(str);
HTNode ht[N] = { 0 };
HCode hcd[N] = { 0 };
int count[127] = { 0 };
total(str,count);
int n = 0;//Count the types of letters that appear
for (i = 0; i < 127; i + + )//The characters and weights are stored in the Huffman node
{
if (count[i] != 0)
{
ht[n].data = (char)i;
ht[n].weight = count[i];
n + + ;
}
}
InitHT(ht, n);//Initialization
CreateHT(ht, n);//Create Huffman tree
CreateHCode(ht, hcd, n);//Create Huffman coding
DispHcode(ht, hcd, n);//Output Huffman code
EnCryption(str, ht, hcd, n);//Output ciphertext of characters
DeCryption(ht, n);
return 1;
}

Count characters and the number of times they appear

We use the occurrence of characters to represent the weight of the Huffman tree. Because it is a character, we can define an array, and then convert the character into an integer, which is the ASCII code corresponding to the character. Add one to the corresponding position in the array. The code is as follows:

void toal(char str[],int count[])
{
int i = 0;
int j = 0;
int num;
int n = strlen(str); //Character length
for (i = 0; i < n; i + + )//Count the frequency of occurrence of each character
{
num = str[i];
count[num - 0] + + ;
}
for (i = 0; i < 127; i + + ) //Output characters and frequency of occurrence of characters
{
if (count[i] != 0)
{
printf("Character: %c, weight: %d\
", i, count[i]);
}
}
return;
}

Node type of Huffman tree

To construct a Huffman tree, you must first determine the node type of the Huffman tree. The content of the node type of the Huffman tree includes node value, weight parent node, left child node, and right child node; so our node type design is as follows:

typedef struct
{
char data;
int weight;
int parent;
int lchild;
int rchild;
}HTNode;

After that, we need to define a node to represent and store the Huffman code. First, we need an array to store the Huffman code of the node, and we also need a value to record which part is the Huffman code of a certain character, so Our node type is defined as follows:

typedef struct
{
char cd[N];
int start;
}HCode;

Huffman tree construction algorithm

Theorem “For a Huffman tree with n leaf nodes, there are 2n-1 in total”

First define an ht[] array of Huffman node type to store the Huffman tree. Algorithm idea: n leaf nodes only have data and weight domain values. First initialize the Huffman tree, and first set the parent, lchild, and rchild domain values of 2n-1 nodes to the initial value -1.

The code is implemented as follows:

void InitHT(HTNode ht[], int n)
{
int i;

for (i = 0; i < 2 * n - 1; i + + )
{
ht[i].parent = -1;
ht[i].lchild = -1;
ht[i].rchild = -1;
}
printf("\
Initialize Huffman tree:\
");
printf("-----------------------\
");
printf("Subscript data weight lchild rchild parent\
");
printf("-----------------------\
");
for (i = 0; i < n; i + + )
{
printf("%d %c %d %d %d %d\
", i, ht[i].data, ht[i].weight, ht[i].lchild, ht[i].rchild, ht [i].parent);
}
printf("-----------------------\
");
}

Then we process each non-leaf node ht[i] (in ht[i]~ht[2n-2]); find the smallest root node (parent value is the initial value -1) from the non-leaf nodes The two nodes of , ht[lnode] and ht[rnode], regard them as the left and right subtrees of ht[i], set the parent nodes of ht[lnode] and ht[rnode] as ht[i], and ht[i].weight = ht[lnode].weight + ht[rnode].weight. Repeat these steps until n-1 non-leaf nodes are processed, and then write the code according to the idea. The code is as follows:

void CreateHT(HTNode ht[], int n)
{
int i, k, lnode, rnode;
int min1, min2;
for (i = n; i < 2 * n - 1; i + + )
{
min1 = min2 = 32767;
lnode = rnode = -1;
for (k = 0; k <= i - 1; k + + )//Find the smallest and second smallest nodes
{
if (ht[k].parent == -1)
{
if (ht[k].weight < min1)
{
min2 = min1;
rnode = lnode;
printf(" %d ", rnode);
printf(" %d ", lnode);
min1 = ht[k].weight;
lnode = k;
}
else if (ht[k].weight < min2)
{
min2 = ht[k].weight;
rnode = k;
}
}
}
ht[lnode].parent = i;
ht[rnode].parent = i;
ht[i].weight = ht[lnode].weight + ht[rnode].weight;
ht[i].lchild = lnode;
ht[i].rchild = rnode;
}
printf("\
 Display the created Huffman tree:\
");
printf("-----------------------\
");
printf("Subscript data weight lchild rchild parent\
");
printf("-----------------------\
");
for (i = 0; i < 2 * n - 1; i + + )
{
printf("%d %c %d %d %d %d\
", i, ht[i].data, ht[i].weight, ht[i].lchild, ht[i].rchild, ht [i].parent);
}
printf("-----------------------\
");
}

Huffman coding

We stipulate that the left branch in the Huffman tree is 0 and the right branch is 1. The sequence of 0s and 1s corresponding to the branches from the root node to the node where each character is located is our Huffman code. Set the start field value of the corresponding Huffman code hcd[i] to the initial value n, and find the parent node ht[f]. If the current node is the child node of the parent node, then in hcd[i] Add ‘0’ to the cd array, add ‘1’ to the right child, and set start-1. Then perform the same operation on the parent nodes until there are no parent nodes, so start points to the first character of the Huffman code. The code is implemented as follows:

void CreateHCode(HTNode ht[], HCode hcd[], int n)
{
int i, f, c;
HCode hc;
for (i = 0; i < n; i + + )
{
hc.start = n;
c = i;
f = ht[i].parent;
while (f != -1)
{
if (ht[f].lchild == c)//The left child stores character 0
hc.cd[hc.start--] = '0';
else
hc.cd[hc.start--] = '1'; //The right child stores character 1
c = f;
f = ht[f].parent;
}
hc.start + + ;
hcd[i] = hc;
}
}

Huffman coding output

If you want to know what Huffman coding looks like, you need to output it. When constructing the Huffman code, we know that the start value is the first character of the Huffman code, and the value order of the Huffman code array and the Huffman tree array corresponds one to one, so the code is as follows:

void DispHcode(HTNode ht[], HCode hcd[], int n)
{
int i, k;
int sum = 0, m = 0, j;
printf("\
 Output Huffman encoding:\
");
printf("-----------------------\
");
for (i = 0; i < n; i + + )
{
j = 0;
printf(" %c:\t", ht[i].data);
for (k = hcd[i].start; k <= n; k + + )
{
printf("%c", hcd[i].cd[k]);
j + + ;
}
printf("\
");
m + = ht[i].weight;
sum + = ht[i].weight * j;
}
printf("-----------------------\
");
printf("\
average length=%g\
", 1.0 * sum / m);
}

Output ciphertext

As mentioned earlier, Huffman coding has a one-to-one correspondence with Huffman trees, so we only need to find the character that is equal to the data value range of the Huffman tree, and then we can find the corresponding Huffman code, and finally output it. The code is implemented as follows:

void EnCryption(char c[], HTNode ht[], HCode hcd[], int n)
{
int i = 0;
int j = 0;
int k = 0;
int num = strlen(c);
printf("The entered characters are: %s\
", c);
printf("The number of characters is %d\
", num);
printf("The ciphertext corresponding to this text is:\
");
for (i = 0; i < num; i + + )//Traverse the array
{
for (j = 0; j < n; j + + )//Traverse the Huffman tree
{
if (c[i] == ht[j].data) //Find the Huffman code corresponding to the character
{
for (k = hcd[j].start; k <= n; k + + )
{
printf("%c", hcd[j].cd[k]);
}
}
}
}
printf("\
Encryption completed\
");
}

Decrypt based on Huffman coding

When constructing the Huffman code, it was said that 0 goes to the left and 1 goes to the right, so every time we start looking from the root node, go left when we encounter 0, and go right when we encounter 1, until we find a leaf node. According to the previous theorem, we can know that the following node is 2n-1, but we should pay attention here, because our array subscript starts from 0, so the position of the root node we initialize every time is 2n-2. code show as below:

void DeCryption(HTNode ht[], int n)
{
printf("Please enter a string of 0 and 1:\
");
char c1[N] = "\0"; //Character array initialization\0
gets_s(c1);
int length1 = strlen(c1);
printf("\
The ciphertext length is %d", length1);
\t
int i = 0;
int node;
HTNode* tmp = ht;
node = 2 * n - 2; //Record the position of the node, initialized to the root node position
printf("\
Decrypt the input ciphertext, the decrypted result is:\
");
while (i < length1)
{
\t\t 
while (tmp[node].lchild != -1 || tmp[node].rchild != -1) //Find the leaf node, find the leaf node and exit the loop
{
if (c1[i] == '0')//According to the definition of Huffman coding, 0 left and right 1
{
node = tmp[node].lchild;

}
else
{
node = tmp[node].rchild;
}
i + + ;
}
printf("%c", tmp[node].data);
node = 2 * n - 2;//Initialize the position of the record node to the root node again
}
printf("\
Decryption completed, welcome back next time!!!");
}

This is my first time learning the Huffman tree, so I didn’t learn it well, so please forgive me! ! !

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 57622 people are learning the system