Computer algorithm analysis and design (12)—Greedy algorithm (optimal loading problem and Huffman coding problem)

Article directory

  • 1. Optimal loading problem
    • 1.1 Problem statement
    • 1.2 Code writing
  • 2. Huffman coding
    • 2.1 Overview of Huffman coding
    • 2.2 Prefix code
    • 2.3 Problem description
    • 2.4 Code ideas
    • 2.5 Code writing
    • 2.6 Average code length
    • 2.7 Examples
      • 2.7.1 Question 1
      • 2.7.2 Question 2

1. Optimal loading problem

1.1 Problem statement

?1. There is a batch of containers to be loaded onto a ship with a carrying capacity of

c

c

ship c, known container

i

(

1

i

n

)

i(1≤i≤n)

The weight of i(1≤i≤n) is

w

i

w_i

wi?. The optimal loading problem requires loading as many containers as possible onto a ship without limiting the loading volume.

?2. Greedy selection strategy: The one with the lightest weight is loaded first.

?3. Algorithm idea: Divide the shipping process into multi-step selections. Each step of loading

1

1

1 box each time, choosing the lightest box from the remaining boxes. This continues until all the boxes are loaded onto the ship or no more boxes can be accommodated on the ship.

1.2 Code writing

The time complexity of the algorithm is

O

(

n

l

o

g

n

)

O(nlogn)

O(nlogn)

#include<algorithm>
#include<iostream>
using namespace std;

int main(){<!-- -->
    int n; //define the number of containers
    int c; //Define the maximum carrying capacity of the ship
    cout<<"Enter the number of containers and the maximum carrying capacity of the ship"<<endl;
    cin>>n>>c;
    
    cout<<"Enter the weight of each item"<<endl;
    int w[n]; //Use an array to fill in the weight of the container
    for(int i=0;i<n;i + + )
{<!-- -->
        cin>>w[i];
    }

    sort(w,w + n); //Quick sort sorts the weight of the containers from small to large
    
    int temp=0; //intermediate value
    int count=0; //Counter
    
    for(int i=0;i<n;i + + )
{<!-- -->
        temp = temp + w[i];
        if(temp<=c)
{<!-- -->
            count + + ;

        }
        else
{<!-- -->
            break;
        }
    }

    cout<<"The maximum quantity that can be loaded into the container is"<<count<<endl;
    return 0;
}

2. Huffman coding

2.1 Overview of Huffman coding

?1. Huffman coding is one of the applications in telecommunications. It is widely used as a very effective coding method for data file compression. Its compression rate is usually between 20% and 90%. In telecommunications services, binary encoding is usually used to represent letters or other characters, and such encoding is used to represent character sequences.

?2. For example: If the message to be sent is

A

B

A

C

C

D

A

ABACCDA’

ABACCDA’, which uses only four characters, can be distinguished by two-digit binary encoding. hypothesis

A

,

B

,

C

,

D

A, B, C, D

The codes of A, B, C and D are respectively

00

,

01

,

10

,

11

00, 01,10, 11

00,01,10,11, then the above message will be

00010010101100

00010010101100’

00010010101100’ (14 digits in total), the decoder can group and decode by two digits to restore the original message.

2.2 Prefix code

?1. Prefix code definition: define a

0

,

1

0,1

0,1 string as its code, and requires that the code of any character is not a prefix of other character codes. This encoding is called a prefix code.

a b c d e f
Encoding mode 1 0 101 100 111 1101 1100
Encoding method 2 0 1 00 01 10 11

?It’s easy to find that when we use encoding

2

2

2, there will be problems with decoding, for example

00110

00110

00110 can be decoded into

a

a

b

b

a

aabba

aabba, which can also be decoded into

c

f

a

cfa

cfa, the decoding result is not unique and the encoding method is not feasible.
?With prefix codes (that is, any prefix of the code is not other codes), the decoding result is unique and the encoding method is feasible.

2.3 Problem description

?1. If the optimal encoding method is required, how to compare the advantages and disadvantages of different prefix codes? We can compare the total length of the encoded binary string. The shorter the total length, the better the encoding method is. The total length of the encoded binary depends on the frequency of characters to be encoded. Suppose we are given coded characters and their frequencies and two different prefix codes, as shown in the table below.

a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Prefix code 1 0 101 100 111 1101 1100
Prefix code 2 1100 1101 111 100 101 0

?It can be known from calculation that using prefix code

1

1

1The total length of the encoded binary string is

224000

224000

224000, using prefix code

2

2

2The total length of the encoded binary string is

348000

348000

348000, obviously, the prefix code

1

1

1 is better than prefix code

2

2

2.

?2. Greedy strategy: Characters that appear more frequently have shorter codes, Characters that appear less frequently have codes that are longer. Encode and decode it using a binary tree.

2.4 Code ideas

?1. Given encoding character set

C

C

C and

C

C

any character in C

c

c

frequency of c

f

(

c

)

f(c)

f(c).

C

C

A prefix code encoding scheme of C corresponds to a binary tree

T

T

T. character

c

c

c in the tree

T

T

The depth in T is denoted by

d

T

(

c

)

d_T(c)

dT?(c). The average code length of this encoding method is defined as:


?2. Huffman algorithm constructs a binary tree representing the optimal prefix code in a bottom-up manner

T

T

T. The algorithm is based on

n

n

Starting from n leaf nodes, execute

n

?

1

n-1

n?1 times of merging, the final required tree is generated after the operation.

T

T

T. In a Huffman tree, each character in the encoded character set

c

c

The frequency of c is

f

(

c

)

f(c)

f(c). by

f

f

f is the priority queue of key value

Q

Q

Q when used for greedy selection effectively determines the two trees with the minimum frequency that are currently to be merged. Once two trees with minimum frequencies are merged, a new tree is generated whose sum of frequencies is the sum of the frequencies of the merged two trees, and the new tree is added

Q

Q

Q.

2.5 Code Writing

The time complexity of the algorithm is

O

(

n

l

o

g

n

)

O(nlogn)

O(nlogn)

#include<iostream>
using namespace std;

#define MaxNode 200
#define MaxBit 100

//Nodes generally follow left 0 and right 1
typedef struct
{<!-- -->
double weight; //weight
int parent; //parent node
int lchild; //Left child node
int rchild; //right child node
char value; //The character represented by the node
}hufnode;

typedef struct
{<!-- -->
int start; //Start pointer, records the starting position of each letter code in the array
int bit[10]; //Storage Huffman code
}hufbit;

hufnode hujd[MaxNode];
hufbit hb[MaxBit];

//Initialize node, parent,lchild,rchild=-1,weight=0
void initNode()
{<!-- -->
for (int i = 0; i < MaxNode; i + + )
{<!-- -->
hujd[i].lchild = -1;
hujd[i].parent = -1;
hujd[i].rchild = -1;
hujd[i].weight = 0;
}
}

//Create a Huffman tree
void creathuf(int n)
{<!-- -->
int i;
cout << "Please enter the characters and weight of each node" << endl;
for (i = 0; i < n; i + + )
{<!-- -->
cin >> hujd[i].value >> hujd[i].weight;
}
 \t
    //Define two variables m1 and m2 to store the two smallest unprocessed weights, and two variables x1 and x2 to store the node index of the corresponding minimum weight.
double m1, m2;
int x1, x2;
for (int i = 0; i < n - 1; i + + )
{<!-- -->
m1 = m2 = 1000;
x1 = x2 = 0;
\t\t
for (int j = 0; j < n + i; j + + )
{<!-- -->
//Find the minimum weight node
if (hujd[j].weight < m1 & amp; & amp; hujd[j].parent == -1)
{<!-- -->
m2 = m1;
x2 = x1;
m1 = hujd[j].weight;
x1 = j;
}
//Find the second smallest weight node
else if (hujd[j].weight < m2 & amp; & amp; hujd[j].parent == -1)
{<!-- -->
m2 = hujd[j].weight;
x2 = j;
}
}
\t\t
        //Create a new parent node whose left child is x1 and right child is x2
hujd[n + i].weight = m1 + m2;
hujd[n + i].lchild = x1;
hujd[n + i].rchild = x2;
hujd[x1].parent = n + i;
hujd[x2].parent = n + i;
}
}

//coding
void tshuff(int n)
{<!-- -->
//p is used to save the index of the parent node of the current node, and c is used to save the index of the current node.
int p, c;
\t
for (int i = 0; i < n; i + + )
{<!-- -->
p = hujd[i].parent; //Get the parent node index of the current node and assign it to p
c = i; //Get the index of the current node and assign it to c
hb[i].start = n - 1; //Set the start bit of the current node to n-1, which may mean that we start encoding from the rightmost
while (p != -1) //Loop when the current node is not the root node
{<!-- -->
            //If the current node is the left child of the parent node, we set the binary bit at the starting position to 0 and move the starting position one bit to the right (move to the left of the tree)
if (hujd[p].lchild == c)
{<!-- -->
hb[i].bit[hb[i].start] = 0;
hb[i].start--;
}
//If the current node is the right child of the parent node, we set the binary bit at the starting position to 1 and move the starting position one bit to the right (to the left of the tree
if (hujd[p].rchild == c)
{<!-- -->
hb[i].bit[hb[i].start] = 1;
hb[i].start--;
}
//Assign the index of the current node to p, assign the index of the parent node to c, and then start the next cycle.
//This process continues until p equals -1, that is, when we reach the root node of the tree, the loop will end.
c = p;
p = hujd[p].parent;
}
}
}

void output(int n)
{<!-- -->
for (int i = 0; i < n; i + + )
{<!-- -->
cout << hujd[i].value << ":";
for (int j = hb[i].start + 1; j < n; j + + )
{<!-- -->
cout << hb[i].bit[j];
}
cout << endl;
}
}
int main()
{<!-- -->
int n;
cout << "Please enter how many characters:" << endl;
cin >> n;
initNode();
creathuf(n);
tshuff(n);
cout << "The encoding method is:" << endl;
output(n);
return 0;
}

2.6 average code length

2.7 Examples

When calculating the average code length in the following two questions, I will follow the depth from

1

1

1 started.
If according to

0

0

Starting from 0, the answer to the first question of average code length is:

1065

1065

1065; The answer to the second question is:

119

119

119.
Some textbooks define depth from

0

0

Starting from 0, some start from

1

1

1 To start, let’s look at the specific situation in detail.

2.7.1 Question 1


2.7.2 Question 2