FPGA generates 3X3 image processing matrix through FIFO

In the previous series of articles on median filtering, I have written about the generation method of 3×3 matrix template:

Median filter design based on FPGA—-(3) Matrix template generation module design_verilog image generation 3*3 matrix-CSDN blog

Now it seems that this method is relatively crude and simple. The general processing method is to store an entire image in ROM/RAM in advance, and then take out each point of the 3×3 matrix one by one through addressing. It now seems that there are many shortcomings. For example, when expanding to a 5×5 matrix, the code workload will increase exponentially and it is not suitable for high-speed data stream processing, blah blah blah.

This article is equivalent to checking for leaks and filling in gaps. It introduces a currently mainstream matrix generation method, which uses FIFO sliding to obtain the 3×3 matrix for each pixel of an image.

1. Overall structure and function description

First, let’s take a look at the structure of data processing:

As shown in the figure above, when the external image data Data_In is input, it needs to be stored in FIFO_0. Note that the data is only allowed to be read out when FIFO_0 is full of one row and at the same time the read data is stored in In FIFO_1, the same row of data is allowed to be read only after FIFO_1 is full. The entire operation is like data sliding between FIFOs. Let’s look at the formation of the matrix. The actual first matrix formation requires three beats. The first beat is the input data c7 and the C4 and C4 values read from FIFO_0 and FIFO_1 at the same time. C1, and so on, it takes 3 beats to form a matrix. The special processing of the edges is not considered here. The entire module follows this pipeline method, inputting a single image pixel and then outputting all 3×3 matrices.

It can be seen that two Fifos are required when forming a 3×3 matrix, and by analogy, 4 Fifos are required when forming a 5×5 matrix.

2. Port descriptions and constants

The composition of the port is as follows:

Input signals:The module has 4 input signals. The clock clk and reset srst are both global signals. The external input data Data_In, and the flag signal Data_En that indicates the external input data is valid. Through the data valid signal It can handle discontinuous input data.

Output signal: The output port of the module is also relatively simple. They are the output matrix data of c0_0~c2_2 and the output matrix valid signal Mata3x3_data_en, where c1_2 represents the coordinates of the first row, second and second columns of the 3×3 matrix. point.
The actual port and constant design code is as follows:

module Matrix_3x3#(
parameter HS_num = 5 ,
VS_num = 6,
Data_width = 8
)(
inputclk,
input srst,
input Data_En,
input [Data_width-1:0] Data_In ,

output reg Mat3x3_data_en ,
output reg [Data_width-1:0] c0_0,c0_1,c0_2 ,
output reg [Data_width-1:0] c1_0,c1_1,c1_2 ,
output reg [Data_width-1:0] c2_0,c2_1,c2_2

);

The constants are HS_num, which represents the total number of columns in a picture, VS_num, which represents the total number of rows in a picture, and the remaining Data_width, which represents the bit width of the pixel data.

3. Logical function design

The logic design is divided into three parts for explanation, namely FIFO read control, FIFO write control, and matrix search and generation. Before introducing these three logic designs, let’s first take a look at the signals that need to be used internally:

reg [3:0] state ;//Fifo read and write status
wire Fifo_wr [1:0] ;
wire[Data_width-1:0] Fifo_datain [1:0] ;
reg Fifo_rd [1:0];
wire[Data_width-1:0] Fifo_dataout [1:0] ;
wire Fifo_full [1:0] ;
wire Fifo_empty [1:0];
wire[9:0] Fifo_data_count [1:0];
\t
reg [3:0] St;
reg [7:0] hs_cnt;//pixel counter for one row
reg [7:0] vs_cnt;//row counter
reg [1:0] delay_cnt ;

From the state signal to the Fifo_data_count signal can be summarized as the internal signals of the FIFO read and write control part, while St to delay_cnt are the signals used for matrix generation, which will be introduced one by one in detail next.

1) FIFO write control

The first is the read control and instantiation of Fifo. The code is as follows:

assign Fifo_wr[0] = Data_En;
assign Fifo_datain[0] = Data_In;

assign Fifo_wr[1] = Fifo_rd[0];
assign Fifo_datain[1] = Fifo_dataout[0];
\t
genvar i;
generate
for(i=0;i<=1;i=i + 1) begin : row_fifo
fifo_8x512 row (
.clk (clk), // input wire clk
.srst (srst), // input wire srst
.din (Fifo_datain[i] ), // input wire [7 : 0] din
.wr_en (Fifo_wr[i] ), // input wire wr_en
.rd_en (Fifo_rd[i] ), // input wire rd_en
.dout (Fifo_dataout[i] ), // output wire [7 : 0] dout
.full (Fifo_full[i] ), // output wire full
.empty (Fifo_empty[i] ), // output wire empty
.data_count (Fifo_data_count[i] ) // output wire [9 : 0] data_count
);
end
endgenerate

On the writing side of the Fifo, for the first Fifo to enter a valid data, you need to write a data, for the second Fifo, it needs to write a data when the first Fifo reads a data. The instantiation of FIFO uses the generate method to cyclically instantiate two Fifos. The design of Fifo adopts the FWFT (straight in, straight out) method. The size of each Fifo is 8×512, where 8 is the data bit width. , 512 is the depth, it just needs to be larger than the rows of the image.

2) FIFO read control

The logic involved in Fifo’s write control is relatively complex, so it is divided into 4 always blocks. First, we need to know the current processing speed, that is, which pixel is the current one, because the edge pixels are The treatment methods need to be differentiated.

///Count of processed data rows and columns
always@(posedge clk) begin
if(srst) begin
hs_cnt<='h0;
vs_cnt<='h0;
end
else if(Fifo_rd[0] & amp; & amp;(hs_cnt==HS_num-1) & amp; & amp;(vs_cnt==VS_num-1)) begin
hs_cnt<='h0;
vs_cnt<='h0;
end
else if(Fifo_rd[0] & amp; & amp;(hs_cnt==HS_num-1)) begin
hs_cnt<='h0;
vs_cnt<=vs_cnt + 'h1;
end
else if(Fifo_rd[0]) begin
hs_cnt<=hs_cnt + 'h1;
vs_cnt<=vs_cnt;
end
else begin
hs_cnt<=hs_cnt;
vs_cnt<=vs_cnt;
end
end

The position of pixels is mainly determined by counting rows and columns. The counting logic is that when the first Fifo reads a data hs_cnt signal, the signal is incremented by one. When the count reaches a row, the column counter vs_cnt is incremented by one. When an image is traversed, it is reset to zero. If there is still data, it is looped.

Then the writing status is given by the position of the row.

//Fifo’s write status calibration
always@(posedge clk) begin
    if(srst)
        state<=4'd0;
    else if(vs_cnt<VS_num-1)
state<=4'd0;
else if((vs_cnt==VS_num-1) & amp; & amp;(delay_cnt==2'd2))//Process the first line from the last
state<=4'd1;
else
state<=4'd2;
end
//The delay gives the processing time to prevent the next row of data from being read too fast and being discarded when processing the data at the end of the row.
always@(posedge clk) begin
    if(srst)
delay_cnt<=2'd0;
    else if(vs_cnt<VS_num-1)
delay_cnt<=2'd0;
else if(vs_cnt==VS_num-1) begin//process the end-of-line delay of the third line from the bottom
if(delay_cnt<2'd2)
delay_cnt<=delay_cnt + 1'b1;
else
delay_cnt<=2'd2;
end
else
delay_cnt<=2'd0;
end

The first always block above describes the calibration of write status as sequential logic:

The first state is a normal line, except for the last line, which is considered a normal line and is marked as state<=4'b0;

The second state is the last line of the special line, marked as state state<=4'b1;

The third state is the stop state, which stops after processing a picture. It is marked as state state<=4'b2;

The row delay described in the second always block above does not need to be delayed when processing normal rows, but when the last row is processed, it prevents the data from being read too fast and the state transition is not urgent, causing the data to be discarded, so the Delay. This action only applies to individual images.

//Fifo read control
always@(*)
case(state)
4'd0:begin
Fifo_rd[0] =(Fifo_data_count[0]>=HS_num) & amp; & amp;Data_En & amp; & amp;!Fifo_full[1];
Fifo_rd[1] =(Fifo_data_count[1]>=HS_num) & amp; & amp;Data_En;
end
4'd1:begin
Fifo_rd[0] =!Fifo_empty[0] & amp; & amp;!Fifo_full[1];
Fifo_rd[1] =!Fifo_empty[0] & amp; & amp;(Fifo_data_count[1]>=HS_num);
end
4'd2:begin
            Fifo_rd[0] =1'b0;
            Fifo_rd[1] =1'b0;
end
default:begin
Fifo_rd[0] =1'b0;
Fifo_rd[1] =1'b0;
        end
endcase

Finally, there is the control of Fifo read signal Fifo_rd[*], which is implemented using combinational logic. At the same time, it needs to be switched according to the previous mark status signal state:

state<=4'b0. The conditions for reading the first Fifo are that the second Fifo is not full, there is at least one row of data in its own space (Fifo_data_count is the internal counting signal of the IP of Fifo), and there is data input. The condition for reading the second Fifo is that there is at least one row of data in its own space and there is data input.

state<=4'b1. The condition for reading the first Fifo is that it is not empty and the second Fifo is not satisfied. The conditions for reading the second Fifo are that there is at least one row of data in its space and that it is not empty.

state<=4'b2. The first Fifo and the second Fifo stop reading.

3) Matrix search and generation

The code for matrix generation is the most complicated part. I will post the complete code design first:

///Matrix search
always@(posedge clk) begin
if(srst) begin
St<='h0;
Mat3x3_data_en<=1'b0;
c0_0<='h0; c0_1<='h0; c0_2<='h0;
c1_0<='h0; c1_1<='h0; c1_2<='h0;
c2_0<='h0; c2_1<='h0; c2_2<='h0;
end
case(St)
4'd0:begin
Mat3x3_data_en<=1'b0;
if(Fifo_rd[0] & amp; & amp; (hs_cnt=='h0)) begin
St<=4'd1;
if(vs_cnt=='h0) begin//process the first line
c0_0 <=Fifo_dataout[0];
c1_0 <=Fifo_dataout[0];
c2_0 <=Data_In;
\t\t\t\t\t
c0_1 <=Fifo_dataout[0];
c1_1 <=Fifo_dataout[0];
c2_1 <=Data_In;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin//process the middle line
c0_0 <=Fifo_dataout[1];
c1_0 <=Fifo_dataout[0];
c2_0 <=Data_In;
\t\t\t\t\t
c0_1 <=Fifo_dataout[1];
c1_1 <=Fifo_dataout[0];
c2_1 <=Data_In;
end
else if(vs_cnt==VS_num-'h1) begin//process the last line
c0_0 <=Fifo_dataout[1];
c1_0 <=Fifo_dataout[0];
c2_0 <=Fifo_dataout[0];
\t\t\t\t\t
c0_1 <=Fifo_dataout[1];
c1_1 <=Fifo_dataout[0];
c2_1 <=Fifo_dataout[0];
end
end
end
4'd1:begin
if(Fifo_rd[0] & amp; & amp; (hs_cnt=='h1)) begin
St<=4'd2;
if(vs_cnt=='h0) begin
c0_2 <=Fifo_dataout[0];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
Mat3x3_data_en<=1'b1;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
Mat3x3_data_en<=1'b1;
end
else if(vs_cnt==VS_num-'h1) begin
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Fifo_dataout[0];
Mat3x3_data_en<=1'b1;
end
end
end
4'd2:begin
if(Fifo_rd[0] & amp; & amp; (hs_cnt==HS_num-1))
St<=4'd3;
else
St<=St;
if(Fifo_rd[0]) begin
Mat3x3_data_en<=1'b1;
c0_0<=c0_1; c0_1<=c0_2;
c1_0<=c1_1; c1_1<=c1_2;
c2_0<=c2_1; c2_1<=c2_2;
if(vs_cnt=='h0) begin
c0_2 <=Fifo_dataout[0];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
end

else if(vs_cnt==VS_num-'h1) begin
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Fifo_dataout[0];
end
end
else
Mat3x3_data_en<=1'b0;
end
4'd3:begin
St<=4'd0;
Mat3x3_data_en<=1'b1;
c0_0<=c0_1; c0_1<=c0_2; c0_2<=c0_2;
c1_0<=c1_1; c1_1<=c1_2; c1_2<=c1_2;
c2_0<=c2_1; c2_1<=c2_2; c2_2<=c2_2;
end
default:begin
St<='h0;
Mat3x3_data_en<=1'b0;
c0_0<='h0; c0_1<='h0; c0_2<='h0;
c1_0<='h0; c1_1<='h0; c1_2<='h0;
c1_0<='h0; c1_1<='h0; c1_2<='h0;
end
endcase
end 

This part of the code design idea needs to distinguish the design based on the normal part and edge part of the image. Here we use the copy filling method for filling the edge of the image, that is, copying from the nearest location:

You can watch So, when we generate a 3×3 matrix, we need to fill it like a red box, and when we generate a 5×5 matrix, we need to fill it like a green box. The principle is the same.

Among the states of the matrix-generated St state machine, let’s interpret how each state is generated. First of all, we must know that the idea of our processing is to divide each pixel of each row. What does it mean?

Specifically, our matrix is processed row by row, which can be used as a cycle of states. Then a row can be divided into four states, which are the edge matrix generation processing state of the first and second points at the beginning of a row, and the middle of the image row. The matrix generation processing status of the point, and the matrix generation processing status of the last point at the end of a row. So this is how the four states of the final St state machine come about.

By dividing the processing status of the points in a row, it can be inferred that the rows are also distinguished. They are that the matrix generation of the first row requires special processing, and finally the matrix generation of the first row requires special processing, and the processing of the normal rows in the middle. This is how each St The design idea of state branch if-else.

Let us further explain based on the above ideas. In state St=4’b0:

 4'd0:begin
Mat3x3_data_en<=1'b0;
if(Fifo_rd[0] & amp; & amp; (hs_cnt=='h0)) begin
St<=4'd1;
if(vs_cnt=='h0) begin//process the first line
c0_0 <=Fifo_dataout[0];
c1_0 <=Fifo_dataout[0];
c2_0 <=Data_In;
\t\t\t\t\t
c0_1 <=Fifo_dataout[0];
c1_1 <=Fifo_dataout[0];
c2_1 <=Data_In;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin//process the middle line
c0_0 <=Fifo_dataout[1];
c1_0 <=Fifo_dataout[0];
c2_0 <=Data_In;
\t\t\t\t\t
c0_1 <=Fifo_dataout[1];
c1_1 <=Fifo_dataout[0];
c2_1 <=Data_In;
end
else if(vs_cnt==VS_num-'h1) begin//process the last line
c0_0 <=Fifo_dataout[1];
c1_0 <=Fifo_dataout[0];
c2_0 <=Fifo_dataout[0];
\t\t\t\t\t
c0_1 <=Fifo_dataout[1];
c1_1 <=Fifo_dataout[0];
c2_1 <=Fifo_dataout[0];
end
end
end 

This state is the beginning of the row processing cycle, which specifically processes the first point of different rows, such as processing the first point of the first row. At this time, the first Fifois filled with the first row of data, and the first data of the second row is transmitted through Data_In. At this time, the first and second columns of the 3×3 matrix of the first pixel are obtained, because the first column of the matrix of the first point is equivalent to copy the second column. The middle row and the last row are processed in the same way. After processing, jump to state St=4’b1.

In state St=4’b1:

 4'd1:begin
if(Fifo_rd[0] & amp; & amp; (hs_cnt=='h1)) begin//first line
St<=4'd2;
if(vs_cnt=='h0) begin
c0_2 <=Fifo_dataout[0];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
Mat3x3_data_en<=1'b1;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin//middle line
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
Mat3x3_data_en<=1'b1;
end
else if(vs_cnt==VS_num-'h1) begin//last line
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Fifo_dataout[0];
Mat3x3_data_en<=1'b1;
end
end
end

This state mainly processes the second pixel point of each row. The three if-else branches are the processing methods of the first row, the middle row, and the last row, such as the processing method of the second point of the middle row. It is the first and second Fifo output Fifo_dataout[*] and the external input data Data_In that together form the third column of the matrix c0_2, c1_2, c2_2, and the first matrix is generated at this time, so Mat3x3_data_en raises the mark is 1’b1. After processing, jump to state St=4’b2.

In state St=4’b2:

 4'd2:begin
if(Fifo_rd[0] & amp; & amp; (hs_cnt==HS_num-1))
St<=4'd3;
else
St<=St;
if(Fifo_rd[0]) begin
Mat3x3_data_en<=1'b1;
c0_0<=c0_1; c0_1<=c0_2;
c1_0<=c1_1; c1_1<=c1_2;
c2_0<=c2_1; c2_1<=c2_2;
if(vs_cnt=='h0) begin//first line
c0_2 <=Fifo_dataout[0];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
end
else if((vs_cnt<VS_num-'h1) & amp; & amp;(vs_cnt>'h0)) begin//middle line
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Data_In;
end

else if(vs_cnt==VS_num-'h1) begin//last line
c0_2 <=Fifo_dataout[1];
c1_2 <=Fifo_dataout[0];
c2_2 <=Fifo_dataout[0];
end
end
else
Mat3x3_data_en<=1'b0;
end

In this state, the middle pixels are processed, which are also divided into three processing methods: the first row, the middle row, and the last row. The output data of Fifo and the input external data here are both stored in the third column of the matrix. , the first column needs to store the data of the second column, and the second column only needs to store the data of the third column. It is also equivalent to the form of sliding. Every time you slide, a new 3×3 matrix is generated, so The efficiency is greatly improved. After processing, jump to state St=4’b3.

In state St=4’b3:

 4'd3:begin
St<=4'd0;
Mat3x3_data_en<=1'b1;
c0_0<=c0_1; c0_1<=c0_2; c0_2<=c0_2;
c1_0<=c1_1; c1_1<=c1_2; c1_2<=c1_2;
c2_0<=c2_1; c2_1<=c2_2; c2_2<=c2_2;
end

This state only requires the first column to store the second column, and the second column to store the third column. Since the third column is a copy of the second column, it remains unchanged and produces a 3×3 matrix of the last pixel of a row. After processing, it means that one row has been processed, jumps to state St=4’b0, and starts a new row processing cycle.

4. Simulation test

The simulation test stimulus is relatively simple. The input image data is non-continuous 1~30, the number of calibration rows is 6 rows, and the number of columns is 6 columns.

The test results are as follows.

It can be seen that for the first pixel point 1 in the first row, its matrix result is:

c0_0=8’d1; c0_1=8’d1; c0_2=8’d2;

c1_0=8’d1; c1_1=8’d1; c1_2=8’d2;

c2_0=8’d6; c2_1=8’d6; c2_2=8’d7;

A bit more intuitively is this:

5. Summary

This article is mainly a further supplement to the 3×3 matrix generation. The knowledge points involved mainly include the use of FiFo, the use of state machines and the instantiation of the loop generate method. The key point of realizing a 3×3 matrix lies in the logic design of FIFO read control and image edge processing. If you can master these two points, you can basically write a 3×3 matrix by yourself. Of course, the commonly used 5×5 and 7×7 principles are the same.

The design ideas this time were inspired by senior experts, and the code writing and detail processing came from myself. There are inevitably many areas that need improvement. I hope that students who have ideas can propose improvements, and everyone can encourage us together!

syntaxbug.com © 2021 All Rights Reserved.