Verilog7.2.1 Verilog parallel FIR filter design

FIR (Finite Impulse Response) filter is a finite-length unit impulse response filter, also known as a non-recursive filter.

The FIR filter has strict linear phase-frequency characteristics, and its unit response is finite, making it a stable system and widely used in digital communications, image processing and other fields.

FIR filter principle
FIR filters are finite length unit impulse response filters. The direct structure is as follows:

The FIR filter is essentially the convolution of the input signal and the unit impulse response function. The expression is as follows:

FIR filters have the following characteristics:

(1) The response is a finite-length sequence.
(2) The system function converges at |z| > 0, and all poles are at z=0, which belongs to a causal system.
(3) It is non-recursive in structure and has no feedback from output to input.
(4) The input signal phase response is linear because the response function h(n) coefficients are symmetric.
(5) The relative phase difference between the frequencies of the input signal is also fixed.
(6) Time domain convolution is equal to frequency domain multiplication, so this convolution is equivalent to filtering the gain multiple of each frequency component in the spectrum. Some frequency components are retained and some frequency components are attenuated, thereby achieving a filtering effect.
Parallel FIR filter design
Design Notes

Input a sine wave mixed signal with frequencies of 7.5 MHz and 250 KHz. After passing through the FIR filter, the high-frequency signal 7.5MHz is filtered out, leaving only the 250KHz signal. The design parameters are as follows:

Input frequency: 7.5MHz and 250KHz
Sampling frequency: 50MHz
Stop band: 1MHz ~ 6MHz
Order: 15 (N-1=15)

It can be seen from the FIR filter structure that when the order is 15, the implementation of FIR requires 16 multipliers, 15 adders and 15 sets of delay registers. In order to stabilize the data of the first beat, an additional set of delay registers can be used, that is, a total of 16 sets of delay registers are shared. Due to the symmetry of the FIR filter coefficients, half as many multipliers can be used, i.e. a total of 8 multipliers.

Parallel design is to perform multiplication and addition operations on 16 delayed data simultaneously within one clock cycle, and then output the filtered value driven by the clock. The advantage of this method is that the filtering delay is short, but the timing requirements are relatively high.

Concurrent design

For the multiplier module code used in the design, please refer to the previous pipeline-style designed multiplier.

In order to facilitate fast simulation, you can also directly use the multiplication sign * to complete the multiplication operation. Add the macro definition SAFE_DESIGN to the design to select which multiplier to use.

FIR filter coefficients can be generated by matlab, see the appendix for details.

/********************************************** *************
>> V201001: Fs: 50Mhz, fstop: 1Mhz-6Mhz, order: 15
*************************************************** **********/
`define SAFE_DESIGN
 
module fir_guide (
    input rstn, //reset, active low
    input clk, //working frequency, that is, sampling frequency
    input en, //Input data valid signal
    input [11:0] xin, //Input mixed frequency signal data
    output valid, //output data valid signal
    output [28:0] yout //Output data, low frequency signal, i.e. 250KHz
    );
 
    //data en delay
    reg [3:0] en_r;
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            en_r[3:0] <= 'b0;
        end
        else begin
            en_r[3:0] <= {en_r[2:0], en};
        end
    end
 
   //(1) 16 sets of shift registers
    reg [11:0] xin_reg[15:0];
    reg [3:0] i, j ;
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            for (i=0; i<15; i=i + 1) begin
                xin_reg[i] <= 12'b0;
            end
        end
        else if (en) begin
            xin_reg[0] <= xin;
            for (j=0; j<15; j=j + 1) begin
                xin_reg[j + 1] <= xin_reg[j]; //Periodic shift operation
            end
        end
    end
 
   //Only 8 multipliers needed because of the symmetry of FIR filter coefficient
   //(2) The coefficients are symmetrical, 16 shift register data are added at the first bit
    reg [12:0] add_reg[7:0];
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            for (i=0; i<8; i=i + 1) begin
                add_reg[i] <= 13'd0;
            end
        end
        else if (en_r[0]) begin
            for (i=0; i<8; i=i + 1) begin
                add_reg[i] <= xin_reg[i] + xin_reg[15-i];
            end
        end
    end
 
    //(3) 8 multipliers
    //Filter coefficient, which has been amplified by a certain factor
    wire [11:0] coe[7:0] ;
    assign coe[0] = 12'd11;
    assign coe[1] = 12'd31;
    assign coe[2] = 12'd63;
    assign coe[3] = 12'd104;
    assign coe[4] = 12'd152;
    assign coe[5] = 12'd198;
    assign coe[6] = 12'd235;
    assign coe[7] = 12'd255;
    reg [24:0] mout[7:0];
 
`ifdef SAFE_DESIGN
    //Pipeline multiplier
    wire [7:0] valid_mult;
    genvark;
    generate
        for (k=0; k<8; k=k + 1) begin
            mult_man #(13, 12)
            u_mult_paral (
              .clk (clk),
              .rstn (rstn),
              .data_rdy (en_r[1]),
              .mult1 (add_reg[k]),
              .mult2 (coe[k]),
              .res_rdy (valid_mult[k]), //All output enable are exactly the same
              .res (mout[k])
            );
        end
    endgenerate
    wire valid_mult7 = valid_mult[7];
 
`else
    //If the timing requirements are not high, you can directly use the multiplication sign
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            for (i=0; i<8; i=i + 1) begin
                mout[i] <= 25'b0;
            end
        end
        else if (en_r[1]) begin
            for (i=0; i<8; i=i + 1) begin
                mout[i] <= coe[i] * add_reg[i];
            end
        end
    end
    wire valid_mult7 = en_r[2];
`endif
 
    //(4) Accumulation of points, 8 groups of 25bit data -> 1 group of 29bit data
    //Data valid delay
    reg [3:0] valid_mult_r;
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            valid_mult_r[3:0] <= 'b0;
        end
        else begin
            valid_mult_r[3:0] <= {valid_mult_r[2:0], valid_mult7};
        end
    end

`ifdef SAFE_DESIGN
    //When adding operations, pipeline them in multiple cycles to optimize timing.
    reg [28:0] sum1 ;
    reg [28:0] sum2 ;
    reg [28:0] yout_t;
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            sum1 <= 29'd0;
            sum2 <= 29'd0;
            yout_t <= 29'd0;
        end
        else if(valid_mult7) begin
            sum1 <= mout[0] + mout[1] + mout[2] + mout[3] ;
            sum2 <= mout[4] + mout[5] + mout[6] + mout[7] ;
            yout_t <= sum1 + sum2;
        end
    end
 
`else
    //Calculate the cumulative result in one step, but the timing is very dangerous in practice
    reg signed [28:0] sum ;
    reg signed [28:0] yout_t;
    always @(posedge clk or negedge rstn) begin
        if (!rstn) begin
            sum <= 29'd0;
            yout_t <= 29'd0;
        end
        else if (valid_mult7) begin
            sum <= mout[0] + mout[1] + mout[2] + mout[3] + mout[4] + mout[5] + mout[6] + mout[7];
            yout_t <= sum ;
        end
    end
`endif
    assign yout = yout_t;
    assign valid = valid_mult_r[0];

endmodule

testbench

The testbench is written as follows. Its main function is to continuously input 250KHz and 7.5MHz sine wave mixed signal data. The input mixed signal data can also be generated by matlab, see the appendix for details.

`timescale 1ps/1ps
 
module test;
   //input
    reg clk;
    reg rst_n;
    reg en ;
    reg [11:0] xin ;
    //output
    wire valid;
    wire [28:0] yout;
 
    parameter SIMU_CYCLE = 64'd2000; //50MHz sampling frequency
    parameter SIN_DATA_NUM = 200; //Simulation cycle

//======================================
// 50MHz clk generating
    localparam TCLK_HALF = 10_000;
    initial begin
        clk = 1'b0;
        forever begin
            #TCLK_HALF;
            clk = ~clk ;
        end
    end
 
//============================
// reset and finish
    initial begin
        rst_n = 1'b0;
        # 30 rst_n = 1'b1;
        # (TCLK_HALF * 2 * SIMU_CYCLE) ;
        $finish;
    end
 
//========================================
// read signal data into register
    reg [11:0] stimulus [0: SIN_DATA_NUM-1];
    integer i;
    initial begin
        $readmemh("../tb/cosx0p25m7p5m12bit.txt", stimulus) ;
        i = 0;
        en = 0;
        xin = 0;
        # 200 ;
        forever begin
            @(negedge clk) begin
                en = 1'b1;
                xin = stimulus[i];
                if (i == SIN_DATA_NUM-1) begin // Periodically send data control
                    i = 0;
                end
                else begin
                    i = i + 1;
                end
            end
        end
    end
 
    fir_guide u_fir_paral (
      .xin (xin),
      .clk (clk),
      .en (en),
      .rstn (rst_n),
      .valid (valid),
      .yout (yout));
 
endmodule

Simulation results

It can be seen from the simulation results in the figure below that the signal after the FIR filter has only one low-frequency signal (250KHz), and the high-frequency signal (7.5MHz) is filtered out. Moreover, the output waveform is continuous and can be continuously output.

However, as shown in the red circle, the initial part of the waveform is irregular, so zoom in on this.

The start end of the waveform is enlarged as shown in the figure below, and the time period of the irregular waveform can be seen, that is, the time interval between two vertical lines is 16 clock cycles.

Because the data is serial input, 16 sets of delay registers are used in the design, so the first normal point after filtering should be delayed by 16 clock cycles from the first filtered data output moment. That is, the data output valid signal valid should be delayed for another 16 clock cycles, which will make the output waveform more perfect.

Appendix: matlab usage
Generate FIR filter coefficients

Open matlab and enter the command: fdatool in the command window.

Then the following window will open, set according to the FIR filter parameters.

The FIR implementation method chosen here is the least squares method (Least-squares), and different implementation methods have different filtering effects.

Click File -> Export

Output the filter parameters and store them in the variable coef, as shown in the figure below.

At this time, the coef variable should be floating point data. Multiply and expand it by a certain multiple, and then take its approximate fixed-point data as the FIR filter parameters in the design. Here, the expansion factor is set to 2048, and the results are as follows.

Generate a mixed signal of the input

The reference code for using matlab to generate mixed input signals is as follows.
The signal is unsigned fixed-point data with a bit width of 12bit and is stored in the file cosx0p25m7p5m12bit.txt.

clear all;close all;clc;
%================================================== ======
% generating a cos wave data with txt hex format
%================================================== ======

fc = 0.25e6; % center frequency
fn = 7.5e6; % clutter frequency
Fs = 50e6; % sampling frequency
T = 1/fc ; % signal period
Num = Fs * T ; % number of signal sampling points in the period
t = (0:Num-1)/Fs; % discrete time
cosx = cos(2*pi*fc*t); % center frequency sinusoidal signal
cosn = cos(2*pi*fn*t); % clutter signal
cosy = mapminmax(cosx + cosn) ; % amplitude expands to between (-1,1)
cozy_dig = floor((2^11-1) * cozy + 2^11) ; % amplitude expands to 0~4095
fid = fopen('cosx0p25m7p5m12bit.txt', 'wt'); %write data file
fprintf(fid, '%x\\
', cozy_dig) ;
fclose(fid);
 
%Time domain waveform
figure(1);
subplot(121);plot(t,cosx);hold on;
plot(t,cosn);
subplot(122);plot(t,cosy_dig);
 
%frequency domain waveform
fft_cosy = fftshift(fft(cosy, Num)) ;
f_axis = (-Num/2 : Num/2 - 1) * (Fs/Num) ;
figure(5);
plot(f_axis, abs(fft_cosy)) ;