Asynchronous FIFO design simulation based on Verilog HDL

1. What is asynchronous FIFO

Asynchronous FIFO is one of the FIFO designs (First In First Out, data written first is read out first). Since read and write operations are independent, asynchronous FIFO is often used to transmit multi-bit data across clock domains. Asynchronous FIFO will be referred to as FIFO in the following.

Under the control signal of one clock domain (write clock domain), data is written to the FIFO buffer memory array. Under the control signal of another clock domain (read clock domain), data is read from another port of the same FIFO buffer memory array. Conceptually, such an assumption seems simple, but it is not very easy to implement in practice. The difficulty in FIFO design is what kind of FIFO pointer should be used and what method should be used to determine the full and empty status of the FIFO buffer memory array.

2. What kind of pointer is used?

Before discussing how to set appropriate pointers, we should first understand how pointers in FIFO work. The write pointer always points to the next address to which data is to be written. Likewise, the read pointer always points to the next address to read data from. When the reset signal is asserted, both pointers are set to zero. At this time, no data is stored in the FIFO, it is empty, and the read pointer points to invalid data. When a FIFO write operation is performed, the address location pointed to by the write pointer is written, and then the write pointer is incremented to point to the next location to be written. When the first data is written into the FIFO, the write pointer will be incremented and the read empty signal will be invalid. The read pointer pointing to the first FIFO storage location will immediately output the first valid data. At the same time, the read pointer will also be incremented and point to the next data. read data.

By explaining how pointers work, we can know that when both pointers are reset to zero in a reset operation, or when the read pointer catches up with the write pointer after reading the last word from the FIFO, the read and write pointers If equal, the FIFO is empty; when the write pointer bypasses and catches up with the read pointer, the read and write pointers are equal again and the FIFO is full. This raises a question: When the pointers are equal, is the FIFO in an empty state (empty) or a full state (full)?

By adding additional MSBs to the pointer, the empty and full status of the FIFO can be effectively judged. When the write pointer increments to the final address of the FIFO, the write pointer increments the unused MSB bits while setting the remaining bits to zero. The same goes for read pointers. If the MSBs of the two pointers are different, it means that the write pointer is wrapped one more time than the read pointer. If the MSB of two pointers is the same, it means that both pointers have been wrapped the same number of times.

Figure 1 Using the pointer with extra MSB added to determine the FIFO empty and full status

Using n-bit pointers (where n-1 bits are the number of address bits required to access the entire FIFO memory buffer), when the bits of the two pointers, including the MSB, are equal, the FIFO is empty. When all bits of the two pointers except the MSB are equal and the MSB is not equal, the FIFO is full. The FIFO design in this article uses an n-bit pointer for a FIFO with 2^(n-1) writable locations to determine the empty and full status.

We now know that we can judge the FIFO empty and full status by comparing the write pointer and the read pointer. The problem is that what we designed is an asynchronous FIFO. The write pointer is located in the write clock domain, the read pointer is located in the read clock domain, and the two clock domains are asynchronous to each other. For ordinary binary encoding, changes in adjacent values may involve changes in multiple bits. For example, from 7 to 8, the binary code changes from 0111 to 1000, all four bits are flipped, and there are 14 possible intermediate states during the flipping process. When ordinary binary coding is used to transmit the read (write) pointer to the write (read) clock domain for comparison and judgment, it is likely to cause misjudgment due to the acquisition of intermediate states of coding changes. How to solve this problem?

Converting ordinary binary encoding to Gray code is an effective method. The conversion between adjacent Gray code values only requires a change of one bit, which greatly reduces the probability of metastable states and thus eliminates the problem of simultaneous changes in multi-bit signals on the same clock edge.

Figure 2 The correspondence between the four-bit Gray code and its decimal value

3. Detailed explanation of asynchronous FIFO design

The schematic block diagram of asynchronous FIFO design is shown below.

Figure 3 Principle block diagram of asynchronous FIFO design

In order to facilitate the analysis and understanding of each part, the entire FIFO design is divided into six Verilog modules. They are:

①, fifo1.v: top-level packaging module. Including all external input and output signals of FIFO design, and instantiation of all modules;

②, fifomem.v: This is the FIFO memory buffer accessed by both the write clock domain and the read clock domain. The buffer is represented in this article as an instantiated synchronous dual-port RAM (DPRAM);

③, sync_r2w.v: Synchronization module, used to synchronize the read pointer to the write clock domain. The synchronized read pointer will be used by the wptr_full module to determine the full signal. This module only contains flip-flops synchronized with the write clock and no other logic;

④. sync_w2r.v: synchronization module, used to synchronize the write pointer to the read clock domain. The synchronized write pointer will be used by the rptr_empty module to judge the read empty signal. This module only contains flip-flops synchronized with the read clock and no other logic;

⑤. rptr_empty.v: This module is completely synchronized with the read clock domain. Used to increment the read address and judge the empty signal;

⑥.wptr_full.v: This module is completely synchronized with the write clock domain. Used to increment the write address and judge the full signal.

The functions of each module and the issues that need attention are explained below:

3.1 fifo1.v

module fifo1 #(parameter DSIZE = 8,
               parameter ASIZE = 4)
(output [DSIZE-1:0] rdata,
 output full,
 output rempty,
 input [DSIZE-1:0] wdata,
 input winc, wclk, wrst_n,
 input rinc, rclk, rrst_n);
    
 wire [ASIZE-1:0] waddr, raddr;
 wire [ASIZE:0] wptr, rptr, wq2_rptr, rq2_wptr;
    
 sync_r2w sync_r2w (.wq2_rptr(wq2_rptr), .rptr(rptr),.wclk(wclk), .wrst_n(wrst_n));
 sync_w2r sync_w2r (.rq2_wptr(rq2_wptr), .wptr(wptr),.rclk(rclk), .rrst_n(rrst_n));
    
 fifomem #(DSIZE, ASIZE) fifomem(.rdata(rdata), .wdata(wdata),.waddr(waddr), .raddr(raddr),.wclken(winc), .wfull(wfull),.wclk(wclk) );
    
 rptr_empty #(ASIZE) rptr_empty(.rempty(rempty),.raddr(raddr),.rptr(rptr), .rq2_wptr(rq2_wptr),.rinc(rinc), .rclk(rclk),.rrst_n(rrst_n));

 wptr_full #(ASIZE) wptr_full(.wfull(wfull), .waddr(waddr),.wptr(wptr), .wq2_rptr(wq2_rptr),.winc(winc), .wclk(wclk),.wrst_n(wrst_n));

endmodule

The top-level FIFO module is a parameterized FIFO design, and all sub-blocks are instantiated using the recommended named port connection method. The bit width of the data transmitted in the FIFO designed here is [DSIZE-1,0], that is, 8 bits wide; the bit width of the pointer used is [ASIZE,0], that is, 5 bits wide (including 4-bit addressing DPRAM address plus 1 extra MSB). The signals that the FIFO receives from the outside include write data wdata, write enable winc, write clock wclk, write reset wrst_n, read enable rinc, read clock rclk, and read reset rrst_n. The signals output by the FIFO to the outside include read data rdata, read empty signal rempty, and write full signal wfull.

3.2 fifomem.v

module fifomem #(parameter DATASIZE = 8,
                 parameter ADDRSIZE = 4)
                (output [DATASIZE-1:0] rdata,
                 input [DATASIZE-1:0] wdata,
                 input [ADDRSIZE-1:0] waddr, raddr,
                 input wclken, wfull, wclk);
                 
                `ifdef VENDORRAM
                
                vendor_ram mem (.dout(rdata), .din(wdata),.waddr(waddr), .raddr(raddr),.wclken(wclken),.wclken_n(wfull), .clk(wclk));
                `else
                localparam DEPTH = 1<<ADDRSIZE;
                reg [DATASIZE-1:0] mem [0:DEPTH-1];
                
                assign rdata = mem [raddr];
                
                always @(posedge wclk)
                    if (wclken & amp; & amp; !wfull)
                        mem [waddr] <= wdata;
                `endif
            
endmodule

As you can see, the fifomem.v part is essentially a dual-port RAM (DPRAM, Dual Port RAM). Precompiled statements are used in the source code. Since vendor_ram is not defined in advance, you only need to pay attention to the part between else and endif. The depth of FIFO storage is DEPTH=1<

3.3 sync_r2w.v

module sync_r2w #(parameter ADDRSIZE = 4)
                 (rptr,wclk,wrst_n,wq2_rptr);
    
                input [ADDRSIZE:0] rptr;
                input wclk,wrst_n;
                output [ADDRSIZE:0] wq2_rptr;
                
                reg [ADDRSIZE:0] wq1_rptr;
                reg [ADDRSIZE:0] wq2_rptr;
                
                always@ (posedge wclk or negedge wrst_n)
                    begin
                        if (!wrst_n)
                            begin
                                wq1_rptr <= 0;
                                wq2_rptr <= 0;
                            end
                        else
                            begin
                                wq1_rptr <= rptr;
                                wq2_rptr <= wq1_rptr;
                            end
                    end
endmodule

The read address rptr converted into Gray code in the read clock domain is input to the write clock domain in the form of wq2_rptr (ie: two beats) through the two-stage D flip-flop in the write clock domain. Pay attention to the non-blocking amplitude method to synthesize a two-stage D flip-flop.

3.4 sync_w2r.v

module sync_w2r #(parameter ADDRSIZE = 4)
                 (wptr,rclk,rrst_n,rq2_wptr);
    
                input [ADDRSIZE:0] wptr;
                input rclk,rrst_n;
                output [ADDRSIZE:0] rq2_wptr;
                
                reg [ADDRSIZE:0] rq1_wptr;
                reg [ADDRSIZE:0] rq2_wptr;
                
                always@ (posedge rclk or negedge rrst_n)
                    begin
                        if (!rrst_n)
                            begin
                                rq1_wptr <= 0;
                                rq2_wptr <= 0;
                            end
                        else
                            begin
                                rq1_wptr <= wptr;
                                rq2_wptr <= rq1_wptr;
                            end
                    end
endmodule

The write address wptr converted into Gray code in the write clock domain is input to the read clock domain in the form of rq2_wptr through the two-stage D flip-flop in the read clock domain. Note the use of non-blocking amplitude.

3.5 rptr_empty.v

module rptr_empty #(parameter ADDRSIZE = 4)
                   (rq2_wptr,rinc,rclk,rrst_n,
                    rempty,raddr,rptr);
                    
                    input [ADDRSIZE :0] rq2_wptr;
                    input rinc, rclk, rrst_n;
                    output rempty;
                    output [ADDRSIZE-1:0] raddr;
                    output [ADDRSIZE :0] rptr;
                    
                    reg rempty;
                    reg [ADDRSIZE :0] rptr;
                    
                    reg [ADDRSIZE:0] rbin;
                    wire [ADDRSIZE:0] rgraynext, rbinnext;
                   
                    always @(posedge rclk or negedge rrst_n)
                        if (!rrst_n)
                            {rbin, rptr} <= 0;
                        else
                            {rbin, rptr} <= {rbinnext, rgraynext};
                   
                    assign raddr = rbin [ADDRSIZE-1:0];
                    assign rbinnext = rbin + (rinc & amp; ~rempty);
                    assign rgraynext = (rbinnext>>1) ^ rbinnext;
                   
                    assign rempty_val = (rgraynext == rq2_wptr);
                    always @(posedge rclk or negedge rrst_n)
                        if (!rrst_n)
                            rempty <= 1'b1;
                        else
                            rempty <= rempty_val;
endmodule

Pay special attention to the differences between the signals in this section. Several signals used in this module include: Gray code write pointer rq2_wptr transmitted to the read clock domain, binary read pointer rbin, next beat binary read pointer rbinnext, Gray code read pointer rptr, next beat Gray code read Pointer rgraynext, binary read address raddr written to RAM.

Sequential logic is used to control the binary pointer and Gray code pointer (it does not always increase, the read address can only increase automatically when the read enable is valid and the read address is empty).

As mentioned earlier, n-bit pointers can be used for FIFOs with 2^(n-1) writable locations. Using the last 4 bits of the 5-bit read pointer as the read address raddr can realize the addressing of the read address in fifomem.v.

Using combinational logic, the conversion from rbinnext (binary code) to rgraynext (Gray code) can be achieved. The specific method is relatively simple and will not be described in detail here.

As mentioned earlier, using n-bit pointers (where n-1 bits are the number of address bits required to access the entire FIFO memory buffer), when the bits of the two pointers, including the MSB, are equal, the FIFO is empty. This condition still holds true for Gray code pointers. In order to judge the read empty signal more efficiently, the Gray code read pointer rgraynext of the next beat is actually compared with the Gray code write pointer rq2_wptr transmitted to the read clock domain. When the bits of the two are exactly the same, it is considered FIFO read empty. Otherwise, the read empty signal is invalid and the FIFO continues to perform read operations.

3.6 wptr.full.v

module wptr_full #(parameter ADDRSIZE = 4)
                  (wq2_rptr,winc, wclk, wrst_n,
                   wfull,waddr,wptr);

                   input [ADDRSIZE :0] wq2_rptr;
                   input winc, wclk, wrst_n;
                   output full;
                   output [ADDRSIZE-1:0] waddr;
                   output [ADDRSIZE :0] wptr;
                   
                   reg wfull;
                   reg [ADDRSIZE :0] wptr;
                   
                   
                   reg [ADDRSIZE:0] wbin;
                   wire [ADDRSIZE:0] wgraynext, wbinnext;
                
                always @(posedge wclk or negedge wrst_n)
                    if (!wrst_n)
                        {wbin, wptr} <= 0;
                    else
                        {wbin, wptr} <= {wbinnext, wgraynext};
                
                assign waddr = wbin [ADDRSIZE-1:0];
                assign wbinnext = wbin + (winc & amp; ~wfull);
                assign wgraynext = (wbinnext>>1) ^ wbinnext;
               
                assign wfull_val = (wgraynext == {~wq2_rptr[ADDRSIZE:ADDRSIZE-1],wq2_rptr[ADDRSIZE-2:0]});
                always @(posedge wclk or negedge wrst_n)
                    if (!wrst_n)
                        wfull <= 1'b0;
                    else
                        wfull <= wfull_val;
endmodule

Several signals used in this module include: Gray code read pointer wq2_rptr transmitted to the write clock domain, binary write pointer wbin, next beat binary write pointer wbinnext, Gray code write pointer wptr, next beat Gray code write Pointer wgraynext, binary write address waddr written to RAM.

Sequential logic is used to control the binary pointer and Gray code pointer (it does not always increase, the write address can only increase automatically when the write enable is valid and not full).

As mentioned earlier, n-bit pointers can be used for FIFOs with 2^(n-1) writable locations. Addressing the write address in fifomem.v can be achieved by using the last 4 bits of the 5-bit write pointer as the write address waddr.

Using combinational logic, the conversion from wbinnext (binary code) to wgraynext (Gray code) can be achieved. The specific method is relatively simple and will not be described in detail here.

As mentioned earlier, using n-bit pointers (where n-1 bits are the number of address bits required to access the entire FIFO memory buffer), when the bits of the two pointers except the MSB are equal and the MSB is not equal, the FIFO For fullness. This condition needs to be modified for Gray code pointers. Specifically, the two highest MSBs of the two pointers need to be exactly opposite, and the other bits must be the same. In order to judge the full signal more efficiently, the Gray code write pointer wgraynext of the next beat is actually compared with the Gray code read pointer wq2_rptr transmitted to the write clock domain. The two highest two MSBs need to be exactly opposite. The other two When all bits are the same, the FIFO is considered full. Otherwise, the full signal is invalid and the FIFO continues to write operations.

4. Testbench file writing and timing diagram analysis

According to the characteristics of FIFO, it is not difficult to write a testbench file to test it.

`timescale 1ns / 1ps
module fifo1_tb;
    parameter ASIZE = 4;
    parameter DSIZE = 8;
    
    reg winc,wclk,wrst_n,rinc,rclk,rrst_n;
    wire wfull,rempty;
    reg [DSIZE-1:0] wdata;
    wire [DSIZE-1:0] rdata;
    
    reg init_done;
    
    initial
        begin
           winc = 0;
           wclk = 0;
           wrst_n = 1;
           rinc = 0;
           rclk = 0;
           rrst_n = 1;
           init_done = 0;
           #30 wrst_n = 0;
               rrst_n = 0;
           #30 wrst_n = 1;
               rrst_n = 1;
           #30 init_done = 1;
        end
        
        always #2 wclk = ~wclk;
        always #4 rclk = ~rclk;
        

        always @(*)
            begin
                if (init_done)
                    begin
                        winc = 1;
                        rinc = 1;
                    end
            end
 
        always @(posedge wclk)
            begin
                if (~init_done)
                    wdata <= 'b0;
                else if ( {winc,wfull} == 2'b10 )
                    wdata <= wdata + 1;
                else
                    wdata <= wdata;
                end

        fifo1 fifo1_test (.winc(winc),.wclk(wclk),.wrst_n(wrst_n),
                          .rinc(rinc),.rclk(rclk),.rrst_n(rrst_n),
                          .wdata(wdata),.rdata(rdata),.wfull(wfull),.rempty(rempty));
            

endmodule

Note that in order to facilitate timing analysis, the write enable winc and the write full signal wfull are used to control the automatic increment of the write data. The period of the read/write clock can be easily modified to observe the FIFO operation under the comparison of fast and slow clocks.

4.1 Fast writing (2ns) and slow reading (4ns)

View timing waveforms through Vivado. First set the write clock to toggle every 2ns and the read clock to toggle every 4ns. Within the first 90ns, the reset bits of some signals to 0 are completed. The read and write enable is valid for 90ns, and data transmission officially begins.

It can be seen that at 90ns, data begins to be written to the FIFO and wdata begins to increase. It should be noted that the wdata shown here is controlled increment external write data, and the data actually written to the FIFO will be one write clock slower than the wdata shown here. Since the read data rdata is a combinational logic output, rdata outputs the initialized write data wdata at 90ns, which is a 0 signal.

When the next write clock comes (94ns), the first write data 01 (hexadecimal) is officially written at mem[0]. Since the read data rdata is a combinational logic output, rdata immediately outputs a 01 signal at 94ns. The reading of the 01 signal spans multiple clock beats. The reason is that the write pointer requires multiple clock beats to be passed into the read clock domain for comparison with the read pointer, so that the read empty signal is invalid before a new signal can be read continuously. .

It can be seen that at 178ns, the wgraynext and wq2_rptr used to determine the full signal are 1d (11101) and 04 (00101) respectively, which exactly meets the inversion of the upper two MSBs, and the remaining bits are the same as the full signal condition. The full signal efficient. Therefore, when the next write clock beat comes (182ns), the write data wdata no longer increments (here, winc and wfull are used to control the input of write data in the testbench file. If data is input from the outside at a constant beat, data will be missed. situation, it is not convenient to observe the timing diagram). At the same time (182ns), the wgraynext and wq2_rptr that determine the full signal are 1d (11101) and 04 (00100) respectively, which no longer meet the valid conditions for the full signal, and the full signal is invalid. Subsequent analysis on the rising edge of each clock can be deduced in this way.

4.2 Fast writing (2ns) and slow reading (32ns)

Set the read clock to flip every 32ns. The simulation timing diagram is as follows:

It can be seen that due to the slow read clock, the write-full signal is valid before rq2_wptr has time to change (the high two MSBs of wgraynext and wq2_rptr are opposite, and the remaining bits are the same), and the write data no longer increases by itself to wait for the data to be read. out.

At 160ns, rq2_wptr finally changes (this change is inevitable when the read clock hits two beats), which is not equal to rgraynext. When the next read clock beat comes, the read empty signal rempty is invalid, and the next read Output data when the clock beat comes.

4.3 Fast reading (2ns) and slow writing (4ns)

Set the write clock to toggle every 4ns and the read clock to toggle every 2ns. The timing simulation is as follows:

The timing analysis steps are similar to those in the case of fast writing and slow reading. It should be noted that due to the init_done setting, the invalid data 00 is written into mem and is not overwritten in the early stage. This problem can be solved by setting init_done to 92ns. Specifically, just set the init_done signal to be at the same time as the rising edge of the write clock, so that the rising edge of the write clock at the current moment is invalid.

4.4 Fast reading (2ns) and slow writing (32ns)

During the timing simulation, the author found that the original method of using combinational logic to output read data had certain problems. As shown below.

When the FIFO writes data, under the action of rq2_wptr and rgraynext, the read empty signal is invalid and the data in the FIFO starts to be read. At this time, since combinational logic is used to output read data, rdata will output the data at the current read address mem[01]. In fact, there is no time to write data there at this time, which results in the output of invalid data.

Using the sequential logic method to output the read data can effectively solve this problem. Note that the read clock, read enable and other signals have been connected to the RAM at this time, and the FIFO block diagram has changed compared to Figure 3. The modified fifomem.v code is posted below. At this time, the top-level module fifo1 also needs to make corresponding modifications to the port of the instantiated module.

module fifomem #(parameter DATASIZE = 8,
                 parameter ADDRSIZE = 4)
                (output reg [DATASIZE-1:0] rdata,
                 input [DATASIZE-1:0] wdata,
                 input [ADDRSIZE-1:0] waddr, raddr,
                 input wclken, wfull, wclk,
                 input rinc, rempty, rclk);
                 
// reg [DATASIZE-1:0] rdata;
                 
                `ifdef VENDORRAM
                
                vendor_ram mem (.dout(rdata), .din(wdata),.waddr(waddr), .raddr(raddr),.wclken(wclken),.wclken_n(wfull), .clk(wclk));
                `else
                localparam DEPTH = 1<<ADDRSIZE;
                reg [DATASIZE-1:0] mem [0:DEPTH-1];
                
// assign rdata = mem [raddr];
                
                always @(posedge rclk)
                    if (rinc & amp; & amp; !rempty)
                        rdata <= mem [raddr];
                        
                        
                always @(posedge wclk)
                    if (wclken & amp; & amp; !wfull)
                        mem [waddr] <= wdata;
                `endif
            
endmodule

At this time, the simulation timing diagram is as follows:

Since timing control is added to the read data, invalid data will no longer be output! After testing, after adding timing control to the read data, there were no errors in the corresponding situations of 4.1, 4.2, and 4.3.

5. Some important issues in FIFO design

(To be added)