LoongArch CPU Design Experiment_Practice Task 7: Simple pipeline CPU without considering related conflict handling

Directory

  • LoongArch CPU Design Experiment_Practice Task 7: Simple pipeline CPU without considering related conflict handling
    • statement
    • Experimental requirements
  • Pipeline design ideas (no conflicts)
    • Add cache between pipeline stages
      • Pipeline division
      • pipeline cache
      • Combinational logic and sequential logic (single-cycle pipeline)
    • Sync RAM
    • PC data path
      • ?Jump instruction
      • PC register reset
    • Handshake signals between pipeline stages
  • Code
    • IF stage
    • ID stage
    • EXE stage
      • alu
    • MEM stage
    • WB stage
    • mycpu.h
    • mycpu_top
    • soc_lite_top
  • TestBench
  • References

LoongArch CPU design experiment_Practical task 7: Simple pipeline CPU without considering related conflict handling

Statement

Due to the failure to successfully configure the Loongarch cross-compilation tool, this experiment used the exp compressed package (see the reference materials at the end of the article for the compressed package download link).

Experimental requirements

This practical task requires completing the following work based on the single-cycle CPU implemented in practical task 6:

  1. Adjust the CPU top-level interface and add the chip select signal inst_sram_ce of the instruction RAM and the chip select signal data_sram_ce of the data RAM.
  2. Adjust the CPU top-level interface and adjust both inst_sram_we and data_sram_we from 1-bit write enable to 4-bit byte write enable.
  3. Design a single-issue five-level pipeline CPU that does not consider correlation-induced conflicts.
  4. Running the func corresponding to exp7 requires successful simulation and board verification.
  • It should be noted that in the code implementation of this article, the en signal is the ce signal.

Pipeline design ideas (no conflicts)

Add caching between pipeline stages

Assembly line division

The five-level pipeline is fetching (IF), decoding (ID), executing (EXE), memory access (MEM), and writing back (WB).

  1. Instruction fetching phase: PC update, jump instruction PC update; use next_pc to retrieve the next round of instructions from inst_ram.
  2. Decoding stage: parsing instructions and generating various control signals. Read the general register file to generate the source operand, and write the data in the WB stage to the general register file. Process jump instructions, generate jump signals and jump target addresses, and forward jump information.
  3. Execution phase: Select source operands and perform various arithmetic logic operations. For the store instruction, write the data back to data_ram. Issue a request to read data_ram.
  4. Memory access phase: Get the result of the EXE phase read data_ram request. Choose whether the final result is the calculation result of alu or the result of reading data_ram.
  5. Write-back stage: Generate write-back signals (enable, address, write-back data), and pass the write-back signal to the ID stage to complete the write-back register.

Pipeline cache

Set triggers as pipeline caches between these five stages. We mark the triggers between IF and ID stages as ID reg. Other triggers are similar.
In order to confirm whether the content in the pipeline cache is valid, the valid bit valid needs to be set for identification.
The pipeline cache includes control signals and data content, that is, everything in the backward pass signal line.

Combinational logic and sequential logic (single-cycle pipeline)

Within the pipeline stages is combinational logic.
The pipeline cache contains sequential logic.

Sync RAM

Synchronous RAM issues a read enable and read address in the first clock cycle, and returns the read result in the second clock cycle.
inst_ram: Use next_pc to request the inst of the next instruction in the IF phase of this round.
data_ram: Request to read data_ram in the EXE stage, and obtain the read data in the MEM stage.
The timing relationship between PC, target, and inst is shown in the figure below (MIPS figure).

PC data path

?Jump command

Calculate whether to jump and the target address of the jump in the ID stage, and forward the data to the IF stage.
ds_pc is used when calculating the jump address, because the jump destination address in Loongarch is calculated using the pc of the jump instruction itself, not the delay slot instruction. pc.
It should be noted that there are no delay slot instructions in Loongarch.

PC register reset

When the reset signal is valid, the value of PC is reset to 32'h1bfffffc, because the value is obtained using next_pc. When the reset signal is valid, next_pc code> is 32'h1c000000.

Handshake signals between pipeline stages

  • fs_ready_go: Indicates that this stage is ready to send data, and the high level is valid.
    Subsequent blocking is implemented through the ready signal.
  • fs_allowin: Indicates whether data is allowed to be entered into this stage.
    This stage can only receive data when the data at this stage is invalid, or when this stage can send data and the next stage can receive data.
  • fs_to_ds_valid: Indicates whether the data from the fs module to the ds module is valid.
    Data sent to ds is only valid when the data at this stage is valid and ready to be sent.
  • fs_valid: Indicates whether the data in the fs module is valid.
// IF stage
assign fs_ready_go = 1'b1;
assign fs_allowin = !fs_valid || fs_ready_go & amp; & amp; ds_allowin;
assign fs_to_ds_valid = fs_valid & amp; & amp; fs_ready_go;
always @(posedge clk) begin
    if (reset) begin
        fs_valid <= 1'b0;
    end
    else if (fs_allowin) begin
        fs_valid <= to_fs_valid;
    end
end

Code implementation

IF stage

`include "mycpu.h"

module if_stage(
    inputclk,
    input reset,
    //allwoin
    input ds_allowin ,
    //brbus
    input [`BR_BUS_WD -1:0] br_bus ,
    //tods
    output fs_to_ds_valid ,
    output [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus ,
    // inst sram interface
    output inst_sram_en ,
    output [3:0] inst_sram_we ,
    output [31:0] inst_sram_addr ,
    output [31:0] inst_sram_wdata,
    input [31:0] inst_sram_rdata
);

reg fs_valid;
wire fs_ready_go;
wire fs_allowin;
wire to_fs_valid;

wire [31:0] seq_pc;
wire [31:0] nextpc;

wire br_taken;
wire [31:0] br_target;
assign {<!-- -->br_taken, br_target} = br_bus;

wire [31:0] fs_inst;
reg [31:0] fs_pc;
assign fs_to_ds_bus = {<!-- -->fs_inst ,
                       fs_pc };

// pre-IF stage
assign to_fs_valid = ~reset;
// because after sending fs_pc to ds, the seq_pc = fs_pc + 4 immediately
// Actually, the seq_pc is just a delay slot instruction
// if we use inst pc, here need to -4, it's more troublesome
assign seq_pc = fs_pc + 3'h4;
assign nextpc = br_taken ? br_target : seq_pc;

// IF stage
assign fs_ready_go = 1'b1; // ready to send
assign fs_allowin = !fs_valid || fs_ready_go & amp; & amp; ds_allowin; // Can receive data (not blocking
assign fs_to_ds_valid = fs_valid & amp; & amp; fs_ready_go;
always @(posedge clk) begin
    if (reset) begin
        fs_valid <= 1'b0;
    end
    else if (fs_allowin) begin
        fs_valid <= to_fs_valid; // Data is valid
    end
end

always @(posedge clk) begin
    if (reset) begin
        fs_pc <= 32'h1bffffffc; //trick: to make nextpc be 0x1c000000 during reset
    end
    else if (to_fs_valid & amp; & amp; fs_allowin) begin
        fs_pc <= nextpc;
    end
end

assign inst_sram_en = to_fs_valid & amp; & amp; fs_allowin;
assign inst_sram_we = 4'h0;
assign inst_sram_addr = nextpc;
assign inst_sram_wdata = 32'b0;

assign fs_inst = inst_sram_rdata;

endmodule

ID stage

`include "mycpu.h"

module id_stage(
    inputclk,
    input reset,
    //allowin
    input es_allowin ,
    output ds_allowin ,
    //from fs
    input fs_to_ds_valid,
    input [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus ,
    //toes
    output ds_to_es_valid,
    output [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus ,
    //to fs
    output [`BR_BUS_WD -1:0] br_bus ,
    //to rf: for write back
    input [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus
);

wire br_taken;
wire [31:0] br_target;

wire [31:0] ds_pc;
wire [31:0] ds_inst;

reg ds_valid;
wire ds_ready_go;

wire [11:0] alu_op;

wire load_op;
wire src1_is_pc;
wire src2_is_imm;
wire res_from_mem;
wire dst_is_r1;
wire gr_we;
wire mem_we;
wire src_reg_is_rd;
wire [4: 0] dest;
wire [31:0] rj_value;
wire [31:0] rkd_value;
wire [31:0] imm;
wire [31:0] br_offs;
wire [31:0] jirl_offs;

wire [5:0] op_31_26;
wire [3:0] op_25_22;
wire [1:0] op_21_20;
wire [4:0] op_19_15;
wire[4:0]rd;
wire [4:0] rj;
wire[4:0]rk;
wire [11:0] i12;
wire [19:0] i20;
wire [15:0] i16;
wire [25:0] i26;

wire [63:0] op_31_26_d;
wire [15:0] op_25_22_d;
wire [3:0] op_21_20_d;
wire [31:0] op_19_15_d;

wire inst_add_w;
wire inst_sub_w;
wire inst_slt;
wire inst_sltu;
wire inst_nor;
wire inst_and;
wire inst_or;
wire inst_xor;
wire inst_slli_w;
wire inst_srli_w;
wire inst_srai_w;
wire inst_addi_w;
wire inst_ld_w;
wire inst_st_w;
wire inst_jirl;
wire inst_b;
wire inst_bl;
wire inst_beq;
wire inst_bne;
wire inst_lu12i_w;

wire need_ui5;
wire need_si12;
wire need_si16;
wire need_si20;
wire need_si26;
wire src2_is_4;

wire [4:0] rf_raddr1;
wire [31:0] rf_rdata1;
wire [4:0] rf_raddr2;
wire [31:0] rf_rdata2;

wire rf_we;
wire [4:0] rf_waddr;
wire [31:0] rf_wdata;

wire [31:0] alu_src1;
wire [31:0] alu_src2;
wire [31:0] alu_result;

wire [31:0] mem_result;
wire [31:0] final_result;


assign op_31_26 = ds_inst[31:26];
assign op_25_22 = ds_inst[25:22];
assign op_21_20 = ds_inst[21:20];
assign op_19_15 = ds_inst[19:15];

assign rd = ds_inst[4: 0];
assign rj = ds_inst[9: 5];
assign rk = ds_inst[14:10];

assign i12 = ds_inst[21:10];
assign i20 = ds_inst[24: 5];
assign i16 = ds_inst[25:10];
assign i26 = {<!-- -->ds_inst[9: 0], ds_inst[25:10]};

decoder_6_64 u_dec0(.in(op_31_26 ), .out(op_31_26_d ));
decoder_4_16 u_dec1(.in(op_25_22 ), .out(op_25_22_d ));
decoder_2_4 u_dec2(.in(op_21_20 ), .out(op_21_20_d ));
decoder_5_32 u_dec3(.in(op_19_15 ), .out(op_19_15_d ));

assign inst_add_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h00];
assign inst_sub_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h02];
assign inst_slt = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h04];
assign inst_sltu = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h05];
assign inst_nor = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h08];
assign inst_and = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h09];
assign inst_or = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h0a];
assign inst_xor = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h0b];
assign inst_slli_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h01];
assign inst_srli_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h09];
assign inst_srai_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h11];
assign inst_addi_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'ha];
assign inst_ld_w = op_31_26_d[6'h0a] & amp; op_25_22_d[4'h2];
assign inst_st_w = op_31_26_d[6'h0a] & amp; op_25_22_d[4'h6];
assign inst_jirl = op_31_26_d[6'h13];
assign inst_b = op_31_26_d[6'h14];
assign inst_bl = op_31_26_d[6'h15];
assign inst_beq = op_31_26_d[6'h16];
assign inst_bne = op_31_26_d[6'h17];
assign inst_lu12i_w= op_31_26_d[6'h05] & amp; ~ds_inst[25];

assign alu_op[ 0] = inst_add_w | inst_addi_w | inst_ld_w | inst_st_w
                    | inst_jirl | inst_bl;
assign alu_op[ 1] = inst_sub_w;
assign alu_op[2] = inst_slt;
assign alu_op[3] = inst_sltu;
assign alu_op[4] = inst_and;
assign alu_op[5] = inst_nor;
assign alu_op[ 6] = inst_or;
assign alu_op[7] = inst_xor;
assign alu_op[8] = inst_slli_w;
assign alu_op[9] = inst_srli_w;
assign alu_op[10] = inst_srai_w;
assign alu_op[11] = inst_lu12i_w;

assign need_ui5 = inst_slli_w | inst_srli_w | inst_srai_w;
assign need_si12 = inst_addi_w | inst_ld_w | inst_st_w;
assign need_si16 = inst_jirl | inst_beq | inst_bne;
assign need_si20 = inst_lu12i_w;
assign need_si26 = inst_b | inst_bl;
assign src2_is_4 = inst_jirl | inst_bl;

assign imm = src2_is_4 ? 32'h4 :
             need_si20 ? {<!-- -->i20[19:0], 12'b0} :
             need_ui5?rk:
            /*need_si12*/{<!-- -->{<!-- -->20{<!-- -->i12[11]}}, i12[11:0]};

assign br_offs = need_si26 ? {<!-- -->{<!-- --> 4{<!-- -->i26[25]}}, i26[25:0], 2'b0} :
                              {<!-- -->{<!-- -->14{<!-- -->i16[15]}}, i16[15:0], 2'b0};

assign jirl_offs = {<!-- -->{<!-- -->14{<!-- -->i16[15]}}, i16[15:0], 2'b0};

assign src_reg_is_rd = inst_beq | inst_bne | inst_st_w;

assign src1_is_pc = inst_jirl | inst_bl;

assign src2_is_imm = inst_slli_w |
                       inst_srli_w |
                       inst_srai_w |
                       inst_addi_w |
                       inst_ld_w |
                       inst_st_w |
                       inst_lu12i_w|
                       inst_jirl |
                       inst_bl;

assign res_from_mem = inst_ld_w;
assign dst_is_r1 = inst_bl;
assign gr_we = ~inst_st_w & amp; ~inst_beq & amp; ~inst_bne & amp; ~inst_b;
assign mem_we = inst_st_w;
assign dest = dst_is_r1 ? 5'd1 : rd;

assign rf_raddr1 = rj;
assign rf_raddr2 = src_reg_is_rd ? rd :rk;
regfile u_regfile(
    .clk (clk),
    .raddr1 (rf_raddr1),
    .rdata1 (rf_rdata1),
    .raddr2 (rf_raddr2),
    .rdata2 (rf_rdata2),
    .we (rf_we ),
    .waddr (rf_waddr),
    .wdata (rf_wdata)
    );

assign rj_value = rf_rdata1;
assign rkd_value = rf_rdata2;

assign rj_eq_rd = (rj_value == rkd_value);
assign br_taken = ( inst_beq & amp; & amp; rj_eq_rd
                   || inst_bne & amp; & amp; !rj_eq_rd
                   || inst_jirl
                   || inst_bl
                   || inst_b
                ) & amp; & amp; ds_valid;
assign br_target = (inst_beq || inst_bne || inst_bl || inst_b) ? (ds_pc + br_offs) :
                                                   /*inst_jirl*/ (rj_value + jirl_offs);

assign br_bus = {<!-- -->br_taken, br_target};

reg [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus_r;

assign {<!-- -->ds_inst,
        ds_pc } = fs_to_ds_bus_r;

assign {<!-- -->rf_we , //37:37
        rf_waddr, //36:32
        rf_wdata //31:0
       } = ws_to_rf_bus;

assign ds_to_es_bus = {<!-- -->alu_op , // 12
                       load_op , // 1
                       src1_is_pc , // 1
                       src2_is_imm , // 1
                       src2_is_4 , // 1
                       gr_we , // 1
                       mem_we , // 1
                       dest , // 5
                       imm , // 32
                       rj_value , // 32
                       rkd_value , // 32
                       ds_pc , // 32
                       res_from_mem
                    };

assign ds_ready_go = 1'b1;
assign ds_allowin = !ds_valid || ds_ready_go & amp; & amp; es_allowin;
assign ds_to_es_valid = ds_valid & amp; & amp; ds_ready_go;
always @(posedge clk) begin
    if (reset) begin
        ds_valid <= 1'b0;
    end
    else if (ds_allowin) begin
        ds_valid <= fs_to_ds_valid;
    end

    if (fs_to_ds_valid & amp; & amp; ds_allowin) begin
        fs_to_ds_bus_r <= fs_to_ds_bus;
    end
end


endmodule

EXE stage

`include "mycpu.h"

module exe_stage(
    inputclk,
    input reset,
    //allowin
    input ms_allowin,
    output es_allowin ,
    //from ds
    input ds_to_es_valid,
    input [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus ,
    //to ms
    output es_to_ms_valid,
    output [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus ,

    //data sram interface(write)
    output data_sram_en ,
    output [3:0] data_sram_we ,
    output [31:0] data_sram_addr ,
    output [31:0] data_sram_wdata
);

reg es_valid;
wire es_ready_go;

reg [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus_r;

wire[11:0] alu_op;
wire es_load_op;
wire src1_is_pc;
wire src2_is_imm;
wire src2_is_4;
wire res_from_mem;
wire dst_is_r1;
wire gr_we;
wire es_mem_we;
wire [4: 0] dest;
wire [31:0] rj_value;
wire [31:0] rkd_value;
wire [31:0] imm;
wire [31:0] es_pc;


assign {<!-- -->alu_op,
        es_load_op,
        src1_is_pc,
        src2_is_imm,
        src2_is_4,
        gr_we,
        es_mem_we,
        dest,
        imm,
        rj_value,
        rkd_value,
        es_pc,
        res_from_mem
       } = ds_to_es_bus_r;

wire [31:0] alu_src1;
wire [31:0] alu_src2;
wire [31:0] alu_result;


// did't use in lab7
wire es_res_from_mem;
assign es_res_from_mem = es_load_op;



assign es_to_ms_bus = {<!-- -->res_from_mem, //70:70 1
                       gr_we , //69:69 1
                       dest , //68:64 5
                       alu_result , //63:32 32
                       es_pc //31:0 32
                      };

assign es_ready_go = 1'b1;
assign es_allowin = !es_valid || es_ready_go & amp; & amp; ms_allowin;
assign es_to_ms_valid = es_valid & amp; & amp; es_ready_go;
always @(posedge clk) begin
    if (reset) begin
        es_valid <= 1'b0;
    end
    else if (es_allowin) begin
        es_valid <= ds_to_es_valid;
    end

    if (ds_to_es_valid & amp; & amp; es_allowin) begin
        ds_to_es_bus_r <= ds_to_es_bus;
    end
end

assign alu_src1 = src1_is_pc ? es_pc : rj_value;
assign alu_src2 = src2_is_imm ? imm : rkd_value;

alu u_alu(
    .alu_op (alu_op ),
    .alu_src1 (alu_src1 ),
    .alu_src2 (alu_src2 ),
    .alu_result (alu_result)
    );

assign data_sram_en = 1'b1;
assign data_sram_we = es_mem_we & amp; & amp; es_valid ? 4'hf : 4'h0;
assign data_sram_addr = alu_result;
assign data_sram_wdata = rkd_value;


endmodule

alu

module alu(
  input wire [11:0] alu_op,
  input wire [31:0] alu_src1,
  input wire [31:0] alu_src2,
  output wire [31:0] alu_result
);

wire op_add; //add operation
wire op_sub; //sub operation
wire op_slt; //signed compared and set less than
wire op_sltu; //unsigned compared and set less than
wire op_and; //bitwise and
wire op_nor; //bitwise nor
wire op_or; //bitwise or
wire op_xor; //bitwise xor
wire op_sll; //logic left shift
wire op_srl; //logic right shift
wire op_sra; //arithmetic right shift
wire op_lui; //Load Upper Immediate

// control code decomposition
assign op_add = alu_op[ 0];
assign op_sub = alu_op[ 1];
assign op_slt = alu_op[ 2];
assign op_sltu = alu_op[ 3];
assign op_and = alu_op[ 4];
assign op_nor = alu_op[ 5];
assign op_or = alu_op[ 6];
assign op_xor = alu_op[ 7];
assign op_sll = alu_op[ 8];
assign op_srl = alu_op[ 9];
assign op_sra = alu_op[10];
assign op_lui = alu_op[11];

wire [31:0] add_sub_result;
wire [31:0] slt_result;
wire [31:0] sltu_result;
wire [31:0] and_result;
wire [31:0] nor_result;
wire [31:0] or_result;
wire [31:0] xor_result;
wire [31:0] lui_result;
wire [31:0] sll_result;
wire [63:0] sr64_result;
wire [31:0] sr_result;


// 32-bit adder
wire [31:0] adder_a;
wire [31:0] adder_b;
wire adder_cin;
wire [31:0] adder_result;
wire adder_cout;

assign adder_a = alu_src1;
assign adder_b = (op_sub | op_slt | op_sltu) ? ~alu_src2 : alu_src2; //src1 - src2 rj-rk
assign adder_cin = (op_sub | op_slt | op_sltu) ? 1'b1 : 1'b0;
assign {<!-- -->adder_cout, adder_result} = adder_a + adder_b + adder_cin;

// ADD, SUB result
assign add_sub_result = adder_result;

// SLT result
assign slt_result[31:1] = 31'b0; //rj < rk 1
assign slt_result[0] = (alu_src1[31] & amp; ~alu_src2[31])
                        | ((alu_src1[31] ~^ alu_src2[31]) & amp; adder_result[31]);

//SLTU result
assign sltu_result[31:1] = 31'b0;
assign sltu_result[0] = ~adder_cout;

//bitwise operation
assign and_result = alu_src1 & alu_src2;
assign or_result = alu_src1 | alu_src2;
assign nor_result = ~or_result;
assign xor_result = alu_src1 ^ alu_src2;
assign lui_result = alu_src2;

// SLL result
assign sll_result = alu_src1 << alu_src2[4:0]; //rj << ui5

// SRL, SRA result
assign sr64_result = {<!-- -->{<!-- -->32{<!-- -->op_sra & alu_src1[31]}}, alu_src1[31:0]} >> alu_src2[ 4:0]; //rj >> i5

assign sr_result = sr64_result[31:0];

// final result mux
assign alu_result = ({<!-- -->32{<!-- -->op_add|op_sub}} & amp; add_sub_result)
                  | ({<!-- -->32{<!-- -->op_slt }} & amp; slt_result)
                  | ({<!-- -->32{<!-- -->op_sltu }} & amp; sltu_result)
                  | ({<!-- -->32{<!-- -->op_and }} & amp; and_result)
                  | ({<!-- -->32{<!-- -->op_nor }} & amp; nor_result)
                  | ({<!-- -->32{<!-- -->op_or }} & amp; or_result)
                  | ({<!-- -->32{<!-- -->op_xor }} & amp; xor_result)
                  | ({<!-- -->32{<!-- -->op_lui }} & amp; lui_result)
                  | ({<!-- -->32{<!-- -->op_sll }} & amp; sll_result)
                  | ({<!-- -->32{<!-- -->op_srl|op_sra}} & amp; sr_result);

endmodule

MEM stage

`include "mycpu.h"

module mem_stage(
    inputclk,
    input reset,
    //allowin
    input ws_allowin,
    output ms_allowin ,
    //from es
    input es_to_ms_valid,
    input [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus ,
    //to ws
    output ms_to_ws_valid,
    output [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus ,
    
    //from data-sram
    input [31:0] data_sram_rdata
);

reg ms_valid;
wire ms_ready_go;

reg [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus_r;
wire ms_res_from_mem;
wire ms_gr_we;
wire [4:0] ms_dest;
wire [31:0] ms_alu_result;
wire [31:0] ms_pc;

wire [31:0] mem_result;
wire [31:0] ms_final_result;


assign {<!-- -->ms_res_from_mem, //70:70
        ms_gr_we , //69:69
        ms_dest , //68:64
        ms_alu_result , //63:32
        ms_pc //31:0
       } = es_to_ms_bus_r;

assign ms_to_ws_bus = {<!-- -->ms_gr_we , //69:69
                       ms_dest , //68:64
                       ms_final_result, //63:32
                       ms_pc //31:0
                      };

assign ms_ready_go = 1'b1;
assign ms_allowin = !ms_valid || ms_ready_go & amp; & amp; ws_allowin;
assign ms_to_ws_valid = ms_valid & amp; & amp; ms_ready_go;
always @(posedge clk) begin
    if (reset) begin
        ms_valid <= 1'b0;
    end
    else if (ms_allowin) begin
        ms_valid <= es_to_ms_valid;
    end

    if (es_to_ms_valid & amp; & amp; ms_allowin) begin
        es_to_ms_bus_r = es_to_ms_bus;
    end
end

assign mem_result = data_sram_rdata;
assign ms_final_result = ms_res_from_mem ? mem_result : ms_alu_result;

endmodule

WB stage

`include "mycpu.h"

module wb_stage(
    inputclk,
    input reset,
    //allowin
    output ws_allowin ,
    //from ms
    input ms_to_ws_valid,
    input [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus ,
    //to rf: for write back
    output [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus ,
    //trace debug interface
    output [31:0] debug_wb_pc ,
    output [3:0] debug_wb_rf_we ,
    output [4:0] debug_wb_rf_wnum,
    output [31:0] debug_wb_rf_wdata
);

reg ws_valid;
wire ws_ready_go;

reg [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus_r;
wire ws_gr_we;
wire [4:0] ws_dest;
wire [31:0] ws_final_result;
wire [31:0] ws_pc;
assign {<!-- -->ws_gr_we , //69:69
        ws_dest , //68:64
        ws_final_result, //63:32
        ws_pc //31:0
       } = ms_to_ws_bus_r;

wire rf_we;
wire [4 :0] rf_waddr;
wire [31:0] rf_wdata;
assign ws_to_rf_bus = {<!-- -->rf_we , //37:37
                       rf_waddr, //36:32
                       rf_wdata //31:0
                      };

assign ws_ready_go = 1'b1;
assign ws_allowin = !ws_valid || ws_ready_go;
always @(posedge clk) begin
    if (reset) begin
        ws_valid <= 1'b0;
    end
    else if (ws_allowin) begin
        ws_valid <= ms_to_ws_valid;
    end

    if (ms_to_ws_valid & amp; & amp; ws_allowin) begin
        ms_to_ws_bus_r <= ms_to_ws_bus;
    end
end

assign rf_we = ws_gr_we & amp; & amp; ws_valid;
assign rf_waddr = ws_dest;
assign rf_wdata = ws_final_result;

// debug info generate
assign debug_wb_pc = ws_pc;
assign debug_wb_rf_we = {<!-- -->4{<!-- -->rf_we}};
assign debug_wb_rf_wnum = ws_dest;
assign debug_wb_rf_wdata = ws_final_result;

endmodule

mycpu.h

`ifndef MYCPU_H
    `define MYCPU_H

    `define BR_BUS_WD 33
    `define FS_TO_DS_BUS_WD 64
    `define DS_TO_ES_BUS_WD 152
    `define ES_TO_MS_BUS_WD 71
    `define MS_TO_WS_BUS_WD 70
    `define WS_TO_RF_BUS_WD 38
`endif

mycpu_top

`include "mycpu.h"

module mycpu_top(
    input clk,
    input resetn,
    // inst sram interface
    output inst_sram_en,
    output [3:0] inst_sram_we,
    output [31:0] inst_sram_addr,
    output [31:0] inst_sram_wdata,
    input [31:0] inst_sram_rdata,
    // data sram interface
    output data_sram_en,
    output [3:0] data_sram_we,
    output [31:0] data_sram_addr,
    output [31:0] data_sram_wdata,
    input [31:0] data_sram_rdata,
    // trace debug interface
    output [31:0] debug_wb_pc,
    output [3:0] debug_wb_rf_we,
    output [4:0] debug_wb_rf_wnum,
    output [31:0] debug_wb_rf_wdata
);
reg reset;
always @(posedge clk) reset <= ~resetn;

wire ds_allowin;
wires_allowin;
wire ms_allowin;
wire ws_allowin;
wire fs_to_ds_valid;
wire ds_to_es_valid;
wire es_to_ms_valid;
wire ms_to_ws_valid;
wire [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus;
wire [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus;
wire [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus;
wire [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus;
wire [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus;
wire [`BR_BUS_WD -1:0] br_bus;

// IF stage
if_stage if_stage(
    .clk (clk),
    .reset (reset),
    //allowin
    .ds_allowin (ds_allowin ),
    //brbus
    .br_bus (br_bus ),
    //outputs
    .fs_to_ds_valid (fs_to_ds_valid ),
    .fs_to_ds_bus (fs_to_ds_bus ),
    // inst sram interface
    .inst_sram_en (inst_sram_en ),
    .inst_sram_we (inst_sram_we ),
    .inst_sram_addr (inst_sram_addr ),
    .inst_sram_wdata(inst_sram_wdata),
    .inst_sram_rdata(inst_sram_rdata)
);
//ID stage
id_stage id_stage(
    .clk (clk),
    .reset (reset),
    //allowin
    .es_allowin (es_allowin ),
    .ds_allowin (ds_allowin ),
    //from fs
    .fs_to_ds_valid (fs_to_ds_valid ),
    .fs_to_ds_bus (fs_to_ds_bus ),
    //toes
    .ds_to_es_valid (ds_to_es_valid ),
    .ds_to_es_bus (ds_to_es_bus ),
    //to fs
    .br_bus (br_bus ),
    //to rf: for write back
    .ws_to_rf_bus (ws_to_rf_bus)
);
// EXE stage
exe_stage exe_stage(
    .clk (clk),
    .reset (reset),
    //allowin
    .ms_allowin (ms_allowin ),
    .es_allowin (es_allowin ),
    //from ds
    .ds_to_es_valid (ds_to_es_valid ),
    .ds_to_es_bus (ds_to_es_bus ),
    //to ms
    .es_to_ms_valid (es_to_ms_valid ),
    .es_to_ms_bus (es_to_ms_bus ),
    // data sram interface
    .data_sram_en (data_sram_en ),
    .data_sram_we (data_sram_we ),
    .data_sram_addr (data_sram_addr ),
    .data_sram_wdata(data_sram_wdata)
);
// MEM stage
mem_stage mem_stage(
    .clk (clk),
    .reset (reset),
    //allowin
    .ws_allowin (ws_allowin),
    .ms_allowin (ms_allowin ),
    //from es
    .es_to_ms_valid (es_to_ms_valid ),
    .es_to_ms_bus (es_to_ms_bus ),
    //to ws
    .ms_to_ws_valid (ms_to_ws_valid ),
    .ms_to_ws_bus (ms_to_ws_bus ),
    //from data-sram
    .data_sram_rdata(data_sram_rdata)
);
// WB stage
wb_stage wb_stage(
    .clk (clk),
    .reset (reset),
    //allowin
    .ws_allowin (ws_allowin),
    //from ms
    .ms_to_ws_valid (ms_to_ws_valid ),
    .ms_to_ws_bus (ms_to_ws_bus ),
    //to rf: for write back
    .ws_to_rf_bus (ws_to_rf_bus ),
    //trace debug interface
    .debug_wb_pc (debug_wb_pc ),
    .debug_wb_rf_we (debug_wb_rf_we ),
    .debug_wb_rf_wnum (debug_wb_rf_wnum ),
    .debug_wb_rf_wdata(debug_wb_rf_wdata)
);

endmodule

soc_lite_top

Unmodified

TestBench

Test passed

Reference materials

[1] CPU Design Practice (Wang Wenxiang) Chapter 4
[2] LoongArch CPU Design Experiment_Practical Task 7
[3] CDP_EDE_local
[4] Loongson Architecture 32-bit Lite Reference Manual
[5] exp experimental release package download address