Directory
- LoongArch CPU Design Experiment_Practice Task 7: Simple pipeline CPU without considering related conflict handling
-
- statement
- Experimental requirements
- Pipeline design ideas (no conflicts)
-
- Add cache between pipeline stages
-
- Pipeline division
- pipeline cache
- Combinational logic and sequential logic (single-cycle pipeline)
- Sync RAM
- PC data path
-
- ?Jump instruction
- PC register reset
- Handshake signals between pipeline stages
- Code
-
- IF stage
- ID stage
- EXE stage
-
- alu
- MEM stage
- WB stage
- mycpu.h
- mycpu_top
- soc_lite_top
- TestBench
- References
LoongArch CPU design experiment_Practical task 7: Simple pipeline CPU without considering related conflict handling
Statement
Due to the failure to successfully configure the Loongarch cross-compilation tool, this experiment used the exp compressed package (see the reference materials at the end of the article for the compressed package download link).
Experimental requirements
This practical task requires completing the following work based on the single-cycle CPU implemented in practical task 6:
- Adjust the CPU top-level interface and add the chip select signal inst_sram_ce of the instruction RAM and the chip select signal data_sram_ce of the data RAM.
- Adjust the CPU top-level interface and adjust both inst_sram_we and data_sram_we from 1-bit write enable to 4-bit byte write enable.
- Design a single-issue five-level pipeline CPU that does not consider correlation-induced conflicts.
- Running the func corresponding to exp7 requires successful simulation and board verification.
- It should be noted that in the code implementation of this article, the en signal is the ce signal.
Pipeline design ideas (no conflicts)
Add caching between pipeline stages
Assembly line division
The five-level pipeline is fetching (IF), decoding (ID), executing (EXE), memory access (MEM), and writing back (WB).
- Instruction fetching phase: PC update, jump instruction PC update; use next_pc to retrieve the next round of instructions from inst_ram.
- Decoding stage: parsing instructions and generating various control signals. Read the general register file to generate the source operand, and write the data in the WB stage to the general register file. Process jump instructions, generate jump signals and jump target addresses, and forward jump information.
- Execution phase: Select source operands and perform various arithmetic logic operations. For the store instruction, write the data back to data_ram. Issue a request to read data_ram.
- Memory access phase: Get the result of the EXE phase read data_ram request. Choose whether the final result is the calculation result of alu or the result of reading data_ram.
- Write-back stage: Generate write-back signals (enable, address, write-back data), and pass the write-back signal to the ID stage to complete the write-back register.
Pipeline cache
Set triggers as pipeline caches between these five stages. We mark the triggers between IF and ID stages as ID reg
. Other triggers are similar.
In order to confirm whether the content in the pipeline cache is valid, the valid bit valid
needs to be set for identification.
The pipeline cache includes control signals and data content, that is, everything in the backward pass signal line.
Combinational logic and sequential logic (single-cycle pipeline)
Within the pipeline stages is combinational logic.
The pipeline cache contains sequential logic.
Sync RAM
Synchronous RAM issues a read enable and read address in the first clock cycle, and returns the read result in the second clock cycle.
inst_ram
: Use next_pc
to request the inst of the next instruction in the IF phase of this round.
data_ram
: Request to read data_ram
in the EXE stage, and obtain the read data in the MEM stage.
The timing relationship between PC, target, and inst is shown in the figure below (MIPS figure).
PC data path
?Jump command
Calculate whether to jump and the target address of the jump in the ID stage, and forward the data to the IF stage.
ds_pc
is used when calculating the jump address, because the jump destination address in Loongarch is calculated using the pc
of the jump instruction itself, not the delay slot instruction. pc
.
It should be noted that there are no delay slot instructions in Loongarch.
PC register reset
When the reset signal is valid, the value of PC is reset to 32'h1bfffffc
, because the value is obtained using next_pc
. When the reset signal is valid, next_pc
code> is 32'h1c000000
.
Handshake signals between pipeline stages
fs_ready_go
: Indicates that this stage is ready to send data, and the high level is valid.
Subsequent blocking is implemented through theready
signal.fs_allowin
: Indicates whether data is allowed to be entered into this stage.
This stage can only receive data when the data at this stage is invalid, or when this stage can send data and the next stage can receive data.fs_to_ds_valid
: Indicates whether the data from the fs module to the ds module is valid.
Data sent to ds is only valid when the data at this stage is valid and ready to be sent.fs_valid
: Indicates whether the data in the fs module is valid.
// IF stage assign fs_ready_go = 1'b1; assign fs_allowin = !fs_valid || fs_ready_go & amp; & amp; ds_allowin; assign fs_to_ds_valid = fs_valid & amp; & amp; fs_ready_go; always @(posedge clk) begin if (reset) begin fs_valid <= 1'b0; end else if (fs_allowin) begin fs_valid <= to_fs_valid; end end
Code implementation
IF stage
`include "mycpu.h" module if_stage( inputclk, input reset, //allwoin input ds_allowin , //brbus input [`BR_BUS_WD -1:0] br_bus , //tods output fs_to_ds_valid , output [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus , // inst sram interface output inst_sram_en , output [3:0] inst_sram_we , output [31:0] inst_sram_addr , output [31:0] inst_sram_wdata, input [31:0] inst_sram_rdata ); reg fs_valid; wire fs_ready_go; wire fs_allowin; wire to_fs_valid; wire [31:0] seq_pc; wire [31:0] nextpc; wire br_taken; wire [31:0] br_target; assign {<!-- -->br_taken, br_target} = br_bus; wire [31:0] fs_inst; reg [31:0] fs_pc; assign fs_to_ds_bus = {<!-- -->fs_inst , fs_pc }; // pre-IF stage assign to_fs_valid = ~reset; // because after sending fs_pc to ds, the seq_pc = fs_pc + 4 immediately // Actually, the seq_pc is just a delay slot instruction // if we use inst pc, here need to -4, it's more troublesome assign seq_pc = fs_pc + 3'h4; assign nextpc = br_taken ? br_target : seq_pc; // IF stage assign fs_ready_go = 1'b1; // ready to send assign fs_allowin = !fs_valid || fs_ready_go & amp; & amp; ds_allowin; // Can receive data (not blocking assign fs_to_ds_valid = fs_valid & amp; & amp; fs_ready_go; always @(posedge clk) begin if (reset) begin fs_valid <= 1'b0; end else if (fs_allowin) begin fs_valid <= to_fs_valid; // Data is valid end end always @(posedge clk) begin if (reset) begin fs_pc <= 32'h1bffffffc; //trick: to make nextpc be 0x1c000000 during reset end else if (to_fs_valid & amp; & amp; fs_allowin) begin fs_pc <= nextpc; end end assign inst_sram_en = to_fs_valid & amp; & amp; fs_allowin; assign inst_sram_we = 4'h0; assign inst_sram_addr = nextpc; assign inst_sram_wdata = 32'b0; assign fs_inst = inst_sram_rdata; endmodule
ID stage
`include "mycpu.h" module id_stage( inputclk, input reset, //allowin input es_allowin , output ds_allowin , //from fs input fs_to_ds_valid, input [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus , //toes output ds_to_es_valid, output [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus , //to fs output [`BR_BUS_WD -1:0] br_bus , //to rf: for write back input [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus ); wire br_taken; wire [31:0] br_target; wire [31:0] ds_pc; wire [31:0] ds_inst; reg ds_valid; wire ds_ready_go; wire [11:0] alu_op; wire load_op; wire src1_is_pc; wire src2_is_imm; wire res_from_mem; wire dst_is_r1; wire gr_we; wire mem_we; wire src_reg_is_rd; wire [4: 0] dest; wire [31:0] rj_value; wire [31:0] rkd_value; wire [31:0] imm; wire [31:0] br_offs; wire [31:0] jirl_offs; wire [5:0] op_31_26; wire [3:0] op_25_22; wire [1:0] op_21_20; wire [4:0] op_19_15; wire[4:0]rd; wire [4:0] rj; wire[4:0]rk; wire [11:0] i12; wire [19:0] i20; wire [15:0] i16; wire [25:0] i26; wire [63:0] op_31_26_d; wire [15:0] op_25_22_d; wire [3:0] op_21_20_d; wire [31:0] op_19_15_d; wire inst_add_w; wire inst_sub_w; wire inst_slt; wire inst_sltu; wire inst_nor; wire inst_and; wire inst_or; wire inst_xor; wire inst_slli_w; wire inst_srli_w; wire inst_srai_w; wire inst_addi_w; wire inst_ld_w; wire inst_st_w; wire inst_jirl; wire inst_b; wire inst_bl; wire inst_beq; wire inst_bne; wire inst_lu12i_w; wire need_ui5; wire need_si12; wire need_si16; wire need_si20; wire need_si26; wire src2_is_4; wire [4:0] rf_raddr1; wire [31:0] rf_rdata1; wire [4:0] rf_raddr2; wire [31:0] rf_rdata2; wire rf_we; wire [4:0] rf_waddr; wire [31:0] rf_wdata; wire [31:0] alu_src1; wire [31:0] alu_src2; wire [31:0] alu_result; wire [31:0] mem_result; wire [31:0] final_result; assign op_31_26 = ds_inst[31:26]; assign op_25_22 = ds_inst[25:22]; assign op_21_20 = ds_inst[21:20]; assign op_19_15 = ds_inst[19:15]; assign rd = ds_inst[4: 0]; assign rj = ds_inst[9: 5]; assign rk = ds_inst[14:10]; assign i12 = ds_inst[21:10]; assign i20 = ds_inst[24: 5]; assign i16 = ds_inst[25:10]; assign i26 = {<!-- -->ds_inst[9: 0], ds_inst[25:10]}; decoder_6_64 u_dec0(.in(op_31_26 ), .out(op_31_26_d )); decoder_4_16 u_dec1(.in(op_25_22 ), .out(op_25_22_d )); decoder_2_4 u_dec2(.in(op_21_20 ), .out(op_21_20_d )); decoder_5_32 u_dec3(.in(op_19_15 ), .out(op_19_15_d )); assign inst_add_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h00]; assign inst_sub_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h02]; assign inst_slt = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h04]; assign inst_sltu = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h05]; assign inst_nor = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h08]; assign inst_and = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h09]; assign inst_or = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h0a]; assign inst_xor = op_31_26_d[6'h00] & amp; op_25_22_d[4'h0] & amp; op_21_20_d[2'h1] & amp; op_19_15_d[5'h0b]; assign inst_slli_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h01]; assign inst_srli_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h09]; assign inst_srai_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'h1] & amp; op_21_20_d[2'h0] & amp; op_19_15_d[5'h11]; assign inst_addi_w = op_31_26_d[6'h00] & amp; op_25_22_d[4'ha]; assign inst_ld_w = op_31_26_d[6'h0a] & amp; op_25_22_d[4'h2]; assign inst_st_w = op_31_26_d[6'h0a] & amp; op_25_22_d[4'h6]; assign inst_jirl = op_31_26_d[6'h13]; assign inst_b = op_31_26_d[6'h14]; assign inst_bl = op_31_26_d[6'h15]; assign inst_beq = op_31_26_d[6'h16]; assign inst_bne = op_31_26_d[6'h17]; assign inst_lu12i_w= op_31_26_d[6'h05] & amp; ~ds_inst[25]; assign alu_op[ 0] = inst_add_w | inst_addi_w | inst_ld_w | inst_st_w | inst_jirl | inst_bl; assign alu_op[ 1] = inst_sub_w; assign alu_op[2] = inst_slt; assign alu_op[3] = inst_sltu; assign alu_op[4] = inst_and; assign alu_op[5] = inst_nor; assign alu_op[ 6] = inst_or; assign alu_op[7] = inst_xor; assign alu_op[8] = inst_slli_w; assign alu_op[9] = inst_srli_w; assign alu_op[10] = inst_srai_w; assign alu_op[11] = inst_lu12i_w; assign need_ui5 = inst_slli_w | inst_srli_w | inst_srai_w; assign need_si12 = inst_addi_w | inst_ld_w | inst_st_w; assign need_si16 = inst_jirl | inst_beq | inst_bne; assign need_si20 = inst_lu12i_w; assign need_si26 = inst_b | inst_bl; assign src2_is_4 = inst_jirl | inst_bl; assign imm = src2_is_4 ? 32'h4 : need_si20 ? {<!-- -->i20[19:0], 12'b0} : need_ui5?rk: /*need_si12*/{<!-- -->{<!-- -->20{<!-- -->i12[11]}}, i12[11:0]}; assign br_offs = need_si26 ? {<!-- -->{<!-- --> 4{<!-- -->i26[25]}}, i26[25:0], 2'b0} : {<!-- -->{<!-- -->14{<!-- -->i16[15]}}, i16[15:0], 2'b0}; assign jirl_offs = {<!-- -->{<!-- -->14{<!-- -->i16[15]}}, i16[15:0], 2'b0}; assign src_reg_is_rd = inst_beq | inst_bne | inst_st_w; assign src1_is_pc = inst_jirl | inst_bl; assign src2_is_imm = inst_slli_w | inst_srli_w | inst_srai_w | inst_addi_w | inst_ld_w | inst_st_w | inst_lu12i_w| inst_jirl | inst_bl; assign res_from_mem = inst_ld_w; assign dst_is_r1 = inst_bl; assign gr_we = ~inst_st_w & amp; ~inst_beq & amp; ~inst_bne & amp; ~inst_b; assign mem_we = inst_st_w; assign dest = dst_is_r1 ? 5'd1 : rd; assign rf_raddr1 = rj; assign rf_raddr2 = src_reg_is_rd ? rd :rk; regfile u_regfile( .clk (clk), .raddr1 (rf_raddr1), .rdata1 (rf_rdata1), .raddr2 (rf_raddr2), .rdata2 (rf_rdata2), .we (rf_we ), .waddr (rf_waddr), .wdata (rf_wdata) ); assign rj_value = rf_rdata1; assign rkd_value = rf_rdata2; assign rj_eq_rd = (rj_value == rkd_value); assign br_taken = ( inst_beq & amp; & amp; rj_eq_rd || inst_bne & amp; & amp; !rj_eq_rd || inst_jirl || inst_bl || inst_b ) & amp; & amp; ds_valid; assign br_target = (inst_beq || inst_bne || inst_bl || inst_b) ? (ds_pc + br_offs) : /*inst_jirl*/ (rj_value + jirl_offs); assign br_bus = {<!-- -->br_taken, br_target}; reg [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus_r; assign {<!-- -->ds_inst, ds_pc } = fs_to_ds_bus_r; assign {<!-- -->rf_we , //37:37 rf_waddr, //36:32 rf_wdata //31:0 } = ws_to_rf_bus; assign ds_to_es_bus = {<!-- -->alu_op , // 12 load_op , // 1 src1_is_pc , // 1 src2_is_imm , // 1 src2_is_4 , // 1 gr_we , // 1 mem_we , // 1 dest , // 5 imm , // 32 rj_value , // 32 rkd_value , // 32 ds_pc , // 32 res_from_mem }; assign ds_ready_go = 1'b1; assign ds_allowin = !ds_valid || ds_ready_go & amp; & amp; es_allowin; assign ds_to_es_valid = ds_valid & amp; & amp; ds_ready_go; always @(posedge clk) begin if (reset) begin ds_valid <= 1'b0; end else if (ds_allowin) begin ds_valid <= fs_to_ds_valid; end if (fs_to_ds_valid & amp; & amp; ds_allowin) begin fs_to_ds_bus_r <= fs_to_ds_bus; end end endmodule
EXE stage
`include "mycpu.h" module exe_stage( inputclk, input reset, //allowin input ms_allowin, output es_allowin , //from ds input ds_to_es_valid, input [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus , //to ms output es_to_ms_valid, output [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus , //data sram interface(write) output data_sram_en , output [3:0] data_sram_we , output [31:0] data_sram_addr , output [31:0] data_sram_wdata ); reg es_valid; wire es_ready_go; reg [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus_r; wire[11:0] alu_op; wire es_load_op; wire src1_is_pc; wire src2_is_imm; wire src2_is_4; wire res_from_mem; wire dst_is_r1; wire gr_we; wire es_mem_we; wire [4: 0] dest; wire [31:0] rj_value; wire [31:0] rkd_value; wire [31:0] imm; wire [31:0] es_pc; assign {<!-- -->alu_op, es_load_op, src1_is_pc, src2_is_imm, src2_is_4, gr_we, es_mem_we, dest, imm, rj_value, rkd_value, es_pc, res_from_mem } = ds_to_es_bus_r; wire [31:0] alu_src1; wire [31:0] alu_src2; wire [31:0] alu_result; // did't use in lab7 wire es_res_from_mem; assign es_res_from_mem = es_load_op; assign es_to_ms_bus = {<!-- -->res_from_mem, //70:70 1 gr_we , //69:69 1 dest , //68:64 5 alu_result , //63:32 32 es_pc //31:0 32 }; assign es_ready_go = 1'b1; assign es_allowin = !es_valid || es_ready_go & amp; & amp; ms_allowin; assign es_to_ms_valid = es_valid & amp; & amp; es_ready_go; always @(posedge clk) begin if (reset) begin es_valid <= 1'b0; end else if (es_allowin) begin es_valid <= ds_to_es_valid; end if (ds_to_es_valid & amp; & amp; es_allowin) begin ds_to_es_bus_r <= ds_to_es_bus; end end assign alu_src1 = src1_is_pc ? es_pc : rj_value; assign alu_src2 = src2_is_imm ? imm : rkd_value; alu u_alu( .alu_op (alu_op ), .alu_src1 (alu_src1 ), .alu_src2 (alu_src2 ), .alu_result (alu_result) ); assign data_sram_en = 1'b1; assign data_sram_we = es_mem_we & amp; & amp; es_valid ? 4'hf : 4'h0; assign data_sram_addr = alu_result; assign data_sram_wdata = rkd_value; endmodule
alu
module alu( input wire [11:0] alu_op, input wire [31:0] alu_src1, input wire [31:0] alu_src2, output wire [31:0] alu_result ); wire op_add; //add operation wire op_sub; //sub operation wire op_slt; //signed compared and set less than wire op_sltu; //unsigned compared and set less than wire op_and; //bitwise and wire op_nor; //bitwise nor wire op_or; //bitwise or wire op_xor; //bitwise xor wire op_sll; //logic left shift wire op_srl; //logic right shift wire op_sra; //arithmetic right shift wire op_lui; //Load Upper Immediate // control code decomposition assign op_add = alu_op[ 0]; assign op_sub = alu_op[ 1]; assign op_slt = alu_op[ 2]; assign op_sltu = alu_op[ 3]; assign op_and = alu_op[ 4]; assign op_nor = alu_op[ 5]; assign op_or = alu_op[ 6]; assign op_xor = alu_op[ 7]; assign op_sll = alu_op[ 8]; assign op_srl = alu_op[ 9]; assign op_sra = alu_op[10]; assign op_lui = alu_op[11]; wire [31:0] add_sub_result; wire [31:0] slt_result; wire [31:0] sltu_result; wire [31:0] and_result; wire [31:0] nor_result; wire [31:0] or_result; wire [31:0] xor_result; wire [31:0] lui_result; wire [31:0] sll_result; wire [63:0] sr64_result; wire [31:0] sr_result; // 32-bit adder wire [31:0] adder_a; wire [31:0] adder_b; wire adder_cin; wire [31:0] adder_result; wire adder_cout; assign adder_a = alu_src1; assign adder_b = (op_sub | op_slt | op_sltu) ? ~alu_src2 : alu_src2; //src1 - src2 rj-rk assign adder_cin = (op_sub | op_slt | op_sltu) ? 1'b1 : 1'b0; assign {<!-- -->adder_cout, adder_result} = adder_a + adder_b + adder_cin; // ADD, SUB result assign add_sub_result = adder_result; // SLT result assign slt_result[31:1] = 31'b0; //rj < rk 1 assign slt_result[0] = (alu_src1[31] & amp; ~alu_src2[31]) | ((alu_src1[31] ~^ alu_src2[31]) & amp; adder_result[31]); //SLTU result assign sltu_result[31:1] = 31'b0; assign sltu_result[0] = ~adder_cout; //bitwise operation assign and_result = alu_src1 & alu_src2; assign or_result = alu_src1 | alu_src2; assign nor_result = ~or_result; assign xor_result = alu_src1 ^ alu_src2; assign lui_result = alu_src2; // SLL result assign sll_result = alu_src1 << alu_src2[4:0]; //rj << ui5 // SRL, SRA result assign sr64_result = {<!-- -->{<!-- -->32{<!-- -->op_sra & alu_src1[31]}}, alu_src1[31:0]} >> alu_src2[ 4:0]; //rj >> i5 assign sr_result = sr64_result[31:0]; // final result mux assign alu_result = ({<!-- -->32{<!-- -->op_add|op_sub}} & amp; add_sub_result) | ({<!-- -->32{<!-- -->op_slt }} & amp; slt_result) | ({<!-- -->32{<!-- -->op_sltu }} & amp; sltu_result) | ({<!-- -->32{<!-- -->op_and }} & amp; and_result) | ({<!-- -->32{<!-- -->op_nor }} & amp; nor_result) | ({<!-- -->32{<!-- -->op_or }} & amp; or_result) | ({<!-- -->32{<!-- -->op_xor }} & amp; xor_result) | ({<!-- -->32{<!-- -->op_lui }} & amp; lui_result) | ({<!-- -->32{<!-- -->op_sll }} & amp; sll_result) | ({<!-- -->32{<!-- -->op_srl|op_sra}} & amp; sr_result); endmodule
MEM stage
`include "mycpu.h" module mem_stage( inputclk, input reset, //allowin input ws_allowin, output ms_allowin , //from es input es_to_ms_valid, input [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus , //to ws output ms_to_ws_valid, output [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus , //from data-sram input [31:0] data_sram_rdata ); reg ms_valid; wire ms_ready_go; reg [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus_r; wire ms_res_from_mem; wire ms_gr_we; wire [4:0] ms_dest; wire [31:0] ms_alu_result; wire [31:0] ms_pc; wire [31:0] mem_result; wire [31:0] ms_final_result; assign {<!-- -->ms_res_from_mem, //70:70 ms_gr_we , //69:69 ms_dest , //68:64 ms_alu_result , //63:32 ms_pc //31:0 } = es_to_ms_bus_r; assign ms_to_ws_bus = {<!-- -->ms_gr_we , //69:69 ms_dest , //68:64 ms_final_result, //63:32 ms_pc //31:0 }; assign ms_ready_go = 1'b1; assign ms_allowin = !ms_valid || ms_ready_go & amp; & amp; ws_allowin; assign ms_to_ws_valid = ms_valid & amp; & amp; ms_ready_go; always @(posedge clk) begin if (reset) begin ms_valid <= 1'b0; end else if (ms_allowin) begin ms_valid <= es_to_ms_valid; end if (es_to_ms_valid & amp; & amp; ms_allowin) begin es_to_ms_bus_r = es_to_ms_bus; end end assign mem_result = data_sram_rdata; assign ms_final_result = ms_res_from_mem ? mem_result : ms_alu_result; endmodule
WB stage
`include "mycpu.h" module wb_stage( inputclk, input reset, //allowin output ws_allowin , //from ms input ms_to_ws_valid, input [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus , //to rf: for write back output [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus , //trace debug interface output [31:0] debug_wb_pc , output [3:0] debug_wb_rf_we , output [4:0] debug_wb_rf_wnum, output [31:0] debug_wb_rf_wdata ); reg ws_valid; wire ws_ready_go; reg [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus_r; wire ws_gr_we; wire [4:0] ws_dest; wire [31:0] ws_final_result; wire [31:0] ws_pc; assign {<!-- -->ws_gr_we , //69:69 ws_dest , //68:64 ws_final_result, //63:32 ws_pc //31:0 } = ms_to_ws_bus_r; wire rf_we; wire [4 :0] rf_waddr; wire [31:0] rf_wdata; assign ws_to_rf_bus = {<!-- -->rf_we , //37:37 rf_waddr, //36:32 rf_wdata //31:0 }; assign ws_ready_go = 1'b1; assign ws_allowin = !ws_valid || ws_ready_go; always @(posedge clk) begin if (reset) begin ws_valid <= 1'b0; end else if (ws_allowin) begin ws_valid <= ms_to_ws_valid; end if (ms_to_ws_valid & amp; & amp; ws_allowin) begin ms_to_ws_bus_r <= ms_to_ws_bus; end end assign rf_we = ws_gr_we & amp; & amp; ws_valid; assign rf_waddr = ws_dest; assign rf_wdata = ws_final_result; // debug info generate assign debug_wb_pc = ws_pc; assign debug_wb_rf_we = {<!-- -->4{<!-- -->rf_we}}; assign debug_wb_rf_wnum = ws_dest; assign debug_wb_rf_wdata = ws_final_result; endmodule
mycpu.h
`ifndef MYCPU_H `define MYCPU_H `define BR_BUS_WD 33 `define FS_TO_DS_BUS_WD 64 `define DS_TO_ES_BUS_WD 152 `define ES_TO_MS_BUS_WD 71 `define MS_TO_WS_BUS_WD 70 `define WS_TO_RF_BUS_WD 38 `endif
mycpu_top
`include "mycpu.h" module mycpu_top( input clk, input resetn, // inst sram interface output inst_sram_en, output [3:0] inst_sram_we, output [31:0] inst_sram_addr, output [31:0] inst_sram_wdata, input [31:0] inst_sram_rdata, // data sram interface output data_sram_en, output [3:0] data_sram_we, output [31:0] data_sram_addr, output [31:0] data_sram_wdata, input [31:0] data_sram_rdata, // trace debug interface output [31:0] debug_wb_pc, output [3:0] debug_wb_rf_we, output [4:0] debug_wb_rf_wnum, output [31:0] debug_wb_rf_wdata ); reg reset; always @(posedge clk) reset <= ~resetn; wire ds_allowin; wires_allowin; wire ms_allowin; wire ws_allowin; wire fs_to_ds_valid; wire ds_to_es_valid; wire es_to_ms_valid; wire ms_to_ws_valid; wire [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus; wire [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus; wire [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus; wire [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus; wire [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus; wire [`BR_BUS_WD -1:0] br_bus; // IF stage if_stage if_stage( .clk (clk), .reset (reset), //allowin .ds_allowin (ds_allowin ), //brbus .br_bus (br_bus ), //outputs .fs_to_ds_valid (fs_to_ds_valid ), .fs_to_ds_bus (fs_to_ds_bus ), // inst sram interface .inst_sram_en (inst_sram_en ), .inst_sram_we (inst_sram_we ), .inst_sram_addr (inst_sram_addr ), .inst_sram_wdata(inst_sram_wdata), .inst_sram_rdata(inst_sram_rdata) ); //ID stage id_stage id_stage( .clk (clk), .reset (reset), //allowin .es_allowin (es_allowin ), .ds_allowin (ds_allowin ), //from fs .fs_to_ds_valid (fs_to_ds_valid ), .fs_to_ds_bus (fs_to_ds_bus ), //toes .ds_to_es_valid (ds_to_es_valid ), .ds_to_es_bus (ds_to_es_bus ), //to fs .br_bus (br_bus ), //to rf: for write back .ws_to_rf_bus (ws_to_rf_bus) ); // EXE stage exe_stage exe_stage( .clk (clk), .reset (reset), //allowin .ms_allowin (ms_allowin ), .es_allowin (es_allowin ), //from ds .ds_to_es_valid (ds_to_es_valid ), .ds_to_es_bus (ds_to_es_bus ), //to ms .es_to_ms_valid (es_to_ms_valid ), .es_to_ms_bus (es_to_ms_bus ), // data sram interface .data_sram_en (data_sram_en ), .data_sram_we (data_sram_we ), .data_sram_addr (data_sram_addr ), .data_sram_wdata(data_sram_wdata) ); // MEM stage mem_stage mem_stage( .clk (clk), .reset (reset), //allowin .ws_allowin (ws_allowin), .ms_allowin (ms_allowin ), //from es .es_to_ms_valid (es_to_ms_valid ), .es_to_ms_bus (es_to_ms_bus ), //to ws .ms_to_ws_valid (ms_to_ws_valid ), .ms_to_ws_bus (ms_to_ws_bus ), //from data-sram .data_sram_rdata(data_sram_rdata) ); // WB stage wb_stage wb_stage( .clk (clk), .reset (reset), //allowin .ws_allowin (ws_allowin), //from ms .ms_to_ws_valid (ms_to_ws_valid ), .ms_to_ws_bus (ms_to_ws_bus ), //to rf: for write back .ws_to_rf_bus (ws_to_rf_bus ), //trace debug interface .debug_wb_pc (debug_wb_pc ), .debug_wb_rf_we (debug_wb_rf_we ), .debug_wb_rf_wnum (debug_wb_rf_wnum ), .debug_wb_rf_wdata(debug_wb_rf_wdata) ); endmodule
soc_lite_top
Unmodified
TestBench
Test passed
Reference materials
[1] CPU Design Practice (Wang Wenxiang) Chapter 4
[2] LoongArch CPU Design Experiment_Practical Task 7
[3] CDP_EDE_local
[4] Loongson Architecture 32-bit Lite Reference Manual
[5] exp experimental release package download address