Simulation of Scoreboard Scheduling Method Based on C++ 100010674

1. Experiment name

Scoreboard Scheduling Method Simulation

2. Experiment report author

3. Experiment content

3.1. Define the basic data structure of NK-CPU instruction pipeline scoreboard simulation

3.1.1 Scoreboard

The scoreboard data structure is the core of the whole experiment, and the experiment requires a visual scoreboard process, so I store all the data needed on the scoreboard on the panel, which is the data structure of entity visualization.

If you need to use the data, you can get it directly from the panel. If the data is modified, it will be updated to the panel in time.

In addition, like the previous experiment, I also used two sets of scoreboards to indicate the start and end states. The second scoreboard is hidden and invisible, which can be seen from the design diagram. The reading is From the first scoreboard, changes are made to the second scoreboard, which avoids data hazards and structure hazards (artificial) during the simulation. At the end of each cycle the value of the second scoreboard is copied to the first scoreboard for the purpose.

3.2. Use C language to realize NK-CPU instruction pipeline scoreboard simulation program

3.2.1. Four implementation steps

The main thing is to change the scoreboard, and the specific operations are implemented according to the prompts in the book:

“Computer Architecture: Quantitative Research Methods (Fifth Edition)” P510

However, the judging conditions in this table are insufficient, and it needs to be judged according to the data of the command status scoreboard. Only after the previous step is completed, the subsequent steps can be executed.

3.2.1.1. Launch

nkcpuScoreboardDlg.cpp

void CnkcpuScoreboardDlg::launch()
{
    UpdateData(TRUE);
    CString ins = m_instate. GetItemText(m_PC - 6, 1);
    if (ins != "")
    {
        int FU;
        CString FUx;
        if (op == "load" || op == "store" || op == "branch" || op == "integer operation") { FU = integer unit; FUx = "integer"; }
        else if (op == "addition" || op == "subtraction") { FU = floating point addition unit; FUx = "addition";}
        else if (op == "Multiplication") { FU = Floating point multiplication unit; FUx = "Multiply by 1"; if (m_funstate.GetItemText(FU, 0) == "Yes") { FU ++ ; FUx = " multiply by 2"; } }
        else if (op == "divide") { FU = floating point divide unit; FUx = "divide";}
        CString busy = m_funstate. GetItemText(FU, 1);
        CString result = m_result. GetItemText(0, rmap[D] + 1);
        CString resultS1 = m_result. GetItemText(0, rmap[S1] + 1);
        CString resultS2 = m_result. GetItemText(0, rmap[S2] + 1);
        if (busy == "No" & amp; & amp; result == "")
        {
            m_instate1.SetItemText(m_PC - 6, 2, _T("√"));
            m_funstate1.SetItemText(FU, 1, _T("yes"));
            m_funstate1. SetItemText(FU, 2, op);
            m_funstate1. SetItemText(FU, 3, D);
            m_funstate1. SetItemText(FU, 4, S1);
            m_funstate1. SetItemText(FU, 5, S2);
            m_funstate1. SetItemText(FU, 6, resultS1);
            m_funstate1. SetItemText(FU, 7, resultS2);
            m_funstate1.SetItemText(FU, 8, resultS1 == "" ? _T("Yes") : _T("No"));
            m_funstate1.SetItemText(FU, 9, resultS2 == "" ? _T("Yes") : _T("No"));
            CString pc;pc.Format(_T("%d"), m_PC);
            m_funstate1. SetItemText(FU, 10, pc);
            m_PC++;
            m_result1.SetItemText(0, rmap[D] + 1, FUx);
            UpdateData(FALSE);
        }
    }
}

3.2.1.2. Read operand

nkcpuScoreboardDlg.cpp

void CnkcpuScoreboardDlg::read operand()
{
    for (int i = integer unit; i <= floating point division unit; i ++ )
    {
        CString Rj = m_funstate. GetItemText(i, 8);
        CString Rk = m_funstate. GetItemText(i, 9);
        if (Rj == "Yes" & amp; & amp;Rj == "Yes")
        {
            CString pc = m_funstate. GetItemText(i, 10);
            m_instate1.SetItemText(_ttoi(pc) - 6, 3, _T("√"));
            m_funstate1. SetItemText(i, 6, _T(""));
            m_funstate1. SetItemText(i, 7, _T(""));
            m_funstate1.SetItemText(i, 8, _T("No"));
            m_funstate1.SetItemText(i, 9, _T("No"));
            CString Fj = m_funstate. GetItemText(i, 4);
            CString Fk = m_funstate. GetItemText(i, 5);
            UpdateData(TRUE);
            B[i] = m_fReg[rmap[Fj]];
            C[i] = m_fReg[rmap[Fk]];
        }
    }
}

3.2.1.3. Execution completed

nkcpuScoreboardDlg.cpp

void CnkcpuScoreboardDlg::execution complete()
{
    for (int i = integer unit; i <= floating point division unit; i ++ )
    {
        CString pc = m_funstate. GetItemText(i, 10);
        if (m_instate. GetItemText(_ttoi(pc) - 6, 3)=="√")
        {
            switch (i)
            {
            case integer unit:
                m_instate1.SetItemText(_ttoi(pc) - 6, 4, integer());
                break;
            case floating point multiplied by 1 unit:
                m_instate1.SetItemText(_ttoi(pc) - 6, 4, float multiplied by 1());
                break;
            case floating point multiply by 2 unit:
                m_instate1.SetItemText(_ttoi(pc) - 6, 4, float multiplied by 2());
                break;
            case floating point plus unit:
                m_instate1.SetItemText(_ttoi(pc) - 6, 4, float plus ());
                break;
            case floating point division unit:
                m_instate1.SetItemText(_ttoi(pc) - 6, 4, floating point division ());
                break;
            }
        }
    }
}

The execution involves the following five functional units. Because they are independent of each other, they can all run independently. Therefore, there are five units in the execution phase, which will be introduced in detail in the next chapter.

3.2.1.4. Write result

nkcpuScoreboardDlg.cpp

void CnkcpuScoreboardDlg::write result()
{
    for (int i = integer unit; i <= floating point division unit; i ++ )
    {
        CString pc = m_funstate. GetItemText(i, 10);
        if (m_instate. GetItemText(_ttoi(pc) - 6, 4) == "√")
        {
            CString FUx;
            switch (i)
            {
            case integer unit: FUx = "Integer";break;
            case floating-point multiplication by 1 unit: FUx = "multiply by 1";break;
            case floating-point multiplication by 2 unit: FUx = "multiply by 2";break;
            case floating-point plus unit: FUx = "plus";break;
            case floating-point division unit: FUx = "division";break;
            }
            CString FiFU = m_funstate. GetItemText(i, 3);
            bool term = 1;
            for (int j = integer unit; j <= floating point division unit; j ++ )
            {
                if (i != j)
                {
                    CString Fjf = m_funstate. GetItemText(j, 4);
                    CString Fkf = m_funstate. GetItemText(j, 5);
                    CString Rjf = m_funstate. GetItemText(j, 8);
                    CString Rkf = m_funstate. GetItemText(j, 9);
                    term = term & amp; & amp; (Fjf != FiFU || Rjf == "No") & amp; & amp; (Fkf != FiFU || Rkf == "No");
                }
            }
            if (term)
            {
                m_instate1.SetItemText(_ttoi(pc) - 6, 5, _T("√"));
                for (int j = integer unit; j <= floating point division unit; j ++ )
                {
                    if (i != j)
                    {
                        CString Qjf = m_funstate. GetItemText(j, 6);
                        CString Qkf = m_funstate. GetItemText(j, 7);
                        if (Qjf == FUx)m_funstate1.SetItemText(j, 8, _T("yes"));
                        if (Qkf == FUx)m_funstate1.SetItemText(j, 9, _T("yes"));
                    }
                }
                CString lastPC = m_funstate. GetItemText(i, 10);
                m_result1.SetItemText(0, rmap[FiFU] + 1, _T(""));
                m_funstate1.SetItemText(i, 1, _T("No"));
                m_funstate1. SetItemText(i, 2, _T(""));
                m_funstate1. SetItemText(i, 3, _T(""));
                m_funstate1. SetItemText(i, 4, _T(""));
                m_funstate1. SetItemText(i, 5, _T(""));
                m_funstate1. SetItemText(i, 6, _T(""));
                m_funstate1. SetItemText(i, 7, _T(""));
                m_funstate1. SetItemText(i, 8, _T(""));
                m_funstate1. SetItemText(i, 9, _T(""));
                m_funstate1. SetItemText(i, 10, _T(""));
                m_fReg[rmap[FiFU]] = ALUOutput[i];
                if (m_funstate.GetItemText(i, 2) == "Loading")m_fReg[rmap[FiFU]] = LMD;
                UpdateData(FALSE);
            }
        }
    }
}

3.2.2. Five functional units

These five functional units may be used in the execution phase. I set the clock delay according to the assumptions of the examples in the book. The purpose is to better show the experimental results. Among them, the addition takes 2 clock cycles and the multiplication takes 10 clocks. cycle, divide by 40 clock cycles. It is also implemented according to the design in the book.

“Computer Architecture: Quantitative Research Methods (Fifth Edition)” P506

The pc in it is used to record the instructions that have been executed. Due to multiple constant cycles, the simulation is simplified a bit, and it is done in the last clock cycle, and the corresponding state is changed.

3.2.2.1. Integer units

nkcpuScoreboardDlg.cpp

CString pc1;
CString CnkcpuScoreboardDlg::integer()
{
    if (pc1 != m_funstate.GetItemText(integer unit, 10))
    {
        ALUOutput[integer unit] = B[integer unit] + C[integer unit];
        pc1 = m_funstate.GetItemText(integer unit, 10);
    }
    return _T("√");
}

3.2.2.2. Two floating-point multiply units

nkcpuScoreboardDlg.cpp

int float multiplied by 1 remaining time = 10;
CString pc2;
CString CnkcpuScoreboardDlg:: Float multiplied by 1()
{
    if (pc2 == m_funstate.GetItemText(floating point multiplied by 1 unit, 10))
    {
        return _T("√");
    }
    else
    {
        if (float multiplied by 1 remaining time == 1)
        {
            ALUOutput[floating point by 1 unit] = B[floating point by 1 unit] * C[floating point by 1 unit];
            Float multiplied by 1 time remaining = 10;
            pc2 = m_funstate.GetItemText(floating point multiplied by 1 unit, 10);
            return _T("√");
        }
        else
        {
            Float multiplied by 1 remaining time --;
            return _T("");
        }
    }
}

int float multiplied by 2 time remaining = 10;
CString pc3;
CString CnkcpuScoreboardDlg:: Float multiplied by 2()
{
    if (pc3 == m_funstate.GetItemText(floating point multiplied by 2 units, 10))
    {
        return _T("√");
    }
    else
    {
        if (float multiplied by 2 remaining time == 1)
        {
            ALUOutput[floating point by 2 units] = B[floating point by 2 units] * C[floating point by 2 units];
            Float multiplied by 2 time remaining = 10;
            pc3 = m_funstate.GetItemText(floating point multiplied by 2 units, 10);
            return _T("√");
        }
        else
        {
            Float multiplied by 2 remaining time --;
            return _T("");
        }
    }
}

3.2.2.3. Floating point addition unit

nkcpuScoreboardDlg.cpp

int floating point plus remaining time = 2;
CString pc4;
CString CnkcpuScoreboardDlg::float plus()
{
    if (pc4 == m_funstate.GetItemText(float plus unit, 10))
    {
        return _T("√");
    }
    else
    {
        if (float plus remaining time == 1)
        {
            ALUOutput[floating point plus unit] = B[floating point plus unit] * C[floating point plus unit];
            float plus remaining time = 2;
            pc4 = m_funstate.GetItemText(float plus unit, 10);
            return _T("√");
        }
        else
        {
            floating point plus remaining time --;
            return _T("");
        }
    }
}

3.2.2.4. Floating-point division unit

nkcpuScoreboardDlg.cpp

int floating point division time remaining = 40;
CString pc5;
CString CnkcpuScoreboardDlg::float divide()
{
    if (pc5 == m_funstate.GetItemText(floating point division unit, 10))
    {
        return _T("√");
    }
    else
    {
        if (float plus remaining time == 1)
        {
            ALUOutput[floating point division unit] = B[floating point division unit] / C[floating point division unit];
            float plus remaining time = 40;
            pc5 = m_funstate.GetItemText(floating point division unit, 10);
            return _T("√");
        }
        else
        {
            floating point plus remaining time --;
            return _T("");
        }
    }
}

3.2.3. Analog clock period

3.2.3.1. Stepping

nkcpuScoreboardDlg.cpp

// Use the button to control the clock cycle, press once to count a clock cycle
void CnkcpuPipelineDlg::OnBnClickedButton2()
{
    // TODO:
    if (!endState)
    {
        writeresult();
        execution complete();
        read operand();
        emission();
        int num=0;
        for (int i = 0; i < m_instate. GetItemCount(); i ++ )
        {
            for (int j = 0; j < m_instate. GetHeaderCtrl()->GetItemCount(); j ++ )
            {
                if (m_instate1. GetItemText(i, j) == "√")num++;
                m_instate. SetItemText(i, j, m_instate1. GetItemText(i, j));
            }
        }
        for (int i = 0; i < m_funstate. GetItemCount(); i ++ )
        {
            for (int j = 0; j < m_funstate. GetHeaderCtrl()->GetItemCount(); j ++ )
            {
                m_funstate. SetItemText(i, j, m_funstate1. GetItemText(i, j));
            }
        }
        for (int i = 0; i < m_result. GetItemCount(); i ++ )
        {
            for (int j = 0; j < m_result. GetHeaderCtrl()->GetItemCount(); j ++ )
            {
                m_result. SetItemText(i, j, m_result1. GetItemText(i, j));
            }
        }
        if (num == m_instate. GetItemCount()*(m_instate. GetHeaderCtrl()->GetItemCount() - 2))
            endState = 1;
    }
    else {
        MessageBox(_T("The program is finished!"), _T("Prompt"), MB_ICONINFORMATION);
    }
}

Press the button once to perform an operation on the four segments, and judge whether the program is completed according to the completion status on the scoreboard, so that the program can be stopped in time.

3.2.3.2. Execution

nkcpuScoreboardDlg.cpp

void CnkcpuPipelineDlg::OnBnClickedButton1()
{
    // TODO: Add control notification handler code here
    while (!endState)
    {
        OnBnClickedButton2();
    }
    MessageBox(_T("The program is finished!"), _T("Prompt"), MB_ICONINFORMATION);
}

The program operation is to repeat the single-step operation until the program ends.

3.3. Use NK-CPU assembly language to write the test program of the scoreboard simulation program

Since the sorting operation in the previous experiment only involved integers and no floating point numbers, so this time I have to rewrite it, and I directly use the examples in the book to do it, so as to test the results.

The assembly statement I use is:

l.d $f6 ,1($v0)
l.d $f2 ,0($v1)
mul.d $f0, $f2, $f4
sub.d $f8 , $f6 , $f2
div.d $f10, $f0, $f6
add.d $f6,$f8,$f2

3.4. Obtain the test results and compare them with the experimental 3 simulation test results

Because the integer units used in Experiment 3 will always be hit in this experiment, it is equivalent to multi-cycle sequential execution and cannot play the role of the scoreboard. In contrast, the efficiency of the pipeline is higher.

4. Basis for experimental design

4.1. Stage

Instructions are decoded sequentially and go through the following four stages.

4.1.1. Launch:

The system checks which register the instruction will read and write. This information is memorized as it will be required in the following phases. To avoid output dependencies (WAW – write after write), instructions stall until the instruction writing to the same register completes. Instructions also stall when the required functional unit is currently busy.

4.1.2. Read operand:

After an instruction has been issued and properly allocated to the required hardware modules, the instruction waits for all operands to become available. This process resolves read dependencies (RAW – read after write), because a register to be written to by another instruction is not considered available until it is actually written.

4.1.3. Execution:

When all operands have been fetched, the functional unit starts executing. The scoreboard is notified when the results are ready.

4.1.4. Write result:

At this stage, the result will be written to its destination register. However, this operation is delayed until the earlier instruction (intended to read the register this instruction is writing to) has completed its read operand phase. This way so-called data dependencies (WAR – write after read) can be resolved.

4.2. Data structure

To control the execution of instructions, the scoreboard maintains three state tables:

  • **Instruction status:** For each instruction being executed, it indicates the four stages it is in.
  • **Functional unit status: **Indicates the status of each functional unit. Each functional unit maintains 9 fields in the table:
    • Busy: Indicates whether the unit is in use
    • Op: the operation to perform in the cell (e.g. MUL , DIV or MOD )
    • Fi: destination register
    • Fj , Fk: source register number
    • Qj , Qk : Functional units that will generate source registers Fj , Fk
    • Rj , Rk: flags indicating whether Fj , Fk are ready
  • **Register Status:** Indicates for each register, which functional unit writes its result.

4.3. Algorithms

The detailed algorithm of scoreboard control is as follows:

Function issue(op, dst, src1, src2)
    Wait until (!Busy[FU] AND !Result[dst]); // FU can be any functional unit that can execute operation op Busy[FU] ← Yes;
    Op[FU] ← op;
    Fi[FU] ← dst;
    Fj[FU] ← src1;
    Fk[FU] ← src2;
    Qj[FU] ← Result[src1];
    Qk[FU] ← Result[src2];
    Rj[FU] ← notQj;
    Rk[FU] ← notQk;
    Result[dst] ← FU;
Function read_operands(FU)
    Wait until (Rj[FU] AND Rk[FU]); Rj[FU] ← No;
    Rk[FU] ← No;
Function execute(FU)
    // Execute whatever FU must do
Function write_back(FU)
    Waituntil (f {(Fj[f]≠Fi[FU] OR Rj[f] = No] AND (Fk[f] ≠ Fi[FU] OR Rk[f] = No)]}
    for each f do
        if Qj[f]=FU then Rj[f] ← Yes;
        if Qk[f]=FU then Rk[f] ← Yes;
    Result[Fi[FU]] ← 0;
    Busy[FU] ← No;

4.4. Remarks

The scoreboard approach must block the problem phase when no functional units are available. In this case, future possible execution of instructions will wait until the structural hazard is resolved. Some other techniques like Tomasulo’s algorithm can avoid structure hazards and resolve WAR and WAW dependencies through registry renaming.

5. Experimental results and analysis

5.1. Program interface

5.2. Stepping

5.3. Execution

The experimental results are basically in line with expectations.

6. Experimental experience

After several times of tempering, this time the scoreboard officially added floating-point calculations, so I did not re-add my floating-point module, but used the examples in the book, so that the original results can be reproduced. . I think it is important to cultivate experimental ability. There are a lot of homework at the end of the term, so I didn’t finish it until today, I hope the teacher understands.

Resources

Size: 112MB
Resource download: https://download.csdn.net/download/s1t16/87425303