Hi, welcome back to our Hardcaml MIPS project! Although we have been quiet recently, a lot has happened behind the scenes. We finished the core functionality of our pipeline, implemented the vast majority of MIPS instructions, and added some functionality. Today, we will discuss forwarding/stalls, jumping, and branching as the final features to finish out our CPU. If you would like to see the end result of this post, it's tagged as 1.0.0 on GitHub.
Data Hazards, Forwarding, and Stalls
Sometimes, an instruction depends on the output of a previous instruction. This can be problematic in the context of a pipelined design: register values are fetched during the Decode stage, but written in the Writeback stage. This means that sometimes, the output of one instruction won’t be written until after the Decode stage of a dependent instruction. As a result, the dependent instruction will run using outdated input values. This scenario is known as a data hazard.
There are 2 potential solutions to this issue:
Stall the dependent instruction until current instruction has been completely processed OR
Find some way to send the output of the current instruction directly to the dependent instruction
When possible, we’d like to avoid stalling since that wastes time, so (2) is preferable. The technique we’ll use is called forwarding.
For example, say we have registers t0
, t1
, and t2
with initial values 0
, 5
, and 6
respectively. We want to run the following instructions:
add t0, t1, t2
add t2, t1, t0
In theory, the outputs should be t0 = 11
and t2 = 16
. However, without forwarding, the output would be t0 = 11
and t2 = 5
. This is because t0
had not been written back when it was needed in the second instruction. With forwarding, we can use the ALU output from the Execute stage and send it directly to the Decode stage. Then, our second instruction will operate on the proper inputs.
We only potentially need to forward from the two previous instructions, because anything before then will be processed completely in time. Since we fetch input values in the Decode stage, we only need to consider forwarding from the Execute and Memory stages.
We cannot forward from a “load word” instruction that’s still in the Execute stage. This is because the output of “load word” only becomes available when we read from data memory in the Memory stage. In this case, we must stall for one clock cycle. This allows the load word to move to the Memory stage, from which we can then forward.
Priority Muxes
We implemented our forwarding logic using a priority mux. In this section, we’ll talk a bit about what a priority mux is, and how to make one in Hardcaml.
A regular multiplexor (mux) uses a selector input which selects from a list of elements. A priority mux is effectively a chain of muxes, with a separate selector at each stage. This allows us to replicate if-else statements with many if-else
branches.
let circuit_impl (_scope : Scope.t) (input : _ I.t) =
let opt = input.options in
let ctrl = input.controls in
let forward_data =
priority_select_with_default ~default:opt.reg_value
[
{
With_valid.valid =
ctrl.e_reg_write_enable
&: (input.source ==: input.e_dest)
&: ~:(ctrl.e_sel_mem_for_reg_data);
value = opt.e_alu_output;
};
{
With_valid.valid =
ctrl.m_reg_write_enable
&: (input.source ==: input.m_dest)
&: ctrl.m_sel_mem_for_reg_data;
value = opt.m_data_output;
};
{
With_valid.valid =
ctrl.m_reg_write_enable &: (input.source ==: input.m_dest);
value = opt.m_alu_output;
};
]
in
{ O.forward_data }
Jumping and Branching
Often, we want to move to a different place in our program. For example, we may call/return from a function, repeat a loop, or conditionally execute some logic via if
statements. The “source of truth” for where we are in our program is the pc (program counter) register. Consequently, we can jump by changing the value stored in the pc register.
The pc register isn’t in the regfile, so we need to design a special way to update it. We use a priority mux to choose whether we want to branch/jump, or just advance to the next instruction. The decision is made in the control unit.
Sometimes you want to jump conditionally, like with a loop or an if-else statement. Branching allows you to jump based on a condition. There are two such instructions:
This is an example of how to simulate a for loop in MIPS.
Hardcaml Observations
Ideally, when we write register values during the Writeback stage of our pipeline, we want to write during the first half of a cycle. Then, values could be read during the second half, and we could have both a write and a read in the same cycle. Unfortunately, the Hardcaml simulation library doesn't support half-cycle timing. This is a problem, because there will be a 1-cycle-delay between when data is written, and when it can be read. With some help from the author of the Hardcaml library, we came with a workaround. Essentially, we implemented a mini-forwarding system so that if we are reading and writing the same ports in a clock cycle, we output the written value instead of reading from memory.