Hi, welcome back to our Hardcaml MIPS project! In the last post, we created a separate module for the instruction fetch stage in our CPU design. Today, we'll discuss how we can work with memory in Hardcaml, and add instruction memory to that fetch stage. If you'd like to see the end-result of this post, it's tagged as v0.3.0 on GitHub.
Memory in Our Design
Our pipelined CPU design needs 3 "blocks" of memory:
- The instruction memory is a read-only block of storage containing the program (a list of MIPS instructions) that our CPU is executing.
- The data memory is a large read-write block of storage that we use for data structures (arrays, lists, etc), the execution stack, and other general-purpose storage.
- The register file (no relation to the filesystem) is a set of 32 32-bit registers that's used for storing temporary/intermediate values, local variables, addresses of larger structures in the data memory, etc.
These 3 blocks have different requirements, and as a result, will have different implementations. The register file needs to be extremely fast, so it's usually implemented as a grid of flip-flops directly on the CPU. On the other hand, instruction and data memory need to be very large, so RAM sticks are often a better fit.
For simplicity, the MIPS CPU we made in class also implemented instruction and data memory as grids of flip-flops. This allows us to instantly read/write memory without the latency of communicating with external hardware. We'll keep this oversimplification for now, and add support for RAM-based memory later if time allows.
Hardcaml Multiport Memory
Hardcaml has a primitive called multiport_memory
to represent memory structures. It actually converts to Verilog as a grid of registers, which works great for the simplified design I explained above. Hardcaml also has tools for interacting with RAM and other FPGA memory, but that's a matter for another post.
Conceptually, you can think about multiport_memory
as an array: there's n
contiguous cells, each of which has w
bits. We'll specify n
when we declare the memory unit, and w
when we configure write ports.
A write port configures how we write to our memory. It's implemented as an OCaml record, and includes:
- A clock signal, since register writes happen once every clock cycle.
- An enable signal, since not every instruction we process needs to write to memory.
- A signal containing the address (from
0
to n-1
) of the memory cell to which we're writing.
- A signal containing the data we're writing.
Of course, we also need to read from our memory. In addition to write ports, we'll also provide a list of read addresses, which are just signals containing addresses of the cells we want to read from. The output of the multiport_memory
function will be an array of the read data for each read address.
Note that every write port has to have the same width, and there must be at least one write port and one read address. This means that the width of our write data represents w
, and that'll be the width of the outputs of all our reads.
Let's take a look at how we could implement our data memory in Hardcaml using multiport_memory
:
open Hardcaml
open Hardcaml.Signal
let data_memory write_clock write_enable write_address write_data read_address =
let write_port = { write_clock; write_address; write_enable; write_data } in
let number_of_cells = 512 in
let mem =
multiport_memory number_of_cells ~write_ports:[| write_port |]
~read_addresses:[| read_address |]
in
Array.get mem 0
We're defining this as a function, which we'll use when we make the circuit for our memory CPU stage. It takes all the signals we need for a write port and an address to read from. It'll output the contents of data memory at the read address. Since the output of multiport_memory
is an array of signals, we use Array.get
to get the signal corresponding to one one
Our write_data
wire will be 32 bits (not explicitly stated yet) and we're declaring 512 (n
) cells of memory, so this block will have 16kb in total.
All in all, the result looks like this:
This isn't in the Github repo yet because we're still working on the fetch stage, but it will be eventually!
Instruction Memory Implementation
Now, let's switch from data memory to instruction memory. In an actual CPU, instructions would be fetched from storage hardware: this would be BIOS when your computer is booting up, or RAM / disk memory while it's running. However, since we want to keep things simple, we'll hardcode a program as a constant value to our instruction memory.
That being said, it's important to recognize that despite the hardcoding, a particular program is not a part of the CPU’s design/implementation. Luckily, Hardcaml might help us here: if we make the program that gets executed an argument to our main.ml
, we can treat it as a variable in OCaml, but a hardcoded value in the generated Verilog. Another advantage of making it an argument is that we can now easily test our design with various programs.
And that's exactly what we'll do! I won't go through all the changes here, but here's our new main.ml
where we bind Program.sample
(a sample program for testing) into our design:
open Hardcaml
open Mips
module MipsCircuit = Circuit.With_interface (Datapath.I) (Datapath.O)
let scope = Scope.create ()
let circuit =
let bound_create = Datapath.create ~program:Program.sample scope in
MipsCircuit.create_exn ~name:"datapath" bound_create
let () = Rtl.print ~database:(Scope.circuit_database scope) Verilog circuit
program
is passed down through Datapath
to Instruction_fetch
, where it'll be used by our instruction memory implementation.
Finally, we need to implement instruction memory. Unfortunately, multiport_memory
won't work here, as it doesn't support initial values (since most real projects would interface with actual RAM). We're going to need a workaround.
What we really want is a variable-length, read-only block of data that we can read from, and which we can set to an initial value. Luckily, we can commandeer Hardcaml's mux
primitive to do exactly this. mux
takes 2 arguments: a selector signal (sel
) and a list of signals (lst
). It returns the element of lst
at indexsel
. So if we use our program for lst
and the current address for sel
, we get exactly what we want.
let instruction_memory program address =
mux address (Program.to_signals program)
let create ~program (_scope : Scope.t) (i : _ I.t) =
let address = srl i.pc 2 in
let instruction = instruction_memory program address in
{ O.next_pc = i.pc +:. 4; O.instruction }
And that's our instruction fetch module done, as planned!
Hardcaml Observations
- I like that memory is represented as a function outputting read data. As with circuits, functions are a great way to reason about parts of a system.
- It would be very useful if we could specify initial values for
multiport_memory
and RAM primitives, even if that only worked for testing.
- Encapsulating programs behind the
Program.t
abstract type feels really clean: I think I might even like this more than the OOP private/public declarations I'm used to.