We’re back!
Last fall, we completed our project of implementing a simple MIPS CPU in Hardcaml to explore hardware design in OCaml. A few weeks later, we got an invite to meet with the team that maintains Hardcaml and talk about our experience. They even sent us actual Arty A-7 FPGAs so we could test out our simulation on real hardware!
So… we’re back! In these next 2 posts, we’ll cover FPGAs in a bit more depth, how we adapted our project to actually run on a real FPGA, and how we implemented support for interacting with external devices. We actually implemented most of this back in January, but haven’t had time to wrap it up due to a very busy semester.
Thank you to Andrew Ray, Fu Yong Quah, Ben Devlin, and the rest of the Jane Street FPGA team for creating Hardcaml, meeting with us, and answering our numerous questions throughout this process. Thank you also to Yaron Minsky and Jane Street for sending us the FPGAs to try out our code. So without further ado:
The Arty A7 FPGA
The FPGA model we are using is Digilent’s Arty A7. For a brief refresher of FPGAs and why they matter, see our previous post.-,FPGYay,-!). The board's layout is illustrated in the diagram below and described in the table of contents.
The components that matter most to us are:
- DONE LED (1): lights up when the program has finished execution
- UART port (2): enables communication through MicroUSB, over UART. We’ll use this in the next post.
- Ethernet connector (3): supports incoming/outgoing Ethernet connections.
- MAC address sticker (4): is a unique ID for the Ethernet connector.
- Power jack (5): allow for an optional external power supply.
- Power good LED (6): shows whether the power supply (whether through USB or the dedicated jack) is acceptable.
- LEDs (7) and RGB LEDs (8): the design running on our FPGA can turn these on/off, allowing them to act as a basic “output” for our design. We’ll use these in this post to demonstrate that our CPU can run programs correctly.
- User Push Buttons (9) and Slide Switches (10): send their current state (pressed/unpressed, on/off) to the FPGA as input. This can be used by some designs to provide user input.
- Artix FPGA (19): is the actual programmable FPGA; everything else on this board is just peripherals and pins to help interface with / provide power to this FPGA.
- Micron DDR3 memory (20): a block of DDR3 memory that can be used by the FPGA.
Running Hardcaml on an FPGA
The work we’ve done so far (posts 1-12) has allowed us to code and test our MIPS CPU design in OCaml/Hardcaml, and compile that to equivalent Verilog. However, there are a few issues that remained unanswered:
- How do we load this design onto a physical FPGA?
- How do we map the inputs/outputs of our design to the FPGA’s actual hardware?
One option is copying our generated Verilog into the Vivado GUI, and running a synthesis, implementation, and bitstream generation there. That’s not ideal, as it would add a manual component to our automated workflow. Instead, we’ll use the Hardcaml Arty library, which was created by Fu Yong Quah, a member of the Hardcaml team, and provides:
- A high-level API for interacting with the Arty-A7’s hardware.
- Scripts and configuration files that programmatically generate a bitstream, which can be loaded onto our FPGA. These still use Xilinx’s Vivado under the surface, but are done automatically.
Note that the hardcaml_arty
library isn’t currently installable; for now, you’ll need to run the following command to install it:
opam pin hardcaml_arty https://github.com/askvortsov1/hardcaml_arty.git#as/make-installable
We decided to place all Hardcaml Arty-related files into the arty
subfolder of our project. This consists of config files/scripts, which we copied from Hardcaml Arty as per instructions, and code that wraps our design in the interface required by Hardcaml Arty, with relevant tests. You can see all the code changes we made in this commit.
Let’s discuss the aforementioned wrapper code a bit. As mentioned in the Hardcaml Arty documentation, the top-level Hardcaml module must accept Hardcaml_arty.User_application.I.t
as input, and output Hardcaml_arty.User_applicaton.O.t
. These include incoming clock signals, outgoing LED control bits, and receive/transmit data structures for UART and Ethernet. With that in mind, here’s our basic wrapper:
open Hardcaml
open Hardcaml_arty
open Signal
let rgb_off =
{
User_application.Led_rgb.r = of_string "0";
g = of_string "0";
b = of_string "0";
}
let create _scope (input : _ User_application.I.t) =
{
User_application.O.led_4bits = of_string "4'b1111";
uart_tx = input.uart_rx;
led_rgb = [ rgb_off; rgb_off; rgb_off; rgb_off ];
ethernet = User_application.Ethernet.O.unused (module Signal);
}
let () =
Hardcaml_arty.Rtl_generator.generate ~instantiate_ethernet_mac:false create
(To_channel Stdio.stdout)
To keep things simple, we won’t use our datapath yet; for now, we’ll just light up the 4 non-RBG LEDs to test that our Hardcaml code can control the FPGA. We’ll use placeholders/”empty” values for everything else. These are all relatively straightforward except for uart_tx
, where we just output the incoming uart_rx
as a placeholder. We can then pass this wrapper function to Hardcaml_arty.Rtl_generator
, which will transform our high-level inputs/outputs into a low-level format the pins on the FPGA can understand.
We can now run the following scripts to generate the bitstream and program our FPGA. Note that we ran these from the arty
subdirectory of our project:
source path/to/xilinx/installation/Vivado/2018.2/settings64.sh
make outputs/hardcaml_arty_top.bit
djtgcfg prog --file outputs/hardcaml_arty_top.bit -d Arty -i 0
For another sample Hardcaml Arty project, check out their blinker example.
Linking In Our Design
Now that we have been able to run simple Hardcaml code on our FPGA, we want to link in our datapath. This itself is pretty easy, we can just include it into the Arty wrapper as a hierarchical subcircuit. The challenge is testing whether the CPU actually worked on real hardware. We’ll do this by calculating a simple program (10 + 4), and validating the answer (14). But this begs the question: how do we display this value on our board?
Practically speaking, we have 3 “output” options: the LEDs (simplest but most limited), UART (relatively simple but flexible), and Ethernet (complex, but powerful). To keep things simple for now, we’ll stick to the LEDs. We’ll also treat the RGB LEDs as binary to give us 8 bits of output. We could have used the full range of r/g/b combinations, but this would be more complicated to visually decipher without major benefits. 8 bits is enough to display a trivial program’s output; in our case, 14. In a future post, we’ll restructure our design to use UART for input and output.
The code for our new wrapper was added in this commit, is as follows:
open Hardcaml
open Hardcaml_arty
open Signal
let rgb on = { User_application.Led_rgb.r = on; g = on; b = on }
(* Hardcoding this is not great, but there won't be a need to once we implement mmio. *)
let last_pc = "32'd8"
let display_val pc data =
mux2
(pc >=: of_string last_pc)
(of_bool true @: data.:[(6, 0)])
(of_string "8'h00")
let store_on_finished clk pc data =
let spec = Reg_spec.create ~clock:clk () in
let on_output_instr = pc ==: of_string last_pc in
let data_reg =
reg_fb spec ~enable:vdd ~w:8 (fun v ->
mux2 on_output_instr data.:[(7, 0)] v)
in
mux2 on_output_instr data.:[(7, 0)] data_reg
let circuit_impl program _scope (input : _ User_application.I.t) =
let datapath =
Mips.Datapath.hierarchical program _scope { clock = input.clk_166 }
in
let finished_data =
store_on_finished input.clk_166 datapath.writeback_pc
datapath.writeback_data
in
let display = display_val datapath.writeback_pc finished_data in
{
User_application.O.led_4bits = display.:[(7, 4)];
uart_tx = input.uart_rx;
led_rgb =
[ rgb display.:(0); rgb display.:(1); rgb display.:(2); rgb display.:(3) ];
ethernet = User_application.Ethernet.O.unused (module Signal);
}
The datapath currently outputs the writeback data and pc, which can be thought of as the output and line number of a line of the program. We take that output and send it to the store_on_finished
subcircuit, which uses a register to display the final result of the program, even after it has finished. We then send that to display_val
, which outputs a “formatted” 8-bit input:
- If the program has finished, the first bit is
1
, and the remainder is the last 7 bits of program output.
- Otherwise, all 0s.
This signal can then be used to light up our board’s LEDs. The expected output is 14, which is 1110 in binary. So only the first “done” LED and the first 3 “value” LEDs in the second row should light up. And indeed: