Hands on Risc-V (RV32IMAC) assembler : Part 2

Registers

The GD32VF103 has 32 CPU core registers (x0 to x31) each of which is 32 bits wide. There is also a 32 bit program counter (pc) (instruction pointer). Apart from x0 which is read-only and always returns a value of zero all the registers are interchangeable. This means that any register can be a stack pointer, a link register, an argument to a function and so on. While this freedom may seem great it could lead to chaos if you want pre-compiled program modules or libraries to work with one another. There must be some agreement between authors of such as to which registers carry return results, parameters, behave as a stack pointer and so on. The RISC-V Application Binary Interface (ABI) defines this and also renames the registers so that their use is more apparent. Assemblers and compilers are aware of these names also. The register names used in the RISC-V ABI are:

x0 is renamed to zero. This reminds me of the constant generator in the TIMSP430 which could output 6 different constant values that were commonly used in code. Using the zero register is faster than loading the value 0 from memory and is commonly used in program loops etc.

a0 to a7 : These are used to pass arguments to functions.

a0 and a1 are also used to return values from functions.

x2 is nominated as the Stack Pointer (sp)

x1 is used as a link register (it remembers the return address in leaf functions). It is called “ra” (return address). This is similar to the link register in ARM Cortex-M processors.

t0 to t6 are “temporary” registers. Functions need not preserve values in these registers

s0 to s11 are “saved” or “variable” registers. Functions must preserve values in these registers. They typically are used to hold a variable for quick access in a function (e.g. a loop counter).

x3 is renamed as gp (global pointer) and can be used to point at the middle of the global memory space

x4 is renamed as tp (thread pointer) is used in multi-threaded applications and points at a block of memory containing static data used by the current thread.

The mapping of these ABI register names to the underlying “x” register names may seem a little arbitrary. Presumably it is influenced by various efficiency constraints and the need to accommodate a version of the architecture which has only 16 registers (the “E” or embedded architecture). From a programmers perspective it makes no difference which underlying “X” register is used for each role so don’t worry too much about it!

In summary, the registers typically used by an application program are as follows:

t0-t6temporary or scratch registers
a0 to a7function arguments and return values
s0 to s11registers where you can keep variables inside a block of code. Register s0 is used as a frame pointer inside a function call.
spstack pointer
rareturn address for leaf functions
gpglobal pointer
tpthread pointer
zeroa register that always returns a value of zero.

How do I put a number in a register?

The GD32VF103 uses an RV32IMAC core. This means it does Integer calculations only. Has a hardware Multiply, is capable of certain Atomic (non-interruptible) instructions (useful for multitasking and interrupts) and it can execute Compressed (16 bit) instructions as well as 32 bit ones.

From a programmers point of view, it might be nice if we could write instructions like this:

1) Put this 32 bit number into this register.

2) Add 1 to this register.

3) Store this register at this 32 bit memory address.

4) Set this register to zero.

From a CPU design perspective these instructions are less than ideal. Instruction 1 must be more than 32 bits wide as it has to encode the instruction, the target register and the 32 bit value.

Instruction 2 could be easily encoded in 16 bits.

Instruction 3 is, once again, wider than 32 bits.

Instruction 4 could be encoded in 16 (or fewer) bits.

These variable length instructions cause problems for instruction pipelines and complicate the instruction fetch mechanism. It would be nicer if instructions were a fixed width e.g. 32 bits. If you have lots of memory then this is fine. In embedded situations, where memory is in short supply, this is quite wasteful. If all instructions occupy 32 bits then simpler instructions will include lots of unused bits. RISC-V and ARM designers have compromised on instruction size by processing a mix or 16 and 32 bit instructions. This allows more instructions to be packed into less memory and only slightly complicates the instruction fetch and pipeline hardware. In the case of RISC-V the 16 bit instructions are referred to as Compressed instructions (the “C” in RV32IMAC).

Ok, we have 32 bit and 16 bit instructions. How do we do instruction 1 above:

Put this 32 bit value into this register

You could do it in two halves and load the upper 16 bits followed by the lower 16 bits using two 32 bit instructions.

Or, you could execute a command of the following form:

Load the 32 bit value in memory that is N bytes away from here.

In the case of RISC-V, you can do the following:

Load the following 20 bits into the upper bits of this register (clearing the lower 12 bits)

Add the following 12 bit number. The programmer can write these two commands

lui t0,0x12345 /* load upper 20 bits */

addi t0,t0,0x678 /* add lower 12 bits */

This is further complicated by the fact that the addi instruction takes a signed value. If you need to add an immediate value whose 12th bit is set (implying a negative value) you have to figure out two’s compliment values and add what looks like a negative number. Recognizing that this is likely to lead to all sorts of human errors, a handy pseudo instruction is available: load immediate or li. This is translated by the assembler into the correct pair of lui and addi instructions. So, our load now goes like this:

li t0,0x12345678

The Load Store architecture.

All arithmetic and logical operations in the RV32IMAC are carried out via the cpu registers. It is not possible to add values in memory directly to one another : you need to get them into registers first (load), do the calculation and then optionally write (store) the result back to memory. Suppose you want to do the following calculation:

c = a + b;

Typically the process works like this:

Make a pointer to a.

Load the value at a into a register.

Make a pointer to b.

Load the value at b into a (different) register.

Add the two registers together.

Make a pointer to c.

Write the result to c.

The code shown below implements this (not particulary optimal).

	lui t2,%hi(a)		/* load 20 high bits of address of a into t2 */
	addi t2,t2,%lo(a)   /* add lower 12 bits of address of a to t2 */
	lw t0,0(t2) 		/* load the value pointed to by (0+t2) into t0 */
	
	lui t2,%hi(b)		/* load 20 high bits of address of b into t2 */
	addi t2,t2,%lo(b)	/* add lower 12 bits of address of b to t2 */
	lw t1,0(t2)			/* load the value pointed to by (0+t2) into t0 */
	
	add t0,t0,t1		/* add the values at a and b */
		
	lui t2,%hi(c)		/* load 20 high bits of address of c into t2 */
	addi t2,t2,%lo(c)	/* add lower 12 bits of address of c to t2 */
	sw t0,0(t2)			/* store the value in t0 to address pointed to by (0+t2)

exit_spin: 
	j exit_spin
/* constants below are in flash */	
a:	.word 0x12345678
b:	.word 0x23456789
/* variables are placed in ram */
	.data
c:	.word 0

Hands on Risc-V (RV32IMAC) assembler : Part 1

Setting up the development environment

     

I was looking around for a board to tinker with RV32 assembly language as a way of getting to know the architecture a bit better. I tried using a WCH-Link debugger module and a CH32VF103 board but so far I have had no success using OpenOCD with it. I have opted instead to use a Longan Nano GD32VF103 in conjunction with a J-Link Edu debugger. This worked well enough for me to get going although the debug interface appears to be very sensitive to noise.

Using the Jlink tools from Segger a GDB link to the target as follows:
JLinkGDBServer -device GD32VF103C8T6 -if JTAG

First code.

My goal here is to get started into RISC-V assembler with the minimum amount of fuss. When the Longan-Nano GD32VF103 boots it begins executing code at address 0. Typically this code would initialize global and static variables, set the stack pointer and then call on main. For this particular architecture it also needs to set up the interrupt controller. I will do this at a later time. For now I will work without interrupts.

/* init.s
 Initialization routine which sets the stack pointer, 
 sets initial global values and clears those that are not
 specifically initialized.  Assumes that the linker script aligned 
 data sections along a word (4 byte) boundary.
*/
	.global Reset_Handler
	.extern INIT_DATA_VALUES
	.extern INIT_DATA_START
	.extern INIT_DATA_END
	.extern BSS_START
	.extern BSS_END
	.extern main
	.section start
Reset_Handler:
	lui sp,0x20005 # set stack pointer to top of RAM
# Fill global and static variables with initial values
	la	t0,INIT_DATA_VALUES
	la  t1,INIT_DATA_START
	la  t2,INIT_DATA_END
init_data_store_loop:
	beq t1,t2,done_init_data
	lw  a0,0(t0)
	sw  a0,0(t1)
	addi t0,t0,4
	addi t1,t1,4
	j init_data_store_loop
done_init_data:
# Fill uninitialized global and static variables with zero
	la	t0,BSS_START
	la  t1,BSS_END
zero_data_store_loop:
	beq t0,t1,done_zero_data
	sw  x0,0(t1)	
	addi t0,t0,4
	j zero_data_store_loop
done_zero_data:
# call main C code
	jal main
main_exit_spin: /* should not get here. */
	j main_exit_spin

This code needs to be placed at address 0 (aliased from 0x08000000). The linker script helps do this by associating the section name “start” with the first entry in the flash ROM.

/* linker_script.ld */
/* useful reference: www.linuxselfhelp.com/gnu/ld/html_chapter/ld_toc.html */
/* sdata and sbss : the 's' prefix indicates short addressing (32 bit rather than 64 bit) is used */
MEMORY
{
    flash : org = 0x00000000, len = 64k
    ram : org = 0x20000000, len = 20k
}
  
SECTIONS
{
        
	. = ORIGIN(flash);
        .text : {
          *(start);
		  *(.vectors); /* The interrupt vectors */
		  *(.text);
		  *(.rodata);
		  *(.comment);		  
		
		  . = ALIGN(4);
        } >flash
	. = ORIGIN(ram);
        .data : {
	  INIT_DATA_VALUES = LOADADDR(.data);
	  INIT_DATA_START = .;
	    *(.data);
	    *(.sdata);
	  INIT_DATA_END = .;
	  . = ALIGN(4);
        } >ram AT>flash
	BSS_START = .;
	.bss : {	  
	    *(.bss);
	    *(.sbss);
	    . = ALIGN(4);
	} > ram
	BSS_END = .;
}

/* main.c */
int x=0x12345678;
int y=0xabcd1234;
int z;
int main()
{
	
	y += 5;
	z = 4;
	while(1)
	{
		x+=y;
	}
}

The following command compiles the code:

riscv64-unknown-elf-gcc -march=rv32imac -mabi=ilp32 main.c init.s -nostdlib -T linker_script.ld -g3 -O0

The -march parameter is set to rv32imac which matches the gd32vf103. The mabi argument generates code with the following integer and pointer sizes:

long : 64 bits, int : 32 bits, short : 16 bits, pointers : 32 bits

(ref : https://www.sifive.com/blog/all-aboard-part-1-compiler-args)

There are two files in this project : main.c (a simple C program) and init.s.

The nostdlib argument really says that this is a completely bare-metal program that requires no additional components.

The linker script file name is specified with the -T argument.

The -g3 argument turns debugging information up to the maximum which helps debugging

The -O0 argument turns off all optimizations so that the code is left “as is”.

Debug session

Execute the following command to start the debug session (assuming you have started the JLinkGDBServer in another window).

gdb-multiarch a.out  

This starts the following GDB session.
GNU gdb (Ubuntu 13.1-2ubuntu2) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/&gt;.
Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/&gt;.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from a.out…
(gdb) target ext :2331
Remote debugging using :2331
main () at main.c:12
12 x+=y;
(gdb) monitor reset
Resetting target
(gdb) load
Loading section .text, size 0x9c lma 0x0
Loading section .data, size 0x8 lma 0x9c
Start address 0x00000000, load size 164
Transfer rate: 160 KB/sec, 82 bytes/write.
(gdb) stepi
Reset_Handler () at init.s:17
17 la t0,INIT_DATA_VALUES
(gdb) i r
ra 0x0 0x0 <Reset_Handler>
sp 0x20005000 0x20005000
gp 0x0 0x0 <Reset_Handler>
tp 0x0 0x0 <Reset_Handler>
t0 0x0 0
t1 0x0 0
t2 0x0 0
fp 0x0 0x0 <Reset_Handler>
s1 0x0 0
a0 0x0 0
a1 0x0 0
a2 0x0 0
a3 0x0 0
a4 0x0 0
a5 0x0 0
a6 0x0 0
a7 0x0 0
s2 0x0 0
s3 0x0 0
s4 0x0 0
s5 0x0 0
s6 0x0 0
s7 0x0 0
s8 0x0 0
s9 0x0 0
s10 0x0 0
s11 0x0 0
t3 0x0 0
t4 0x0 0
t5 0x0 0
t6 0x0 0
pc 0x4 0x4 <Reset_Handler+4>

Commands that are entered are shown in bold in the above listing. The first of these is

target ext :2331

This connects to the JLinkGDBServer over TCP port 2331 on the local machine

monitor reset

This resets (and halts in this case) the GD32VF103

load

Loads the program specified in the command line (a.out) into flash memory

stepi

Execute a single assembler instuction pointed to by the the program counter (pc)

i r

Shorthand for info registers. This displays the contents of the CPU registers.

Now that all of this seems to be working further adventures in RISC-V assembler will follow.