August 2023 – ioprog

Registers

The GD32VF103 has 32 CPU core registers (x0 to x31) each of which is 32 bits wide. There is also a 32 bit program counter (pc) (instruction pointer). Apart from x0 which is read-only and always returns a value of zero all the registers are interchangeable. This means that any register can be a stack pointer, a link register, an argument to a function and so on. While this freedom may seem great it could lead to chaos if you want pre-compiled program modules or libraries to work with one another. There must be some agreement between authors of such as to which registers carry return results, parameters, behave as a stack pointer and so on. The RISC-V Application Binary Interface (ABI) defines this and also renames the registers so that their use is more apparent. Assemblers and compilers are aware of these names also. The register names used in the RISC-V ABI are:

x0 is renamed to zero. This reminds me of the constant generator in the TIMSP430 which could output 6 different constant values that were commonly used in code. Using the zero register is faster than loading the value 0 from memory and is commonly used in program loops etc.

a0 to a7 : These are used to pass arguments to functions.

a0 and a1 are also used to return values from functions.

x2 is nominated as the Stack Pointer (sp)

x1 is used as a link register (it remembers the return address in leaf functions). It is called “ra” (return address). This is similar to the link register in ARM Cortex-M processors.

t0 to t6 are “temporary” registers. Functions need not preserve values in these registers

s0 to s11 are “saved” or “variable” registers. Functions must preserve values in these registers. They typically are used to hold a variable for quick access in a function (e.g. a loop counter).

x3 is renamed as gp (global pointer) and can be used to point at the middle of the global memory space

x4 is renamed as tp (thread pointer) is used in multi-threaded applications and points at a block of memory containing static data used by the current thread.

The mapping of these ABI register names to the underlying “x” register names may seem a little arbitrary. Presumably it is influenced by various efficiency constraints and the need to accommodate a version of the architecture which has only 16 registers (the “E” or embedded architecture). From a programmers perspective it makes no difference which underlying “X” register is used for each role so don’t worry too much about it!

In summary, the registers typically used by an application program are as follows:

t0-t6	temporary or scratch registers
a0 to a7	function arguments and return values
s0 to s11	registers where you can keep variables inside a block of code. Register s0 is used as a frame pointer inside a function call.
sp	stack pointer
ra	return address for leaf functions
gp	global pointer
tp	thread pointer
zero	a register that always returns a value of zero.

How do I put a number in a register?

The GD32VF103 uses an RV32IMAC core. This means it does Integer calculations only. Has a hardware Multiply, is capable of certain Atomic (non-interruptible) instructions (useful for multitasking and interrupts) and it can execute Compressed (16 bit) instructions as well as 32 bit ones.

From a programmers point of view, it might be nice if we could write instructions like this:

1) Put this 32 bit number into this register.

2) Add 1 to this register.

3) Store this register at this 32 bit memory address.

4) Set this register to zero.

From a CPU design perspective these instructions are less than ideal. Instruction 1 must be more than 32 bits wide as it has to encode the instruction, the target register and the 32 bit value.

Instruction 2 could be easily encoded in 16 bits.

Instruction 3 is, once again, wider than 32 bits.

Instruction 4 could be encoded in 16 (or fewer) bits.

These variable length instructions cause problems for instruction pipelines and complicate the instruction fetch mechanism. It would be nicer if instructions were a fixed width e.g. 32 bits. If you have lots of memory then this is fine. In embedded situations, where memory is in short supply, this is quite wasteful. If all instructions occupy 32 bits then simpler instructions will include lots of unused bits. RISC-V and ARM designers have compromised on instruction size by processing a mix or 16 and 32 bit instructions. This allows more instructions to be packed into less memory and only slightly complicates the instruction fetch and pipeline hardware. In the case of RISC-V the 16 bit instructions are referred to as Compressed instructions (the “C” in RV32IMAC).

Ok, we have 32 bit and 16 bit instructions. How do we do instruction 1 above:

Put this 32 bit value into this register

You could do it in two halves and load the upper 16 bits followed by the lower 16 bits using two 32 bit instructions.

Or, you could execute a command of the following form:

Load the 32 bit value in memory that is N bytes away from here.

In the case of RISC-V, you can do the following:

Load the following 20 bits into the upper bits of this register (clearing the lower 12 bits)

Add the following 12 bit number. The programmer can write these two commands

lui t0,0x12345 /* load upper 20 bits */

addi t0,t0,0x678 /* add lower 12 bits */

This is further complicated by the fact that the addi instruction takes a signed value. If you need to add an immediate value whose 12^th bit is set (implying a negative value) you have to figure out two’s compliment values and add what looks like a negative number. Recognizing that this is likely to lead to all sorts of human errors, a handy pseudo instruction is available: load immediate or li. This is translated by the assembler into the correct pair of lui and addi instructions. So, our load now goes like this:

li t0,0x12345678

The Load Store architecture.

All arithmetic and logical operations in the RV32IMAC are carried out via the cpu registers. It is not possible to add values in memory directly to one another : you need to get them into registers first (load), do the calculation and then optionally write (store) the result back to memory. Suppose you want to do the following calculation:

c = a + b;

Typically the process works like this:

Make a pointer to a.

Load the value at a into a register.

Make a pointer to b.

Load the value at b into a (different) register.

Add the two registers together.

Make a pointer to c.

Write the result to c.

The code shown below implements this (not particulary optimal).

	lui t2,%hi(a)		/* load 20 high bits of address of a into t2 */
	addi t2,t2,%lo(a)   /* add lower 12 bits of address of a to t2 */
	lw t0,0(t2) 		/* load the value pointed to by (0+t2) into t0 */
	
	lui t2,%hi(b)		/* load 20 high bits of address of b into t2 */
	addi t2,t2,%lo(b)	/* add lower 12 bits of address of b to t2 */
	lw t1,0(t2)			/* load the value pointed to by (0+t2) into t0 */
	
	add t0,t0,t1		/* add the values at a and b */
		
	lui t2,%hi(c)		/* load 20 high bits of address of c into t2 */
	addi t2,t2,%lo(c)	/* add lower 12 bits of address of c to t2 */
	sw t0,0(t2)			/* store the value in t0 to address pointed to by (0+t2)

exit_spin: 
	j exit_spin
/* constants below are in flash */	
a:	.word 0x12345678
b:	.word 0x23456789
/* variables are placed in ram */
	.data
c:	.word 0

/* init.s Initialization routine which sets the stack pointer, sets initial global values and clears those that are not specifically initialized. Assumes that the linker script aligned data sections along a word (4 byte) boundary. */ .global Reset_Handler .extern INIT_DATA_VALUES .extern INIT_DATA_START .extern INIT_DATA_END .extern BSS_START .extern BSS_END .extern main .section start Reset_Handler: lui sp,0x20005 # set stack pointer to top of RAM # Fill global and static variables with initial values la t0,INIT_DATA_VALUES la t1,INIT_DATA_START la t2,INIT_DATA_END init_data_store_loop: beq t1,t2,done_init_data lw a0,0(t0) sw a0,0(t1) addi t0,t0,4 addi t1,t1,4 j init_data_store_loop done_init_data: # Fill uninitialized global and static variables with zero la t0,BSS_START la t1,BSS_END zero_data_store_loop: beq t0,t1,done_zero_data sw x0,0(t1) addi t0,t0,4 j zero_data_store_loop done_zero_data: # call main C code jal main main_exit_spin: /* should not get here. */ j main_exit_spin

/* linker_script.ld */ /* useful reference: www.linuxselfhelp.com/gnu/ld/html_chapter/ld_toc.html */ /* sdata and sbss : the 's' prefix indicates short addressing (32 bit rather than 64 bit) is used */ MEMORY { flash : org = 0x00000000, len = 64k ram : org = 0x20000000, len = 20k } SECTIONS { . = ORIGIN(flash); .text : { *(start); *(.vectors); /* The interrupt vectors */ *(.text); *(.rodata); *(.comment); . = ALIGN(4); } >flash . = ORIGIN(ram); .data : { INIT_DATA_VALUES = LOADADDR(.data); INIT_DATA_START = .; *(.data); *(.sdata); INIT_DATA_END = .; . = ALIGN(4); } >ram AT>flash BSS_START = .; .bss : { *(.bss); *(.sbss); . = ALIGN(4); } > ram BSS_END = .; }

ioprog

Programming for input/output

Month: August 2023

Hands on Risc-V (RV32IMAC) assembler : Part 2

Hands on Risc-V (RV32IMAC) assembler : Part 1