A measurement of the performance of Rust vs C.

This post follows on from my previous one which was an opening foray into the world of Rust. Following a comment I received I decided I would look at the relative performance of Rust and C for the trivial example of blinky running as fast as the CPU can do it (using the default reset clock speed). This involved modifying the C and Rust code.
The modified main.c
The modified code for main.c is shown below. The main differences between this and the version from my previous post are:
#defines used to define the addresses of the I/O registers. This means that the handling of pointer indirection happens at compile time – not run time. Also, it means that RAM is not used to represent the pointers at run-time.
The BSR and BRR (bit-set and bit reset registers) are used for setting and clearing bit 13. This produces smaller and faster code as there is no software bitwise AND/OR involved at run-time.
Two build changes:
optimization is turned up to level 2 by passing the -O2 argument to arm-none-eabi-gcc
debugging information is removed (building for release)

/* User LED for the Blue pill is on PC13 */
#include  // This header includes data type definitions such as uint32_t etc.
#define rcc_apb2enr (  (volatile uint32_t *) 0x40021018  )
#define gpioc_crh  (  (volatile uint32_t *) 0x40011004  )
#define gpioc_odr (  (volatile uint32_t *) 0x4001100c  )
#define gpioc_bsr (  (volatile uint32_t *) 0x40011010  )
#define gpioc_brr (  (volatile uint32_t *) 0x40011014  )


// The main function body follows 
int main() {
    // Do I/O configuration
    // Turn on GPIO C
    *rcc_apb2enr |= (1 << 4); // set bit 4
    // Configure PC13 as an output
    *gpioc_crh |= (1 << 20); // set bit 20
    *gpioc_crh &= ~((1 << 23) | (1 << 22) | (1 << 21)); // clear bits 21,22,23
    while(1) // do this forever
    {
        *gpioc_bsr = (1 << 13); // set bit 13       
         *gpioc_brr = (1 << 13); // clear bit 13     
    }
}
// The reset interrupt handler
void reset_handler() {
    main(); // call on main 
    while(1); // if main exits then loop here until next reset. 
}

// Build the interrupt vector table
const void * Vectors[] __attribute__((section(".vector_table"))) ={
	reset_handler
};

This code is built and gdb reports it's load size as 76 bytes when loaded onto the STM32F103. Executing the code we see the port pin behave as follows:

rust_blink_speed
I’m not clear on why there is a variation of cycle length every other cycle – maybe its as a result of the sampling on my cheap logic analyzer but the important point here is that the waveform is exactly the same for both C and Rust versions of this program so there is no performance hit in tight loops such as this.

The Rust version
The Rust version of the program is shown below. This is different to the version in my previous post as it uses the stm32f1 crate. The example is taken from a Jonathan Klimt’s blog over here. There’s quite a lot going on under the surface in this crate but crucially this all seems to happen at compile time because the output code is just as efficient as the C code from a speed perspective although it is a good deal bigger : 542 bytes vs C’s 76. Why the difference? There are several reasons but one big chunk is accounted for by the fact that the Rust version has a fully populated interrupt vector table which takes up 304 bytes. The C-version only includes the initial stack pointer and the Reset vector : a total of 8 bytes.

// std and main are not available for bare metal software
#![no_std]
#![no_main]

extern crate stm32f1;
extern crate panic_halt;
extern crate cortex_m_rt;

use cortex_m_rt::entry;
use stm32f1::stm32f103;

// use `main` as the entry point of this application
#[entry]
fn main() -> ! {
    // get handles to the hardware
    let peripherals = stm32f103::Peripherals::take().unwrap();
    let gpioc = &peripherals.GPIOC;
    let rcc = &peripherals.RCC;   

    // enable the GPIO clock for IO port C
    rcc.apb2enr.write(|w| w.iopcen().set_bit());
    gpioc.crh.write(|w| unsafe{
        w.mode13().bits(0b11);
        w.cnf13().bits(0b00)
    });

    loop{
        gpioc.bsrr.write(|w| w.bs13().set_bit());        
        gpioc.brr.write(|w| w.br13().set_bit());                
    }
}

The blinky loops in assembler
The version produced by the C code is shown below. r1 and r2 have been previously set up to point at the BSR and BRR registers; while r3 contains the mask value (1 << 13).

   0x08000034 :    str     r3, [r1, #0]
   0x08000036 :    str     r3, [r2, #0]
   0x08000038 :    b.n     0x8000034 

The version produced by the Rust code is below. In this case, r0 is set to point at the data structure that manages Port C and the offsets used in the store instructions are used to select the BSR and BRR registers. Register r1 has been preset to the mask value (1 << 13).

   0x0800017a :    str     r1, [r0, #12]
   0x0800017c :    str     r1, [r0, #16]
   0x0800017e :    b.n     0x800017a 

Final thoughts
Rust is considered to be a safer language to use than C. This benefit must come at some cost however. I’ve seen here that the cost is not in execution speed so where is it? Well, the C version is completely built in 168ms while the Rust version takes over 2 minutes – note this is a “once-off” cost in a way; incremental builds are nearly as quick as the C version. A full rebuild only happens if you build after a “cargo clean” command or if you modify the Cargo.toml file (which changes the build options).
Does this encourage me to pursue Rust a little more – definitely yes and with a more complex project that might reveal the greater safety of Rust.

Rust and C side-by-side on the STM32F103C8T6 (“Blue pill”) board

rustybluepill2
For most people, the first program they write for an MCU is “blinky”. This post explores blinky in C and Rust for the STM32F103C8T6 Blue pill board. This board has a built-in LED on port C bit 13. Programs are downloaded via an STLink-V2 SWD debug interface clone.

What does the program do?

When you power on an STM32F103 it looks to the lower addresses in FLASH memory to figure out what to do. The first entry (32 bit) should contain the value for the initial stack pointer. This is typically set to the top of RAM. The next entry is the address of the code that handles reset : the reset_handler.
Typically, the job of the reset handler is to initialize global and static variables and maybe perform some clock initialization. When this is done it then calls on the “main” function although this is not essential : you can write all your code in the reset handler. I prefer to use the main function.
In order to make the LED blink the main function must configure the appropriate GPIO pin as an output (Port C, bit 13). Having done this, the program enters an endless loop which does the following:
Set port C bit 13 high (turning on the LED)
Wait for a while (so the user can see the LED change)
Set port C bit 13 low (turning off the LED)
Wait for a while (so the user can see the LED change)
And that’s it. Now lets look at the C and Rust ways of doing this. Note: these are very much minimal programs that seek to bridge the gap between the higher level languages and the hardware. They don’t necessarily represent a recommended programming style for complex systems.

Blinky in C : memory layout
memory_layout

The figure above shows how the linker script file (linker_script.ld) lays out the memory image output by the linker. Flash memory starts at address 0x08000000. The initial stack pointer value is set by the line:

          LONG(ORIGIN(RAM) + LENGTH(RAM));

which evaluates to 0x20005000.
The interrupt vector table is then placed immediately after this – the reset vector being the first and only entry in this case. The text section contains code, rodata contains constants and so on. The ARM.exidx section contains data that can be used during debugging to perform a stack backtrace. The last section is not really relevant to this example but is included for completeness. This is used to help initialize global and static data.

Blinky in C : code

/* User LED for the Blue pill is on PC13 */
#include  // This header includes data type definitions such as uint32_t etc.

// Simple software delay.  The larger dly is the longer it takes to count to zero.
void delay(uint32_t dly) {
    while(dly--);
}
// GPIO configuration
void config_pins() {
    // Make pointers to the relevant registers
    volatile uint32_t * rcc_apb2enr = (  (volatile uint32_t *) 0x40021018  );
    volatile uint32_t * gpioc_crh = (  (volatile uint32_t *) 0x40011004  );
    
    // Turn on GPIO C
    *rcc_apb2enr |= (1 << 4); // set bit 4
    // Configure PC13 as an output
    *gpioc_crh |= (1 << 20); // set bit 20
    *gpioc_crh &= ~((1 << 23) | (1 << 22) | (1 << 21)); // clear bits 21,22,23
}
void led_on() {
    // Make a pointer to the output data register for port C
    volatile uint32_t * gpioc_odr = (  (volatile uint32_t *) 0x4001100c  );
    *gpioc_odr |= (1 << 13); // set bit 13
}

void led_off() {
    // Make a pointer to the output data register for port C
    volatile uint32_t * gpioc_odr = (  (volatile uint32_t *) 0x4001100c  );
    *gpioc_odr &= ~(1 << 13); // clear bit 13
}
// The main function body follows 
int main() {
    // Do I/O configuratoin
    config_pins(); 
    while(1) // do this forever
    {
        led_on();
        delay(100000);
        led_off();
        delay(100000);
    }
}
// The reset interrupt handler
void reset_handler() {
    main(); // call on main 
    while(1); // if main exits then loop here until next reset. 
}

// Build the interrupt vector table
const void * Vectors[] __attribute__((section(".vector_table"))) ={
	reset_handler
};

Let's start at the bottom of the code: Here you can see the interrupt vector table being defined. It begins with:

    const void * Vectors[] __attribute__((section(".vector_table"))) ={

What does this mean:
Vectors is a an array of constant pointers to undefined types. It is given the linker section attribute ".vector_table" which places it at the appropriate place in the program memory image. There is just one element in this array: reset_handler which represents the address of that function. So, when reset or power up happens, the first code to be executed will be at the address “reset_handler”. In this example, the reset_handler code simply calls on main. If by some chance the main function exits, reset_handler then enters an empty endless loop.
The main function calls on lower level functions : config_pins, led_on, delay and led_off.

The job of config_pins is to set up GPIO Port C, bit 13 as an output. Note the way pointers to the hardware registers are created. This mechanism is less than ideal because the pointer takes up valuable RAM. It is more usual to use #define macros for these pointers instead but this approach is taken here so that the C code lines up with the Rust code more closely.
The functions led_on, led_off behave in a similar way. The delay function implements a simple software delay loop. The program is built with the following command line (this is in a “batch” file in the github repository called build.bat – this batch file extension is chosen so that it can be directly executed on Windows as well as Linux)

    arm-none-eabi-gcc -static -mthumb -g -mcpu=cortex-m3 *.c -T linker_script.ld -o main.elf -nostartfiles 

This invokes the arm gcc compliler, performs static linking (no dll’s), generates “thumb” code, with debugging information for the arm-cortex-m3 core. All C files in the current directory are included. The linker is instructed to use this particular linker script, the output program will be called main.elf and the linker is instructed not to include any default initialization code (the reset_handler function does this). You must have arm-none-eabi-gcc installed on your system and reachable via the PATH environment variable.

Blinky in Rust
Rust uses a linker file just like C – in fact it is identical apart from a couple of minor points:
1) It is called memory.x (which is in line with other examples on the Internet)
2) It doesn’t contain the section relating to the initialization of global and static variables.

While the C version of the program sits entirely in one directory, the Rust version is distributed across a number of sub-directories. This is because the Rust build tool Cargo is used. The directory tree is as follows:
cargo_directories

This is obviously WAY more complex however you really only have to concern yourself with the highlighted files.

The config file in the “.cargo” directory contains the following:

[target.thumbv7m-none-eabi]
rustflags = ["-C", "link-arg=memory.x"]

[build]
target = "thumbv7m-none-eabi"

This tells the rust compiler the target architecture : thumbv7m (Cortex m3), none (no operating system), eabi : defines the function calling convention, the way data is organized in memory and the object file format as embedded application binary interface. The rustflags setting causes the linker to use the linker script file memory.x when generating the program image.

The file Cargo.toml contains the following


[package]
name = "slightly_rusty"
version = "0.1.0"


[profile.release]
# enable debugging in release mode.
debug = true

This simply names the output executable file name, specifies the version number and causes debugging data to be included in the release version (can be handy while testing)

The memory.x linker script file has been discussed above.

The rust code for this project is included in main.rs. It is as follows:

#![no_main]
#![no_std]

use core::panic::PanicInfo;

fn delay(mut dly : u32) {
    
    while dly > 0
    {
        dly = dly -1;
    }
}
// GPIO configuration
fn config_pins() {
   unsafe {
        // Make pointers to the relevant registers
        let rcc_apb2enr  = 0x40021018 as *mut u32;
        let gpioc_crh    = 0x40011004 as *mut u32; 
                
        // Turn on GPIO C
        *rcc_apb2enr |= 1 << 4; // set bit 4
        // Configure PC13 as an output
        *gpioc_crh |= 1 << 20;  // set bit 20
        *gpioc_crh &= !((1<<23) | (1<<22) | (1 << 21)); // clear bits 21,22,23
    }
}

fn led_on() {
    unsafe {
        // Make a pointer to the output data register for port C
        let gpioc_odr  = 0x4001100C as *mut u32; 
        *gpioc_odr |= 1 << 13; // set bit 13
    }
}

fn led_off() {
    unsafe {
        // Make a pointer to the output data register for port C
        let gpioc_odr  = 0x4001100C as *mut u32; 
        *gpioc_odr &= !(1  ! = reset_handler;




// Rust requires a function to handle program panics - this one simply loops.  You 
// could perhaps write some code to output diagnostic information.
#[panic_handler]
fn panic(_panic: &PanicInfo) -> ! {
    loop {}
}

The code is pretty similar to the C code so let’s just look at some of the differences.
First of all, we see the #![no_main] macro. This tells rust that there is no default startup function. Without this the compiler generates an error saying “error: requires `start` lang_item” which presumably means it can’t find the
default entry point for this program – not a problem here as the reset_handler is the entry point.
The #![no_std] macro tells the rust ( and the linker) not to include the rust standard library as it is not implemented (at least not fully) for the thumbv7m-none-eabi target.

The next line: use core::panic::PanicInfo is a bit like a C-include. It includes the definition for the type “PanicInfo” which is needed by the panic handler at the end of the program. The rust functions that follow have obvious parallels with their C counterparts. The key differences are:
unsafe : this keyword suspends some of Rust’s compile time memory/type safety checking to allow raw pointers to be used.
(unsafe reference : https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html). The data types are quite similar to those defined in stdint.h : uint32_t = u32 etc. Pointer behaviour is similar to C also. Consider this line from config_pins:

  let rcc_apb2enr  = 0x40021018 as *mut u32;

This declares rcc_apb2enr as a pointer to the address 0x40021018 which contains a changeable unsigned 32 bit integer. Pointer deferencing is just like C’s.
One additional difference: the bitwise NOT is an exclamation point (!) as opposed to C’s tilde (~)

To build the rust program the following command is executed from the same directory as the Cargo.toml file:

cargo build

Assuming all goes well the output executable is written to the file
target/thumbv7m-none-eabi/debug/slightly_rusty

Loading on to the chip
The output files main.elf and slightly_rusty are loaded in the same way on to the target hardware. Start an openocd session in one terminal from the top level directory (rust_vs_c) as follows
/usr/bin/openocd -f stm32f103_aliexpress.cfg
This is the default openocd (version 0.10.0) that comes with Ubuntu 19.04
Assuming your devices are plugged in ok you should see an output something like this:
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v17 API v2 SWIM v4 VID 0x0483 PID 0x3748
Info : using stlink api v2
Info : Target voltage: 3.244914
Info : stm32f1x.cpu: hardware has 6 breakpoints, 4 watchpoints

In another window run arm-none-eabi-gdb and execute the following commands
target remote :3333
load c/main.elf
monitor reset

This loads the C version of the progam and you should see the blue pill’s onboard LED blink.
Execute the following commands to run the rust version:
load rust/target/thumbv7m-none-eabi/debug/slightly_rusty
monitor reset

Hopefully the LED starts blinking again.

Where to from here? There are crates (rust libraries) online that define the various peripherals for STM32F103 devices etc. I’ve begun looking at these and while they are great in that they convert the STM32 SVD files to Rust they are also quite big – leading to a quite large final executable. More investigations are needed as well as a project to drive it all along.
Code for these examples is over on github

Environmental sensing using the STM32G071

stm32g071_environment_assembled_board
Full size image
This project displays information about the environment on an ST7789 display. There’s a text version of the data and a simple trend graph of the dust/pollen data. The sensors used are as follows:
Pressure/Temperature: BMP280
Light : TLS2561
Dust : DM501A

The BMP280 and TLS2561 connect back to the STM32G071 over the I2C1 bus. The DM501A light sensor connects back to a GPIO pin. This pin is sampled at 1kHz. When a dust particle is detected in the sensor this pin is driven low. The program simply counts the number of milliseconds (samples) that this pin is low over a 1 minute interval and displays this figure.

Environmental information is sent back to a host PC using a UART. It is also written to the ST7789 display.

Code is available over here on github.

Getting started with the STM32G071

I bought a couple of STM32G071’s about a month ago but didn’t do much with them as openocd as supplied in Ubuntu didn’t support them. There is however a snapshot of openocd that does. It’s a little out of the way for now and no doubt will be included in the next release so check before trying any of this. So the steps I followed are:

Download and extract : http://openocd.zylin.com/gitweb?p=openocd.git;a=snapshot;h=dcec354bfc756c4a4e1034c9461b5d3f4e55a63e;sf=tgz
Go in to this directory and run the following
git clone http://openocd.zylin.com/jimtcl
Now, I wasn’t expecting to have to do the next edit but I got compiler errors without it – I suspect my gcc version (8.3.0) is different from the original author’s.
Open file src/flash/nor/stm32g0x.c and starting at line 929 you will see the following code:
COMMAND_HANDLER(stm32x_handle_options_read_command)
{
struct target *target = NULL;
//struct stm32g0x_flash_bank *stm32x_info = NULL; // <—— COMMENT OUT THIS LINE

if (CMD_ARGC < 1)
return ERROR_COMMAND_SYNTAX_ERROR;

struct flash_bank *bank;
int retval = CALL_COMMAND_HANDLER(flash_command_get_bank, 0, &bank);
if (ERROR_OK != retval)
return retval;

//stm32x_info = bank->driver_priv; // <—— COMMENT OUT THIS LINE

target = bank->target;

if (target->state != TARGET_HALTED) {
LOG_ERROR(“Target not halted”);
return ERROR_TARGET_NOT_HALTED;
}

etc.

run ./bootstrap
run ./configure –enable-stlink –enable-maintainer-mode –disable-internal-libjaylink
make
sudo make install
This left me with an installation of openocd in /usr/local

Now that I had openocd I needed to get set up for blinky. The STM32G071 version I have is an LQFP32 so a breakout board is needed along with pin headers. Lots of flux and some solder wick was used to fix errors.

The picture below shows the finished circuit
stm32g0blinky

Next step was to put together a simple blinky program. A little background work was needed first to get register definitions and so on. I dug around the web and found the appropriate SVD file for this chip and used svdconv to generate a header file. The final code and the openocd configuration script (stm32g0.cfg) can be found over on github. More examples will follow.

Dublin Maker and Apollo 11

As it happens, Dublin Maker 2019 will occur on the same day as the 50th anniversary of the Apollo 11 moon landing. Once I realized this I had to add a simple moon-lander style game to the Dublin Maker (unofficial) badge.
DMBadgeMoonLander
Flash memory usage is running at 99.98% so I suppose the software for the badge is now “done”
Final badge code is available over here on github.

Badge building in Tog

badge_building_1_2019
We had a 2 hour Dublin Maker badge building session in Tog on Monday the 27th. 10 people showed up to (partially) assemble 13 badges. The firmware will need to be finalized before the final build which should hopefully happen in the next couple of weeks. Hoping to get a total of 15 built.