This post follows on from my previous one which was an opening foray into the world of Rust. Following a comment I received I decided I would look at the relative performance of Rust and C for the trivial example of blinky running as fast as the CPU can do it (using the default reset clock speed). This involved modifying the C and Rust code.
The modified main.c
The modified code for main.c is shown below. The main differences between this and the version from my previous post are:
#defines used to define the addresses of the I/O registers. This means that the handling of pointer indirection happens at compile time – not run time. Also, it means that RAM is not used to represent the pointers at run-time.
The BSR and BRR (bit-set and bit reset registers) are used for setting and clearing bit 13. This produces smaller and faster code as there is no software bitwise AND/OR involved at run-time.
Two build changes:
optimization is turned up to level 2 by passing the -O2 argument to arm-none-eabi-gcc
debugging information is removed (building for release)
/* User LED for the Blue pill is on PC13 */ #include // This header includes data type definitions such as uint32_t etc. #define rcc_apb2enr ( (volatile uint32_t *) 0x40021018 ) #define gpioc_crh ( (volatile uint32_t *) 0x40011004 ) #define gpioc_odr ( (volatile uint32_t *) 0x4001100c ) #define gpioc_bsr ( (volatile uint32_t *) 0x40011010 ) #define gpioc_brr ( (volatile uint32_t *) 0x40011014 ) // The main function body follows int main() { // Do I/O configuration // Turn on GPIO C *rcc_apb2enr |= (1 << 4); // set bit 4 // Configure PC13 as an output *gpioc_crh |= (1 << 20); // set bit 20 *gpioc_crh &= ~((1 << 23) | (1 << 22) | (1 << 21)); // clear bits 21,22,23 while(1) // do this forever { *gpioc_bsr = (1 << 13); // set bit 13 *gpioc_brr = (1 << 13); // clear bit 13 } } // The reset interrupt handler void reset_handler() { main(); // call on main while(1); // if main exits then loop here until next reset. } // Build the interrupt vector table const void * Vectors[] __attribute__((section(".vector_table"))) ={ reset_handler };
This code is built and gdb reports it's load size as 76 bytes when loaded onto the STM32F103. Executing the code we see the port pin behave as follows:
I’m not clear on why there is a variation of cycle length every other cycle – maybe its as a result of the sampling on my cheap logic analyzer but the important point here is that the waveform is exactly the same for both C and Rust versions of this program so there is no performance hit in tight loops such as this.
The Rust version
The Rust version of the program is shown below. This is different to the version in my previous post as it uses the stm32f1 crate. The example is taken from a Jonathan Klimt’s blog over here. There’s quite a lot going on under the surface in this crate but crucially this all seems to happen at compile time because the output code is just as efficient as the C code from a speed perspective although it is a good deal bigger : 542 bytes vs C’s 76. Why the difference? There are several reasons but one big chunk is accounted for by the fact that the Rust version has a fully populated interrupt vector table which takes up 304 bytes. The C-version only includes the initial stack pointer and the Reset vector : a total of 8 bytes.
// std and main are not available for bare metal software #![no_std] #![no_main] extern crate stm32f1; extern crate panic_halt; extern crate cortex_m_rt; use cortex_m_rt::entry; use stm32f1::stm32f103; // use `main` as the entry point of this application #[entry] fn main() -> ! { // get handles to the hardware let peripherals = stm32f103::Peripherals::take().unwrap(); let gpioc = &peripherals.GPIOC; let rcc = &peripherals.RCC; // enable the GPIO clock for IO port C rcc.apb2enr.write(|w| w.iopcen().set_bit()); gpioc.crh.write(|w| unsafe{ w.mode13().bits(0b11); w.cnf13().bits(0b00) }); loop{ gpioc.bsrr.write(|w| w.bs13().set_bit()); gpioc.brr.write(|w| w.br13().set_bit()); } }
The blinky loops in assembler
The version produced by the C code is shown below. r1 and r2 have been previously set up to point at the BSR and BRR registers; while r3 contains the mask value (1 << 13).
0x08000034 : str r3, [r1, #0] 0x08000036 : str r3, [r2, #0] 0x08000038 : b.n 0x8000034
The version produced by the Rust code is below. In this case, r0 is set to point at the data structure that manages Port C and the offsets used in the store instructions are used to select the BSR and BRR registers. Register r1 has been preset to the mask value (1 << 13).
0x0800017a : str r1, [r0, #12] 0x0800017c : str r1, [r0, #16] 0x0800017e : b.n 0x800017a
Final thoughts
Rust is considered to be a safer language to use than C. This benefit must come at some cost however. I’ve seen here that the cost is not in execution speed so where is it? Well, the C version is completely built in 168ms while the Rust version takes over 2 minutes – note this is a “once-off” cost in a way; incremental builds are nearly as quick as the C version. A full rebuild only happens if you build after a “cargo clean” command or if you modify the Cargo.toml file (which changes the build options).
Does this encourage me to pursue Rust a little more – definitely yes and with a more complex project that might reveal the greater safety of Rust.
Whether the interrupt table is included the binary or not doesn’t matter; the flash memory is not usable for anything else.
The rest of the size difference lies in the error handling: you’re using `.unwrap()` which is fallible and thus adds runtime checks and panicking. Furthermore you’re using the `panic_halt` crate which also adds some bytes for the panic_handler…
NB: `extern crate` is outdated style, in Rust 2018 edition only `use` statements are necessary.
LikeLike