Speeding up some micropython with a touch of inline assembly on the Raspberry Pi Pico

I have been working on a graphics library for the ST7735 and the Raspberry Ri Rico. The first version was written in pure micropython and worked well enough but was quite slow – especially when writing out blocks of colour (fillRectangle). This was the original code for fillRectangle:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)        
        msg=bytearray()
        while(pixelcount >0):
            pixelcount = pixelcount-1          
            msg.append(colour >> 8)
            msg.append(colour & 0xff)
        self.spi.write(msg)

Not only was this slow, it also required that a buffer be created that held the filled rectangle in RAM. This was slow and memory intensive.

The new version looks like this:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)
        self.fill_block(colour,pixelcount) 

It makes use of an inline assembler function the source code of which is as follows:

@micropython.asm_thumb
    def fill_block(r0,r1,r2):
        # pointer to self passed in r0
        # r1 contains the 16 bit data to be written
        # r2 countains count
        # Going to use SPI0.
        # Base address = 0x4003c000
        # SSPCR0 Register OFFSET 0
        # SSPCR1 Register OFFSET 4
        # SSPDR Register OFFSET 8
        # SSPSR Register OFFSET c
        push({r1,r2,r3,r4,r7})
        # Convoluted load of a 32 value into r7
        mov(r7,0x40)
        lsl(r7,r7,8)
        add(r7,0x03)
        lsl(r7,r7,8)
        add(r7,0xc0)
        lsl(r7,r7,8)
        add(r7,0x00)
        mov(r4,2)        
        label(fill_block_loop_start)
        cmp(r2,0)
        beq(fill_block_exit)        
        mov(r3,r1) # read next byte
        lsr(r3,r3,8)
        strb(r3,[r7,8]) # write to SPI
        label(fill_block_spi_wait1)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait1)
        
        mov(r3,r1) # read next byte        
        strb(r3,[r7,8]) # write to SPI        
        sub(r2,r2,1) # decrement count                
        label(fill_block_spi_wait2)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait2)
        b(fill_block_loop_start)
        
        label(fill_block_exit)
        pop ({r1,r2,r3,r4,r7})

This writes the colour value directly to the SPI port the required number of times. It needs to pause when the SPI FIFO fills up (hence he need for the labels fill_block_spi_wait1/2).

The performance improvement is about a factor of 20!

Code is available over on gihub and is likely to change lots in the next couple of weeks while I prepare for a STEM event.

Missing interrupts with Zephyr OS on the Microbit V2.21

The case of the missing interrupts

Recently, we asked our students to buy BBC Microbits for our Internet of Things module. Most of them received version 2.0 of the board. Some however received version 2.21. The main difference between the boards is the USB interface MCU. Version 2.00 used an NXP MKL27Z256VFM4. The 2.21 version changed this to a Nordic Semiconductor NRF52820 device. In most cases, users will not notice the difference between the two boards. However, if you are programming hardware registers directly there are some differences. We use Zephyr OS in our IoT module and work at the hardware level so, for us, this showed up as missing interrupts from the LSM303 Accelerometer/Magnetometer.

Our sample code programs the LSM303 as a step counter. When the accelerometer experiences low acceleration in all 3 axes (i.e. in freefall) it outputs an interrupt signal. This signal pulls the I2C_INT_INT line low (falling edge interrupt trigger). On Version 2.00 boards this worked fine, not so on V2.21 boards. The problem turned out to be that the interface MCU was holding the I2C_INT_INT line low permanently which prevented falling edge interrupts.

The solution

I contacted microbit.org and quickly received a response from Carlos. He pointed me at this site which documents the I2C interface protocol implemented by the interface MCU. This looks quite interesting and will need further exploration at a later date. Carlos suggest that I program the target to perform a dummy read of the interface MCU over the I2C bus. I tried this and it almost solved the problem. Just after programming or after pressing the reset button, the Microbit processed interrupts correctly. After a power on reset however interrupts did not take place. Again Carlos came to the rescue and suggested that I pause the target boot for 1 second before performing the dummy read. This time allows the interface MCU to complete boot up before the dummy read. The result? Interrupts were processed correctly!

The initialization code for the LSM303 motion sensor was modified as shown below:

static const struct device *i2c;
int lsm303_ll_begin()
{
	int nack;
	uint8_t device_id;
	// Set up the I2C interface
	i2c = device_get_binding("I2C_1");
	if (i2c==NULL)
	{
		printk("Error acquiring i2c1 interface\n");
		return -1;
	}	
	// Fix for version 2.21 of the Microbit.  
	// This code resets the I2C_INT_INT signal coming out of the interface IC (DAPLink)
	// There is an acknowledged bug in the firmware for this IC which leave the interrupt
	// line asserted under certain conditions.  This prevents the LSM303 from raising interrupts
	// A dummy read of the interface IC (I2C address 0x70) deasserts this signal
	// Thanks to Carlos in microbit for this help.
	k_msleep(1000); // allow interface MCU complete booting before dummy read
	uint8_t dummy_value[5];
	nack=i2c_read(i2c,dummy_value,1,0x70);	
	printk("nack=%x\n",nack);
	
	
	// Check to make sure the device is present by reading the WHO_AM_I register
	nack = lsm303_ll_readRegister(0x0f,&device_id);
	if (nack != 0)
	{
		printk("Error finding LSM303 on the I2C bus\n");
		return -2;
	}
	else	
	{
		printk("Found LSM303.  WHO_AM_I = %d\n",device_id);
	}
	lsm303_ll_writeRegister(0x20,0x77); //wake up LSM303 (max speed, all accel channels)
	lsm303_ll_writeRegister(0x23,0x08); //enable  high resolution mode +/- 2g
	
	return 0;
}

Thanks to Carlos Pereira Atencio from the Microbit foundation for lots of help solving this problem.

Full schematics are available here:https://tech.microbit.org/hardware/schematic/