Speeding up some micropython with a touch of inline assembly on the Raspberry Pi Pico

I have been working on a graphics library for the ST7735 and the Raspberry Ri Rico. The first version was written in pure micropython and worked well enough but was quite slow – especially when writing out blocks of colour (fillRectangle). This was the original code for fillRectangle:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)        
        msg=bytearray()
        while(pixelcount >0):
            pixelcount = pixelcount-1          
            msg.append(colour >> 8)
            msg.append(colour & 0xff)
        self.spi.write(msg)

Not only was this slow, it also required that a buffer be created that held the filled rectangle in RAM. This was slow and memory intensive.

The new version looks like this:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)
        self.fill_block(colour,pixelcount) 

It makes use of an inline assembler function the source code of which is as follows:

@micropython.asm_thumb
    def fill_block(r0,r1,r2):
        # pointer to self passed in r0
        # r1 contains the 16 bit data to be written
        # r2 countains count
        # Going to use SPI0.
        # Base address = 0x4003c000
        # SSPCR0 Register OFFSET 0
        # SSPCR1 Register OFFSET 4
        # SSPDR Register OFFSET 8
        # SSPSR Register OFFSET c
        push({r1,r2,r3,r4,r7})
        # Convoluted load of a 32 value into r7
        mov(r7,0x40)
        lsl(r7,r7,8)
        add(r7,0x03)
        lsl(r7,r7,8)
        add(r7,0xc0)
        lsl(r7,r7,8)
        add(r7,0x00)
        mov(r4,2)        
        label(fill_block_loop_start)
        cmp(r2,0)
        beq(fill_block_exit)        
        mov(r3,r1) # read next byte
        lsr(r3,r3,8)
        strb(r3,[r7,8]) # write to SPI
        label(fill_block_spi_wait1)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait1)
        
        mov(r3,r1) # read next byte        
        strb(r3,[r7,8]) # write to SPI        
        sub(r2,r2,1) # decrement count                
        label(fill_block_spi_wait2)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait2)
        b(fill_block_loop_start)
        
        label(fill_block_exit)
        pop ({r1,r2,r3,r4,r7})

This writes the colour value directly to the SPI port the required number of times. It needs to pause when the SPI FIFO fills up (hence he need for the labels fill_block_spi_wait1/2).

The performance improvement is about a factor of 20!

Code is available over on gihub and is likely to change lots in the next couple of weeks while I prepare for a STEM event.

Zephyr on the NRF52840 dongle

I got an NRF52840 dongle at Christmas and have been trying to work with Zephyr on it. An initial set of examples is over here. These examples include a simple “Hello World” with a blinking LED, a minimal BLE example and an example that drives an ST7735 display.

Nrf52840 dongle with an ST7735 display

A couple of points about the wiring:

The ST7735 has a built-in 3.3V regulator so it is supplied from the 5V VBus pin.

The serial interface is connected to a USB-Serial converter (not shown). The TX pin (from the NRF) is on P0.20. The RX pin (in to the NRF) is on pin P0.24