Speeding up some micropython with a touch of inline assembly on the Raspberry Pi Pico

I have been working on a graphics library for the ST7735 and the Raspberry Ri Rico. The first version was written in pure micropython and worked well enough but was quite slow – especially when writing out blocks of colour (fillRectangle). This was the original code for fillRectangle:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)        
        msg=bytearray()
        while(pixelcount >0):
            pixelcount = pixelcount-1          
            msg.append(colour >> 8)
            msg.append(colour & 0xff)
        self.spi.write(msg)

Not only was this slow, it also required that a buffer be created that held the filled rectangle in RAM. This was slow and memory intensive.

The new version looks like this:

def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w
        self.command(0x2c)
        self.a0.value(1)
        self.fill_block(colour,pixelcount) 

It makes use of an inline assembler function the source code of which is as follows:

@micropython.asm_thumb
    def fill_block(r0,r1,r2):
        # pointer to self passed in r0
        # r1 contains the 16 bit data to be written
        # r2 countains count
        # Going to use SPI0.
        # Base address = 0x4003c000
        # SSPCR0 Register OFFSET 0
        # SSPCR1 Register OFFSET 4
        # SSPDR Register OFFSET 8
        # SSPSR Register OFFSET c
        push({r1,r2,r3,r4,r7})
        # Convoluted load of a 32 value into r7
        mov(r7,0x40)
        lsl(r7,r7,8)
        add(r7,0x03)
        lsl(r7,r7,8)
        add(r7,0xc0)
        lsl(r7,r7,8)
        add(r7,0x00)
        mov(r4,2)        
        label(fill_block_loop_start)
        cmp(r2,0)
        beq(fill_block_exit)        
        mov(r3,r1) # read next byte
        lsr(r3,r3,8)
        strb(r3,[r7,8]) # write to SPI
        label(fill_block_spi_wait1)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait1)
        
        mov(r3,r1) # read next byte        
        strb(r3,[r7,8]) # write to SPI        
        sub(r2,r2,1) # decrement count                
        label(fill_block_spi_wait2)        
        ldr(r3,[r7,0xc]) # read next byte
        and_(r3,r4)
        beq(fill_block_spi_wait2)
        b(fill_block_loop_start)
        
        label(fill_block_exit)
        pop ({r1,r2,r3,r4,r7})

This writes the colour value directly to the SPI port the required number of times. It needs to pause when the SPI FIFO fills up (hence he need for the labels fill_block_spi_wait1/2).

The performance improvement is about a factor of 20!

Code is available over on gihub and is likely to change lots in the next couple of weeks while I prepare for a STEM event.

Missing interrupts with Zephyr OS on the Microbit V2.21

The case of the missing interrupts

Recently, we asked our students to buy BBC Microbits for our Internet of Things module. Most of them received version 2.0 of the board. Some however received version 2.21. The main difference between the boards is the USB interface MCU. Version 2.00 used an NXP MKL27Z256VFM4. The 2.21 version changed this to a Nordic Semiconductor NRF52820 device. In most cases, users will not notice the difference between the two boards. However, if you are programming hardware registers directly there are some differences. We use Zephyr OS in our IoT module and work at the hardware level so, for us, this showed up as missing interrupts from the LSM303 Accelerometer/Magnetometer.

Our sample code programs the LSM303 as a step counter. When the accelerometer experiences low acceleration in all 3 axes (i.e. in freefall) it outputs an interrupt signal. This signal pulls the I2C_INT_INT line low (falling edge interrupt trigger). On Version 2.00 boards this worked fine, not so on V2.21 boards. The problem turned out to be that the interface MCU was holding the I2C_INT_INT line low permanently which prevented falling edge interrupts.

The solution

I contacted microbit.org and quickly received a response from Carlos. He pointed me at this site which documents the I2C interface protocol implemented by the interface MCU. This looks quite interesting and will need further exploration at a later date. Carlos suggest that I program the target to perform a dummy read of the interface MCU over the I2C bus. I tried this and it almost solved the problem. Just after programming or after pressing the reset button, the Microbit processed interrupts correctly. After a power on reset however interrupts did not take place. Again Carlos came to the rescue and suggested that I pause the target boot for 1 second before performing the dummy read. This time allows the interface MCU to complete boot up before the dummy read. The result? Interrupts were processed correctly!

The initialization code for the LSM303 motion sensor was modified as shown below:

static const struct device *i2c;
int lsm303_ll_begin()
{
	int nack;
	uint8_t device_id;
	// Set up the I2C interface
	i2c = device_get_binding("I2C_1");
	if (i2c==NULL)
	{
		printk("Error acquiring i2c1 interface\n");
		return -1;
	}	
	// Fix for version 2.21 of the Microbit.  
	// This code resets the I2C_INT_INT signal coming out of the interface IC (DAPLink)
	// There is an acknowledged bug in the firmware for this IC which leave the interrupt
	// line asserted under certain conditions.  This prevents the LSM303 from raising interrupts
	// A dummy read of the interface IC (I2C address 0x70) deasserts this signal
	// Thanks to Carlos in microbit for this help.
	k_msleep(1000); // allow interface MCU complete booting before dummy read
	uint8_t dummy_value[5];
	nack=i2c_read(i2c,dummy_value,1,0x70);	
	printk("nack=%x\n",nack);
	
	
	// Check to make sure the device is present by reading the WHO_AM_I register
	nack = lsm303_ll_readRegister(0x0f,&device_id);
	if (nack != 0)
	{
		printk("Error finding LSM303 on the I2C bus\n");
		return -2;
	}
	else	
	{
		printk("Found LSM303.  WHO_AM_I = %d\n",device_id);
	}
	lsm303_ll_writeRegister(0x20,0x77); //wake up LSM303 (max speed, all accel channels)
	lsm303_ll_writeRegister(0x23,0x08); //enable  high resolution mode +/- 2g
	
	return 0;
}

Thanks to Carlos Pereira Atencio from the Microbit foundation for lots of help solving this problem.

Full schematics are available here:https://tech.microbit.org/hardware/schematic/

Early experiments with Raspberry Pi Pico, Micropython and an ST7735 LCD

The Raspberry Pi Pico and micropython are new to me. I was curious to see how it would perform with graphics on an ST7735. From what I can see, the SPI output is just as fast as any other language however the preparation of SPI data is slower. I tried two approaches for writing filled rectangles to the display. The first prepared a bytearray filled with the desired colour values. This bytearray is then sent to the SPI bus with a single write command. From what I could see, the write is at full SPI speed with no gaps between bytes. The filling of the bytearray was a little slow.

The second approach involved sending the colour to the display in 2 byte blocks SPI. This eliminated the in-memory bytearray however it added a lot of CPU overhead. In the end I stayed with the first approach for now. I expect that I will be coming back to this in the future.

My initial demo was developed using the Thonny editor which is quite nice and looks like it may be suitable for STEM workshops.

The main function looks like this:

from machine import Pin, Timer,SPI
import urandom
import time
import st7735 
display=st7735.st7735()

while(1):
    x1=urandom.randint(0,120)
    y1=urandom.randint(0,150)
    x2=urandom.randint(0,120)
    y2=urandom.randint(0,150)
    w=urandom.randint(1,127-x1)
    h=urandom.randint(1,159-y1)
    colour=urandom.randint(0,65535)    
    display.drawLine(x1,y1,x2,y2,colour)
    



And the ST7735 library (for now) is this:

from machine import Pin, Timer,SPI
import time
class st7735:
    def __init__(self):        
        # configure pins for output
        self.reset=Pin(21,Pin.OUT)
        self.cs=Pin(22,Pin.OUT)
        self.a0=Pin(20,Pin.OUT)
        self.spi=SPI(0,baudrate=20000000,polarity=1,phase=1,bits=8,firstbit=SPI.MSB,sck=Pin(18),mosi=Pin(19))        
        self.cs.value(1)
        self.reset.value(1)
        time.sleep_ms(10)
        
        self.reset.value(0)
        time.sleep_ms(200)
        
        self.reset.value(1)
        time.sleep_ms(200)
        self.cs.value(200)
        time.sleep_ms(200)
        
        self.cs.value(0)
        time.sleep_ms(20)
        
        self.command(1) # software reset
        time.sleep_ms(100)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0x11)
        time.sleep_ms(120)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)        
        time.sleep_ms(1)
        
        self.command(0xb1)
        self.data(0x05)
        self.data(0x3c)
        self.data(0x3c);
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0xb2)
        self.data(0x05)
        self.data(0x3c)
        self.data(0x3c);        
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0xb3)
        self.data(0x05)
        self.data(0x3c)
        self.data(0x3c)
        self.data(0x05)
        self.data(0x3c)
        self.data(0x3c)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0xb4)
        self.data(0x03)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0x36)
        self.data(0xc8)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0x3a)
        self.data(0x05)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0x29)
        time.sleep_ms(100)
        self.cs.value(1)
        time.sleep_ms(1)
        self.cs.value(0)
        time.sleep_ms(1)
        
        self.command(0x2c)          
        self.fillRectangle(0,0,128,160,0)
        
    def command(self,cmd):
        self.a0.value(0)
        msg=bytearray()
        msg.append(cmd)
        self.spi.write(msg)
        
    def data(self,cmd):
        self.a0.value(1)
        msg=bytearray()
        msg.append(cmd)
        self.spi.write(msg)        
    def openAperture(self,x1,y1,x2,y2):
        self.command(0x2a)
        self.data(x1 >> 8)
        self.data(x1 & 0xff)
        self.data(x2 >> 8)
        self.data(x2 & 0xff)
        self.command(0x2b)
        self.data(y1 >> 8)
        self.data(y1 & 0xff)
        self.data(y2 >> 8)
        self.data(y2 & 0xff)        
        self.command(0x2c)
    def fillRectangle(self,x1,y1,w,h,colour):
        self.openAperture(x1,y1,x1+w-1,y1+h-1)        
        pixelcount=h*w        
        self.command(0x2c)
        self.a0.value(1)
        msg=bytearray()
        while(pixelcount >0):
            pixelcount = pixelcount-1           
            msg.append(colour >> 8)
            msg.append(colour & 0xff)
        self.spi.write(msg)
    def putPixel(self,x,y,colour):
        self.openAperture(x,y,x+1,y+1)        
        self.a0.value(1)
        msg=bytearray()
        msg.append(colour >> 8)
        msg.append(colour & 0xff)
        self.spi.write(msg)
    def drawLine(self,x0,y0,x1,y1,colour):
        if ( (abs(y1-y0) < abs(x1-x0))):
             if (x0 > x1):
                 self.drawLineLowSlope(x1,y1,x0,y0,colour)
             else:
                 self.drawLineLowSlope(x0,y0,x1,y1,colour)
        else:
            if (y0 > y1):
                self.drawLineHighSlope(x1,y1,x0,y0,colour)
            else:
                self.drawLineHighSlope(x0,y0,x1,y1,colour)
            
    def drawLineLowSlope(self,x0,y0,x1,y1,colour):
        # Reference : https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm    
        dx = x1 - x0
        dy = y1 - y0
        yi = 1
        if (dy < 0):
            yi = -1
            dy = -dy
        D = 2*dy - dx
        y = y0
        for x in range(x0,x1+1):
            self.putPixel(x,y,colour)
            if (D > 0):
                y = y + yi
                D = D - 2*dx
            D=D + 2*dy
    def drawLineHighSlope(self,x0,y0,x1,y1,colour):
        # Reference : https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm    
        dx = x1 - x0
        dy = y1 - y0
        xi = 1
        if (dx < 0):
            xi = -1
            dx = -dx
        D = 2*dx - dy
        x = x0
        for y in range(y0,y1+1):
            self.putPixel(x,y,colour)
            if (D > 0):
                x = x + xi
                D = D - 2*dy
            D=D + 2*dx
            

Update on porting DM Badge 2022 to Zephyr 3.1

My Dublin Maker Badge 2022 ran on top of Zephyr 2.6. I’ve been working on porting it to Zephyr 3.1. This involved some superficial changes to header files. It also required some deeper changes to the way hardware was accessed (as a result of the requirement to make use of Device Tree). Anyway, I got through all of this and it works fine. I was not so lucky with the Bluetooth mesh communications however. It seems that there are some application key binding differences between 2.6 and 3.1 that escape me. Code over on github has been updated and I will keep working on the mesh communications problem when I get the time.

Moving a BLE application from Zephyr 2.6.0 to version 3.1.0

I have been using the Microbit V2 as a BLE development tool with our engineering students. Part of this involves looking below the covers at what is happening on the I2C bus. Students interact with the built-in accelerometer using I2C reads and writes (i.e. not using Zephyr’s high level driver). Version 2.6.0 of Zephyr allowed us to do this without many problems. Students were given a basic BLE step counter application which they analysed and extended in a practical session. Zephyr OS is under constant development of course and it has moved on to version 3.1 over the past few months. Naturally this broke all of my sample code 🙂

So what’s changed. Well, first of all, the Zephyr include path has gone from

#include <bluetooth/bluetooth.h>

to

#include <zephyr/bluetooth/bluetooth.h>

Now there is a project configuration variable that allows legacy include directory names but I decided that I would go in and make the changes to the various source files. This turned out to be the easiest change to make.

The next hurdle turned out to be the PINCTRL system built in to Device Tree. This is something I avoided in Zephyr 2.6 as it has a big enough learning curve, lacks some documentation and is very different to the bare-metal microcontroller way of managing I/O pins. The approach taken was instead to assign pins to functions in the app.overlay file as follows:


&i2c1 { 
	compatible = "nordic,nrf-twim"; 
	status = "okay"; 
	sda-pin = < 16 >;// P0.16 = I2C_INT_SDA 
	scl-pin = < 8 >; // P0.8 = I2C_INT_SCL 
}; 
&spi2 { 
	compatible = "nordic,nrf-spi"; 
	status = "okay"; 
	sck-pin = <17>; 
	mosi-pin = <13>; 
/* Redirecting MISO to a pin that is not connected on the microbit v2 board */ 
	miso-pin = <27>; 
	clock-frequency = <1000000>; 
}; 
&adc { 
	status = "okay"; 
}; 
&pwm0 { 
	status = "okay"; 
// P0.3 is labelled RING1 on the microbit. 
	ch0-pin = <3>; 
};


Attempting to do this with version 3.1 results in lots of errors of the following form:

static assertion failed: “/soc/spi@40023000 has legacy *-pin properties defined although PINCTRL is enabled

Why does this happen? When I build the code I use the following command:

west build -b bbc_microbit_v2 ble_stepcount -p

This uses the board definitions for the bbc_microbit_v2 device as defined in the directory zephyr/boards/arm/bbc_microbit_v2/

This directory contains a number of files that define the device tree and various project build options. The result is that the build system requires all pin definitions to be carried out using the PINCTRL mechanism. So what’s different? Here’s the PINCTRL version of my app.overlay file.

&pinctrl {
   /* IMPORTANT!  There should not be a space before the : in the next line (and 	similar below) */
    spi2_default_alt: spi2_default_alt {
        group1 {
            psels = <NRF_PSEL(SPIM_MOSI,0,17)>,
                    <NRF_PSEL(SPIM_SCK,0,13)>;
        };
        group2 {
        	psels = <NRF_PSEL(SPIM_MISO, 0, 27)>;
	      bias-pull-down;
        };
        
    };
    spi2_sleep_alt: spi2_sleep_alt {
        group1 {
            psels = <NRF_PSEL(SPIM_MOSI,0,17)>,
                    <NRF_PSEL(SPIM_SCK,0,13)>,
                    <NRF_PSEL(SPIM_MISO, 0, 27)>;
            low-power-enable;
        };
    };
    i2c1_default_alt: i2c1_default_alt {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA,0,16)>,
                    <NRF_PSEL(TWIM_SCL,0,8)>;
            bias-pull-up;
        };
    };
    i2c1_sleep_alt: i2c1_sleep_alt {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA,0,16)>,
                    <NRF_PSEL(TWIM_SCL,0,8)>;
            low-power-enable;
        };
    };
 };
 
 &spi2 {
    compatible = "nordic,nrf-spi";
    status = "okay";
    pinctrl-0 = <&spi2_default_alt>;
    pinctrl-1 = <&spi2_sleep_alt>;
    pinctrl-names = "default", "sleep";

    clock-frequency = <1000000>;
};
&i2c1 {
	compatible = "nordic,nrf-twim";
	status = "okay";
	pinctrl-0 = <&i2c1_default_alt>;
	pinctrl-1 = <&i2c1_sleep_alt>;
	pinctrl-names = "default", "sleep";
	label = "I2C_1";
};
&i2c0 {
    status="disabled";
};
&gpio0 {
    status="okay";
    label="GPIO_0";
};
&adc {
        status = "okay";
};
&pwm0 {
        status = "okay";
// P0.3 is labelled RING1 on the microbit. 
        ch0-pin = <3>; 
};


		

As you can see, there are lots of changes. The pinctrl section is new. This section allows us to redefine the pins used by the microcontroller peripherals spi2 and i2c1. We write definitions for each of these for the default powered up state and for the sleep state. (Definitions for sleep state are not required if you disable power management in the project configuration file). States contain one or more groups of pin definitions. These groups allocate actual pin numbers to peripheral functions and can set group properties such as whether there will be pull-up resistors etc. Pin assignment for SPI2 is follows:

psels = <NRF_PSEL(SPIM_MOSI,0,17)>,
<NRF_PSEL(SPIM_SCK,0,13)>,
<NRF_PSEL(SPIM_MISO, 0, 27)>;

This states that MOSI is on GPIO0, bit 17, SCK GPIO0, bit 13 etc. The trickiest part of this is finding out what are the correct names of the pins e.g. SPIM_MOSI, TWIM_SDA and so on. I found them in

zephyr/boards/arm/bbc_microbit_v2/bbc_microbit_v2-pinctrl.dtsi and

zephyr/boards/arm/nrf52833dk_nrf52833/nrf52833dk_nrf52833-pinctrl.dtsi

Having defined the pin assignments the next sections (&spi2, &i2c1) allow you to apply them to the hardware in this project. In the case of spi2, the pinctrl settings for state 0 (default) are remapped to spi2_default_alt with this line:

pinctrl-0 = <&spi2_default_alt>;

The configuration for sleep mode are configured by assigning a value to pinctrl-1 as follows:

pinctrl-1 = <&i2c1_sleep_alt>;

In my C code I had acquired a device handle I2C1 and GPIO0 using the names “I2C_1” and “GPIO_0” respectively. This did not work with Zephyr 3.1 but inserting label directives as shown above fixed this error.

At this point, the code would build and appear to work with one exception: Bluetooth notify failed to work correctly. I saw lots of warnings of the following form in a serial terminal monitoring the microbit:

<wrn> Device is not subscribed to characteristic

This happened when the bt_gatt_notify function was called in response to a change in step count value. Fixing this error took a while! The original BLE characteristic and service definition was as follows:

#define BT_GATT_CHAR1 \ BT_GATT_CHARACTERISTIC(&stepcount_id.uuid,BT_GATT_CHRC_READ | \  BT_GATT_CHRC_WRITE |  BT_GATT_CHRC_NOTIFY, 	BT_GATT_PERM_READ | \ BT_GATT_PERM_WRITE, read_stepcount, write_stepcount, &stepcount_value)


// ********************[ Service definition ]********************
#define BT_UUID_CUSTOM_SERVICE_VAL	\
	BT_UUID_128_ENCODE(1, 2, 3, 4, (uint64_t)0)
static struct bt_uuid_128 my_service_uuid = BT_UUID_INIT_128( BT_UUID_CUSTOM_SERVICE_VAL);

BT_GATT_SERVICE_DEFINE(my_service_svc,
	BT_GATT_PRIMARY_SERVICE(&my_service_uuid),
		BT_GATT_CHAR1
);

The new definition and additional code is as follows:

#define BT_GATT_CHAR1 BT_GATT_CHARACTERISTIC(&stepcount_id.uuid,BT_GATT_CHRC_READ | \ BT_GATT_CHRC_WRITE |  BT_GATT_CHRC_NOTIFY | BT_GATT_CCC_NOTIFY, \ BT_GATT_PERM_READ | BT_GATT_PERM_WRITE, read_stepcount, write_stepcount, \ &stepcount_value)

static void step_changed(const struct bt_gatt_attr *attr,
                                 uint16_t value)
{   if (value==BT_GATT_CCC_NOTIFY)
    {
       printk("Subscribed\n");
       Subscribed = 1;
    }
    else
    {
        printk("Not Subscribed\n");
        Subscribed = 0;
    }
}



// ********************[ Service definition ]********************
#define BT_UUID_CUSTOM_SERVICE_VAL	BT_UUID_128_ENCODE(1, 2, 3, 4, (uint64_t)0)
static struct bt_uuid_128 my_service_uuid = BT_UUID_INIT_128( BT_UUID_CUSTOM_SERVICE_VAL);
BT_GATT_SERVICE_DEFINE(my_service_svc,
	BT_GATT_PRIMARY_SERVICE(&my_service_uuid),
		BT_GATT_CHAR1,
        BT_GATT_CCC(step_changed,BT_GATT_PERM_READ | BT_GATT_PERM_WRITE)

Changes are highlighted in bold. What are the changes all about?

BT_GATT_CCC_NOTIFY : configures the attribute to send notifications if there are changes to attribute values.

Function step_changed : This function is a callback that is activated when there is a change to the subscription status of a BLE client. It updates a global variable called Subscribed which is used to later to determine whether a ble_gatt_notify event is sent.

BLE_GATT_CCC : This macro defines what looks like a new attribute of the characteristic BT_GAT_CHAR1 (the step counter). This attribute can be read and written by the client device to enable or disable notifications. When a BLE client subscribes or unsubscribes from notifications for this characteristic, the function step_changed is called (CCC stands for Client Characteristic Configuration).

All of the above took quite some time and I hope the description is of some help to other lost souls. Code is available over here on github. This will grow as a port the rest of my examples.

Dublin Maker Badge PCB design

Well, Dublin Maker is coming up on the 23rd of July this year. The unofficial electronic badge design is coming along well. I think (hope!) that the PCB design is fine and it will be sent off for manufacture soon. You can see the 3D viewer output from KiCad below. U3 is the pin header for the display, U2 is a pin header for the boost converter. U1 is an NRF52833 module. SW7 is shown as a pin header but is in fact an on/off switch with the same pin pitch. A “Simple Add-On” connector (SAO1) is also provided. This nearly conforms to the BadgeLife SAO 1.69 standard. It provides power, I2C and a single GPIO (rather than 2). I ran out of GPIO port bits in the design as I will not be using the pads underneath the NRF52833 module (can’t hand solder them). The IDC socket for the SAO interface will not be populated in the final badge but will instead be left to anyone who cares to solder one on.

All components for the build have now arrived except for the battery cases. These can hold a single AA battery and are mounted on the back of the badge. One last check and it will be off to PCBWay with the design!

Sprites, tiles, motion and transparency for DMB2022

All of the games I have written to date for embedded systems such as Breadboard Games and badges featured a uniform background which was usually black. This was simple to implement . If a character is to be moved, it is first overwritten with the background colour and then redrawn at a new location. More complex backgrounds that featured terrain such as grass or rock were not so easy to deal with. In a desktop programming environment I would probably tackle the problem as follows:

(1) Make a copy of the area that the character will obscure

(2) Draw the character

If a character is then to be moved, you simply write the copy of the obscured area to the screen to hide the character and then repeat the process at a new location.

Step (1) requires a readable display buffer. The ST7789 in this project is a write-only device. In theory I could make a frame buffer big enough to hold the entire screen and then repeatedly write this out over the SPI interface. The display has 240×240 pixels, each pixel encoded in 16 bits. A complete framebuffer for this requires 240x240x2 = 115200 bytes. The NRF52833 MCU driving the display has 128kB of RAM so not much would be left over for stack and variables. Dividing the display up into “tiles” can greatly reduce the RAM requirements.

Let’s divide the screen up into 30×30 pixel tiles. This may seem a little large however the ST7789 screen is very small and has a high pixel density. A 30×30 tile represents an area of 3mm x 3mm approx within which a texture or character is drawn. Let’s also use the following two tiles:

The left tile represents a character in a game (surrounded by transparency), the right one represents grass on the ground. We can completely fill the screen with 64 grass tiles. When our character moves across this background it can cover (at least partially) at most 4 grass tiles. This means that we can make use of a framebuffer in RAM that is 2 tiles x 2 tiles or 60x60x2 bytes in size (7200 bytes). We can move the character across the framebuffer in RAM and then write the framebuffer to the display. If the character moves beyond the edge of the framebuffer we can move where we write the framebuffer to the display and adjust the character’s position within the framebuffer. The image below shows how the framebuffer moves when the character moves diagonally across the screen (the background was not drawn to emphasize the movement).

The character moves across screen as shown in the following video:

https://youtube.com/shorts/gbuUdUqhvyA?feature=share

As can be seen, the movement is smooth and quite fast (it is actually artificially slowed down). There is however a problem: The bounding square of the character is drawn as white which overwrites the background. It would be much better if the background bounding rectangle for the character was treated as transparent.

A slight detour for PNG files.

I’m using KolourPaint in KDE (Kubuntu) to produce the tiles. These images are saved as PNG files with 4 channels: Red, Green, Blue and Alpha. The Alpha channel represents the transparency of a pixel. In this case I’m concerned with two levels of this: completely opaque (Alpha=255) and completely transparent (Alpha = 0). These PNG files are converted to C header files which encode the RGB values into a 16 bit colour value suitable for use with the ST7789. As mentioned in a previous blog post, the ST7789 uses 565 (RGB) encoding. A value of 11111 111111 11111 represents white. A slightly different value of 11111 111110 11111 (the LSB of the green channel is set to 0) looks nearly the same on the screen. I decided that the colour value of 11111 111111 11111 (65535) would represent “transparent” while any value that was meant to be white would be re-encoded as 11111 111110 11111 (65503). The following python script was then used to convert the png file to a C header file.

# Want to deal with transparency.  Need to nominate a particular colour as "transparent"
# Going to go with 0xffff as being transparent
# if a pixel is designed to be this colour it will be changed to 
# 0b11111 111110 11111 
# i.e. the least significant green bit will be set to 0.  This is slightly off the intended white
# but not by much.
import sys
Filename=sys.argv[1]
Forename=Filename.split(".")[0]
from PIL import Image
img=Image.open(Filename)
width, height = img.size
pixels = list(img.getdata())
print("#define ",end="")
print(Forename,end="")
print("_width ",end="")
print(width)
print("#define ",end="")
print(Forename,end="")
print("_height ",end="")
print(height)
print("static const uint16_t ",end="")
print(Forename,end="")
print("[]={")
for x in range(0,width):
	
	for y in range (0, height):		
		(Red,Green,Blue,Alpha) = pixels[(x*height)+y]
		# Colour format : Red : 5 bits, Green 6 bits, Blue 5 bits
		# Assuming all components are in range 0 - 255
		if (Alpha == 255):
			Red = Red >> 3 # discard 3 bits
			Blue = Blue >> 3 # discard 3 bits
			Green = Green >> 2 # discard 2 bits
			st7789_16 = (Red << 11) + (Green << 5) + Blue
			low_byte = st7789_16 & 0xff
			# have to do an endian swap
			high_byte = st7789_16 >> 8
			st7789_16 = (low_byte << 8) + high_byte		
			if (st7789_16 == 0xffff):
				st7789_16 = 0b1111111111011111
			print(st7789_16, end="")			
		else:
			print("65535", end="")
		print(",")		
print("};")	

The framebuffer output functions were adapted to take this special transparent colour into account and the new motion now looks like this:

https://youtube.com/shorts/Vsr7n7rsNRI?feature=share

Once again, motion looks smooth (and has been artificially slowed down).

Code for all of this is in quite an untidy state for now but will be uploaded to github over the next couple of weeks

DMB2022 Graphics subsystem

DMB2022 (Dublin Maker Badge for 2022 (not official)) is an electronic badge consisting of an NRF52833 module and a display. The display for the badge is an ST7789 module. It has an SPI interface that is operated at 32MHz. It is configured to use 16 bit RGB colour values arranged as follows:

5 most significant bits : Red

6 middle bits : Green

5 least significant bits : Blue

The NRF52833’s SPI3 is used to drive the display as SPI0 and SPI1 are limited to 8Mbs. SPI3 can operate up to 32Mbs.

The display.cpp module handles all interaction with the ST7789. Lots of LCD displays of this type have a similar pattern of operations.

At startup, the display is first configured which typically involves bringing the device out of low power mode and configuring its colour mode, orientation and optionally its colour palette. The initialization code achieves this by sending commands and data to the display. The D/C signal tells the display whether a command or data byte is being transmitted. If it is Low then a command is being sent; a High level implies data is being sent.

In order to put pixels on the display, the controlling program must first open up an aperture for drawing in. This aperture can range from the entire display to just a single pixel. Once opened, data can be written to the aperture as a continuous stream of bytes. The display performs a raster like operation on the incoming data with display lines auto-wrapping when the data stream reaches the right-hand side of the aperture.

All of the remaining graphic primitives are built on top of this mechanism (with a few performance optimizations). The display class includes the following functions (for now):

int begin();
	void command(uint8_t cmd);
	void data(uint8_t data);
	void openAperture(uint16_t x1, uint16_t y1, uint16_t x2, uint16_t y2);
	void putPixel(uint16_t x, uint16_t y, uint16_t colour);
	void putImage(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t *Image);
	void drawLine(uint16_t x0, uint16_t y0, uint16_t x1, uint16_t y1, uint16_t Colour);
	int iabs(int x); // simple integer version of abs for use by graphics functions        
	void drawRectangle(uint16_t x, uint16_t y, uint16_t w, uint16_t h, uint16_t Colour);
	void fillRectangle(uint16_t x,uint16_t y,uint16_t width, uint16_t height, uint16_t colour);
	void drawCircle(uint16_t x0, uint16_t y0, uint16_t radius, uint16_t Colour);
	void fillCircle(uint16_t x0, uint16_t y0, uint16_t radius, uint16_t Colour);
	void print(const char *Text, uint16_t len, uint16_t x, uint16_t y, uint16_t ForeColour, uint16_t BackColour);
	void print(uint16_t number, uint16_t x, uint16_t y, uint16_t ForeColour, uint16_t BackColour);
	uint16_t RGBToWord(uint16_t R, uint16_t G, uint16_t B);
	void drawLineLowSlope(uint16_t x0, uint16_t y0, uint16_t x1,uint16_t y1, uint16_t Colour);
	void drawLineHighSlope(uint16_t x0, uint16_t y0, uint16_t x1,uint16_t y1, uint16_t Colour);    

I will not do a deep dive on all of the functions, instead I will just focus on two of them : putPixel and fillRectangle.

Getting a dot on the screen : putPixel

This badge uses Zephyr OS as the underlying operating system (principally to make use of its Bluetooth Low Energy features). As a result it makes use of Zephyr’s peripheral device drivers to interface with the onboard peripherals of the NRF52833.

// Configuration for the SPI port.  Note the 32MHz clock speed possible only on SPI 3
// Pin usage by SPI bus defined in app.overlay.
static const struct spi_config cfg = {
	.frequency = 32000000,
	.operation = SPI_WORD_SET(8) | SPI_TRANSFER_MSB |  SPI_MODE_CPOL | SPI_MODE_CPHA,
	.slave = 0,
};
void display::putPixel(uint16_t x, uint16_t y, uint16_t colour)
{
    this->openAperture(x, y, x + 1, y + 1);
	struct spi_buf tx_buf = {.buf = &colour, .len = 2};
	struct spi_buf_set tx_bufs = {.buffers = &tx_buf, .count = 1};
	DCHigh();
	spi_write(spi_display, &cfg, &tx_bufs);
}

The SPI configuration structure cfg sets the SPI mode, bit order and role. It also specifies the transfer speed of 32Mbps. This is used in calls to spi_write.

Inside putPixel we see that a 1 pixel size aperture is opened on the display. Next, an spi_buf structure is prepared which contains a pointer to the data being written as well as it’s length. This is wrapped in an spi_buf_set structure required by the spi_write function. Finally, the D/C line is driven high indicating to the ST7789 that data is being written. The spi_write function handles the actual write operation. The first parameter for spi_write is a Zephyr device structure called spi_display which identifies the SPI device being used. This was obtained when the SPI interface was initialized and is used in a manner similar to a FILE structure in C file operations

As you can see from the above there is a bit of overhead associated with writing a pixel to the display. Functions in display.cpp that write multiple pixels don’t necessarily call putPixel as this would be too slow. Instead they interface directly with Zephyr’s I/O functions. An example of this is fillRectangle.

Filling large areas of the screen: fillRectangle

void display::fillRectangle(uint16_t x,uint16_t y,uint16_t width, uint16_t height, uint16_t colour)
{
	// This routine breaks the filled area up into sections and fills these.
	// This allows it to make more efficient use of the control lines and SPI bus
#define PIXEL_CACHE_SIZE 64
	static uint16_t fill_cache[PIXEL_CACHE_SIZE]; // use this to speed up writes
	uint32_t pixelcount = height * width;
	uint32_t blockcount = pixelcount / PIXEL_CACHE_SIZE;
	
	this->openAperture(x, y, x + width - 1, y + height - 1);
	DCHigh();
	struct spi_buf tx_buf = {.buf = &colour, .len = 2};
	struct spi_buf_set tx_bufs = {.buffers = &tx_buf, .count = 1};   

	if (blockcount)
	{
  		  for (int p=0;p<PIXEL_CACHE_SIZE;p++)
		  {
			fill_cache[p]=colour;
		  }
	}
	while(blockcount--)
	{
	   tx_buf.buf=fill_cache;
	   tx_buf.len = PIXEL_CACHE_SIZE*2;
	   spi_write(spi_display, &cfg, &tx_bufs);
	}

	pixelcount = pixelcount % PIXEL_CACHE_SIZE;
	while(pixelcount--) 
	{
	  tx_buf.buf = &colour;
 	  tx_buf.len = 2;		
		  spi_write(spi_display, &cfg, &tx_bufs);
	}	
}

Filling a rectangle on screen consists of two steps:

Open an aperture for the rectangle

Write pixel values to the display.

In an attempt to reduce the number of Zephyr API function calls this function divides the pixel data into chunks of size PIXEL_CACHE_SIZE. This allows that number of pixels to be written in a single call to spi_write (and maybe to leverage any optimization within the driver such as DMA). For example, suppose you want to fill an area of the screen consisting of 200 pixels. Using the value of 64 as a block size this would require the writing of three 64 pixel block and 8 individual pixel writes. So 200 calls to spi_write have been reduced to 11. If the function were to use putPixel then it would suffer from the additional overhead of opening an aperture for each individual pixel as well as controlling the D/C line for each of them. The above mechanism is a lot faster.

Code for all of this is very much a work in progress and is available over here on github. It will be changing a lot over the next few weeks

Testing an NRF52833 Module

The (unofficial) Dublin Maker Badge for 2022 will hopefully be based on an NRF52833 module. I got hold of couple and asked a colleague to make a breakout PCB for it to allow me experiment.

The PCB is a little rough but is OK for evaluation purposes. I won’t be using the inner layer of contacts for the module as I can’t solder on to them manually however I should have enough pins in what remains. Luckily the extremely flexible NRF52833 allows for routing of signals to just about any pin. Next step is to the module to the board.

The drill holes for the pin headers have more or less wiped out all of the pads for the pin headers but with some careful soldering I think I got everything wired up. Next step Blinky!

VDD and VDDH are connected to a 3.3V DC supply; the SWD interface is connected to a JLink EDU probe. A simple Zephyr based LED Blinky program was downloaded and tested. The JLink software complained a little about the SWD interface being unstable and it automatically dropped its speed to a lower value. Blinky seemed to work fine; how about a simple BLE example I previously used on the BBC Microbit V2? Well that worked fine too without any changes 🙂

Next step: Interface with a display to see if the SPI interface can be operated at full speed.