April 2020 – ioprog

MEMORY { flash : org = 0x00000000, len = 128k ram : org = 0x20000000, len = 16k } SECTIONS { . = ORIGIN(flash); .text : { *(.vectors); /* The interrupt vectors */ *(.text); *(.rodata); } >flash . = ORIGIN(ram); .data : { INIT_DATA_VALUES = LOADADDR(.data); INIT_DATA_START = .; *(.data); INIT_DATA_END = .; . = ALIGN(4); } >ram AT>flash BSS_START = .; .bss : { *(.bss); . = ALIGN(4); } > ram BSS_END = .; .padding : { /* This is a 64 byte block of 0xff's to ensure that the last */ /* page of the program is written to the MCU */ /* The openocd SAMD driver does not seem to flush the last partial */ /* page out properly */ LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff); LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff); LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff); LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff); } >flash }

stm32f030_st7789_nrf24l01

I’ve been working on a new project involving an STM32F030, an ST7789 display and an NRF24L01 radio link. As part of this project I took a good look at the graphics library that I used in the Dublin Maker badge in 2019. It turns out that there was plenty of scope to improve it’s performance. Tweaks included flattening function calls and using the set/reset registers in the STM32F030. Here’s an excerpt from the old library:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    openAperture(x, y, x + width - 1, y + height - 1);
    for (y = 0; y < height; y++)
    {
        for (x = 0; x < width; x++)
        {
            writeData16(Colour);
        }
    }
}
void display::RSLow()
{
    GPIOB->ODR &= ~(1 << 1); // drive D/C pin low
}
void display::RSHigh()
{ 
    GPIOB->ODR |= (1 << 1); // drive D/C pin high
}

The new version of these functions looks like this:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    
    register uint32_t pixelcount = height * width;
    uint16_t LowerY = height+y;
    if ((LowerY) <= VIRTUAL_SCREEN_HEIGHT) 
    {
        openAperture(x, y, x + width - 1, y + height - 1);
        RSHigh();
        while(pixelcount--)
            transferSPI16(Colour);
    }
    else
    {
        // Drawing a box beyond the extents of the virtual screen.  
        // Need to wrap this around to the start of the screen.
        uint16_t LowerHeight = (VIRTUAL_SCREEN_HEIGHT-y);
        uint16_t UpperHeight = height - LowerHeight;
        openAperture(x, y, x + width - 1, VIRTUAL_SCREEN_HEIGHT-1);
        RSHigh();
        pixelcount = LowerHeight * width;
        while(pixelcount--)
            transferSPI16(Colour);
      
        openAperture(x, 0,x + width - 1, UpperHeight);
        RSHigh();
        pixelcount = UpperHeight * width;
        while(pixelcount--)
                transferSPI16(Colour);
        
    }
}
void display::RSLow()
{ 
// Using Set/Reset register here as this needs to be as fast as possible   
    GPIOB->BSRR = ((1 << 1) << 16); // drive D/C pin low
}
void display::RSHigh()
{ 
// Using Set/Reset register here as this needs to be as fast as possible     
    GPIOB->BSRR = ((1 << 1)); // drive D/C pin high
}

The new version is a good deal bigger for a couple of reasons:
First of all, the fill rectangle function has been extended so that it is usable with display scrolling (a new feature)
Secondly, the call to writeData16 has been eliminated (removing the function call overhead). This means that lower level SPI function calls have to be used. Also, the nested loop for x and y co-ordinates has been changed to a single loop that fires out the pixels as a continuous stream – the display hardware itself looks after the x and y coordinates.

So how much faster is it? To test this I wrote a simple program to fire a full filled rectangle at the display 50 times and measured how long it took.
The results:
The old driver :
50 rectangles (240*240) took 8.6 seconds. This corresponds to a pixel write speed of 334883 pixels per second.
The new driver:
50 rectangles (240*240) took 4.6 seconds or 626086 pixels per second. Nearly twice as fast as the older library. At this speed it takes 92 milliseconds to fill the display. Not stellar by PC standards but good enough for my needs.
Code is available over on github.

ioprog

Programming for input/output

Month: April 2020

Getting around an openocd bug

Performance improvement for STM32F030/ST7789 graphics library