Performance improvement for STM32F030/ST7789 graphics library

stm32f030_st7789_nrf24l01

I’ve been working on a new project involving an STM32F030, an ST7789 display and an NRF24L01 radio link. As part of this project I took a good look at the graphics library that I used in the Dublin Maker badge in 2019. It turns out that there was plenty of scope to improve it’s performance. Tweaks included flattening function calls and using the set/reset registers in the STM32F030. Here’s an excerpt from the old library:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    openAperture(x, y, x + width - 1, y + height - 1);
    for (y = 0; y < height; y++)
    {
        for (x = 0; x < width; x++)
        {
            writeData16(Colour);
        }
    }
}
void display::RSLow()
{
    GPIOB->ODR &= ~(1 << 1); // drive D/C pin low
}
void display::RSHigh()
{ 
    GPIOB->ODR |= (1 << 1); // drive D/C pin high
}

The new version of these functions looks like this:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    
    register uint32_t pixelcount = height * width;
    uint16_t LowerY = height+y;
    if ((LowerY) <= VIRTUAL_SCREEN_HEIGHT) 
    {
        openAperture(x, y, x + width - 1, y + height - 1);
        RSHigh();
        while(pixelcount--)
            transferSPI16(Colour);
    }
    else
    {
        // Drawing a box beyond the extents of the virtual screen.  
        // Need to wrap this around to the start of the screen.
        uint16_t LowerHeight = (VIRTUAL_SCREEN_HEIGHT-y);
        uint16_t UpperHeight = height - LowerHeight;
        openAperture(x, y, x + width - 1, VIRTUAL_SCREEN_HEIGHT-1);
        RSHigh();
        pixelcount = LowerHeight * width;
        while(pixelcount--)
            transferSPI16(Colour);
      
        openAperture(x, 0,x + width - 1, UpperHeight);
        RSHigh();
        pixelcount = UpperHeight * width;
        while(pixelcount--)
                transferSPI16(Colour);
        
    }
}
void display::RSLow()
{ 
// Using Set/Reset register here as this needs to be as fast as possible   
    GPIOB->BSRR = ((1 << 1) << 16); // drive D/C pin low
}
void display::RSHigh()
{ 
// Using Set/Reset register here as this needs to be as fast as possible     
    GPIOB->BSRR = ((1 << 1)); // drive D/C pin high
}

The new version is a good deal bigger for a couple of reasons:
First of all, the fill rectangle function has been extended so that it is usable with display scrolling (a new feature)
Secondly, the call to writeData16 has been eliminated (removing the function call overhead). This means that lower level SPI function calls have to be used. Also, the nested loop for x and y co-ordinates has been changed to a single loop that fires out the pixels as a continuous stream – the display hardware itself looks after the x and y coordinates.

So how much faster is it? To test this I wrote a simple program to fire a full filled rectangle at the display 50 times and measured how long it took.
The results:
The old driver :
50 rectangles (240*240) took 8.6 seconds. This corresponds to a pixel write speed of 334883 pixels per second.
The new driver:
50 rectangles (240*240) took 4.6 seconds or 626086 pixels per second. Nearly twice as fast as the older library. At this speed it takes 92 milliseconds to fill the display. Not stellar by PC standards but good enough for my needs.
Code is available over on github.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s