Getting around an openocd bug

I’ve been back working with the SAMD20 microcontroller family again for an RS485 network project. While writing code for the device I noticed that it would crash quite often. Further investigations revealed that sections of the code were not being written to flash memory. This device has a 64 byte flash page size and it will only accept writes of 64 byte blocks. If you try to write a smaller amount the write is ignored. This causes a problem when you want to write a flash image that is not an integer multiple of 64 bytes in size – the last few bytes will not be written out. The version of openocd I’m using came from here https://sourceforge.net/p/openocd/code/ci/master/tree/ . I reported the issue but while I’m waiting for a response I’ve managed to workaround the problem by adding some padding to the flash image in the linker file as shown here.

MEMORY
{
    flash : org = 0x00000000, len = 128k
    ram : org = 0x20000000, len = 16k
}
  
SECTIONS
{
        
	. = ORIGIN(flash);
        .text : {
		  *(.vectors); /* The interrupt vectors */
		  *(.text);		  
		  *(.rodata);		  

        } >flash
	. = ORIGIN(ram);
        .data : {
	  INIT_DATA_VALUES = LOADADDR(.data);
	  INIT_DATA_START = .;
	    *(.data);
	  INIT_DATA_END = .;
	  . = ALIGN(4);
        } >ram AT>flash
     
	BSS_START = .;
	.bss : {	  
	    *(.bss);
	    . = ALIGN(4);
	} > ram
	BSS_END = .;
	
	.padding : {
		  /* This is a 64 byte block of 0xff's to ensure that the last */
		  /* page of the program is written to the MCU */
		  /* The openocd SAMD driver does not seem to flush the last partial */
		  /* page out properly */

          LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
	} >flash
}


This linker script is for the SAMD20E17 MCU. The value chose for the padding data is important as the erased state of flash is logic ‘1’. By using 0xffffffff as a padding value (which may or may not be written to the flash) read-back verification will report a success.

Performance improvement for STM32F030/ST7789 graphics library

stm32f030_st7789_nrf24l01

I’ve been working on a new project involving an STM32F030, an ST7789 display and an NRF24L01 radio link. As part of this project I took a good look at the graphics library that I used in the Dublin Maker badge in 2019. It turns out that there was plenty of scope to improve it’s performance. Tweaks included flattening function calls and using the set/reset registers in the STM32F030. Here’s an excerpt from the old library:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    openAperture(x, y, x + width - 1, y + height - 1);
    for (y = 0; y < height; y++)
    {
        for (x = 0; x < width; x++)
        {
            writeData16(Colour);
        }
    }
}
void display::RSLow()
{
    GPIOB->ODR &= ~(1 << 1); // drive D/C pin low
}
void display::RSHigh()
{ 
    GPIOB->ODR |= (1 << 1); // drive D/C pin high
}

The new version of these functions looks like this:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    
    register uint32_t pixelcount = height * width;
    uint16_t LowerY = height+y;
    if ((LowerY) <= VIRTUAL_SCREEN_HEIGHT) 
    {
        openAperture(x, y, x + width - 1, y + height - 1);
        RSHigh();
        while(pixelcount--)
            transferSPI16(Colour);
    }
    else
    {
        // Drawing a box beyond the extents of the virtual screen.  
        // Need to wrap this around to the start of the screen.
        uint16_t LowerHeight = (VIRTUAL_SCREEN_HEIGHT-y);
        uint16_t UpperHeight = height - LowerHeight;
        openAperture(x, y, x + width - 1, VIRTUAL_SCREEN_HEIGHT-1);
        RSHigh();
        pixelcount = LowerHeight * width;
        while(pixelcount--)
            transferSPI16(Colour);
      
        openAperture(x, 0,x + width - 1, UpperHeight);
        RSHigh();
        pixelcount = UpperHeight * width;
        while(pixelcount--)
                transferSPI16(Colour);
        
    }
}
void display::RSLow()
{ 
// Using Set/Reset register here as this needs to be as fast as possible   
    GPIOB->BSRR = ((1 << 1) << 16); // drive D/C pin low
}
void display::RSHigh()
{ 
// Using Set/Reset register here as this needs to be as fast as possible     
    GPIOB->BSRR = ((1 << 1)); // drive D/C pin high
}

The new version is a good deal bigger for a couple of reasons:
First of all, the fill rectangle function has been extended so that it is usable with display scrolling (a new feature)
Secondly, the call to writeData16 has been eliminated (removing the function call overhead). This means that lower level SPI function calls have to be used. Also, the nested loop for x and y co-ordinates has been changed to a single loop that fires out the pixels as a continuous stream – the display hardware itself looks after the x and y coordinates.

So how much faster is it? To test this I wrote a simple program to fire a full filled rectangle at the display 50 times and measured how long it took.
The results:
The old driver :
50 rectangles (240*240) took 8.6 seconds. This corresponds to a pixel write speed of 334883 pixels per second.
The new driver:
50 rectangles (240*240) took 4.6 seconds or 626086 pixels per second. Nearly twice as fast as the older library. At this speed it takes 92 milliseconds to fill the display. Not stellar by PC standards but good enough for my needs.
Code is available over on github.