Getting around an openocd bug

I’ve been back working with the SAMD20 microcontroller family again for an RS485 network project. While writing code for the device I noticed that it would crash quite often. Further investigations revealed that sections of the code were not being written to flash memory. This device has a 64 byte flash page size and it will only accept writes of 64 byte blocks. If you try to write a smaller amount the write is ignored. This causes a problem when you want to write a flash image that is not an integer multiple of 64 bytes in size – the last few bytes will not be written out. The version of openocd I’m using came from here https://sourceforge.net/p/openocd/code/ci/master/tree/ . I reported the issue but while I’m waiting for a response I’ve managed to workaround the problem by adding some padding to the flash image in the linker file as shown here.

MEMORY
{
    flash : org = 0x00000000, len = 128k
    ram : org = 0x20000000, len = 16k
}
  
SECTIONS
{
        
	. = ORIGIN(flash);
        .text : {
		  *(.vectors); /* The interrupt vectors */
		  *(.text);		  
		  *(.rodata);		  

        } >flash
	. = ORIGIN(ram);
        .data : {
	  INIT_DATA_VALUES = LOADADDR(.data);
	  INIT_DATA_START = .;
	    *(.data);
	  INIT_DATA_END = .;
	  . = ALIGN(4);
        } >ram AT>flash
     
	BSS_START = .;
	.bss : {	  
	    *(.bss);
	    . = ALIGN(4);
	} > ram
	BSS_END = .;
	
	.padding : {
		  /* This is a 64 byte block of 0xff's to ensure that the last */
		  /* page of the program is written to the MCU */
		  /* The openocd SAMD driver does not seem to flush the last partial */
		  /* page out properly */

          LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
		  LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);LONG(0xffffffff);
	} >flash
}


This linker script is for the SAMD20E17 MCU. The value chose for the padding data is important as the erased state of flash is logic ‘1’. By using 0xffffffff as a padding value (which may or may not be written to the flash) read-back verification will report a success.

Performance improvement for STM32F030/ST7789 graphics library

stm32f030_st7789_nrf24l01

I’ve been working on a new project involving an STM32F030, an ST7789 display and an NRF24L01 radio link. As part of this project I took a good look at the graphics library that I used in the Dublin Maker badge in 2019. It turns out that there was plenty of scope to improve it’s performance. Tweaks included flattening function calls and using the set/reset registers in the STM32F030. Here’s an excerpt from the old library:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    openAperture(x, y, x + width - 1, y + height - 1);
    for (y = 0; y < height; y++)
    {
        for (x = 0; x < width; x++)
        {
            writeData16(Colour);
        }
    }
}
void display::RSLow()
{
    GPIOB->ODR &= ~(1 << 1); // drive D/C pin low
}
void display::RSHigh()
{ 
    GPIOB->ODR |= (1 << 1); // drive D/C pin high
}

The new version of these functions looks like this:

void display::fillRectangle(uint16_t x, uint16_t y, uint16_t width, uint16_t height, uint16_t Colour)
{
    
    register uint32_t pixelcount = height * width;
    uint16_t LowerY = height+y;
    if ((LowerY) <= VIRTUAL_SCREEN_HEIGHT) 
    {
        openAperture(x, y, x + width - 1, y + height - 1);
        RSHigh();
        while(pixelcount--)
            transferSPI16(Colour);
    }
    else
    {
        // Drawing a box beyond the extents of the virtual screen.  
        // Need to wrap this around to the start of the screen.
        uint16_t LowerHeight = (VIRTUAL_SCREEN_HEIGHT-y);
        uint16_t UpperHeight = height - LowerHeight;
        openAperture(x, y, x + width - 1, VIRTUAL_SCREEN_HEIGHT-1);
        RSHigh();
        pixelcount = LowerHeight * width;
        while(pixelcount--)
            transferSPI16(Colour);
      
        openAperture(x, 0,x + width - 1, UpperHeight);
        RSHigh();
        pixelcount = UpperHeight * width;
        while(pixelcount--)
                transferSPI16(Colour);
        
    }
}
void display::RSLow()
{ 
// Using Set/Reset register here as this needs to be as fast as possible   
    GPIOB->BSRR = ((1 << 1) << 16); // drive D/C pin low
}
void display::RSHigh()
{ 
// Using Set/Reset register here as this needs to be as fast as possible     
    GPIOB->BSRR = ((1 << 1)); // drive D/C pin high
}

The new version is a good deal bigger for a couple of reasons:
First of all, the fill rectangle function has been extended so that it is usable with display scrolling (a new feature)
Secondly, the call to writeData16 has been eliminated (removing the function call overhead). This means that lower level SPI function calls have to be used. Also, the nested loop for x and y co-ordinates has been changed to a single loop that fires out the pixels as a continuous stream – the display hardware itself looks after the x and y coordinates.

So how much faster is it? To test this I wrote a simple program to fire a full filled rectangle at the display 50 times and measured how long it took.
The results:
The old driver :
50 rectangles (240*240) took 8.6 seconds. This corresponds to a pixel write speed of 334883 pixels per second.
The new driver:
50 rectangles (240*240) took 4.6 seconds or 626086 pixels per second. Nearly twice as fast as the older library. At this speed it takes 92 milliseconds to fill the display. Not stellar by PC standards but good enough for my needs.
Code is available over on github.

Various examples for the STM32G431

The STM32G431 was recently introduced by ST-Microelectronics. It contains a Cortex M4 core running at 170MHz along with ADC’s, DAC’s timers and some interesting DSP acceleration hardware. I’ve just got started on this chip and have uploaded a number of examples to github.
The version of openocd that came with my Ubuntu installation did not support this chip so I had to download a more up to date version from here.
Example code so far ranges from Blinky up to stereo analogue pass-through. I plan to work work on some FIR and IIR examples soon. stm32g431_bbreadboard

Micropython on a cheap NRF51822 module

IMG_20200202_153744
It is possible to buy small NRF51822 modules like the black “daughter-board” in the picture above for about €2. The motherboard it connects to breaks all the pins out to various headers, provides an JTAG/SWD port and a USB/Serial interface. These motherboards cost around $6 on Aliexpress.
I was curious about micropython on this platform so I took the following steps:

First, I downloaded the Nordic Semiconductors command line tools from here
They were then installed as follows:

sudo dpkg -i nRF-Command-Line-Tools_10_6_0_Linux-amd64.deb

(I had to fiddle with my PATH environment variable to make the programs available to the script used later).

Next, the motherboard and JTAG interfaces were plugged in to a USB port.

Micropython was compiled and downloaded as follows (Note, I had an arm cross compiler already on my system.):

git clone https://github.com/micropython/micropython.git micropython
cd micropython
make -C mpy-cross
cd ports/nrf
./drivers/bluetooth/download_ble_stack.sh
make submodules
make BOARD=wt51822_s4at SD=110 sd

The last line targets a particular board that is compatible with the little cheap Aliexpress one, it also includes the Bluetooth soft-device and flashes the board.

All that seemed to work ok and I tried to connect to the board over a serial terminal: No success. After a bit of searching I discovered file called mpconfigboard.h in the micropython/ports/nrf/boards/wt51822_s4at directory. This needs to be edited as follows so that TX and RX for the micropython shell are routed properly.

/*
 * This file is part of the MicroPython project, http://micropython.org/
 *
 * The MIT License (MIT)
 *
 * Copyright (c) 2017 Ayke van Laethem
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

// Datasheet for board:
// https://4tronix.co.uk/picobot2/WT51822-S4AT.pdf
#define MICROPY_HW_BOARD_NAME       "WT51822-S4AT"
#define MICROPY_HW_MCU_NAME         "NRF51822"
#define MICROPY_PY_SYS_PLATFORM     "nrf51"

#define MICROPY_PY_MACHINE_UART     (1)
#define MICROPY_PY_MACHINE_HW_SPI   (1)
#define MICROPY_PY_MACHINE_TIMER    (1)
#define MICROPY_PY_MACHINE_RTCOUNTER (1)
#define MICROPY_PY_MACHINE_I2C      (1)
#define MICROPY_PY_MACHINE_ADC      (1)
#define MICROPY_PY_MACHINE_TEMP     (1)
#define MICROPY_PY_RANDOM_HW_RNG    (1)

#define MICROPY_HW_HAS_LED          (0)

// UART config
#define MICROPY_HW_UART1_RX         (11)
#define MICROPY_HW_UART1_TX         (9)
#define MICROPY_HW_UART1_HWFC       (0)

// SPI0 config
#define MICROPY_HW_SPI0_NAME        "SPI0"
#define MICROPY_HW_SPI0_SCK         (1)
#define MICROPY_HW_SPI0_MOSI        (2)
#define MICROPY_HW_SPI0_MISO        (3)

Note the changes to MICROPY_HW_UART1_RX and MICROPY_HW_UART1_TX.

I repeated the last “make” command and hey presto : I was able to type python into a serial terminal!

nrf51822_python

There are a couple of little problems:
1) You can’t paste large amounts of python into the terminal as it overflows the serial buffer in the device
2) The range of pins available is restricted as this board is not quite the same as the wt51822_s4at

Anyway, it’s a start

A simple oscilloscope using the Longan-Nano

Longan-nano-scope
The Longan-Nano comes with a nice little display that opens up lots of possibilities. One of these is an oscilloscope. The GD32VF103 on the board has a pair of fast ADC’c that supposedly work up to 1MHz. The approach I took didn’t quite achieve this so I had to settle for 500kHz instead – good enough for a demo.
The code makes use of Timer 2 to pace (trigger) ADC conversions on two channels (one on each ADC) which are labelled A0 and A3 on the ‘nano.
The picture above shows the display when two inputs are used: A steady DC value from a potentiometer and a sine-wave generated by a blue-pill board doing some fast PWM (See this post).
The picture of the display is showing “ghosting” because of the slow shutter speed of the camera.
Two buttons are fitted to the board to allow the user increase and decrease the sample rate which stretches and compresses the waveform on the screen.
This is not meant to be an actually useful instrument – it’s just a learning exercise for this device.
Code is over on github.

Fast PWM on the STM32F103 Bluepill

I needed a signal source for another project I’m working on so I set about building a signal generator of sorts using a “Bluepill” board. The STM32F103 on these boards does not have a Digital to Analogue Converter but it does have a very capable timer which can be used to generate waveforms.
STM32F103_TIM1_DMA_PWM

The timer (TIM1) is shown in the diagram above. In the mode that it is configured for this project it works as follows:
The CNT register counts up to the value in ARR
When CNT = ARR the upper digital comparator outputs a pulse which is used to reset CNT to 0 and to set the PWM output signal high (via the S-R Flip-Flop).
When CNT is equal to the value in CCR1, the lower digital comparator outputs a pulse which resets the PWM output low. ARR therefore controls the output frequency while CCR1 controls the output duty.
(CNT = count, ARR = auto-reload register, CCR1 = Counter-Compare Register 1)

I’m interested in producing audio range signals so up to about 20kHz. An trade-off between switching frequency, resolution, and output filtering arises here. First of all, I want to use a simple RC filter on the output to remove the high frequency switching. This means that the switching frequency has to be a lot higher than the target output signal i.e. much greater than 20kHz. But how much higher? The clock in the STM32F103 runs at 72MHz so this places an upper limit on things. Lets say we go for a switching frequency of 12MHz. This means that the counter will count from zero to five and then back up to zero again (6 states in total). This means that our PWM waveform generation system is only able to produce 6 different outputs (between 2 and 3 bits of resolution) which is pretty poor. I wanted a lot more resolution than that so after playing with the numbers I arrived at the following. Let ARR = 255 (8 bits of resolution). This gives a switching frequency of 280500Hz which is certainly not audible and is pretty easy to filter with passive components.
In order to produce a sine-wave output the CCR1 value needs to be varied sinusoidally. This could be done by generating an interrupt for each PWM cycle and loading values from a look-up table into CCR1. This places a reasonably high load on the CPU however given the high switching frequency. Another option is to use DMA which can transfer values from the lookup-table directly into CCR1. After all the values of the table have been used up, the DMA controller can then generate an interrupt allowing the program to perhaps modify the output signal if necessary. The interrupt rate is a lot lower with this approach and the CPU is free to do other tasks.
stm32f103_280kHz_PWM_logic_out
The figure above shows a logic analyzer view of the outputs. The upper signal is a fixed duty signal from CCR2 (another PWM output channel). The middle signal is the sinusoidal PWM output. The lower signal is a digital output that is toggled at the end of each DMA transfer (every millisecond).

After filtering the output looks like this:
stm32f103_280kHz_PWM

The upper trace is the PWM output after it has been passed through an RC filter with a cutoff of 20kHz. As expected the switching frequency (which is most pronounced around zero-crossings of the sinewave) is attenuated to about 1 tenth of its original size. The lower trace shows the unfiltered output.

The next step is to produce a variable frequency and voltage output – perhaps even two channels. Code is available over on github
Incidentally, the particular board used here is one with 128kB of flash (earlier ones I used had only 64kB).

Measuring GD32VF103 Interrupt response time

Following on from the previous article that talked about the ECLIC in the GD32VF103 RISC-V microcontroller I decided to measure its interrupt response. The example works like this: Configure a port pin to generate an interrupt when it goes through a high-low transition. The interrupt handler then drives the pin back high again.
The main body of this example consists of a loop which drives the pin low thus triggering an interrupt. The interrupt handler sends it high again. This is done in a loop which incorporates a small delay to facilitate measurement. The GPIO pin in question is Port C bit 13 which happens to control the red LED on the Sipeed Longan Nano. The results are shown in the figure below.
GD32VF103IRQResponseTime

As can be seen, it takes 416.667ns to drive the pin high again. This corresponds to 45 clock cycles at 108Mhz which is the time it takes to save the CPU registers, determine the cause of the interrupt, jump to the handler and perform a standard function entry prologue (set up stack etc). Not too shabby 🙂

Code is available over on Github

Interrupts and the GD32VF103

Interrupt handling is a core part of embedded systems and different architectures have different ways of dealing with it. The RISC-V Bumblebee core in the GD32VF103 uses an interrupt controller called the Enhanced Core-Local Interrupt Controller (ECLIC). All interrupts (internal and external) are handled by the ECLIC. The ECLIC handles prioritization and level/edge triggering of interrupts.The documentation on the ECLIC is pretty poor at the moment. The Bumblebee core datasheet is formatted very badly (fonts all over the place, broken diagrams etc) and it was only by combining this with the assembler provided in the GD32VF103 that I began to get a clearer picture of what is going on.

The ECLIC has two modes of dealing with interrupts : vectored and non-vectored. Vectored interrupt handling is similar to other MCU’s such as the ARM, MSP430 etc. The CPU receives interrupt request ‘N’ and finds the interrupt handler by looking up entry ‘N’ in the interrupt vector table. Saving of registers (context saving) is left to the writer of the interrupt handler and on exit the handler often must execute a return from interrupt instruction (though not for all architectures).
Non-Vectored interrupt handling is quite different. All interrupt requests cause the same interrupt handler code to be executed initially. This code saves the CPU registers, finds out which interrupt happened and calls the handler pointed to by the appropriate interrupt vector table entry. This happens using a standard subroutine/function call instruction. The handler deals with the interrupt and executes a normal function return instruction. The registers are then restored and a return from interrupt instruction is executed.

The advantage of this approach is that the code to preserve/restore the registers is done (safely) and only needs to occur once in memory. Also, the interrupt handlers can be written in C or C++ without the need to tag on unusual attributes like “interrupt” etc. The potential disadvantage is that maybe the handling of interrupts is a little slower than it might have been using the vectored approach because you end up saving and restoring all CPU registers – whether you use them or not.

Excerpt from ~/.platformio/packages/framework-gd32vf103-sdk/RISCV/env_Eclipse/entry.S

.weak irq_entry
irq_entry: // -------------> This label will be set to MTVT2 register
  // Allocate the stack space
  

  SAVE_CONTEXT// Save 16 regs

  //------This special CSR read operation, which is actually use mcause as operand to directly store it to memory
  csrrwi  x0, CSR_PUSHMCAUSE, 17
  //------This special CSR read operation, which is actually use mepc as operand to directly store it to memory
  csrrwi  x0, CSR_PUSHMEPC, 18
  //------This special CSR read operation, which is actually use Msubm as operand to directly store it to memory
  csrrwi  x0, CSR_PUSHMSUBM, 19
 
// ****************** THIS IS WHERE THE JUMP TO THE INTERRUPT HANDLER FUNCTION HAPPENS! **************

service_loop:
  //------This special CSR read/write operation, which is actually Claim the CLIC to find its pending highest
  // ID, if the ID is not 0, then automatically enable the mstatus.MIE, and jump to its vector-entry-label, and
  // update the link register 
  csrrw ra, CSR_JALMNXTI, ra 
  
  //RESTORE_CONTEXT_EXCPT_X5

  #---- Critical section with interrupts disabled -----------------------
  DISABLE_MIE # Disable interrupts 

  LOAD x5,  19*REGBYTES(sp)
  csrw CSR_MSUBM, x5  
  LOAD x5,  18*REGBYTES(sp)
  csrw CSR_MEPC, x5  
  LOAD x5,  17*REGBYTES(sp)
  csrw CSR_MCAUSE, x5  


  RESTORE_CONTEXT

  
  // Return to regular code
  mret

The comments just before the instruction csrrw ra, CSR_JALMNXTI, ra are a little obscure but their sense is clear enough: If there is a pending interrupt the ECLIC will execute a call to its interrupt handler and will update the Link Register so that when the handler executes a return from subroutine instruction control will return to the next line in the irq_entry.
The macros SAVE_CONTEXT, RESTORE_CONTEXT are defined elsewhere in the startup source code. The constant CSR_JALMNXTI evalautes to 0x7ed – a register number in the ECLIC. According to the Bumblebee core data sheet this register is
“The custom register is used to enable the ECLIC interrupt. The read operation of this register can process the next interrupt and return the entry address of the next interrupt handler. Jump to this address.”.

This is the mechanism by which the developer supplied interrupt handler is called.
How does the irq_entry code get called in the first place? Looking at the code for .platformio/packages/framework-gd32vf103-sdk/RISCV/env_Eclipse/start.S we find the following code that executes during boot:

_start:

        csrc CSR_MSTATUS, MSTATUS_MIE
        /* Jump to logical address first to ensure correct operation of RAM region  */
    la          a0,     _start
    li          a1,     1
        slli    a1,     a1, 29
    bleu        a1, a0, _start0800
    srli        a1,     a1, 2
    bleu        a1, a0, _start0800
    la          a0,     _start0800
    add         a0, a0, a1
        jr      a0

_start0800:

    /* Set the the NMI base to share with mtvec by setting CSR_MMISC_CTL */
    li t0, 0x200
    csrs CSR_MMISC_CTL, t0

        /* Intial the mtvt*/
    la t0, vector_base
    csrw CSR_MTVT, t0

        /* Intial the mtvt2 and enable it*/
    la t0, irq_entry
    csrw CSR_MTVT2, t0
    csrs CSR_MTVT2, 0x1

Note the instructions la t0, irq_entry , csrw CSR_MTVT2, t0 . These tell the ECLIC where to find the irq_entry code. The Bumblebee core reference manual defines this register CSR_MTV2 (0x7ec) as
“Custom registers are used to set non-vector interrupt handling Mode interrupt entry address” .

So, there we have it. The picture below shows my rough understanding of the process.
ECLIC
I have written up two more demo programs: One which uses the internal clock cycle timer interrupt (mtimecmp) (Systick interrupt), the other uses Timer 6 as an external source of timer interrupts (TimerIRQ). Code is available over on Github

Longan-nano RISC-V and PlatformIO

longan_patchwork

The Longan-nano board shown above includes a microcontroller with a RISC-V core. The chip seems to be very similar to the STM32F103C8T6 with the exception that the ARM-Cortex M3 core has been swapped out for a RISC-V Bumblebee core called the GD32VF103CBT6. This little development kit has an Arduino nano form factor and also sports a 160×80 full colour display, an RGB LED and a micro SD-card socket. This particular one came from Seeed Studio at a cost of $4.90 + shipping.

There are two ways of programming this chip: You can use a JTAG debugger (various kinds supported) or you can put the chip into DFU (Device Firmware Update) mode by holding the Reset and Boot button down, releasing the Reset button first. This allows you to program the board using a simple USB-Serial converter. I’ve gone with this initially but will definitely be exploring the details of the RISC-V core with a proper debugger shortly.

Developing code
The documentation for this chip pushes you towards using PlatformIO for development. This is an extension for Visual Studio Code – another first for me. You first install VSCode and then add in the PlatformIO extension (I just followed the online guide and it all just worked 🙂 )

There is good support for the GD32VF103CBT6 within PlatformIO and so there is no problem with setting up environment variables, directories and so on. I chose to develop in C++ which caused me a slight problem with the gd32vf103.h header file. This is intended for use with C projects and includes a definition the bool type. This causes a problem for C++ as it already has such a type. If you edit gd32vf103.h as follows you can fix this (around line 180 look for the enum declaration called bool)

#ifndef __cplusplus
typedef enum {FALSE = 0, TRUE = !FALSE} bool;
#endif

One more thing: when you include this file in your C++ files be sure to surround the include statement with “extern C” as shown below:

extern "C" {
#include "gd32vf103.h"
}

If you don’t do this you run into all sorts of problems with C++ decorated names.

And along came a gotcha
I have to admit that I have been a little lazy when it comes to function return values over the years. Let’s say you write an I/O function that may, in some future design, return an error code – as the project is only starting however you have not written that code yet.
So, your code might look like this:

int test()
{
    int X;
    X = 1;
    /* etc */
}

Note the missing return statement – after all, you haven’t written that code yet. Well, this runs fine on ARM Cortex MCU’s I’ve used, and, also on x86. On RISC-V/GCC9.2 this crashes your program. I spent a while scratching my head over this one. The C++ standard apparently states that behavior in this example is “undefined”. So, be warned: If you state that you are going to return a value then do!

Demo code
I’ve started a repository on github with some examples (multi-colour blinky and a graphics demo for now). You can view it here

NRF905 Revisited

I have worked with the NRF905 radio transceiver in the past as part of a weather station. At the time I wasn’t entirely happy about the code for the radio as it felt a little hacky. Like many others before me I vowed “This time I’m going to do it right” 🙂 so I have set about rewriting the radio part in a more object oriented way. I have put together a transceiver pair as shown below and I intend working on the code to simply the NRF905 application interface as much as I can. Right now there is little abstraction going on – that needs to change.
The transceiver pair consists of a couple of NRF905’s wired to some STM32F030’s mounted on breakout boards. These are wire-wrapped together and a full schematic will eventually be posted along with the code over on github


Update 19-Nov-2019
I have updated the code on github with a tighter interface to the nrf905 class. I also did an initial range test and was getting error free line of sight range of about 200m @ 10dBm. The transmitter was on top of a car and the receiver was at about 1m above the ground. Transmitter and receiver had 1/4 wave monopole antennae. A longer range is possible if you are not concerned about the odd missing packet.