Using the FMAC in the STM32G431

The STM32G431 has a Filter Math ACellerator (FMAC) hardware unit inside of it. This unit can take be used to implement an FIR or IIR filter without burdening the CPU. The FMAC unit has input and output circular buffers as well as a coefficient buffer. It is possible to connect the input buffer to an ADC using DMA and similarly it is possible to connect an output buffer directly to a DAC over DMA. In the case of this project I used an ADC interrupt handler to manage data input and output to the FMAC.

There are lots of tools to help you design a digital filter. I chose to use python and jupyter notebook in this case. The jupyter notebook code is as follows (it is also on the github site linked below)

import numpy as np
import scipy as sp
import scipy.signal as sg
import matplotlib.pyplot as plt
Fs=48000
Fpass=1000
Order=16
Wp=Fpass/(Fs/2)
b=sg.firwin(Order+1,Wp,window = "hamming",pass_zero = True)
w,h=sg.freqz(b)
mag=20*np.log10(abs(h))
plt.figure()
plt.semilogx(w*(Fs/(2*np.pi)), mag)
plt.show()
bmax=np.max(np.abs(b))
# Working out the scale factor can be a bit tricky.  There is a 
# 24 bit accumulator in the FMAC.  The ADC has a 12bit range.
# This leaves 12 bits for coefficients if overflows are to be prevented.
# Furthermore, the multiply and accumulate nature of the FIR will push 
# results beyond 24 bits if we are not careful.  This is more pronounced with
# lower cut-off frequencies where there is a large central lobe to the filter 
# coefficients which may lead to overflows, particularly at low input 
# frequencies.  For now I'm just doing this by trial and error
ScaleFactor=4095/(bmax)
f = open('coffs.h', 'w')
f.write("#include <stdint.h>\n")
f.write("#define SCALE_FACTOR ")
f.write(str(int(np.round(ScaleFactor))))
f.write("\n")
f.write("#define FILTER_LENGTH ")
f.write(str(Order))
f.write("\n")
f.write("const int16_t b[]={")
for coeff in b:
    f.write(str(int(np.round(coeff*ScaleFactor))))
    f.write(",\n")
f.write("};\n")

f.close();
plt.figure();
plt.plot(b);

This code outputs a header file that includes the filter coefficients for a low pass FIR filter with a cutoff frequency of 1000Hz. The output at 2kHz is shown below

And it 4kHz this becomes:

It would appear that the filter is indeed working however there are a number of caveats. The FMAC uses fixed point arithmetic so coefficients and input signals must be shifted and scaled appropriately. The FMAC has a limited numeric range (24 bits of fractional data internally, 15 bits input and output) and overflows will happen. This is a particular problem at low frequencies with filters whose coefficients are mostly/all positive. I had to do some manual tweaking of the coefficients to get the output performance I wanted. When testing for such overflows it is useful to input a DC signal of maximum voltage to ensure that no overflows occur.

As usual, code is available over on github

Waiting for /CS

I have been working on an interface between an STM32G431 and a W25Q128FV SPI flash memory chip (128 Mbit/16MByte). The image above shows the memory chip on a breakout board with a surface mount capacitor next to it. Given the nature of breadboards and the wires used I’ve been running the memory interface at a reduced speed (1.3MHz). A PCB will be used at a later stage and which should allow for higher speeds.

Erasing the chip was presenting some problems. The code for erase was as follows:

void serial_flash::bulk_erase()
{
	write_enable();
	SPI->startTransaction();
	SPI->transfer((uint8_t)0xc7);	
	SPI->stopTransaction();
	while(read_status1() & 1); // wait until erase has completed
}

The chip must be put into “write” mode before the erase command (0xC7) is sent. The function exits when the “busy” bit in status register 1 is clear. While everything seemed ok, the function did not erase the chip. I took a closer look at the SPI bus using a logic analyzer. I have a very cheap logic analyzer which doesn’t have a very good trigger mechanism. My normal workaround for this is to put the area under test into an everlasting loop and then view the pins of interest on the logic analyzer. This is a problem for erase operations like this as the SPI flash chip has only so many erase cycles. As a precaution I change the command code to 0xd7 (not a supported command) which allowed me look at the SPI bus without harming the chip. I also commented out loop that polled the status register.

The write enable command (0x06) is plainly visible as is the “fake” chip erase command 0xd7. The CS line is driven low just before the 0x06 command and goes high some time after the 0xd7 command. This is not the correct way to erase this chip. The data sheet clearly states that the CS line must go high for a period after each command. It does not do this after the write enable command. The write_enable function is as follows:

void serial_flash::write_enable(void)
{
	SPI->startTransaction();
	SPI->transfer((uint8_t)0x06);	
	SPI->stopTransaction();		
}	

The stopTransaction function should drive the CS line high but it didn’t seem to be working. The relevant SPI code is:

void spi::stopTransaction(void)
{	
	volatile unsigned Timeout = 1000;    
	while (SPI1->SR & ((1 << 12) + (1 << 11)) );     // wait for fifo to empty
	while (((SPI1->SR & (1 << 0))!=0)&&(Timeout--)); // Wait for RXNE
	Timeout = 1000;    
	while (((SPI1->SR & (1 << 1))==0)&&(Timeout--)); // Wait for TXE
	Timeout = 1000;    
	while (((SPI1->SR & (1 << 7))!=0)&&(Timeout--)); // Wait for Busy		
	SPI1->CR1 &= ~(1 << 6); // Disable SPI (SPE = 0)
				
}

This should have worked but it clearly didn’t. Thinking about the sequence of events involved in the bulk_erase function it occurred to me that the call to startTransaction just after the write_enable command may actually be happening before the SPI peripheral had a chance to raise the CS line. The SPI peripheral is routed through GPIO port A in this setup. I noticed that I could monitor the status of the CS pin by reading GPIOA’s input data register and hence wait for it to go high. The stopTransaction code was modified as follows:

void spi::stopTransaction(void)
{	
	volatile unsigned Timeout = 1000;    
	while (SPI1->SR & ((1 << 12) + (1 << 11)) );     // wait for fifo to empty
	while (((SPI1->SR & (1 << 0))!=0)&&(Timeout--)); // Wait for RXNE
	Timeout = 1000;    
	while (((SPI1->SR & (1 << 1))==0)&&(Timeout--)); // Wait for TXE
	Timeout = 1000;    
	while (((SPI1->SR & (1 << 7))!=0)&&(Timeout--)); // Wait for Busy		
	SPI1->CR1 &= ~(1 << 6); // Disable SPI (SPE = 0)
		
	while((GPIOA->IDR & (1 << 4))==0); // wait for CS to go high
	
}

This produced the following output from the logic analyzer:

A high pulse can now be seen between the two SPI commands. As a final test, I replaced the fake “0xD7” command with “0xC7” and presto: erases now work.

Various examples for the STM32G431

The STM32G431 was recently introduced by ST-Microelectronics. It contains a Cortex M4 core running at 170MHz along with ADC’s, DAC’s timers and some interesting DSP acceleration hardware. I’ve just got started on this chip and have uploaded a number of examples to github.
The version of openocd that came with my Ubuntu installation did not support this chip so I had to download a more up to date version from here.
Example code so far ranges from Blinky up to stereo analogue pass-through. I plan to work work on some FIR and IIR examples soon. stm32g431_bbreadboard