Multi-threading on the Tiva C Launchpad

Threads and processes

A process is a running program. Multitasking operating systems (e.g Linux, Windows etc.) run a number of processes simultaneously. Each process has a global (or static) memory area, a stack and code. Processes in multitasking OS’s are protected from one another using a hardware based memory management unit. A Scheduler allocates CPU time to each process. The simplest scheduler is a “round-robin” scheduler which allows each process run for a short time before switching to the next allowing each process a turn on the CPU.

multitasking

Threads are similar to processes in some ways however they share the same global/static data as well as the same code but have separate stacks.

threads1

Threads can be scheduled just like processes and so appear to operate in parallel – this is multi-threading.

threads2

Context switching

Each process or thread switch involves a context change: the current processor state (all of its register contents) must be saved and the processor state for the next thread or process loaded.  The image below illustrates a context change from Thread 1 to Thread 2

context_change

The context change is triggered by a timer interrupt and the ARM Cortex processors have a special timer aimed at just this role : the SysTick timer. In the following example the SysTick timer is configured to interrupt the CPU every millisecond which triggers a context change.

ARM Cortex M0 Exception handling

The following registers are placed on the interrupted thread stack (Process Stack) automatically following an interrupt (such as SysTick)

Address Contents
SP Prior to interrupt ????????
SP + 0x0000001C xPSR
SP + 0x00000018 PC
SP + 0x00000014 LR
SP + 0x00000010 R12
SP + 0x0000000C R3
SP + 0x00000008 R2
SP + 0x00000004 R1
SP + 0x00000000 R0

Why not save all of the registers? It is too slow (your ISR may not be changing all registers).

Why just these ones? R0-R3 typically are used for argument passing and should always be preserved by ISR’s. R12 is used by some compilers in their inner function call glue. The LR may hold a function return address. PC must be remembered so we know where to go back to and xPSR must be remembered for the flags.

For a full context switch, the remaining registers must be placed on the Process Stack also.

Address Contents
SP Prior to interrupt ????????
SP + 0x0000001C xPSR
SP + 0x00000018 PC
SP + 0x00000014 LR
SP + 0x00000010 R12
SP + 0x0000000C R3
SP + 0x00000008 R2
SP + 0x00000004 R1
SP + 0x00000000 R0
SP – 0x00000004 R11
SP – 0x00000008 R10
SP – 0x0000000C R9
SP – 0x00000010 R8
SP – 0x00000014 R7
SP – 0x00000018 R6
SP – 0x0000001C R5
SP – 0x00000020 R4

It is not possible carry this out in the C language so a little inline assembler is needed here to complete the context change.


// Preserve remaining registers on stack of thread that is being suspended (Thread A)
asm(" cpsid i "); // disable interrupts during thread switch
asm(" MRS R0,PSP "); // get Thread A stack pointer
asm(" SUB R0,#32"); // Make room for the other registers : R4-R11 = 8 x 4 = 32 bytes
asm(" STMIA R0! , { R4-R7 } "); // Can only do a multiple store on registers up to R7
asm(" MOV R4,R8 "); // Copy higher registers to lower ones
asm(" MOV R5,R9 ");
asm(" MOV R6,R10 ");
asm(" MOV R7,R11 ");
asm(" STMIA R0! , { R4-R7 } "); // and repeat the multiple register store
// Locate the Thread Control Block (TCB) for Thread A
asm(" LDR R0,=TCB_Size "); // get the size of each TCB
asm(" LDR R0,[R0] ");
asm(" LDR R1,=ThreadIndex "); // Which one is being used right now?
asm(" LDR R1,[R1] ");
asm(" MUL R1,R0,R1 "); // Calculate offset of Thread A TCB from start of TCB array
asm(" LDR R0,=Threads "); // point to start of TCB array
asm(" ADD R1,R0,R1 "); // add offset to get pointer to Thread A TCB
asm(" MRS R0,PSP "); // get Thread A stack pointer
// Save Thread A's stack pointer (adjusted for new registers being pushed
asm(" SUB R0,#32 "); // Adjust for the other registers : R4-R11 = 8 x 4 = 32 bytes
asm(" STR R0,[R1] "); // Save Thread A Stack pointer to the TCB (first entry = Saved stack pointer)

// Update the ThreadIndex
ThreadIndex++;
if (ThreadIndex >= ThreadCount)
  ThreadIndex = 0;

// Locate the Thread Control Block (TCB) for Thread B
asm(" LDR R0,=TCB_Size "); // get the size of each TCB
asm(" LDR R0,[R0] ");
asm(" LDR R1,=ThreadIndex "); // Which one is being used right now?
asm(" LDR R1,[R1] ");
asm(" MUL R1,R0,R1 "); // Calculate offset of Thread A TCB from start of TCB array
asm(" LDR R0,=Threads "); // point to start of TCB array
asm(" ADD R1,R0,R1 "); // add offset to get pointer to Thread B TCB
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" ADD R0,#16 "); // Skip past saved low registers for the moment
asm(" LDMIA R0!,{R4-R7} "); // read saved registers
asm(" MOV R8,R4 "); // Copy higher registers to lower ones
asm(" MOV R9,R5 ");
asm(" MOV R10,R6 ");
asm(" MOV R11,R7 ");
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" LDMIA R0!,{R4-R7} "); // read saved LOW registers
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" ADD R0,#32 "); // re-adjust saved stack pointer
asm(" MSR PSP,R0 "); // write Thread B stack pointer

Threads are managed using a structure called a Thread Control Block which is defined as follows:


typedef struct {
uint32_t *ThreadStack;
void (*ThreadFn )();
uint32_t Attributes;
} ThreadControlBlock;

Implementation

A demonstrator application with three threads was developed for the Tiva C Launchpad.  Each thread flashes an LED on the board at a different rate.  The trickiest part to get right was the initial launching of the thread switching which involved a little bit of stack fiddling.  Code is available over here on Github