Threads and processes
A process is a running program. Multitasking operating systems (e.g Linux, Windows etc.) run a number of processes simultaneously. Each process has a global (or static) memory area, a stack and code. Processes in multitasking OS’s are protected from one another using a hardware based memory management unit. A Scheduler allocates CPU time to each process. The simplest scheduler is a “round-robin” scheduler which allows each process run for a short time before switching to the next allowing each process a turn on the CPU.

Threads are similar to processes in some ways however they share the same global/static data as well as the same code but have separate stacks.

Threads can be scheduled just like processes and so appear to operate in parallel – this is multi-threading.

Context switching
Each process or thread switch involves a context change: the current processor state (all of its register contents) must be saved and the processor state for the next thread or process loaded. The image below illustrates a context change from Thread 1 to Thread 2

The context change is triggered by a timer interrupt and the ARM Cortex processors have a special timer aimed at just this role : the SysTick timer. In the following example the SysTick timer is configured to interrupt the CPU every millisecond which triggers a context change.
ARM Cortex M0 Exception handling
The following registers are placed on the interrupted thread stack (Process Stack) automatically following an interrupt (such as SysTick)
-
Address |
Contents |
SP Prior to interrupt |
???????? |
SP + 0x0000001C |
xPSR |
SP + 0x00000018 |
PC |
SP + 0x00000014 |
LR |
SP + 0x00000010 |
R12 |
SP + 0x0000000C |
R3 |
SP + 0x00000008 |
R2 |
SP + 0x00000004 |
R1 |
SP + 0x00000000 |
R0 |
Why not save all of the registers? It is too slow (your ISR may not be changing all registers).
Why just these ones? R0-R3 typically are used for argument passing and should always be preserved by ISR’s. R12 is used by some compilers in their inner function call glue. The LR may hold a function return address. PC must be remembered so we know where to go back to and xPSR must be remembered for the flags.
For a full context switch, the remaining registers must be placed on the Process Stack also.
-
Address |
Contents |
SP Prior to interrupt |
???????? |
SP + 0x0000001C |
xPSR |
SP + 0x00000018 |
PC |
SP + 0x00000014 |
LR |
SP + 0x00000010 |
R12 |
SP + 0x0000000C |
R3 |
SP + 0x00000008 |
R2 |
SP + 0x00000004 |
R1 |
SP + 0x00000000 |
R0 |
SP – 0x00000004 |
R11 |
SP – 0x00000008 |
R10 |
SP – 0x0000000C |
R9 |
SP – 0x00000010 |
R8 |
SP – 0x00000014 |
R7 |
SP – 0x00000018 |
R6 |
SP – 0x0000001C |
R5 |
SP – 0x00000020 |
R4 |
It is not possible carry this out in the C language so a little inline assembler is needed here to complete the context change.
// Preserve remaining registers on stack of thread that is being suspended (Thread A)
asm(" cpsid i "); // disable interrupts during thread switch
asm(" MRS R0,PSP "); // get Thread A stack pointer
asm(" SUB R0,#32"); // Make room for the other registers : R4-R11 = 8 x 4 = 32 bytes
asm(" STMIA R0! , { R4-R7 } "); // Can only do a multiple store on registers up to R7
asm(" MOV R4,R8 "); // Copy higher registers to lower ones
asm(" MOV R5,R9 ");
asm(" MOV R6,R10 ");
asm(" MOV R7,R11 ");
asm(" STMIA R0! , { R4-R7 } "); // and repeat the multiple register store
// Locate the Thread Control Block (TCB) for Thread A
asm(" LDR R0,=TCB_Size "); // get the size of each TCB
asm(" LDR R0,[R0] ");
asm(" LDR R1,=ThreadIndex "); // Which one is being used right now?
asm(" LDR R1,[R1] ");
asm(" MUL R1,R0,R1 "); // Calculate offset of Thread A TCB from start of TCB array
asm(" LDR R0,=Threads "); // point to start of TCB array
asm(" ADD R1,R0,R1 "); // add offset to get pointer to Thread A TCB
asm(" MRS R0,PSP "); // get Thread A stack pointer
// Save Thread A's stack pointer (adjusted for new registers being pushed
asm(" SUB R0,#32 "); // Adjust for the other registers : R4-R11 = 8 x 4 = 32 bytes
asm(" STR R0,[R1] "); // Save Thread A Stack pointer to the TCB (first entry = Saved stack pointer)
// Update the ThreadIndex
ThreadIndex++;
if (ThreadIndex >= ThreadCount)
ThreadIndex = 0;
// Locate the Thread Control Block (TCB) for Thread B
asm(" LDR R0,=TCB_Size "); // get the size of each TCB
asm(" LDR R0,[R0] ");
asm(" LDR R1,=ThreadIndex "); // Which one is being used right now?
asm(" LDR R1,[R1] ");
asm(" MUL R1,R0,R1 "); // Calculate offset of Thread A TCB from start of TCB array
asm(" LDR R0,=Threads "); // point to start of TCB array
asm(" ADD R1,R0,R1 "); // add offset to get pointer to Thread B TCB
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" ADD R0,#16 "); // Skip past saved low registers for the moment
asm(" LDMIA R0!,{R4-R7} "); // read saved registers
asm(" MOV R8,R4 "); // Copy higher registers to lower ones
asm(" MOV R9,R5 ");
asm(" MOV R10,R6 ");
asm(" MOV R11,R7 ");
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" LDMIA R0!,{R4-R7} "); // read saved LOW registers
asm(" LDR R0,[R1] "); // read saved Thread B Stack pointer
asm(" ADD R0,#32 "); // re-adjust saved stack pointer
asm(" MSR PSP,R0 "); // write Thread B stack pointer
Threads are managed using a structure called a Thread Control Block which is defined as follows:
typedef struct {
uint32_t *ThreadStack;
void (*ThreadFn )();
uint32_t Attributes;
} ThreadControlBlock;
Implementation
A demonstrator application with three threads was developed for the Tiva C Launchpad. Each thread flashes an LED on the board at a different rate. The trickiest part to get right was the initial launching of the thread switching which involved a little bit of stack fiddling. Code is available over here on Github