The high registers (r8-r14) cannot be used directly by most of the
instructions in the Cortex-M0 instruction set.
One of the few instructions that can use them is mov, which can use
any pair of registers.
This allows saving two cycles, by replacing two loads (2 cycles each)
with moves (1 cycle each), after stashing the required values in high
registers at startup.
This allows us to use ldr/str with an immediate offset to access the
SGPIO interrupt registers, rather than first having to load a register
with the specific address we want to access.
This change saves a total of 6 cycles, by eliminating two loads (2
cycles each), one of which could be executed twice.
The current code does reads and writes in two chunks: one of
6 words, followed by one of 2.
Instead, use two chunks of 4 words each. This takes the same number of
total cycles, but frees up two registers for other uses.
Note that we can't do things in one chunk, because we'd need eight
registers to hold the data, plus a ninth to hold the buffer pointer. The
ldm/stm instructions can only use the eight low registers, r0-r7.
So we have to use two chunks, and the most register-efficient way to do
that is to use two equal chunks.
Previously this register was reloaded with the same value during each loop.
Initialising it once, outside the loop, saves two cycles.
Note the separation of the loop start ("loop") from the entry point ("main").
Code between these labels will be run once, at startup.