The lsr instruction here shifts the value in r0 right by one bit,
putting the LSB into the carry flag.
By setting the destination register to r1, we can retain the original
unshifted value in r0, and later write this to the INT_CLEAR register in
order to clear all bits that were set.
This saves two cycles by avoiding the need to load an 0xFFFF value to
write to INT_CLEAR.
The effect of the lsr instruction here is to shift the LSB of r0 into
the processor's carry flag. The subsequent bcc instruction ("branch if
carry clear") will then branch if this bit was zero.
The LSB corresponds to the exchange interrupt flag for slice A only. The
other interrupt flag bits are not checked here, contrary to the comment.
Keeping the base address of this structure in a register allows us to
use offsets to load individual fields from it, without needing their
individual addresses.
However, the ldr instruction can only use immediate offsets relative to
the low registers (r0-r7), or the stack pointer (r13).
Low registers are in short supply and are needed for other instructions
which can only use r0-r7, so we use the stack pointer here.
It's safe to do this because we do not use the stack. There are no
function calls, interrupt handlers or push/pop instructions in the M0
code.
This change saves four cycles by eliminating loads of the addresses for
the offset & tx registers, plus a further two by eliminating the need to
stash one of these addresses in r8.
The high registers (r8-r14) cannot be used directly by most of the
instructions in the Cortex-M0 instruction set.
One of the few instructions that can use them is mov, which can use
any pair of registers.
This allows saving two cycles, by replacing two loads (2 cycles each)
with moves (1 cycle each), after stashing the required values in high
registers at startup.
This allows us to use ldr/str with an immediate offset to access the
SGPIO interrupt registers, rather than first having to load a register
with the specific address we want to access.
This change saves a total of 6 cycles, by eliminating two loads (2
cycles each), one of which could be executed twice.
The current code does reads and writes in two chunks: one of
6 words, followed by one of 2.
Instead, use two chunks of 4 words each. This takes the same number of
total cycles, but frees up two registers for other uses.
Note that we can't do things in one chunk, because we'd need eight
registers to hold the data, plus a ninth to hold the buffer pointer. The
ldm/stm instructions can only use the eight low registers, r0-r7.
So we have to use two chunks, and the most register-efficient way to do
that is to use two equal chunks.
Previously this register was reloaded with the same value during each loop.
Initialising it once, outside the loop, saves two cycles.
Note the separation of the loop start ("loop") from the entry point ("main").
Code between these labels will be run once, at startup.