The M4 previously buffered 16K of zeroes for the M0 to transmit, whilst
waiting for the first USB bulk transfer from the host to complete. The
first bulk transfer was placed in the second 16K buffer.
This avoided the M0 transmitting uninitialised data, but was not a
reliable solution, and delayed the transmission of the first
host-supplied samples.
Now that the M0 is placed in TX_START mode, this trick is no longer
necessary, because the M0 can automatically send zeroes until the first
bulk transfer is completed.
As such, the first bulk transfer now goes to the first 16K buffer.
Once the M4 byte count is increased by the bulk transfer completion
callback, the M0 will start transmitting the samples immediately.
In TX_START mode, a lack of data to send is not treated as a shortfall.
Zeroes are written to SGPIO, but no shortfall is recorded in the stats.
Using this mode helps avoid spurious shortfalls at startup.
As soon as there is data to transmit, the M0 switches to TX_RUN mode.
This change adds five cycles to the normal TX path, in order to check
for TX_START mode before sending data, and to switch to TX_RUN in that
case.
It also adds two cycles to the TX shortfall path, to check for TX_START
mode and skip shortfall processing in that mode.
Note the allocation of r3 to store the mode setting, such that this
value is still available after the tx_zeros routine.
This limit allows implementing a timeout: if a TX underrun or RX overrun
continues for the specified number of bytes, the M0 will revert to idle.
A setting of zero disables the limit.
This change adds 5 cycles to the TX & RX shortfall paths, to check if a
limit is set and to check the shortfall length against the limit.
To enable this, we keep a count of the current shortfall length. Each
time an SGPIO read/write cannot be completed due to a shortfall, we
increase this length. Each time an SGPIO read/write is completed
successfully, we reset the shortfall length to zero.
When a shortfall occurs and the existing shortfall length is zero, this
indicates a new shortfall, and the shortfall count is incremented.
This change adds one cycle to the normal RX & TX paths, to zero the
shortfall count. To enable this to be done in a single cycle, we keep a
zero handy in a high register.
The extra accounting adds 10 cycles to the TX and RX shortfall paths,
plus an additional 3 cycles to the RX shortfall path since there are
now two branches involved: one to the shortfall handler, and another
back to the main loop.
Previously, these counts were zeroed by the M4 when leaving the OFF
transceiver mode. Instead, do this on the M0 at the point where the M0
leaves IDLE mode.
This avoids a potential race in which the M4 zeroes the M0 count after
the M0 has already started incrementing it.
At this point, streaming has been stopped, so there will be no further
SGPIO interrupts. However, the M0 will still be spinning on the interrupt
flag, waiting to proceed.
To ensure that the M0 actually reaches its idle loop, we set the SGPIO
interrupt flag once. The M0 will then finish spinning on the flag, clear
the flag, see the new mode setting, and jump to the idle loop.
In the idle mode, the M0 simply waits for a different mode to be set.
No SGPIO access is done.
One extra cycle is added to both TX code paths, to check whether the
M0 should return to the idle loop based on the mode setting. The RX
paths are unaffected as the branch to RX is handled first.
In TX, check if there are sufficient bytes in the buffer to write a
block to SGPIO. If not, write zeros to SGPIO instead.
In RX, check if there is sufficent space in the buffer to store a block
read from SGPIO. If not, do nothing, which discards the data.
In both of these shortfall cases, the M0 count is not incremented.
This ensures that in TX, old data is never repeated. The M0 will not
resume writing TX samples to SGPIO until the M4 count advances,
indicating new data being ready in the buffer. This fixes bug #180.
Similarly, in RX, old data is never overwritten. The M0 will not resume
writing RX samples to the buffer until the M4 count advances, indicating
new space being available in the buffer.
With both counters in place, the number of bytes in the buffer is now
indicated by the difference between the M0 and M4 counts.
The M4 count needs to be increased whenever the M4 produces or consumes
data in the USB bulk buffer, so that the two counts remain correctly
synchronised.
There are three places where this is done:
1. When a USB bulk transfer in or out of the buffer completes, the count
is increased by the number of bytes transferred. This is the most
common case.
2. At TX startup, the M4 effectively sends the M0 16K of zeroes to
transmit, before the first host-provided data.
This is done by zeroing the whole 32K buffer area, and then setting
up the first bulk transfer to write to the second 16K, whilst the M0
begins transmission of the first 16K.
The count is therefore increased by 16K during TX startup, to account
for the initial 16K of zeros.
3. In sweep mode, some data is discarded. When this is done, the count
is incremented by the size of the discarded data.
The USB IRQ is masked whilst doing this, since a read-modify-write is
required, and the bulk transfer completion callback may be called at
any point, which also increases the count.
Instead of this count wrapping at the buffer size, it now increments
continuously. The offset within the buffer is now obtained from the
lower bits of the count.
This makes it possible to keep track of the total number of bytes
transferred by the M0 core.
The count will wrap at 2^32 bytes, which at 20Msps will occur every
107 seconds.
This adds a `hackrf_debug [-S|--state]` option, and the necessary
plumbing to libhackrf and the M4 firmware to support it.
The USB API and libhackrf versions are bumped to reflect the changes.
The previous change moved this flush from the vendor request handler to
the transceiver_shutdown() function which runs outside ISR context.
However, without this flush in the ISR, the firmware can sometimes end up
stuck in TX or RX mode after a request to go IDLE. It's still not clear
how this happens, but keeping the flush in the USB ISR fixes it, and as
soon as a mode change is requested we know we are going to be flushing
anyway, so there is no harm to do so here.
This commit does not remove the flush in transceiver_shutdown(), because
it is possible that the ISR will return just before the transceiver loop
queues a new transfer. Rather than try to avoid that race, we can just
flush again later.
This is a defensive change to make the transceiver code easier to reason
about, and to avoid the possibility of races such as that seen in #1042.
Previously, set_transceiver_mode() was called in the vendor request
handler for the SET_TRANSCEIVER_MODE request, as well in the callback
for a USB configuration change. Both these calls are made from the USB0
ISR, so could interrupt the rx_mode(), tx_mode() and sweep_mode()
functions at any point. It was hard to tell if this was safe.
Instead, set_transceiver_mode() has been removed, and its work is split
into three parts:
- request_transceiver_mode(), which is safe to call from ISR context.
All this function does is update the requested mode and increment a
sequence number. This builds on work already done in PR #1029, but
the interface has been simplified to use a shared volatile structure.
- transceiver_startup(), which transitions the transceiver from an idle
state to the configuration required for a specific mode, including
setting up the RF path, configuring the M0, adjusting LEDs and UI etc.
- transceiver_shutdown(), which transitions the transceiver back to an
idle state.
The *_mode() functions that implement the transceiver modes now call
transceiver_startup() before starting work, and transceiver_shutdown()
before returning, and all this happens in the main thread of execution.
As such, it is now guaranteed that all the steps involved happen in a
consistent order, with the transceiver starting from an idle state, and
being returned to an idle state before control returns to the main loop.
For consistency of interface, an off_mode() function has been added to
implement the behaviour of the OFF transceiver mode. Since the
transceiver is already guaranteed to be in an idle state when this is
called, the only work required is to set the UI mode and wait for a new
mode request.
This fixes bug #1042, which occured when an RX->OFF->RX sequence
happened quickly enough that the loop in rx_mode() did not see the
change. As a result, the enable_baseband_streaming() call at the start
of that function was not repeated for the new RX operation, so RX
progress stalled.
To solve this, the vendor request handler now increments a sequence
number when it changes the transceiver mode. Instead of the RX loop
checking whether the transceiver mode is still RX, it now checks whether
the current sequence number is the same as when it was started. If not,
there must have been at least one mode change, so the loop exits, and
the main loop starts the necessary loop for the new mode. The same
behaviour is implemented for the TX and sweep loops.
For this approach to be reliable, we must ensure that when deciding
which mode and sequence number to use, we take both values from the same
set_transceiver_mode request.
To achieve this, we briefly disable the USB0 interrupt to stop the
vendor request handler from running whilst reading the mode and sequence
number together. Then the loop dispatch proceeds using those pre-read
values.
This is not currently essential, since the current M4 code will not
trigger an SGPIO interrupt until the offset and tx fields are set.
In future though, we want to explicitly set up the M0 state here.
One of the few instructions that can use the high registers (r8-r14) is
the add instruction, which can add any two registers, as long as one of
them is also used as the destination register.
By using this form of add , we can add buf_base (in a high register) to
the offset within the buffer (in a low register), to get the desired
pointer value (buf_ptr) which we want to access.
This saves one cycle by eliminating the need to move buf_base to a low
register first.
The lsr instruction here shifts the value in r0 right by one bit,
putting the LSB into the carry flag.
By setting the destination register to r1, we can retain the original
unshifted value in r0, and later write this to the INT_CLEAR register in
order to clear all bits that were set.
This saves two cycles by avoiding the need to load an 0xFFFF value to
write to INT_CLEAR.
The effect of the lsr instruction here is to shift the LSB of r0 into
the processor's carry flag. The subsequent bcc instruction ("branch if
carry clear") will then branch if this bit was zero.
The LSB corresponds to the exchange interrupt flag for slice A only. The
other interrupt flag bits are not checked here, contrary to the comment.
Keeping the base address of this structure in a register allows us to
use offsets to load individual fields from it, without needing their
individual addresses.
However, the ldr instruction can only use immediate offsets relative to
the low registers (r0-r7), or the stack pointer (r13).
Low registers are in short supply and are needed for other instructions
which can only use r0-r7, so we use the stack pointer here.
It's safe to do this because we do not use the stack. There are no
function calls, interrupt handlers or push/pop instructions in the M0
code.
This change saves four cycles by eliminating loads of the addresses for
the offset & tx registers, plus a further two by eliminating the need to
stash one of these addresses in r8.
The high registers (r8-r14) cannot be used directly by most of the
instructions in the Cortex-M0 instruction set.
One of the few instructions that can use them is mov, which can use
any pair of registers.
This allows saving two cycles, by replacing two loads (2 cycles each)
with moves (1 cycle each), after stashing the required values in high
registers at startup.
This allows us to use ldr/str with an immediate offset to access the
SGPIO interrupt registers, rather than first having to load a register
with the specific address we want to access.
This change saves a total of 6 cycles, by eliminating two loads (2
cycles each), one of which could be executed twice.
The current code does reads and writes in two chunks: one of
6 words, followed by one of 2.
Instead, use two chunks of 4 words each. This takes the same number of
total cycles, but frees up two registers for other uses.
Note that we can't do things in one chunk, because we'd need eight
registers to hold the data, plus a ninth to hold the buffer pointer. The
ldm/stm instructions can only use the eight low registers, r0-r7.
So we have to use two chunks, and the most register-efficient way to do
that is to use two equal chunks.
Previously this register was reloaded with the same value during each loop.
Initialising it once, outside the loop, saves two cycles.
Note the separation of the loop start ("loop") from the entry point ("main").
Code between these labels will be run once, at startup.
These variables are already placed together; this commit just groups
them into a struct and declares this in a new header.
This commit should not result in any change to the firmware binary.
Only the names of symbols are changed.
This removes the definition of the offset variable,
volatile uint32_t usb_bulk_buffer_offset = 0;
which is actually superfluous. This variable, along with its neighbour
usb_bulk_buffer_tx, is placed explicitly by the linker script. Its type
is defined by the declaration in usb_bulk_buffer.h.
There is no need to define it here, and doing so gives the misleading
impression that its initial value can be changed by modifying this line!
The initialization to zero never actually takes effect, because the
variable is not placed in the .data or .bss sections which are
initialised by the startup code.
The offset and tx variables are both set in set_transceiver_mode
before SGPIO streaming is started, so the M0 code does not use
them uninitialised.
This changes m0_rom_to_ram to read directly from the ROM region, rather
than copying from where .text has been copied to RAM.
Previously the second .text section would be merged with the first one,
so the M0 image would be copied to RAM during `pre_main`. However, on
newer linkers the sections are not merged together so the second .text
section did not get copied to RAM.
fixes#936