DroneCAN: ISR-driven TX for STM32F7 bxCAN driver#11560
Conversation
… verbosity Remove LOG_DEBUG from canardSTM32Transmit() failure path — this will run in ISR callback context after the TX ISR migration and calling printf from an ISR is undefined behaviour. Collapse 5 intermediate canardSTM32ComputeTimings() debug calls to the single summary line already present at the end of the function. Remove per-frame success-path LOG_DEBUG calls in dronecan.c transfer handlers. These fire on every received GPS fix, NodeStatus and battery update and are too noisy for upstream submission.
Replace blocking HAL_CAN_AddTxMessage() polling in canardSTM32Transmit()
with a SPSC SW ring buffer (TX_QUEUE_SIZE=32) drained by CAN1_TX_IRQHandler
via HAL_CAN_TxMailbox{0,1,2}CompleteCallback().
Bootstrapping: when all 3 mailboxes are idle RQCP=0 so TMEIE alone won't
fire; seed the HW directly from the main-loop path when
GetTxMailboxesFreeLevel()==3. ISR handles all subsequent frames.
CAN_IT_TX_MAILBOX_EMPTY is activated on each enqueue and deactivated when
the queue drains, keeping the interrupt silent at idle.
Adds four debug visibility hooks to the STM32F7 CAN TX ISR driver: - canTxDropped counter: incremented when SW TX queue (depth 32) is full and canardSTM32Transmit() silently drops a frame. Cumulative across the session; exposed via canardProtocolStatus_t.tx_dropped. - ESR register fields in canardProtocolStatus_t: tec (TX error counter), rec (RX error counter), lec (last error code) read from hcan1.Instance->ESR. Reads BusOff/ErrorPassive atomically with the ESR snapshot. - canardSTM32GetTxQueueFillLevel(): returns current SW queue depth as (head - tail + TX_QUEUE_SIZE) % TX_QUEUE_SIZE. - CLI 'dronecan' command (USE_DRONECAN guard): prints all of the above plus RX buffer fill level in a single readable block. Useful for polling during long-running hardware validation tests without needing the configurator or MSP.
- Add NVIC_PRIO_CAN = 4 to nvic.h; use it for CAN1_RX0 and CAN1_TX IRQs (was 0, violating NVIC_PRIO_MAX=1 and risking preemption of motor timers at priority 3) - Add canTxQueueHWM / canRxBufferHWM tracking; expose via canardProtocolStatus_t and dronecan CLI command as "N / 32 (hwm: N)" for both TX queue and RX buffer - Include drivers/nvic.h in canard_stm32f7xx_driver.c
…pace - Replace direct canTxQueuePop() in main-loop seed path with canTxDrainQueue() called under __disable_irq()/__enable_irq(), making the ISR the sole consumer of the SW TX queue at all times - Only call ActivateNotification if queue is non-empty after seeding, eliminating the spurious ISR invocation on single-frame bursts - Add forward declaration for canTxDrainQueue() - Strip trailing whitespace throughout file
Add __DMB() in canTxQueuePush() between writing frame data and advancing head, and in canTxQueuePop() between reading head and reading frame data. Prevents the M7 write buffer from reordering the head store before the frame data store is globally visible.
Enable TransmitFifoPriority (TXFP) so the peripheral transmits mailboxes in request order rather than mailbox-number order. Without TXFP, loading the next frame into mailbox 0 after it completes causes it to transmit before frames still pending in mailboxes 1 and 2, corrupting the toggle bit sequence of multi-frame UAVCAN transfers. With TXFP enabled, restore the while loop in canTxDrainQueue to fill all free mailboxes per ISR callback for full pipeline utilization. Also fix remaining code review findings: - Check HAL_CAN_AddTxMessage return value; count failures as dropped frames - Add CanTxQueue_t struct tag to match RxBuffer_t convention - Remove blank line artifact at top of onTransferReceived body
- Add volatile to RxBuffer writeIndex/readIndex: written in ISR and read in main loop, missing volatile allows compiler to cache stale values causing silent receive stalls - Peek-then-pop in canTxDrainQueue: advance tail only after HAL_CAN_ AddTxMessage succeeds so a HAL rejection never silently loses a frame - Move ActivateNotification inside critical section: eliminates race where ISR could drain queue and deactivate between __enable_irq and the ActivateNotification call, leaving a spurious notification armed - Snapshot tail before advancing head in canTxQueuePush: prevents ISR advancing tail between the HWM read and head write, which understated the high water mark - Remove dead canTxQueuePop (inlined into peek-then-pop drain logic) - Remove dead RxFrame_t rxMsg global - Fix 3-space indent on first if in canardSTM32Transmit - Fix missing space in ErrorPassive CLI output - Add missing newline at EOF
…eNotification error handling - Fix REC register extraction: ESR bits 7:0 are error flags, REC lives at bits 31:24 (CAN_ESR_REC_Pos=24). Previous code always read near- zero instead of the actual receive error counter. - Remove canTxDropped increment from ISR drain path: a HAL_CAN_ AddTxMessage failure leaves tail unadvanced so the frame is retried on the next ISR — it is not dropped. Incrementing the drop counter here gave wrong semantics and created a dual-writer race on the volatile uint16_t counter. - Check ActivateNotification return in seed path: failure means the TX ISR will never fire for in-flight completions, stalling the queue with no indication. Return 0 on failure so the DroneCAN task retries.
…e access Replace direct struct field access in canTxDrainQueue with two new queue functions: canTxQueuePeek() returns a const pointer to the tail frame (including the DMB barrier) without consuming it, and canTxQueueConsume() advances the tail after the HW accepts the frame. This keeps queue invariants — barrier placement, index arithmetic, and the peek-then-consume ordering — inside the queue API rather than spread across canTxDrainQueue.
…, flag CLI denominators - Fix ESR bit-range annotation on rec field in header: was ESR[7:0] (error flags), correct position is ESR[31:24] per CAN_ESR_REC_Pos=24. Extraction code was already correct; wrong comment risked a future maintainer reverting it. - Add comment explaining why TxMailboxAbortCallback is not wired: AutoRetransmission=ENABLE prevents ALST; TERR leaves the frame in the queue (tail not advanced) so the next Transmit call re-seeds the HW. - Add coupling comments on hardcoded /32 denominators in CLI output so they stay in sync if TX_QUEUE_SIZE or RX_BUFFER_SIZE ever changes.
Add @brief/@param/@RetVal doc blocks to all previously undocumented functions in canard_stm32f7xx_driver.c. Fix stale FDCAN1/hfdcan copy-paste in the canardSTM32CAN1_Init block — this is bxCAN, not FDCAN.
Group functions by subsystem in all three files to aid navigation: F7 driver: Initialization → RX subsystem → TX subsystem → Diagnostics. Removes forward declarations (each function now defined before use). H7 driver: Initialization → RX → TX → Diagnostics. Same section layout as F7 for consistency. Also adds missing canardSTM32GetTxQueueFillLevel stub (declared in shared header, never implemented for H7 — caused undefined-reference link error on MATEKH743). dronecan.c: Message handlers → Libcanard callbacks → Message senders → TX/periodic processing → Public API. No logic changed. Both MATEKF765SE and MATEKH743 build clean.
…ions Public API (dronecanInit, dronecanUpdate) at the top, internals below: TX/periodic processing → message senders → libcanard callbacks → message handlers. Add forward declarations for all internal functions. Mark all internal functions static.
- canardSTM32Transmit: return 1 after successful canTxQueuePush regardless of ActivateNotification result — prevents libcanard from re-pushing the same frame and producing duplicate CAN transmissions that corrupt the UAVCAN transfer-ID toggle sequence - handle_GetNodeInfo: replace printf() with LOG_DEBUG() — printf has no output backend in firmware and risks a fault or linker error - rxBufferPushFrame/rxBufferPopFrame: change return type uint8_t → int8_t so the -1 error sentinel is type-correct
Issue 4 — AbortCallback comment: previous text incorrectly described TERR as a synchronous AddTxMessage failure and misattributed the abort path suppression. Correct explanation: AutoRetransmission=ENABLE causes the hardware to retransmit automatically on arbitration loss (ALST), so the software abort path is never triggered; bus-off is handled by the state machine in dronecanUpdate(). Issue 5 — critical section: replace __disable_irq()/__enable_irq() with ATOMIC_BLOCK(NVIC_PRIO_CAN) from build/atomic.h. This raises BASEPRI to mask only the CAN TX ISR (priority 4 and below), allowing higher-priority IRQs such as the gyro timer (priority 3) to continue firing during the mailbox-seeding window. Also simplifies the two return-1 paths into one.
- Fix 7: Clarify SJW=3 comment in canardSTM32ComputeTimings - Fix 8: Remove commented-out CubeMX scaffold from canardSTM32CAN1_Init - Fix 9: Convert tab indentation to 4-space in canardSTM32Recieve (F7) - Fix 10: Remove USER CODE BEGIN/END sentinel comments from H7 FDCAN1_Init - Fix 11: Replace Unicode section separators with ASCII dashes in all three files
… and 12) - Fix 6: Clarify hardcoded 32 denominators in CLI status output — add comments naming the matching TX_QUEUE_SIZE / RX_BUFFER_SIZE constants - Fix 12: Remove WHAT-comment "// Transmitting" from processCanardTxQueue
F7 driver: - Add __DMB() before writeIndex advance in rxBufferPushFrame (ISR→main memory ordering: ensures frame data visible before consumer sees new index) - Add __DMB() before memcpy in rxBufferPopFrame (acquire barrier: ensures ISR's writes are visible before main loop reads frame data) - Check canardSTM32ComputeTimings() return value; return error if no valid timing solution found for the requested bitrate - Mark rxBufferPushFrame, rxBufferPopFrame, rxBufferNumMessages static H7 driver: - Fix prescaler upper bound: 1024 → 512 (FDCAN NBTP.NBRP is 9-bit) - Zero-initialize canardProtocolStatus_t in GetProtocolStatus so TEC/REC/LEC/queue fields are not garbage on H7 builds - Remove always-true DataLength guard in canardSTM32Transmit; libcanard constrains data_len to 0-8 so the else branch was unreachable
- Mark canard and memory_pool static in dronecan.c (internal state, no reason for external linkage) - Fix spelling: "incremeneted" -> "incremented" in dronecan.c - Clarify H7 FDCAN_FILTER_DUAL comment: the specific filter is a no-op because ConfigGlobalFilter accepts all non-matching frames
Both F7 (max_quanta_per_bit=18) and H7 (max_quanta_per_bit=17) differ from the reference paper maximum of 17. Mark both for revisit alongside SJW tuning — SJW=3 was set conservatively and should be validated against actual crystal performance on the board.
Per INAV coding standards — dead debug logging should be deleted, not left commented out.
The raw sjw=3 value is written to the HAL without the N-1 offset that BS1/BS2 correctly apply, so the hardware actually runs 4TQ SJW not 3TQ. Mark with TODO for the planned SJW/timing tuning pass.
F7 driver: - Fix false-full race in canTxQueuePush: read tail fresh at the full-check so a concurrent ISR drain cannot cause a spurious TX drop; move snapshot after the push for HWM calculation only H7 driver: - Replace ErrorCode/!= 1 timing check with if (!canardSTM32ComputeTimings()) to match F7 pattern and remove fragile bool-stored-as-int16_t - Fix @RetVal for canardSTM32CAN1_Init: documents CANARD_OK (0) on success, not 1 as previously stated - Remove stray USER CODE BEGIN FDCAN1_Init 2 CubeMX sentinel
- Set optional_field_flags to OPTIONAL_FIELD_FLAG_VCS_COMMIT in handle_GetNodeInfo: the field is a bitmask indicating which optional fields are valid, not a storage field for patch version - Remove stale TODO comment from handle_GetNodeInfo - Drop '/ 32' denominator from CLI TX queue and RX buffer lines: value is misleading on H7 (no software queue exists); fill level alone is sufficient on both targets
- Remove rhetorical question comments from GPIO init in F7 and H7 drivers (AF selection is target-specific by design; cmake selects the right driver) - Remove "Wheres the data?" comment from canardSTM32Recieve in F7 driver - Replace "unsure about this one" on H7 TX header fields with factual notes: ESI_ACTIVE (not error-passive), NO_TX_EVENTS (no FIFO allocated), MessageMarker=0 (unused without TX events) - Normalize tab/space mixed indentation in dronecanInit in dronecan.c
- Fix F7 canardSTM32CAN1_Init @RetVal: CANARD_OK is 0, not 1 - Enable AutoRetransmission on H7 to match F7; without it frames lost to bus arbitration are silently dropped, breaking multi-frame transfers - Add volatile qualifier to RX buffer function parameters so the compiler cannot cache index reads across ISR/main-loop boundary - Rename canardSTM32Recieve -> canardSTM32Receive (fix persistent misspelling in header, F7/H7/SITL drivers, and dronecan.c call site)
…N drivers - Correct file header names in canard_stm32f7xx_driver.c and canard_stm32h7xx_driver.c - Remove leading space before #ifdef CAN1_STANDBY in both drivers - Replace all tabs with spaces in dronecan.c (RX loop, 1Hz service block, processCanardTxQueue body)
…neCAN drivers
- Add canardSTM32GetTxQueueFillLevel() stub to canard_sitl_driver.c to fix
linker error when building SITL target with USE_DRONECAN defined
- Add memset in sitlCANGetStatsStub and sitlCANGetStatsSocketCAN to zero all
canardProtocolStatus_t fields, preventing stack garbage in CLI output
- Fix stale comment in processCanardTxQueue: "TX FIFO full" -> "SW TX queue full"
- Remove redundant LOG_DEBUG("Battery Info") from onTransferReceived
- Fix 6-space indent to 4-space in CAN1_RX0_IRQHandler
…ership docs - rxBufferPopFrame: snapshot readIndex before __DMB() so the barrier precedes the data dereference, matching the push-side store-release pattern - canardSTM32Transmit: seed HW mailboxes when any mailbox is free (> 0) rather than only when all three are idle (== 3), preventing a permanent TX stall when HAL_CAN_AddTxMessage fails with the SW queue non-empty - Add counter ownership comments: canTxDropped/canTxQueueHWM written by main loop, canRxBufferHWM written by ISR; all are single-instruction atomic reads - H7 AutoRetransmission=ENABLE: add detailed comment explaining frame ordering guarantee (FDCAN TX FIFO is strictly ordered) and the TODO to evaluate whether DISABLE would be safer for the H7 polling model
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Issue with F4 builds on #11514 has been fixed. |
| header->DLC = frame->data_len; | ||
| header->RTR = (frame->id & CANARD_CAN_FRAME_RTR) ? CAN_RTR_REMOTE : CAN_RTR_DATA; | ||
| header->TransmitGlobalTime = DISABLE; | ||
| memcpy(data, frame->data, frame->data_len); | ||
| } |
There was a problem hiding this comment.
Fixed in commit caa6d63 — added data_len > 8 validation in canardSTM32Transmit() before the frame is accepted into the queue. Classic CAN max payload is 8 bytes.
| } | ||
|
|
||
| // If all mailboxes are idle, RQCP will never fire to start the ISR chain. | ||
| // Seed the HW via canTxDrainQueue while blocking only the CAN TX ISR (not |
There was a problem hiding this comment.
Fixed in commit caa6d63 — corrected comment to accurately describe that ATOMIC_BLOCK(NVIC_PRIO_CAN) masks all interrupts at priority >= NVIC_PRIO_CAN via BASEPRI, not just CAN TX ISR.
| static bool canTxQueuePush(const CanardCANFrame *frame) { | ||
| uint8_t next = (canTxQueue.head + 1) % TX_QUEUE_SIZE; | ||
| // Read tail fresh: ISR only advances tail, so a fresh read can only show more space, | ||
| // never less — eliminates false-full when ISR drains a slot concurrently. | ||
| if (next == canTxQueue.tail) { | ||
| canTxDropped++; | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Fixed in commit caa6d63 — added clarifying comment that usable capacity is TX_QUEUE_SIZE - 1 (31) due to the one-slot-empty scheme.
|
My bot suggested that the following may be worth looking at: HAL_CAN_DeactivateNotification() called from ISR — This HAL function may use __HAL_LOCK() internally, which is not reentrant. Should use the underlying __HAL_CAN_DISABLE_IT() macro instead, which is interrupt-safe. SJW encoding inconsistency — SJW is stored differently from BS1/BS2 (no N-1 offset) creating a silent discrepancy in bit timing configuration. The TODO comment Empty switch cases in handle_NodeStatus() — dead code disguised as live code; replace with a TODO comment or remove entirely? volatile overuse — canTxDropped and canTxQueueHWM are main-loop-only but declared volatile alongside the genuinely ISR-shared canRxBufferHWM, potentially misleading future readers. The overall SPSC architecture is sound. The optional_field_flags fix, AutoRetransmission enable on H7, TXFP=ENABLE, removing printf from handler context, and the FDCAN prescaler range correction are all correct. |
Update comment to reflect unique ID used instead of dummy value. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
Test firmware build ready — commit Download firmware for PR #11560 234 targets built. Find your board's
|
…#11560 - Replace empty switch cases in handle_NodeStatus with (void)nodeStatus + TODO - Remove volatile from main-loop-only canTxDropped/canTxQueueHWM - Use __HAL_CAN_DISABLE_IT() instead of HAL_CAN_DeactivateNotification() in canTxDrainQueue (ISR context safety)
|
All items from your bot's review have been addressed and hardware-tested: 1. HAL_CAN_DeactivateNotification() in ISR — Fixed in commit 5e5f0ff. Replaced with 2. SJW encoding — Left as-is. Existing TODO acknowledges it, and as you noted, not a blocker. 3. Empty switch cases in handle_NodeStatus() — Fixed in commit 5e5f0ff. Replaced with 4. volatile overuse on canTxDropped/canTxQueueHWM — Fixed in commit 5e5f0ff. Removed Hardware test: PASS — Successful CAN node enumeration and telemetry on F765SE with the fixes applied. |
Summary
Migrates the STM32F7 bxCAN TX implementation from polling/blocking to ISR-driven transmission using a 32-deep SPSC ring queue. Also fixes several latent bugs found in H7 and SITL drivers during review.
Changes
canardSTM32GetTxQueueFillLevelstub, improved diagnosticscanardSTM32GetTxQueueFillLevelstub (was causing linker error), fixed uninitialized status fieldscanardSTM32Receive(was "Recieve")optional_field_flagsmisuse, made canard/memory_pool static, removed printf from ISR contextdronecancommand showing TX/RX queue stats and protocol statusTesting
ISR-driven TX verified queuing and transmitting frames correctly. SPSC ring buffer (32-deep) operating with no overruns. CLI
dronecancommand confirmed showing live queue fill levels and protocol status. No regression in RX or frame parsing.F4 build failure was a pre-existing issue in #11514 (
__FPU_PRESENTredefined incmake/stm32f4.cmake). The sameSYSTEM_INCLUDE_DIRECTORIESfix applied tocmake/stm32f7.cmake(commit 37e6b23) needs to be applied tocmake/stm32f4.cmake. This has been resolved in #11514.Known Deferred Items (Non-Blocking)
max_quanta_per_bit=18on F7 vs paper max of 17 (H7 uses 17) — TODO comment in placehandle_NodeStatus/handle_GNSSRCTMStream— scaffolding/silent discard, follow-up neededNone of these block merging.