Skip to content

idma: Streamline TCDM connection, enable multi-channel operation (NEW)#322

Draft
gbellocchi wants to merge 33 commits into
pulp-platform:mainfrom
gbellocchi:gb/idma_tcdm_multich
Draft

idma: Streamline TCDM connection, enable multi-channel operation (NEW)#322
gbellocchi wants to merge 33 commits into
pulp-platform:mainfrom
gbellocchi:gb/idma_tcdm_multich

Conversation

@gbellocchi

Copy link
Copy Markdown

Description

This PR updates the iDMA integration within the Snitch cluster, as shown in Figures 1 and 2.
It is a refreshed version of PR #238 , which had fallen behind the main snitch cluster branch due to being open for a long period.

This includes:

  • The wide port from the SoC directly connects to the TCDM subsystem, allowing external data access while DMA transfers run.
  • The wide DMA XBAR has been simplified.
  • The DMA uses the OBI protocol to connect to the TCDM subsystem.
  • Arbitration between the channels, the superbanks, and the SoC port happens now in TCDM, increasing flexibility and throughput. With minimal changes, it is possible to have multiple DMA cores in the cluster.
  • Add DMINIT support in Snitch cluster.
snitch_cluster_idma_pr_pre Figure 1: Old integration of `idma`. The wide AXI4 XBAR interconnect is used for: (i) DMA interfaces to TCDM and SoC; (ii) NoC wide in/out; (iii) I$; (iv) Zero memory; and (v) BootROM.
snitch_cluster_idma_pr_post Figure 2: New integration of `idma`. The wide AXI4 XBAR interconnect is simplified: (i) DMA interfaces to TCDM via OBI and wide AXI4 requests are transmitted to other clusters via the AXI4 XBAR (NoC wide out); (ii) Remove AXI4 TCDM port and directly interface the NoC wide inputs of external DMAs to the TCDM subsystem (bypassing the XBAR); (iii) Remove zero memory as the iDMA now supports memset initialization of TCDM.

Tasks

I have collected the tasks and reviewer comments/suggestions of PR #88:

gbellocchi and others added 30 commits June 2, 2026 23:12
* `axi_zero_mem` is removed because the idma can now initialize the memory to a desired value.
* Modify the wide cluster xbar address map based on the removal of the zero memory and the new idma-tcdm integration.
* Update the cluster and dma enums in the `snitch_pkg`.
The wide `soc_in_axi_req` is directly interfaced with the tcdm subsystem and bypasses the wide axi cluster xbar.
* `snitch_cluster`: Add memory and obi typedefs.
* `snitch_cluster`: Add obi-to-tcdm protocol conversion for dma requests toward the tcdm subsystem.
* `snitch_cluster`: Update interface of `snitch_cc` instance.
* `snitch_cc`: Instantiate `idma` with obi interfaces.
idma: Fix dminit opcode encoding and add TCDM tests
* Avoid the flattening of arrays in the tcdm dma interconnect.
* Fix deadlock in tcdm-to-tcdm idma transfers, which originates from the absence of p_valid for write transactions.

* Add write pipeline shift register (mirroring id_pipeline) for tracking whether an in-flight slot is a write.
* Add GUI version of `vsim` among the `run.py` simulators.

* Add `wave-file` argument to specify a corresponding wave file to automatically source when launching the `vsim-gui` simulator.

* Add support for wave argument in the generated `snitch_cluster.vsim.gui` script.
* Tie off the `obi_dma_req_o` when no DMA is instantiated in the `snitch_cc`.
* Tie off undriven signals to avoid having undefined behaviors.

* Add documentation for the obi-to-tcdm bridge.
* hw: Update hw configuration files and templates.

* hw: Remove commented lines in `snitch_cluster.sv` for address remapping after zeromem removal.

* sw: Update experiment json configuration files.
* Remove hardcoded reference to DM core as core 8.

* Update dma wait api.
* Map arrays to l1 in order to test l1-to-l1 dma transfers. To this end, use `snrt_l1_alloc()` to initialize `src` and `dst` arrays.

* Extend the range of traffic sizes to trigger the l1-to-l1 deadlock experienced with other kernels (exp).
* This test assesses proper tls initialization at runtime (sanity check) and modification at application-time (core isolation).
* This test concerns the traffic patterns used in `snrt_init_tls` at runtime.

* Parameters are currently tuned on the specific case of the `exp` kernel.
* TCDM does not issue a response for write transactions, while OBI requires
an R-channel acknowledgement for both reads and writes. A shift-register
pipeline is added to keep track of each accepted transaction and subsequently
drive a write acknowledgement exactly `MemRespLat` cycles after the first grant.

* The iDMA OBI read backend issues back-to-back A-channel requests
regardless of `r_dp_ready_i`. With MemRespLat=1, consecutive grants
produce consecutive acknowledgements; if both arrive while `rready=0` the
one-entry hold register overflows and the second acknowledgement is
silently dropped, causing the iDMA to stall indefinitely. To this end,
a `can_accept` control is added to check whether a new grant is received
when the hold register is already occupied or when a response is arriving
that the initiator cannot yet consume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants