跳转至

Uncache Load Processing Unit LoadQueueUncache

Update Time Code Version Updater Notes
2025.02.26 eca6983 Maxpicca-Li Initial version

Functional Description

The LoadQueueUncache module, along with the Uncache module, acts as an intermediate station between the LoadUnit pipeline and the bus for uncache load access requests. The Uncache module, being closer to the bus, performs functions detailed in Uncache. The LoadQueueUncache module, being closer to the pipeline, is responsible for the following:

  1. Receiving uncache load requests from LoadUnit pipeline modules 0, 1, and 2.
  2. Selecting uncache load requests that are ready for uncache access and sending them to the Uncache Buffer.
  3. Receiving processed uncache load requests from the Uncache Buffer.
  4. Returning processed uncache load requests to the LoadUnit.

Structurally, LoadQueueUncache currently has 4 (configurable number) UncacheEntry items. Each item independently handles a request and uses a set of state registers to control its specific processing flow. There is a FreeList that manages the allocation and deallocation status of the items. LoadQueueUncache primarily coordinates the overall logic for new item allocation, request selection, response dispatch, and dequeueing for the 4 items.

Feature 1: Enqueue Logic

LoadQueueUncache is responsible for receiving requests from LoadUnit modules 0, 1, and 2. These requests can be either MMIO requests or NC requests. First, the system sorts requests based on their robIdx in chronological order (from oldest to newest) to ensure that the oldest requests are prioritized for allocation to available items, preventing deadlocks in special cases caused by rollback of older items. The conditions for entering processing are: the request is not a re-send, has no exceptions, and the system allocates items to requests sequentially based on the available free items in the FreeList.

When LoadQueueUncache reaches its capacity limit and there are still requests not allocated to an item, the system selects the oldest of these unallocated requests for rollback.

Feature 2: Dequeue Logic

When an item completes the Uncache access operation and returns to the LoadUnit, or is flushed by a redirect, that item is dequeued and its flag in the FreeList is released. Multiple items may be dequeued in the same cycle. Requests returned to the LoadUnit are selected in the first cycle and returned in the second cycle.

The LoadUnit ports available for handling uncache return requests are predetermined. Currently, MMIO requests only return to LoadUnit 2; NC requests can return to LoadUnit 1\2. When returning through multiple ports, the remainder of the uncache entry ID divided by the number of ports is used to determine the LoadUnit port to which each item can return, and one item is selected from the candidate items for that port to be returned.

Feature 3: Uncache Interaction Logic

(1) Sending req

In the first cycle, one request ready for uncache access is selected. In the second cycle, it is sent to the Uncache Buffer. The sent request includes the ID of the selected item, referred to as mid. Whether it was successfully received can be determined by req.ready.

(2) Receiving idResp

If the sent request is received by the Uncache Buffer, the Uncache's idResp will be received in the cycle following the reception. This response contains mid and the entry ID allocated by the Uncache Buffer for this request (referred to as sid). LoadQueueUncache uses mid to find the corresponding internal item and stores sid in that item.

(3) Receiving resp

After the Uncache Buffer completes the bus access for the request, it returns the access result to LoadQueueUncache. This response contains sid. Considering the merge characteristic of the Uncache Buffer (detailed enqueue merge logic can be found in Uncache), one sid may correspond to multiple items in LoadQueueUncache. LoadQueueUncache uses sid to find all relevant internal items and passes the access result to these items.

Overall Block Diagram

LoadQueueUncache Overall Block Diagram

Interface Timing

Enqueue Interface Timing Example

As shown in the figure below, assume 5 consecutive NC requests enter through LoadUnit 0\1\2 sequentially, and LoadQueueUncache currently has only 4 items. Therefore, the first four items are normally allocated to existing free items. r5, which appears in the 3rd cycle, cannot be allocated an item because the buffer is full, thus causing a rollback in the 5th cycle. Note that the figure assumes NC requests enter sequentially in each cycle, i.e., r1 < r2 < r3 and r4 < r5; if sorting is needed, replace io_req with the sorted result, and the rest of the logic remains the same.

LoadQueueUncache Enqueue Interface Timing Diagram

Dequeue Interface Timing Example

The figure below shows the cases for mmioOut, one ncOut per cycle, and two ncOut simultaneously in one cycle. Let's explain the first example in detail: in the 2nd cycle, the writeback item is selected, the freeList is updated, and the request is registered for one cycle before being written back to the LoadUnit in the 3rd cycle. Subsequent examples can be understood similarly.

LoadQueueUncache Dequeue Interface Timing Diagram

Uncache Interface Timing Example

(1) When there are no outstanding requests, only one uncache access can be sent per segment (controlled by io_uncache_req_ready) until an uncache response is received. As shown below, when io_uncache_req_ready is asserted high in the 5th cycle, the uncache request is sent out. The Uncache receives the request in the 6th cycle and returns idResp in the 7th cycle. After some access time, the Uncache access result is received in cycle 10+n.

LoadQueueUncache and Uncache Interface Timing Diagram

(1) When there are outstanding requests, multiple uncache accesses can be sent per segment (controlled by io_uncache_req_ready). As shown below, m1, m2, m3, m4 requests are sent continuously. In the 4th and 5th cycles, the Uncache dispatch results for the first 2 requests are received. At this point, Uncache is full, m3 is registered in an intermediate register, and m4 is waiting for io_uncache_req_ready to be asserted high. In cycle 9+n, io_uncache_req_ready is asserted high, and m4 is also sent out. The Uncache dispatch results for m3 and m4 are received in cycles 10+n and 11+n, respectively. Uncache access responses will be received gradually in subsequent cycles.

LoadQueueUncache and Uncache Interface Timing Diagram when outstanding

UncacheEntry Module

UncacheEntry is responsible for independently processing the lifecycle of a request and uses a set of state registers to control its specific processing flow. The key structures are as follows:

  • req_valid: Indicates whether the item is valid.
  • req: Stores all relevant contents of the received request.
  • uncacheState: Records the current life stage of the item.
  • slaveAccept, slaveId: Records whether the item is allocated to the Uncache Buffer and the allocated UnCache Buffer ID.
  • needFlushReg: Indicates whether the item needs a delayed flush.

Feature 1: Lifecycle and State Machine

The lifecycle of each UncacheEntry can be fully described by uncacheState. It includes the following states:

  • s_idle: The default state, indicating no request, or the request exists but is not yet ready to be sent to the Uncache Buffer.
  • s_req: Indicates that the conditions are met to send the request to the Uncache Buffer, waiting to be selected by LoadQueueUncache and received by its intermediate register (theoretically it should be received by the Uncache Buffer, but after LoadQueueUncache selection, the request is temporarily stored for one cycle before being sent to the Uncache Buffer; if not received by the Uncache Buffer, it remains stored in the intermediate register). For UncacheEntry, it is not aware of the existence of the intermediate register; it only knows that the request has been sent and successfully received.
  • s_resp: Indicates that the request has been received by the intermediate register, waiting for the Uncache Buffer to return the access result.
  • s_wait: Indicates that the access result from the Uncache Buffer has been received, waiting to be selected by LoadQueueUncache and received by the LoadUnit.

The state transition diagram is shown below, where black indicates the normal lifecycle of the item, and red indicates the item's lifecycle ending abnormally due to a redirect requiring the item to be flushed.

UncacheEntry Finite State Machine Diagram

For the normal lifecycle, the trigger events are detailed as follows:

  • canSendReq: For MMIO requests, when the corresponding instruction reaches the head of the ROB, the Uncache access can be sent. For NC requests, when req_valid is valid, the Uncache access can be sent.
  • uncacheReq.fire: The item is received by the LoadQueueUncache intermediate register. The item will receive the idResp transferred from the Uncache Buffer in the next cycle and update slaveAccept and slaveId.
  • uncacheResq.fire: The item receives the access result returned by the Uncache Buffer.
  • writeback: When in the s_wait state, a writeback request can be sent. The writeback signals for MMIO requests and NC requests are different and need to be distinguished.

Feature 2: Redirect Flush Logic

In cases of abnormal lifecycle, it is usually caused by a pipeline redirect.

When a pipeline redirect signal is received, it is determined if the current item is newer than the redirected item. If the current item is newer, it needs to be flushed, generating a needFlush signal. In general, all contents of the item are immediately flushed, and the item is reclaimed by the FreeList. However, Uncache requests and responses need to correspond exactly to the same uncache load request. Therefore, if the item has already sent an uncache request at this point, its lifecycle can only end after receiving the Uncache response, leading to a "flush delay" situation. Thus, when the needFlush signal is generated, if the item cannot be immediately flushed, the signal needs to be stored in the needFlushReg register. The flush operation will only be executed when the Uncache response is received, and needFlushReg will be cleared.

Feature 3: Exception Cases

Exception cases in LoadQueueUncache include:

  1. When the request is sent to the bus, the bus returns corrupt or denied. This exception needs to be marked when the UncacheEntry is written back and handled by the LoadUnit.