Uncache Load Processing Unit LoadQueueUncache
Update Time | Code Version | Updater | Notes |
---|---|---|---|
2025.02.26 | eca6983 | Maxpicca-Li | Initial version |
Functional Description
The LoadQueueUncache module, along with the Uncache module, acts as an intermediate station between the LoadUnit pipeline and the bus for uncache load access requests. The Uncache module, being closer to the bus, performs functions detailed in Uncache. The LoadQueueUncache module, being closer to the pipeline, is responsible for the following:
- Receiving uncache load requests from LoadUnit pipeline modules 0, 1, and 2.
- Selecting uncache load requests that are ready for uncache access and sending them to the Uncache Buffer.
- Receiving processed uncache load requests from the Uncache Buffer.
- Returning processed uncache load requests to the LoadUnit.
Structurally, LoadQueueUncache currently has 4 (configurable number) UncacheEntry items. Each item independently handles a request and uses a set of state registers to control its specific processing flow. There is a FreeList that manages the allocation and deallocation status of the items. LoadQueueUncache primarily coordinates the overall logic for new item allocation, request selection, response dispatch, and dequeueing for the 4 items.
Feature 1: Enqueue Logic
LoadQueueUncache is responsible for receiving requests from LoadUnit modules 0, 1, and 2. These requests can be either MMIO requests or NC requests. First, the system sorts requests based on their robIdx
in chronological order (from oldest to newest) to ensure that the oldest requests are prioritized for allocation to available items, preventing deadlocks in special cases caused by rollback of older items. The conditions for entering processing are: the request is not a re-send, has no exceptions, and the system allocates items to requests sequentially based on the available free items in the FreeList.
When LoadQueueUncache reaches its capacity limit and there are still requests not allocated to an item, the system selects the oldest of these unallocated requests for rollback.
Feature 2: Dequeue Logic
When an item completes the Uncache access operation and returns to the LoadUnit, or is flushed by a redirect, that item is dequeued and its flag in the FreeList is released. Multiple items may be dequeued in the same cycle. Requests returned to the LoadUnit are selected in the first cycle and returned in the second cycle.
The LoadUnit ports available for handling uncache return requests are predetermined. Currently, MMIO requests only return to LoadUnit 2; NC requests can return to LoadUnit 1\2. When returning through multiple ports, the remainder of the uncache entry ID divided by the number of ports is used to determine the LoadUnit port to which each item can return, and one item is selected from the candidate items for that port to be returned.
Feature 3: Uncache Interaction Logic
(1) Sending req
In the first cycle, one request ready for uncache access is selected. In the second cycle, it is sent to the Uncache Buffer. The sent request includes the ID of the selected item, referred to as mid
. Whether it was successfully received can be determined by req.ready
.
(2) Receiving idResp
If the sent request is received by the Uncache Buffer, the Uncache's idResp
will be received in the cycle following the reception. This response contains mid
and the entry ID allocated by the Uncache Buffer for this request (referred to as sid
). LoadQueueUncache uses mid
to find the corresponding internal item and stores sid
in that item.
(3) Receiving resp
After the Uncache Buffer completes the bus access for the request, it returns the access result to LoadQueueUncache. This response contains sid
. Considering the merge characteristic of the Uncache Buffer (detailed enqueue merge logic can be found in Uncache), one sid
may correspond to multiple items in LoadQueueUncache. LoadQueueUncache uses sid
to find all relevant internal items and passes the access result to these items.
Overall Block Diagram
Interface Timing
Enqueue Interface Timing Example
As shown in the figure below, assume 5 consecutive NC requests enter through LoadUnit 0\1\2 sequentially, and LoadQueueUncache currently has only 4 items. Therefore, the first four items are normally allocated to existing free items. r5
, which appears in the 3rd cycle, cannot be allocated an item because the buffer is full, thus causing a rollback in the 5th cycle. Note that the figure assumes NC requests enter sequentially in each cycle, i.e., r1
< r2
< r3
and r4
< r5
; if sorting is needed, replace io_req
with the sorted result, and the rest of the logic remains the same.
Dequeue Interface Timing Example
The figure below shows the cases for mmioOut
, one ncOut
per cycle, and two ncOut
simultaneously in one cycle. Let's explain the first example in detail: in the 2nd cycle, the writeback item is selected, the freeList is updated, and the request is registered for one cycle before being written back to the LoadUnit in the 3rd cycle. Subsequent examples can be understood similarly.
Uncache Interface Timing Example
(1) When there are no outstanding requests, only one uncache access can be sent per segment (controlled by io_uncache_req_ready
) until an uncache response is received. As shown below, when io_uncache_req_ready
is asserted high in the 5th cycle, the uncache request is sent out. The Uncache receives the request in the 6th cycle and returns idResp
in the 7th cycle. After some access time, the Uncache access result is received in cycle 10+n.
(1) When there are outstanding requests, multiple uncache accesses can be sent per segment (controlled by io_uncache_req_ready
). As shown below, m1, m2, m3, m4 requests are sent continuously. In the 4th and 5th cycles, the Uncache dispatch results for the first 2 requests are received. At this point, Uncache is full, m3 is registered in an intermediate register, and m4 is waiting for io_uncache_req_ready
to be asserted high. In cycle 9+n, io_uncache_req_ready
is asserted high, and m4 is also sent out. The Uncache dispatch results for m3 and m4 are received in cycles 10+n and 11+n, respectively. Uncache access responses will be received gradually in subsequent cycles.
UncacheEntry Module
UncacheEntry is responsible for independently processing the lifecycle of a request and uses a set of state registers to control its specific processing flow. The key structures are as follows:
req_valid
: Indicates whether the item is valid.req
: Stores all relevant contents of the received request.uncacheState
: Records the current life stage of the item.slaveAccept
,slaveId
: Records whether the item is allocated to the Uncache Buffer and the allocated UnCache Buffer ID.needFlushReg
: Indicates whether the item needs a delayed flush.
Feature 1: Lifecycle and State Machine
The lifecycle of each UncacheEntry can be fully described by uncacheState
. It includes the following states:
s_idle
: The default state, indicating no request, or the request exists but is not yet ready to be sent to the Uncache Buffer.s_req
: Indicates that the conditions are met to send the request to the Uncache Buffer, waiting to be selected by LoadQueueUncache and received by its intermediate register (theoretically it should be received by the Uncache Buffer, but after LoadQueueUncache selection, the request is temporarily stored for one cycle before being sent to the Uncache Buffer; if not received by the Uncache Buffer, it remains stored in the intermediate register). For UncacheEntry, it is not aware of the existence of the intermediate register; it only knows that the request has been sent and successfully received.s_resp
: Indicates that the request has been received by the intermediate register, waiting for the Uncache Buffer to return the access result.s_wait
: Indicates that the access result from the Uncache Buffer has been received, waiting to be selected by LoadQueueUncache and received by the LoadUnit.
The state transition diagram is shown below, where black indicates the normal lifecycle of the item, and red indicates the item's lifecycle ending abnormally due to a redirect requiring the item to be flushed.
For the normal lifecycle, the trigger events are detailed as follows:
canSendReq
: For MMIO requests, when the corresponding instruction reaches the head of the ROB, the Uncache access can be sent. For NC requests, whenreq_valid
is valid, the Uncache access can be sent.uncacheReq.fire
: The item is received by the LoadQueueUncache intermediate register. The item will receive theidResp
transferred from the Uncache Buffer in the next cycle and updateslaveAccept
andslaveId
.uncacheResq.fire
: The item receives the access result returned by the Uncache Buffer.writeback
: When in thes_wait
state, a writeback request can be sent. The writeback signals for MMIO requests and NC requests are different and need to be distinguished.
Feature 2: Redirect Flush Logic
In cases of abnormal lifecycle, it is usually caused by a pipeline redirect.
When a pipeline redirect signal is received, it is determined if the current item is newer than the redirected item. If the current item is newer, it needs to be flushed, generating a needFlush
signal. In general, all contents of the item are immediately flushed, and the item is reclaimed by the FreeList. However, Uncache requests and responses need to correspond exactly to the same uncache load request. Therefore, if the item has already sent an uncache request at this point, its lifecycle can only end after receiving the Uncache response, leading to a "flush delay" situation. Thus, when the needFlush
signal is generated, if the item cannot be immediately flushed, the signal needs to be stored in the needFlushReg
register. The flush operation will only be executed when the Uncache response is received, and needFlushReg
will be cleared.
Feature 3: Exception Cases
Exception cases in LoadQueueUncache include:
- When the request is sent to the bus, the bus returns corrupt or denied. This exception needs to be marked when the UncacheEntry is written back and handled by the LoadUnit.