Rob
- Version: V2R2
- Status: OK
- Date: 2025/01/20
- commit: xxx
Terminology Description
Abbreviation | Full Name | Description |
---|---|---|
rob | Reorder Buffer | Reorder Buffer |
rab | Rename Buffer | Rename Buffer |
- | Redirect | Redirect information sent from ctrlblock |
- | Walk | Rollback process after a redirect occurs |
snpt | Snapshot | Snapshot information sent from ctrlblock |
wfi | Wait For Interrupt | Wait For Interrupt |
Submodule List
Submodule | Description |
---|---|
RobEnqPtrWrapper | Maintains the enqueue pointer for the rob |
NewRobDeqPtrWrapper | Maintains the dequeue pointer for the rob |
Rab | Maintains the state of each rat during commit or walk, interacts with rename module |
VTypeBuffer | Maintains a structure similar to Rab for Vtype, interacts with decode module |
ExceptionGen | Exception generation module |
SnapshotGenerator | Snapshot generation module |
Design Specifications
- Supports instruction write-back and commit
- Supports instruction redirect
- Supports interrupt handling
- Supports Rob compression
- Supports snapshot
- Supports exception handling
- Supports vector memory accesses writing back first, then handling exceptions and setting vstart
- Rob supports committing/walking up to 8 entries per cycle
- Rab supports committing/walking up to 6 entries per cycle
Functionality
The Rob module includes: RobEnqPtrWrapper for the enqueue pointer, NewRobDeqPtrWrapper for the dequeue pointer, Rab for maintaining the state of each rat during commit or walk, VTypeBuffer for maintaining the state of Vtype, ExceptionGen for generating exceptions, and SnapshotGenerator for generating snapshots.
The main body of the Rob module is a circular queue with 160 entries. Pointers include a 1-bit flag and an 8-bit value. When the value increments from the maximum value, the flag inverts to distinguish instruction order. When the queue is empty, enqptr === deqptr, with both flag and value being equal. When the queue is full, enqptr.value === deqptr.value, enqptr.flag =/= deqptr.flag, with value equal and flag unequal. The signals contained in each RobEntry are shown in the table below.
Signal Name | Description |
---|---|
isVset | Is it a Vset instruction? |
commitType | Instruction commit type |
isHls | Is it a virtualized load/store instruction? |
wflags | Does it write fcsr's fflags? |
ftqIdx | ftq pointer, used to read pcMem |
ftqOffset | ftq offset, used to calculate pc |
traceBlockInPipe | Trace data in the pipeline, including iretire, ilastsize, itype |
instrSize | Number of instructions in Rob compression |
fpWen | Used to update csr's FS |
isRVC | Is it a compressed instruction? |
dirtyVs | Used to update csr's VS |
realDestSize | Number of destination registers written by the instruction |
stdWritebacked | Has the store instruction written back? |
uopNum | Number of uops needing write-back |
The Rob uses an 8-Bank read design, dividing into banks based on the low 3 bits of the robidx. For example, robBanks0 includes robidx (decimal): 0, 8, 16, 24, 32, ..., robBanks1 includes robidx (decimal): 1, 9, 17, 25, 33, ... Each 8 entries form a Line: 0-7, 8-15, 16-23, ... Each Bank has 20 Entries, for a total of 20 Lines. The Bank division diagram is shown below.
A one-hot Line pointer (20 bits) is used to read RobEntry data. Data from the current Line and the next Line (total 16 Entries) are read from the 8 Banks. After being updated with the current cycle's write-back information, one Line is selected from the two Lines and written to the 8 robDeqGroup registers (if all instructions in the first Line commit in the current cycle, the second Line is selected). During instruction commit, data is read from the 8 robDeqGroup registers for committing. hasCommitted (8 bits) indicates whether each instruction in the current line has been committed, serving as one of the conditions for other instructions to commit. allCommitted indicates that all instructions in the current line have committed, which is the control signal for switching the line pointer. When allCommitted is 1, the data from the second Line that was read is selected, meaning the updated data for the subsequent 8 entries is written to robDeqGroup.
Rob Enqueue: When the Rob can accept instructions, io_enq_canAccept is asserted high. At this point, Dispatch can send instructions to the Rob, up to 6 instructions. Upon receiving instructions, the Rob updates enqptr. It calculates dispatchNum based on the enqueue request and allocates enqptr. If no redirect occurs, enqptr is updated to enqptr + dispatchNum. If a redirect signal occurs, enqptr is set based on the redirect instruction's robidx (related to the redirect level). During enqueue, if an instruction needs move elimination, the writebackd signal is directly asserted high, allowing it to commit without write-back. If an instruction generates an exception during decode, its numWB is set to 0 in the rename stage, it is not dispatched to IQ, and it is marked as written back upon entering the Rob. It is particularly important to note that vector memory access instructions need to wait for all uops to write back before handling the exception. allocatePtrVec is the allocated 6 enqPtrs. The allocation condition is that the instruction is valid and is the first uop (either decoded or derived from Rob compression via the firstUop signal). canEnqueue (6 bits) indicates whether each instruction can enter the Rob: the instruction is valid, is the first uop, and the rob can accept it. uopNum records how many instructions the rob has compressed (corresponding to Rob compression) or how many uops (corresponding to vector instruction splitting). uopNum is updated during enqueue, and then decremented each time a uop writes back (multiple uops can write back in the same cycle). For store instructions, uopNum is set to 1, stdWritebacked is deasserted, std uops are not counted in uopNum, and stdWritebacked is asserted when they write back.
Rob Write-back: The control signals from Exu for writing back to the rob are pipelined by one cycle in the ctrlBlock. Because Rob compression can cause multiple Exus to write back to the same robidx, while being pipelined in the ctrlBlock, Rob compression calculation also occurs. Each Exu counts the number of Exus (among those that can be compressed together with it - certain Exus cannot have compression relationships, so no area or timing is wasted counting all of them) whose write-back robidx is the same as its own, and passes this count to the Rob via io_writebackNums.
Rob Commit: The instruction at the dequeue pointer position commits when the Rob state machine is in the idle state, the instruction is valid, all uops have written back, and blockCommit is deasserted. When the instruction at the dequeue position has an exception, blockCommit is asserted high, preventing the instruction from committing until the exception handling is complete, after which the instruction can be committed. commitValidThisLine indicates whether the 8 entries in the line pointed to by deqptr are commit-valid. The determination is based on the entry being valid, all uops of the entry having written back, the rob not enabling interrupts at this time, no exceptions in the dequeue instructions, no instructions needing reply in the dequeue instructions, not being blocked from committing by older instructions, and not having committed itself. Note the allowOnlyOneCommit situation: when there is an instruction with an exception among the 8 entries being dequeued or when interrupts are enabled, the rob allows only one instruction to commit per cycle.
Rob Dequeue: The Rob dequeues committed instructions, counts the number of committed entries, adds this count to the deqptr value to update the dequeue pointer, and deasserts the valid signal for the dequeued entries.
Rob State Machine: Has two states, s_idle and s_walk. State transitions primarily depend on redirect. s_idle: Normal state, can commit instructions. Returns to idle state after redirect and at least two cycles in the walk state. s_walk: Walk state, cannot commit instructions. Waits for all modules to finish walking and return to s_idle state. The state machine transition code is as follows.
/**
* state changes
* (1) redirect: switch to s_walk
* (2) walk: when walking comes to the end, switch to s_idle
*/
state_next := Mux(
io.redirect.valid || RegNext(io.redirect.valid), s_walk,
Mux(
state === s_walk && walkFinished && rab.io.status.walkEnd && vtypeBuffer.io.status.walkEnd, s_idle,
state
)
)
Rob Redirect and Snapshot: The Rob does not commit instructions in the same cycle that redirect valid is asserted. It switches the Rob read pointer based on the walk start address. The walk start address comes from two sources: snapshot and deqptr. The walk start address selects the older and closest position compared to the robidx of the instruction that issued the redirect. The snapshot saved in the Rob is a set of robidx values, based on the robidx of the first instruction during enqueue, plus 0, +1, +2, +3, +4, +5, +6, +7, for a total of 8 robidx values. The Rob snapshot is controlled by the snapshot inside the ctrlblock. The figure below shows an example of walkPtr selection.
walkPtr Update: If redirect is valid, if io_snpt_useSnpt is 1, select the corresponding snapshot based on io_snpt_snptSelect; if io_snpt_useSnpt is 0, select deqPtr. Note that walkptr must be aligned to the address of bank0. If redirect is invalid and the rob is in the walk state and walk is not finished, walkptr increments by 8 every cycle. Under other conditions, walkptr does not update. lastWalkPtr is the end point of the walk. It is determined by whether the instruction issuing the redirect is flushed or not. If it is flushed, lastWalkPtr is redirect_robidx - 1. If it is not flushed, lastWalkPtr is redirect_robidx. donotNeedWalk mechanism: In the first cycle of walking the 8 entries, instructions older than the robidx that issued the redirect do not need to walk. Judgment for walk end: When walkPtrTrue > lastWalkPtr, walkFinished is 1. walkPtrTrue is the walkPtr without considering Bank address alignment. When walkFinished is 1, the walk end information is passed to rab and vtypeBuffer. shouldWalkVec indicates whether the 8 entries should walk. The condition is instructions older than lastWalkPtr, combined with donotNeedWalk, to finally determine if they should walk.
When Redirect is valid, the rob cannot commit instructions in that cycle, and the walk pointer is updated to the walk start point (snapshot recovery or dequeue position). Note that the walk start address can only be a robidx corresponding to an entry in Bank0. The walk end position lastWalkPtr is recorded. In the next cycle, the state machine changes to the walk state, the read Bank pointer is updated to the position corresponding to the walk pointer, and the valid signal of instructions after the redirect in robEntry is deasserted to 0. In the cycle after that, information needed for walk (realDestSize) is taken from the 8 robDeqGroup and passed to rab, and isVset is accumulated and passed to VTypeBuffer. When the rob walks to lastWalkPtr, the rob's own walk stops, but the rob can only return to the idle state after rab and VTypeBuffer have also finished walking. Rab can walk up to 6 entries per cycle. VTypeBuffer can walk up to 8 entries per cycle.
Rob Exception Handling: Since all instructions after an instruction that causes an exception will not execute, the Rob only needs to save the oldest exception, which is achieved by the Rob exception generation module. The Rob internally only needs to check for exceptions among the instructions currently being committed. In the Rob's exception generation module, the enq signal (same cycle as Rob enqueue) is responsible for receiving exception information from the frontend and decode, corresponding to up to 6 instructions. The wb signal is responsible for receiving exception information written back from functional units (csr + fence + load + store + vload + vstore). The module needs to output the exception information corresponding to the oldest instruction. The current signal saves the current exception information. Instructions from enq are ordered, so a priorityMux can be used to get the oldest exception. Instructions from wb are out of order, so the robidx comparison method is needed to select the oldest exception. The exception handling module selects the oldest instruction in groups. In the first cycle, the oldest is selected within each group. In the second cycle, the oldest instruction is selected from the results of the first cycle. The oldest exception information obtained in the second cycle is compared with current. If current is younger, current is updated to the oldest exception information obtained in the second cycle. Specifically, for vector memory access write-back exceptions, their robidx is the same, but there are many uops. At this point, not only the oldest robidx needs to be compared, but also the vstart value to be set by the exception should be compared, keeping the exception information with the smaller vstart.
Rob Interrupt Handling: Interrupt handling is similar to exception handling. Interrupts come from the CSR module. For instructions that need to issue flushPipe and replayInst, they are currently also processed in ExceptionGen. The Rob handles them by first sending a flushOut to the ctrlBlock, which will return a redirect to flush the pipeline. The difference is that for branch prediction misses and memory access violations that cause redirect, obtaining the target is faster; a pc is read directly from pcMem and combined with ftqOffset to calculate the target and send it to the frontend. For interrupts and exceptions, the information needs to be sent to the CSR first, and the CSR returns the corresponding target before sending it to the frontend. Currently, interrupts are only responded to when the deqPtr is an instruction that is not a load, store, fence, csr, or vset.
When the wfi_enable signal is asserted high (from the CSR register, wait-for-interrupt enable), when a wfi instruction is enqueued in the Rob, hasWFI is asserted to 1. hasWFI asserts blockCommit to 1, blocking the rob from committing and thus pausing the pipeline to wait for an interrupt. When the csr receives an interrupt, io_csr_wfiEvent is asserted high, and hasWFI is deasserted to 0 (or it will be deasserted to 0 if no interrupt is received after a timeout of 1M cycles), and then the Rob can commit instructions normally.
Overall Design
Overall Block Diagram
Interface List
See interface documentation
Module Design
Level 2 Modules
Functionality
Overall Block Diagram
Interface List
See interface documentation