跳转至

IssueQueueEntries

  • Version: V2R2
  • Status: OK
  • Date: 2025/01/20
  • Commit: xxx

Glossary

Glossary
Abbreviation Full Name Description
IQ IssueQueue Issue Queue

Design Specifications

  • Supports three types of issue queue entries: EnqEntry, SimpleEntry, and ComplexEntry
  • Supports dual-port read and write
  • Supports writeback wakeup and speculative wakeup
  • Supports direct dequeue of EnqEntry
  • Supports instruction transfer between Entries
  • Supports wakeup cancellation feedback

Function

Overall Function

Entries are modules within the issue queue that store uops. Internally, they contain multiple entry modules, each capable of holding one uop. These entries can be broadly categorized into two types: EnqEntry, corresponding to the issue queue's enqueue ports, and OthersEntry, which are more numerous. Entries consolidate the issue and status information of all entries and pass it to the issue queue control logic; they receive the selection results from the control logic and output all information for the uops to be issued. Entries receive wakeup signals from the IQ (either the local IQ or fast wakeup from other IQs) and WriteBack (writeback wakeup), cancellation signals from the datapath, etc. (og0Cancel, og1Cancel, etc.), and receive and integrate feedback signals after issue, which are sent to each entry. Entries are also responsible for the logic of transfers between entries. EnqEntry receives uops from the IQ entry port. If OthersEntry is ready, it will transfer to an OthersEntry according to certain rules. EnqEntry supports simultaneously transferring out the previous uop and enqueuing the next uop in the same cycle, achieving seamless connection. In advanced issue queue configurations, OthersEntry is further divided into two types: SimpleEntry and ComplexEntry. Entries are also responsible for controlling the transfer policy from SimpleEntry to ComplexEntry.

Transfer Policy

ComplexEntry is the final entry type and cannot be transferred; SimpleEntry can transfer to ComplexEntry; EnqEntry can transfer to ComplexEntry or SimpleEntry. Only entries that have not been issued can be transferred. If an issued entry receives feedback that the issue failed, the issued flag is cleared, making it a transferable entry. If an issued entry receives feedback that the issue succeeded, the entry becomes invalid and does not need to be transferred further. The transfer logic from EnqEntry to OthersEntry. EnqEntry prioritizes transferring to ComplexEntry, followed by SimpleEntry. Transfers must be all or nothing; either all transfer to ComplexEntry, or all transfer to SimpleEntry, or no transfer occurs. The condition for EnqEntry transferring to ComplexEntry is that there are enough free entries in ComplexEntry and SimpleEntry is completely empty; otherwise, it can only transfer to SimpleEntry. The transfer logic from SimpleEntry to ComplexEntry. Each cycle, SimpleEntry can transfer up to num_enq (equivalent to the number of EnqEntries) entries to ComplexEntry. As long as ComplexEntry has one free slot, one entry can be transferred. The transfer priority of SimpleEntry is higher than that of EnqEntry. There is a strong requirement for the transfer order of SimpleEntry; older entries have higher transfer priority. The age order of entries is obtained by querying the age matrix within the IQ.

Diagram

Issue and Dequeue

Entries collect the valid and canIssue signals from each entry and pass them to the IQ. The IQ returns the selection results for the entries to be dequeued at each dequeue port (deqSelOH), as well as whether each port can accept (deqReady) the entry. Currently, deqReady is a constant value, always asserted high. When both are valid, the entry is considered to be dequeued, and the deqSel signal is passed to that entry. Upon receiving deqSel, the entry is not immediately cleared but is marked as issued, recording the issue port and the number of cycles elapsed since issue. Afterward, it must wait for subsequent resp signals after issue. The entry is cleared only after receiving a successful issue resp. Entries are responsible for aggregating all resp signals and passing the corresponding resp to the entry. For non-memory access IQs, the resp signals for entries are only og0resp and og1resp, selected based on the entry's dequeue port and the cycles elapsed since issue. When the entry's robIdx matches the resp's robIdx, the corresponding resp is passed to the entry. Memory access IQs have more resp signals, and the resp signals for different memory access IQs may differ. They need to compare lqidx and sqidx to select the resp. During issue, the uop information of the selected entry is also passed to the IQ. Due to timing reasons, deqSelOH is not used directly for selection. The bits of deqSelOH have significant differences in arrival time. To reduce latency, the IQ passes in the selection results from each stage, including the results of enqEntryOldest, simpEntryOldest, and compEntryOldest. These three sets of signals are used to select the corresponding dequeued uops separately, and then the final dequeued uop is selected based on the priority of comp, simp, and enq.

Wakeup and Cancellation

Entries do not handle the wakeup logic themselves but pass the wakeup and cancellation signals into all entries. Due to timing reasons, Entries are also responsible for handling the cancellation logic within the same cycle. The source of cancellation has a longer latency. If it were to go through normal wakeup, cancellation, and then selection by the IQ for dequeue, the timing would be too poor. Therefore, only the results after same-cycle wakeup are given to the IQ for dequeue selection, and then Entries separately calculate same-cycle cancellation, finally applying the cancellation check to the uops selected for dequeue at each port.

Overall Block Diagram

Diagram

Interface Timing

Diagram

The io_* signal group is for instructions entering the IQ, up to two per cycle, accompanied by possible wakeup signals. Considering timing, for the case where an incoming instruction is simultaneously awakened, the approach is to delay the wakeup by one cycle, as shown in the figure as enqDelay_wakeup. To align with the previous cycle, this part of the wakeup will have bypass timing similar to speculative wakeup, affecting srcStateNext, which in turn affects canIssueBypass, similar to the same-cycle wakeup and issue of ComplexEntry.

Secondary Modules EnqEntry & OthersEntry

Function

EnqEntry and OthersEntry have basically the same function. EnqEntry has an additional layer of handling for enqueue wakeup because it is directly connected to the enqueue port; the rest of the functions are the same, so they are described together. An Entry has these most important functions: valid, canIssue, issued, status. Valid indicates whether the entry is valid. When a uop enters the entry, the uop information from enq is written into registers, and valid is set to valid. When one of these three conditions is met: flush or tranSel is valid, or issueResp indicates successful issue, the entry is cleared, and valid is set to invalid. When all source operands are ready and the entry is not issued, canIssue is output as valid. Status is a series of information describing the state of source operands, including source operand type (srcType), state (srcState), data source (dataSources), load information that wakes up this operand (srcLoadDependency), EXU information that wakes up this operand (srcWakeUpL1ExuOH), and cycle counter after wakeup (srcTimer). wakeUpFromWB and wakeUpFromIQ pass the pdest to be woken up and the register type xp, fp, vp. If the pdest number matches the entry's operand register number, and the register type also matches, this operand is woken up and marked as ready. og0Cancel and og1Cancel pass the EXU number to be cancelled. For ogCancel, if the EXU to be cancelled matches the EXU that woke up this operand, and srcTimer corresponds to the pipeline stage latency of the issued operation, then this operand is cancelled. For ldCancel, if the cancelled load pipeline stage matches srcLoadDependency, this operand is cancelled. When simultaneous wakeup and cancellation occur for the same operand, cancellation has higher priority. The source operand status information output by the Entry has two types: immediate and delayed, corresponding to fast and slow wakeup. Immediate means the source operand status information is obtained from the register, then updated by the above wakeup and cancellation logic, and output immediately in the same cycle. Delayed means the source operand status information is updated by the above wakeup and cancellation logic, written back to the register, and can only be output from the register in the next cycle. WB wakeup is always slow, while IQ wakeup can be configured as fast or slow. Those configured as fast are called ComplexEntry, and those configured as slow are called SimpleEntry. EnqEntry can theoretically also be configured, but in practice, it is always fast. The difference between EnqEntry and OthersEntry is the additional handling of enqueue wakeup. Due to timing, wakeup and cancellation during enqueue are difficult to perform before writing to EnqEntry. Therefore, they are delayed to the beginning of the cycle after writing to EnqEntry. First, the delayed wakeup and cancellation signals (enqDelay) are used to update the state directly output from the register, followed by normal wakeup and cancellation. Note that enqueue wakeup only occurs in the first cycle the uop enters EnqEntry; thereafter, the state directly output from the register is used.

Summary: 1. An Entry is a structure within the IssueQueue that stores key information about a uop, analogous to an RS (Reservation Station) entry. 2. The standard design specification for the integer IssueQueue in Kunming Lake has 24 Entry items. 3. Entries are classified into three types based on their behavior logic: EnqEntry, SimpleEntry, and ComplexEntry. 4. 2 EnqEntries serve as enqueue ports; the two instructions entering the IQ per cycle can only be stored here. 5. 6 SimpleEntries + 16 ComplexEntries.

Overall Block Diagram

Diagram

Diagram

imm stores the immediate value, payload stores the original instruction information; the entry does not process them.

Diagram

srcStatus indicates the state of each source operand for each uop. issued indicates the issue status of the uop. Since issue can succeed or fail, the validReg can only be modified upon successful issue, so issued is used to mark whether the uop is currently being issued.

Diagram

The existence of issueTimer and deqPortIdx is to accommodate the entry transfer mechanism. After an instruction is issued, it must pass through OG0 and OG1 stages. A uop is considered successfully issued only when it passes through OG1 and enters an EXU. If it fails midway, the IQ needs to be notified to re-issue. Without the transfer mechanism, the uop could be located by entryIdx. With the transfer mechanism, after a uop is issued, it might transfer to another location in the next cycle, making it difficult for the OG0/1 resp signals to locate it. Therefore, issueTimer and deqPortIdx signals are added. Once a uop is issued, issueTimer is modified and increments each cycle, and deqPortIdx records which dequeue port it was issued from. According to the timing relationship in the figure above, the OG0 and OG1 resp signals only need to recognize these two signal values within each Entry to locate the uop.

Diagram

Wakeup --> Modify srcState srcWakeupL1ExuOH --> Marks which EXU the speculative wakeup signal is from

Diagram

Writeback wakeup is issued in the last cycle of uop execution. Entries woken up by writeback do not support same-cycle wakeup and issue.

Diagram

dataSource is used in speculative wakeup scenarios. Writeback wakeup directly sets to reg. Speculative same-cycle wakeup --> forward. Modifies once for each additional cycle stayed, finally maintained as reg. forward --> bypass --> reg --> reg

Diagram

srcLoadDependency is 3 bits, used to record the Load dependencies of each uop. When ldCancel occurs, it flushes all uops in the wakeup chain.