跳转至

IssueQueue

  • Version: V2R2
  • Status: OK
  • Date: 2025/01/20
  • commit: xxx

Design Specifications

  • Supports 4 different types of issue queue modules to accommodate Scalar Integer, Vector Floating Point, Scalar Memory Access, and Vector Memory Access instructions.
  • Supports two enqueue ports and two dequeue ports.
  • Supports speculative wakeup signal generation.
  • Supports speculative wakeup signal register replication.
  • Supports early detection of instruction writeback conflicts.
  • Supports selecting the oldest ready instruction for issue.

Functionality

The issue queue module serves as the starting point for processor out-of-order scheduling, connecting the preceding Dispatch pipeline stage with the subsequent DataPath pipeline stage. In a superscalar out-of-order processor, to achieve correct out-of-order execution of instructions, the dependencies between instructions must be handled correctly. The key to determining if an instruction can be correctly executed is the ready status of each instruction's source operands. A source operand enters the ready state only after the preceding instruction it depends on has finished execution. The IQ receives up to two instructions dispatched from the Dispatch stage. The source operands of these instructions may not yet be ready, so the instructions are temporarily stored inside the IQ. The IQ continuously monitors wakeup signals. Wakeup signals change the status of their corresponding source operands from not ready to ready. Each cycle, the IQ selects up to two instructions whose operands are all ready and issues them to the subsequent DataPath pipeline stage, following an oldest-first strategy. Through this method, the issue queue ensures that all inter-instruction dependencies are satisfied during out-of-order execution and maximizes the performance of out-of-order scheduling as much as possible.

Instruction Enqueue

The instruction enqueue logic for the 4 different types of issue queues is fundamentally similar, with minor differences only due to variations in some signals. The issue queue instantiates an Entries module internally responsible for instruction storage. Generally, the issue queue supports two enqueue ports, meaning it can receive up to 2 valid instructions from the previous pipeline stage per cycle. Correspondingly, the Entries module also supports two enqueue ports. The instruction enqueue process involves entering through the IQ's input, selecting key signals, and sending them to the Entries' input. During this process, intrinsic instruction signals such as robIdx and fuType are passed directly without extra processing. Signals indicating instruction status, such as srcState, are initialized with values through some combinational logic before entering Entries. Refer to the Entries interface documentation for detailed information on each signal. This issue queue supports simultaneous wakeup upon instruction enqueue. For timing considerations, the wakeup logic is not implemented directly before instructions enter Entries. Instead, input wakeup signals are sent into Entries and then registered synchronously before triggering the wakeup.

Instruction Age Relationship Maintenance

To implement the oldest-first strategy for instruction issue selection, the issue queue needs to record and process the age of instructions residing within Entries every cycle. The issue queue internally instantiates several AgeDetector modules to perform this function. Corresponding to the three types of Entries, the issue queue needs to instantiate up to 3 AgeDetectors. Each age matrix can simultaneously receive age queries from multiple dequeue ports and return the oldest entry among the queries. The AgeDetector receives the enqueue status of the three types of entries from Entries each cycle, and is responsible for maintaining the age relationships of all instructions. Intuitively, when an instruction is enqueued, its age is necessarily the youngest among all instructions. The issue queue maintains instruction age relationships through signal transmission from Entries to AgeDetector, and applies the oldest-first strategy during the instruction issue selection stage by reading from the AgeDetector.

Instruction Issue Selection

The issue queue classifies Entries into up to three types and supports transferring instruction storage locations between these Entry types. There is a strict age relationship between the three types of Entries. Therefore, instruction issue selection is performed in parallel for the three Entry types, followed by a 3-to-1 selection to pick the oldest instruction to issue. For timing considerations and to satisfy the requirement of two dequeue ports, the design prioritizes satisfying the needs of the first dequeue port. The functionality is implemented as follows: based on the three Detectors corresponding to the EnqEntry/SimpleEntry/ComplexEntry types, three oldest issuable instructions are selected respectively. Then, based on the strict age relationship between the three Entry types, one is finally selected according to the priority order of Complex > Simple > Enq and sent to the first dequeue port. For the second dequeue port, it depends on the configuration of the exit FU. If the FUs of the two exits are different, then the instructions they can dequeue will not overlap, and the second port also selects the oldest instruction using the same method as the first port. If the FUs of the two exits are the same, overlap may occur. In this case, after masking off the selection result of the first port, a valid instruction is selected "randomly". In the current IQ configuration, the FUs of the two ports are different (or there is only one exit), so the case of "random" selection does not exist, and the specific process of "random" selection is not elaborated here.

Speculative Wakeup Signal Generation

The issue queue is responsible for managing instructions, which includes not only managing when an instruction is issued but also informing other instructions when they can be issued. The latter is achieved through speculative wakeup. Under normal circumstances, if a source operand of an instruction depends on the writeback value of a preceding instruction, the current instruction's source operand can only be set to the ready state when the preceding instruction writes back. To improve the performance of out-of-order instruction execution, if the preceding instruction has a fixed execution latency, its writeback time can actually be determined as soon as it is issued from the issue queue. Accordingly, the issue queue can generate a speculative wakeup signal at a certain time. As long as the speculatively woken up instruction obtains its source operand no earlier than the time the preceding instruction gets its result, instruction out-of-order execution can be accelerated through forwarding, bypassing, etc. In the issue queue, for non-memory access IQs, the module responsible for generating speculative wakeup signals is the WakeupQueue. When an instruction is selected for issue, it also enters the WakeupQueue. According to its execution latency, it enters a different shift pipeline. For 0 latency, speculative wakeup is generated one cycle later; for 2 latency, it's three cycles later, and so on. Through this method, speculative wakeup signals can be generated. For memory access IQs, their wakeup signals are received from the memory access unit through the unique loadWakeUp interface. After being registered for one cycle, they are treated as the IQ's own wakeup signals and then broadcast to other IQs via the same interface used by WakeupQueue.

Early Detection of Writeback Conflicts

Writeback conflicts are divided into two parts. The first part is the writeback conflict within the IQ's own exits. Each issue queue dequeue port corresponds to an EXU. Each EXU may contain a set of FUs with varying execution latencies, but each EXU has only one writeback port (referring to the writeback port to the ROB, distinct from the register file write port; one or more EXU writeback ports share one register file write port). For example, a combination of Alu and Mul FUs can cause 0 latency and 2 latency writeback conflict issues. If a 0 latency instruction is issued two cycles after a 2 latency instruction is issued, they will both complete execution simultaneously, causing an FU writeback conflict. To avoid such situations, the issue queue instantiates a fuBusyTable module internally to determine FU conflicts. The fuBusyTable operates on a per-dequeue port basis, recording the instructions issued by each dequeue port per cycle and their respective feedback signals to update the recorded values. The selection and issue of subsequent instructions also need to refer to the records in this module to avoid FU writeback conflicts. The second part is the writeback conflict at the register file. Since the register file has limited write ports, the writeback ports of multiple EXUs may share a single register file write port, potentially causing write port conflicts between issue queues. Similarly, the issue queue instantiates intWbBusyTable and vfWbBusyTable. Whenever an instruction is issued, corresponding write logic is generated and sent to the external WbFuBusyTable module for OR processing among shared write ports, resulting in the final WbFuBusyTable. During instruction issue selection, the WbFuBusyTable is read from the external module based on the write port and input into the IQ as a reference for selection.

Overall Block Diagram

Schematic Diagram

Interface Timing Diagram

Schematic Diagram

Secondary Module: WakeupQueue

This is a key module that controls each IQ to issue speculative wakeup signals. For non-memory access IQs serving as wakeup sources, each of their dequeue ports corresponds to one WakeupQueue module. Inside the module, it consists of several pipelines. The number of pipelines is directly related to the FU Latency corresponding to the dequeue port. If a dequeue port corresponds to FUs with several types of latencies, there will be a corresponding number of pipelines.

Overall Block Diagram

Schematic Diagram

Interface Timing Diagram

Schematic Diagram

Secondary Module: AgeDetector

This module is an age matrix module that maintains the age order of instructions within each Entry of the issue queue. The issue queue has up to three types of Entries, and for each type of Entry, their AgeDetector modules are independent. This module uses matrix registers to represent the age relationships between Entries. The rows and columns of the matrix are equal to the number of Entries. Below is an example using SimpleEntry with 6 entries:

Schematic Diagram

Overall Block Diagram

Schematic Diagram