跳转至

Rename

  • Version: V2R2
  • Status: OK
  • Date: 2025/01/20
  • Commit: xxx

The Rename module receives instruction decode information from the Decode module, assigns robIdx and physical registers to instructions based on the decode information, and queries the corresponding physical registers via operands. At the same time, this module also maintains the state of the freeList based on instruction decode information, instruction commit information, and register release information from the RenameTable, and sends write requests to the RenameTable based on instruction decode information and instruction commit information to update the register mapping status during speculative execution. Furthermore, this module also handles redirection requests from the ROB, re-updating the state of the freeList based on the redirection information. After completing renaming, Rename sends the renamed instruction information to the Dispatch module.

Basic Functionality

Map logical registers to physical registers, assigning one physical register to each logical register in an instruction.

Register renaming maintains renaming-related tables or pointers. It maintains a mapping table from logical registers to physical registers, recording the physical register number most recently allocated for each logical register.

For integer, floating-point, and vector registers, it maintains physical register status tables with 224, 192, and 128 entries respectively, recording the state of physical registers (whether allocated or not), and using free physical register allocation pointers to record unallocated physical registers.

It maintains a mapping table (RenameTable, RAT) for committed logical registers to corresponding physical registers, recording the mapping relationship between logical registers and physical registers in the committed state.

It maintains a pointer for free physical register allocation in the committed state. Register renaming technology eliminates Write-After-Read (WAR) and Write-After-Write (WAW) dependencies between instructions, ensuring precise state restoration when an instruction execution causes an exception or a branch misprediction cancels subsequent instructions.

Rename Inputs

  • Inputs from the decode stage (the FusionDecoder modifies instructions output by the DecodeStage for macro-op fusion along the way, changing valid, uop, and other information based on combinations of adjacent instructions, and modifying the commit type CommitType of the fused instruction based on different combinations of ftqptr and ftqoffset from adjacent instructions).
  • Receive speculative rename data return from RAT.
  • Instruction fusion information, and modification of the decoded input instruction stream based on instruction fusion status.
  • ssit, waittable information.
  • Ctrlblock snapshot control information and enqueue/dequeue pointers.
  • RAB commit information.

Rename Outputs

  • To RAT: Write rename information.
  • To Dispatch: Pipelined output of renamed uop information: when dispatch recv is valid.
  • To snapshot: enqdata, allowing snapshot generation.

Integer Physical Register Allocation

After receiving valid integer instruction decode information from the Decode module, the Rename module determines whether a new integer physical register needs to be allocated based on the io_in_[0-5]_bits_rfWen signal and the io_in_[0-5]_bits_ldest signal. If rfWen is high and ldest is not 0, a new integer physical register needs to be allocated. If a new integer physical register needs to be allocated, an allocation request is sent to intFreeList, and the allocation result is obtained in the same cycle; otherwise, no allocation request is sent. In addition, the Rename module supports integer Move instruction elimination. If the decoded instruction is detected to be an integer Move instruction, no new integer physical register is allocated.

Floating-Point or Vector Physical Register Allocation

After receiving valid vector floating-point instruction decode information from the Decode module, the Rename module determines whether a new vector floating-point physical register needs to be allocated based on the io_in_[0-5]_bits_fpWen and io_in_[0-5]_bits_vecWen signals. If the fpWen or vecWen signal is high, a new floating-point or vector physical register needs to be allocated. If a new floating-point or vector physical register needs to be allocated, an allocation request is sent to fpFreeList or vecFreeList, and the allocation result is obtained in the same cycle; otherwise, no allocation request is sent.

Setting Source Operand Physical Registers (psrc)

If the instruction decode information from the Decode module includes source operands of integer or vector floating-point register type, then under normal circumstances, the Decode module will query the RenameTable for the corresponding physical register of the logical register one cycle in advance, and obtain the result of reading the speculative rename table in the Rename module one cycle later, after which the result is transmitted to the Dispatch module via io_out_[0-5]_bits_psrc_[0-4]. As an exception, if the destination operand of the previous instruction is the same as the source operand of the current instruction, the psrc of the current instruction should be set to the pdest of the previous instruction.

Setting Destination Operand Physical Registers (pdest)

If the instruction decode information from the Decode module indicates the presence of a destination operand (see and ), then under normal circumstances, the Rename module passes the newly allocated physical register to the Dispatch module via io_out_[0-5]_bits_pdest. As an exception, if the instruction is an integer Move instruction, the pdest of the current instruction should be set to the psrc of the current instruction.

Integer Instruction Commit

When an integer instruction commits, Rename sends a free signal to intFreeList based on the io_int_need_free_[0-5] and io_int_old_pdest_[0-5] information from RenameTableWrapper, thereby releasing the corresponding integer physical registers for use by new instructions. When io_int_need_free_[0-5] is high, it indicates that the io_int_old_pdest_[0-5] integer physical register for the corresponding channel needs to be released. Furthermore, Rename also sends the commit signal from RAB to intFreeList for maintaining the architectural state rename pointer.

Floating-Point or Vector Instruction Commit

When floating-point instruction commit information arrives from RAB, Rename sends a free signal to fpFreeList, combining the commit information from RAB and RenameTableWrapper, thereby releasing vector floating-point physical registers that are no longer in use for allocation to new instructions. If the io_rabCommits_info_[0-5]_fp/vecWen signal from RAB is high after one cycle, along with the io_rabCommits_isCommit and io_rabCommits_commitValid_[0-5] signals after one cycle (these two signals indicate that the current cycle is in a committed state and the commit signal for that channel is valid, see ), then it indicates that the io_fp/vec_old_pdest_[0-5] floating-point or vector register for the corresponding channel needs to be released. Furthermore, Rename also sends the commit signal from RAB to fpFreeList for maintaining the architectural state rename pointer.

Redirection

When the redirection signal is received from the io_redirect port, the freeList will pause physical register allocation and will restore the freeList's physical register allocation pointer to the architectural state or a snapshot state. Furthermore, the Rename module will no longer send write request signals to the RenameTable.

Re-Renaming

One cycle after the redirection signal arrives, the Rename module enters the re-renaming process. The re-renaming signal is received from RAB via the io_rabCommits port. During re-renaming, the Rename module will no longer output valid instruction signals to Dispatch, nor will it send write request signals to the RenameTable.

The Rename module sends re-renaming signals to intFreeList, fpFreeList, and vecFreeList via their respective io_walkReq_[0-5] ports. These re-renaming signals are the io_rabCommits_walkValid_[0-5], io_rabCommits_info_[0-5]_isMove, io_rabCommits_info_[0-5]_ldest, and io_rabCommits_info_[0-5]_rf/fp/vecWen signals from the RAB module. Only when io_rabCommits_isWalk is high are the signals received via io_walkReq_[0-5] valid.

For intFreeList, when io_rabCommits_walkValid_[0-5] is high, and the corresponding channel's io_rabCommits_info_[0-5]_rfWen is high, io_rabCommits_info_[0-5]_ldest is not 0, and io_rabCommits_info_[0-5]_isMove is low, a valid signal is sent to the corresponding io_walkReq_[0-5] port, meaning re-renaming is required.

For fpFreeList and vecFreeList, when io_rabCommits_walkValid_[0-5] is high, and the corresponding channel's io_rabCommits_info_[0-5]_fp/vecWen signal is high, a valid signal is sent to the corresponding io_walkReq_[0-5] port, meaning re-renaming is required.

robIdx Allocation

The Rename module is responsible for assigning a robIdx to each micro-instruction. This module maintains a robIdxHead internally. Under normal circumstances, the Rename module sequentially assigns consecutive robIdx values to the decoded instructions from Decode and increments robIdxHead. However, if the corresponding channel's io_in_[0-5]_bits_lastUop is low, or the io_out_needRobFlags_[0-5] for the corresponding channel from compressUnit is low, then the micro-instruction in the next channel will not be allocated a robIdx.

In the cycle when redirection occurs, this module resets robIdxHead to the redirection robIdx, and in the next cycle, decides whether to increment robIdxHead based on the value of io_redirect_bits_level.

Deciding on Rename Snapshot Generation

The Rename module is also responsible for deciding whether to generate a rename snapshot. Rename snapshots are intended to shorten the re-renaming time after redirection occurs. Rename snapshots are distributed across various modules such as RenameTable, RenameTable_1, RenameTable_2, intFreeList, fpFreeList, vecFreeList, Rob, Rab, CtrlBlock, etc., and the content stored in the snapshots varies. Therefore, a module is needed to tell each module when to generate a snapshot, and this module is Rename. Externally, Rename transmits the snapshot generation signal to other modules via io_out_*_bits_snapshot; internally, Rename also transmits the snapshot generation signal to intFreeList, fpFreeList, and vecFreeList.

There are several restrictions on the generation of rename snapshots. Firstly, the Rename module maintains a snapshot counter snapshotCtr internally; a snapshot can only be generated when this counter is 0. Secondly, if other snapshots currently exist, the robIdx assigned to the first micro-instruction renamed in the current cycle must differ from the robIdx of the most recently generated snapshot by 6, which is greater than the ROB commit width. Finally, the first micro-instruction renamed in the current cycle must be the first micro-instruction of its parent instruction, i.e., io_in_0_bits_firstUop must be high. Only when the above three conditions are met, and there is a branch instruction among the six micro-instructions being renamed in the current cycle, will a snapshot be generated. At this time, the io_out_*_bits_snapshot signals for the channels corresponding to branch instructions will be pulled high, and Rename will also notify its internal sub-modules of the snapshot generation signal.

The snapshot counter snapshotCtr is a counter that controls the interval between snapshot generations. Such a counter is implemented because snapshots taken too close together are meaningless and waste snapshot resources. The initial value of snapshotCtr is set to 4 times the RAB commit width, which is 4×8=32. If no valid rename snapshot currently exists, snapshotCtr will be set to 0; otherwise, snapshotCtr will decrement by n for every n micro-instructions renamed, until it reaches 0. After snapshotCtr reaches 0, if a rename snapshot is generated at some point, snapshotCtr will be reset to the maximum value minus the number of micro-instructions renamed in the current cycle, i.e., 32-PopCount(io_out_*_valid && io_out_*_ready).

Overall Block Diagram

Rename Overall Block Diagram

Interface Timing

Decode Input Interface Timing Diagram

Decode Input Interface Timing Diagram

此图 illustrates three examples of decoded inputs from decode. When the ready and valid signals are both high, the corresponding bits are received by the Rename module.

Rename Output Interface Timing Diagram

Rename Output Interface Timing Diagram

此图 illustrates three examples of renaming results. When the ready and valid signals are both high, the corresponding bits are sent by the Rename module to Dispatch.

Instruction Commit Logic Timing Diagram

Instruction Commit Logic Timing Diagram

此图 illustrates five instruction commit inputs from ROB. When io_rabCommits_isCommit is high and io_rabCommits_isWalk is low, io_rabCommits_info_*_* represents the instruction commit information. When io_rabCommits_commitValid_* is high, the corresponding io_rabCommits_info*_* passes valid instruction commit information into the Rename module. At the same time, io_*_old_pdest_* will pass the old physical register number to be released to the Rename module after a one-cycle delay, and will pass the signal indicating whether to release the integer physical register through the io_int_need_free_* port after another one-cycle delay.

Redirection and Re-Renaming Timing Diagram

Redirection and Re-Renaming Timing Diagram

此图 illustrates the relevant signals before and after redirection occurs. In the first two cycles, io_redirect_valid is low, and Rename is in its normal operating state, same as in 此图. After this, the io_redirect_valid signal is pulled high for one cycle, redirection arrives, relevant redirection information is sent from io_redirect_bits_*, and Rename will enter the re-renaming working state starting from the next cycle. In the following three cycles, io_rabCommits_isCommit is low, and io_rabCommits_info_*_* no longer sends commit information. In contrast, io_rabCommits_isWalk is high, indicating that io_rabCommits_info_*_* is sending re-renaming information, and Rename needs to perform re-renaming work. When io_rabCommits_walkValid_* is high, the re-renaming information from the corresponding io_rabCommits_info*_* is valid.

RenameTableWrapper

RenameTableWrapper is a wrapper module that internally contains the integer rename table RenameTable module, the floating-point rename table RenameTable_1 module, and the vector rename table RenameTable_2 module. This wrapper module, in addition to simply packaging the three rename tables, also handles commit and re-renaming related logic internally. RenameTableWrapper acts as a bridge communicating between the internal rename tables and external modules.

Reading the Speculative Rename Table

RenameTableWrapper has a total of 12 integer register read ports, 18 floating-point register read ports, and 30 vector floating-point register read ports. The integer register read ports are grouped 2 per group, the floating-point register read ports are grouped 3 per group, and the vector register read ports are grouped 5 per group, with 6 groups of read ports for each type. Integer register read ports are used to read the speculative mapping relationship from integer logical registers to integer physical registers, floating-point register read ports are used to read the speculative mapping relationship from floating-point logical registers to vector floating-point physical registers, and vector register read ports are used to read the speculative mapping relationship from vector logical registers to vector floating-point physical registers.

Reading from RenameTableWrapper is synchronous. This means that a read request sent at clock cycle T via io_(int/fp/vec)ReadPorts_*_*_addr will only yield the corresponding physical register for the logical register at clock cycle T from io_(int/fp/vec)ReadPorts_*_*_data at clock cycle T+1.

Reading from RenameTableWrapper has forwarding. If at clock cycle T, a read request is sent to a certain address and a write request is also sent to the same address simultaneously, then at clock cycle T+1, the value read will be the value written to that address at clock cycle T.

Reading from RenameTableWrapper has a hold feature. If at clock cycle T, the io_(int/fp/vec)ReadPorts_*_*_hold signal for a certain read port is high, then at clock cycle T+1, the value read will be the same as the value read at clock cycle T.

Writing the Speculative Rename Table in the Renaming Stage

RenameTableWrapper has a total of 6 integer register write ports, 6 floating-point register write ports, and 6 vector register write ports. These ports are used to write the speculative rename table during the renaming stage. Integer register write ports are used to update the speculative mapping relationship from integer logical registers to integer physical registers during the renaming stage. Floating-point register write ports are used to update the speculative mapping relationship from floating-point logical registers to vector floating-point physical registers during the renaming stage. Vector register write ports are used to update the speculative mapping relationship from vector logical registers to vector floating-point physical registers during the renaming stage.

Writing to RenameTableWrapper is synchronous. This means that a write request sent at clock cycle T via io_(int/fp/vec)RenamePorts_*_addr and io_(int/fp/vec)RenamePorts_*_data will only be readable at clock cycle T+1.

Writing to RenameTableWrapper is enabled. Only write requests where io_(int/fp/vec)RenamePorts_*_wen is high are valid.

Writing to RenameTableWrapper has priority. The larger the write channel number, the higher the priority, meaning that if two channels write to the same address, the final written result will be from the channel with the larger number.

Writing the Architectural Rename Table in the Commit Stage

RenameTableWrapper updates the architectural rename table by monitoring commit information from RAB. If the io_rabCommits_isCommit signal for a certain cycle is high, it indicates that a commit is in progress during that cycle. At this time, if a certain io_rabCommits_commitValid_* signal is high, it indicates that the commit signal for that port is valid. At this time, further examination of io_rabCommits_info_*_rfWen, io_rabCommits_info_*_fpWen, and io_rabCommits_info_*_vecWen is required. If io_rabCommits_info_*_rfWen is high, it indicates that the integer register needs to update the architectural rename table; if io_rabCommits_info_*_fpWen is high, it indicates that the floating-point register needs to update the architectural rename table; if io_rabCommits_info_*_vecWen is high, it indicates that the vector register needs to update the architectural rename table. In the above three cases, RenameTableWrapper will modify the entry at address io_rabCommits_info_*_ldest in the integer, floating-point, or vector architectural rename table to io_rabCommits_info_*_pdest.

Providing Physical Register Release Information in the Commit Stage

RenameTableWrapper provides physical register release information based on the state of writing the architectural rename table during the commit stage. This information includes the integer physical register number to be released io_int_old_pdest_* and its corresponding valid signal io_int_need_free_*, as well as the vector floating-point physical register number to be released io_(fp/vec)_old_pdest_*. These signals come directly from RenameTableWrapper's sub-modules and the physical registers will be released in the Rename module based on the instruction commit status.

Writing the Speculative Rename Table in the Re-Renaming Stage

RenameTableWrapper performs re-renaming by monitoring commit information from RAB. If the io_rabCommits_isWalk signal for a certain cycle is high, it indicates that re-renaming is in progress during that cycle. At this time, if a certain io_rabCommits_walkValid_* signal is high, it indicates that the re-renaming signal for that port is valid. At this time, further examination of io_rabCommits_info_*_rfWen, io_rabCommits_info_*_fpWen, and io_rabCommits_info_*_vecWen is required. If io_rabCommits_info_*_rfWen is high, it indicates that the integer register needs to be re-renamed; if io_rabCommits_info_*_fpWen is high, it indicates that the floating-point register needs to be re-renamed; if io_rabCommits_info_*_vecWen is high, it indicates that the vector register needs to be re-renamed. In the above three cases, RenameTableWrapper will modify the entry at address io_rabCommits_info_*_ldest in the integer, floating-point, or vector speculative rename table to io_rabCommits_info_*_pdest.

Maintenance of Rename Snapshots

RenameTableWrapper passes the external rename snapshot signals io_snpt_* to its sub-modules for the generation, release, flushing, and usage of rename snapshots.

Overall Block Diagram

RenameTableWrapper Overall Block Diagram

Interface Timing

Integer Read/Write Interface Timing Diagram (Floating-Point and Vector are similar)

Integer Read/Write Interface Timing Diagram

此图 illustrates the interface timing for integer read/write.

At time 2, io_intRenamePorts_0 wrote 73 to address 14. At the same time, io_intReadPorts_0_0 also initiated a read request to address 14, thus reading 73, the value written at time 2, at time 3.

At time 4, io_intRenamePorts_0 wrote 74 to address 4, and io_intRenamePorts_1 also wrote 75 to address 4. Consequently, when io_intReadPorts_0_0 issued a read request to address 4 at time 5, the value 75, written by io_intRenamePorts_1, was read at time 6.

At times 3 and 7, io_intReadPorts_0_0_hold is high. Therefore, the value read at time 4 is the same as the value read at time 3, which is 73, not the value at address 5. Similarly, the value read at time 8 is the same as the value read at time 7, which is 76, not the new value 77 written at time 7.

Re-Renaming and Commit Interface Timing Diagram

Re-Renaming and Commit Interface Timing Diagram

此图 illustrates the timing for two re-renaming and commit interfaces.

From time 1 to time 4, the io_rabCommits_isWalk signal is high, and the io_rabCommits_ioCommit signal is low, indicating the re-renaming state. At time 2, io_rabCommits_walkValid_0 is high, io_rabCommits_info_0_rfWen is low, and io_rabCommits_info_0_fpWen is high. Re-renaming interface 0 therefore writes 37 to address 0 of the floating-point speculative rename table. At time 3, both re-renaming interfaces wrote values to logical integer register 12. At this time, interface 1 has higher priority than interface 0, so 57 will be actually written to address 12 of the integer speculative rename table.

From time 5 to time 9, the io_rabCommits_isWalk signal is low, and the io_rabCommits_ioCommit signal is high, indicating the commit state. At time 7, io_rabCommits_commitValid_0 is high, io_rabCommits_info_0_rfWen is low, and io_rabCommits_info_0_fpWen is high. Commit interface 0 therefore writes 92 to address 18 of the floating-point architectural rename table.

RenameTable Supporting Move Elimination

The RenameTable supporting move elimination is used for the integer register rename table, with the module name RenameTable. It maintains the mapping relationship between logical integer registers and physical integer registers. It has 12 read speculative rename table ports, 6 write speculative rename table ports, and 6 write architectural rename table ports. Internally, it uses 32 registers of width 8 to actually maintain the mapping relationship. The behavior of the read and write ports is entirely consistent with the behavior described in RenameTableWrapper. It should be noted that for timing considerations, write speculative rename table requests at module time T0 are actually processed at time T1, and write speculative rename table data at time T0 is bypassed to the read speculative rename table result at time T1.

Secondly, the module also has 4 copies of the speculative rename table snapshot internally for fast recovery during redirection and re-renaming. These snapshots are stored in the sub-module SnapShotGenerator_3, with the name _snapshots_snapshotGen_io_snapshots_0/1/2/3_[0-31] in RenameTable. Snapshot generation, release, usage, and flushing are completely controlled by the external signals io_redirect and io_snpt_*.

The external redirection signal io_redirect and snapshot control signals io_snpt_* become the t1_redirect and t1_snap_* signals after one cycle delay. When the redirection signal t1_redirect is high, it checks if the t1_snpt_useSnpt signal is high. If the t1_snpt_useSnpt signal is low, the speculative rename table will be set to the architectural rename table. If t1_snpt_useSnpt is high, the speculative rename table will be set to _snapshots_snapshotGen_io_snapshots_[t1_snpt_snptSelect]_[0-31].

Furthermore, the module also outputs physical register release signals based on the write architectural rename table ports and the internal architectural rename table. If the write enable signal io_archWritePorts_n_wen for a write architectural rename table channel is low, then io_old_pdest_n in the next cycle will be 0. If the write enable signal is not zero, then io_old_pdest_n in the next cycle will be arch_table[io_archWritePorts_n_addr] in the current cycle. It should be additionally noted that io_old_pdest_n has bypassing. For cases where n>0, if there exists a channel with a smaller sequence number than n that wrote a certain value to the same logical register in the architectural rename table, then io_old_pdest_n in the next cycle should be set to this value, rather than arch_table[io_archWritePorts_n_addr]. For example, for 0<j1, if there exist multiple channels with sequence numbers smaller than n that wrote a certain value to the same logical register in the architectural rename table, then io_old_pdest_n in the next cycle should be set to the written value corresponding to the channel with the largest sequence number among these. For example, for 0<j<k<n, if io_archWritePorts_n_wen, io_archWritePorts_j_wen, and io_archWritePorts_k_wen are all high, and io_archWritePorts_n_addr == io_archWritePorts_j_addr == io_archWritePorts_k_addr, then io_old_pdest_n in the next cycle should be set to io_archWritePorts_k_data, rather than arch_table[io_archWritePorts_n_addr] or io_archWritePorts_j_data.

Physical register release signals also include the io_need_free_* signals. If the io_old_pdest_n signal for a certain channel in the current cycle is different from any entry in arch_table_*, then the io_need_free_n signal for that channel in the next cycle will be set high. It should be additionally noted that for cases where n>0, if there exists a channel with a smaller sequence number than n whose io_old_pdest_j signal is the same as io_old_pdest_n, then the io_need_free_n signal in the next cycle will not be set high. The reason for this phenomenon is move elimination. Here, io_freeReq_* comes from the RenameTable output signal io_need_free. As mentioned in the RenameTable module, the io_need_free signal of RenameTable may not be pulled high when there are identical physical registers in arch_table. It is precisely because move elimination causes different logical registers to share the same physical register that identical physical registers exist in different entries of the RenameTable.

Overall Block Diagram

RenameTable Overall Block Diagram

Interface Timing

RenameTable Supporting Move Elimination Read/Write Interface Timing Diagram

RenameTable Supporting Move Elimination Read/Write Interface Timing Diagram

RenameTable Not Supporting Move Elimination

The RenameTable not supporting move elimination is basically similar to , but it does not include the io_need_free_* signal. The floating-point register rename table RenameTable_1 and the vector register rename table RenameTable_2 use this type of rename table.

The floating-point register rename table RenameTable_1 maintains the mapping relationship between logical floating-point registers and physical vector floating-point registers. It has 18 read speculative rename table ports, 6 write speculative rename table ports, and 6 write architectural rename table ports. Internally, it uses 34 registers of width 8 to actually maintain the mapping relationship.

The vector register rename table RenameTable_2 maintains the mapping relationship between logical vector registers and physical vector floating-point registers. It has 30 read speculative rename table ports, 6 write speculative rename table ports, and 6 write architectural rename table ports. Internally, it uses 48 registers of width 8 to actually maintain the mapping relationship.

Interface Timing

RenameTable Not Supporting Move Elimination Read/Write Interface Timing Diagram

RenameTable Not Supporting Move Elimination Read/Write Interface Timing Diagram

StdFreeList

StdFreeList is instantiated as fpFreeList and vecFreeList in the Rename module. As mentioned in , , and , fpFreelist is responsible during renaming for receiving allocation requests for vector floating-point physical registers and returning the allocated free vector floating-point physical registers. During re-renaming, it is responsible for re-allocating vector floating-point physical registers based on re-renaming requests from RAB. During commit, it is responsible for releasing vector floating-point physical registers that are no longer in use and updating the architectural dequeue pointer.

Overall Block Diagram

StdFreeList Overall Block Diagram

Interface Timing

Free Register Allocation Timing Diagram

StdFreeList Free Register Allocation Timing Diagram

此图 illustrates the timing for free physical register allocation. At times 3, 5, and 6, io_redirect and io_walk are low, and io_doAllocate and io_canAllocate are high, resulting in the allocation of free physical registers. At time 3, io_allocateReq_[2-4] is high, and StdFreeList returns the allocated free physical register numbers 151, 112, and 143 via io_allocatePhyReg_[2-4] respectively. At time 5, io_allocatePhyReg_[0-2|5] returns the allocated free physical register numbers 127, 162, 163, and 144 respectively. At time 6, it returns 174, 182, and 179. For every n free physical registers successfully allocated, the internal headPtr of the module will increment by n.

Instruction Commit Timing Diagram

StdFreeList Instruction Commit Timing Diagram

此图 illustrates the instruction commit timing, where io_freeReq_* represents the io_freeReq signal for a certain lane, and io_freePhyReg_* represents the io_freePhyReg signal for the lane corresponding to io_freeReq_*. When both io_redirect and io_walk are low, if io_freeReq is high, StdFreeList will add the corresponding io_freePhyReg to the free queue.

Furthermore, in the cycle preceding io_freeReq_*, the Rename module also passes the RAB commit information to update the architectural dequeue pointer archHeadPtr. When io_commit_isCommit and the io_commit_commitValid_* signal for the corresponding channel are high, it indicates that the update signal for the corresponding channel is valid. At this time, if the io_commit_info_*_fpWen or io_commit_info_*_vecWen for the corresponding channel is high, it indicates that this channel will cause archHeadPtr to increment by one. If k channels satisfy the above conditions, archHeadPtr will increment by k.

Instruction Re-Renaming Timing Diagram

StdFreeList Instruction Re-Renaming Timing Diagram

此图 illustrates the instruction re-renaming timing. When io_redirect is pulled high for one cycle at time 1, io_walk will be pulled high for several cycles, indicating that the module has entered the re-renaming stage. At time 1, since io_snpt_useSnpt is low, headPtr will be restored to the value of archHeadPtr. This restoration does not happen immediately, but rather at time 2, after adding the number of high io_walkReq_* signals (2), yielding headPtrAllocate (5), and this value is written to headPtr at time 3. Subsequently, when io_walk is high, headPtrAllocate is set to the value of headPtr+PopCount(io_walkReq_*), and this value is written to headPtr in the next cycle.

The re-renaming process aims to eliminate the renaming state on the speculative execution incorrect path. This is achieved by first restoring headPtr to the architectural archHeadPtr state (or to a snapshot state when io_snpt_useSnpt is high), and then re-renaming up to the point before entering the incorrect path.

Key Circuit: Circular Queue

Free physical registers are maintained by a circular queue. This circular queue consists of a register file freeList (i.e., freeList_* in the code, let its size be size), and a head pointer headPtr (i.e., headPtr_* in the code) and a tail pointer tailPtr (i.e., tailPtr_* in the code). Here, headPtr is the dequeue pointer, and tailPtr is the enqueue pointer.

For ease of explanation, first consider a regular queue, where both headPtr and tailPtr are pointers to an element in freeList. During normal operation, tailPtr is always greater than or equal to headPtr, and the elements in the queue are {headPtr, headPtr + 1, ..., tailPtr - 1}. When an element needs to be enqueued, it is placed at freeList[tailPtr], and tailPtr is incremented by one. When an element is dequeued, freeList[headPtr] is taken out, and headPtr is incremented by one. When tailPtr is equal to headPtr, the queue is empty; when tailPtr is greater than headPtr, the queue is not empty.

Regular Queue

However, since freeList cannot be infinitely long, we designed a circular queue. A circular queue can be thought of as connecting the ends of a finite-length regular queue. At this point, tailPtr and headPtr can no longer be just pointers to an element in freeList: with the original design, when tailPtr equals headPtr, the circular queue could be either empty or full.

To solve this issue, we added a flag field to tailPtr and headPtr. This field is initially false, and its value is inverted every time the pointer wraps from freeList[size - 1] to freeList[0]. In this way, when the values are the same, if the flags are also the same, it indicates the circular queue is empty; if the flags are different, it indicates the circular queue is full.

The update timing of canAllocate: In the current cycle, freeRegCnt is calculated based on headPtr, tailPtr, freeReq, and allocateReq, and then this value is delayed by one cycle to obtain freeRegCntReg (which is effectively the size). When freeRegCntReg is greater than the decode width, canAllocate is set high and transmitted in the same cycle.

Circular Queue

MEFreeList

MEFreeList is instantiated as intFreeList in the Rename module. As mentioned in , , and , intFreelist is responsible during renaming for receiving allocation requests for integer physical registers and returning the allocated free integer physical registers. During re-renaming, it is responsible for re-allocating integer physical registers based on re-renaming requests from RAB. During commit, it is responsible for releasing integer physical registers that are no longer in use. Unlike StdFreeList, MEFreeList supports move instruction elimination. If an instruction is a move instruction, Rename will not set io_allocateReq_*_valid high, and therefore MEFreeList will not allocate a free physical register for it.

Overall Block Diagram

MEFreeList Overall Block Diagram

Interface Timing

Free Register Allocation Timing Diagram

MEFreeList Free Register Allocation Timing Diagram

此图 illustrates the timing for free physical register allocation. At times 3, 5, and 6, io_redirect and io_walk are low, and io_doAllocate and io_canAllocate are high, resulting in the allocation of free physical registers. At time 3, io_allocateReq_[2-4] is high, and MEFreeList returns the allocated free physical register numbers 151, 112, and 143 via io_allocatePhyReg_[2-4] respectively. At time 5, io_allocatePhyReg_[0-2|5] returns the allocated free physical register numbers 127, 162, 163, and 144 respectively. At time 6, it returns 174, 182, and 179.

Instruction Commit Timing Diagram

MEFreeList Instruction Commit Timing Diagram

此图 illustrates the instruction commit timing, where io_freeReq_* represents the io_freeReq signal for a certain lane, and io_freePhyReg_* represents the io_freePhyReg signal for the lane corresponding to io_freeReq_*. When both io_redirect and io_walk are low, if io_freeReq is high, StdFreeList will add the corresponding io_freePhyReg to the free queue.

Furthermore, in the two cycles preceding io_freeReq_*, the Rename module also passes the RAB commit information to update the architectural dequeue pointer archHeadPtr. When io_commit_isCommit and the io_commit_commitValid_* signal for the corresponding channel are high, it indicates that the update signal for the corresponding channel is valid. At this time, if the io_commit_info_*_rfWen for the corresponding channel is high, io_commit_info*__ldest is not 0, and io_commit_info*__isMove is low, it indicates that this channel will cause archHeadPtr to increment by one. If k channels satisfy the above conditions, archHeadPtr will increment by k.

When io_commit_commitValid_* is high in a certain cycle, the io_freeReq_* signal will not always be high two cycles later. The reason for this phenomenon is move elimination. Here, io_freeReq_* comes from the RenameTable output signal io_need_free. As mentioned in the RenameTable module, the io_need_free signal of RenameTable may not be pulled high when there are identical physical registers in arch_table. It is precisely because move elimination causes different logical registers to share the same physical register that identical physical registers exist in different entries of the RenameTable.

Instruction Re-Renaming Timing Diagram

MEFreeList Instruction Re-Renaming Timing Diagram

此图 illustrates the instruction re-renaming timing. When io_redirect is pulled high for one cycle at time 1, io_walk will be pulled high for several cycles, indicating that the module has entered the re-renaming stage. At time 1, since io_snpt_useSnpt is low, headPtr will be restored to the value of archHeadPtr. This restoration does not happen immediately, but rather at time 2, after adding the number of high io_walkReq_* signals (2), yielding headPtrAllocate (5), and this value is written to headPtr at time 3. Subsequently, when io_walk is high, headPtrAllocate is set to the value of headPtr+PopCount(io_walkReq_*), and this value is written to headPtr in the next cycle.

The re-renaming process aims to eliminate the renaming state on the speculative execution incorrect path. This is achieved by first restoring headPtr to the architectural archHeadPtr state (or to a snapshot state when io_snpt_useSnpt is high), and then re-renaming up to the point before entering the incorrect path.

CompressUnit

CompressUnit is used to determine which instructions can share the same ROB entry, i.e., can be compressed into the same ROB entry. This module receives output from the decode unit and obtains ROB compression information based on the decode output.

A channel is marked as ROB-compressible (canCompress_[0-5]) if and only if the decode information passed through this channel satisfies: the decode information for this channel is valid (io_in_[0-5]_valid), and there is no instruction fusion in this channel (!io_in_[0-5]_bits_commitType[2]), and there is no instruction splitting or it is the last micro-instruction of an instruction split (io_in_[0-5]_bits_lastUop), and there are no exceptions in this channel (io_in_[0-5]_bits_exceptionVec_* are all low), and this channel is marked as ROB-compressible (io_in_[0-5]_bits_canRobCompress).

CompressUnit outputs a flag io_out_needRobFlags_[0-5] for each channel indicating whether a ROB entry needs to be allocated. A channel's io_out_needRobFlags_[0-5] will be set high if and only if the canCompress_[0-5] for that channel is 0, or that channel is the one with the largest number within its consecutive group of canCompress_[0-5] signals that are 1.

CompressUnit outputs the number of instructions in the ROB entry for that channel, io_out_instrSizes_[0-5], for each channel. When the canCompress_[0-5] for a certain channel is 0, the io_out_instrSizes_[0-5] for that channel is 1. When the canCompress_[0-5] for a certain channel is 1, the io_out_instrSizes_[0-5] for that channel is the number of elements in the consecutive group of canCompress_[0-5] signals that are 1 to which it belongs.

CompressUnit outputs a channel mask io_out_masks_[0-5] for each channel, indicating the channels that share the same ROB entry with that channel. This signal has a bit width of 6, which is the same as the number of channels. When the canCompress_n for a certain channel is 0, io_out_masks_n[n] for that channel is 1, and bits other than io_out_masks_n[n] are 0. When the canCompress_n for a certain channel is 1, the indices of the bits that are 1 in io_out_masks_n are the numbers of the channels within the "consecutive group of canCompress_[0-5] signals that are 1" to which it belongs.

For example, if {canCompress_5, canCompress_4, canCompress_3, canCompress_2, canCompress_1, canCompress_0} == {1, 0, 0, 1, 1, 0}, then {io_out_needRobFlags_5, io_out_needRobFlags_4, io_out_needRobFlags_3, io_out_needRobFlags_2, io_out_needRobFlags_1, io_out_needRobFlags_0} == {1, 1, 1, 1, 0, 1}, {io_out_instrSizes_5, io_out_instrSizes_4, io_out_instrSizes_3, io_out_instrSizes_2, io_out_instrSizes_1, io_out_instrSizes_0} == {1, 1, 1, 2, 2, 1}, and {io_out_masks_5, io_out_masks_4, io_out_masks_3, io_out_masks_2, io_out_masks_1, io_out_masks_0} == {{1, 0, 0, 0, 0, 0}, {0, 1, 0, 0, 0, 0}, {0, 0, 1, 0, 0, 0}, {0, 0, 0, 1, 1, 0}, {0, 0, 0, 1, 1, 0}, {0, 0, 0, 0, 0, 1}}.

Overall Block Diagram

CompressUnit Overall Block Diagram

Interface Timing

This module is purely combinational logic; signals enter and exit in the same cycle.

SnapshotGenerator

As mentioned in , rename snapshots are distributed across various modules where incorrect renaming paths need to be eliminated after redirection occurs, to achieve the goal of accelerating re-renaming. For modules related to renaming, this sub-module exists in RenameTable, RenameTable_1, RenameTable_2, StdFreeList, and MEFreeList.

The specific snapshot data snapshots stored in different sub-modules varies. For RenameTable(*), each stores four copies of spec_table; for StdFreeList and MEFreeList, each stores four copies of headPtr.

The module internally maintains a pair of circular pointers, snptEnqPtr and snptDeqPtr. When io_redirect is low, if the snapshot storage is not full and io_enq is high, the module records the data from io_enqData_* into the snapshot storage at snapshots_[snptEnqPtr_value], sets snptValids[snptEnqPtr_value] to one, and then increments snptEnqPtr by one.

In contrast, when io_redirect is low, if io_deq is high, it indicates that the snapshot module needs to dequeue a snapshot. At this time, snptValids_[snptDeqPtr_value] will be set low, and then snptDrqPtr will be incremented by one.

When redirection occurs, the snapshot module flushes the internal snapshots based on the io_flushVec_* signal. Firstly, if io_flushVec_* is high, then the corresponding channel's snptValids_* will be set low. Secondly, snptEnqPtr will be rolled back to the position of the first snptValids_* entry that is set low.

The data stored in the snapshots is transferred to outside the module via the io_snapshots_[0-3]_* interface for restoration and use by various modules during redirection. Whether to use a snapshot during redirection and which snapshot to use is controlled by signals uniformly generated by CtrlBlock; this module only provides the snapshot data.