跳转至

Vector Segment Access Instruction Processing Unit VSegmentUnit

Function Description

The main body is an 8-item queue, where each item has a 128-bit address register, a 128-bit data register, an index/stride register, and registers for storing information such as the physical register number, write enable, and uopidx for different uops. In addition, there is a register for storing the decoded information of the entire instruction. An internal state machine controls the implementation of splitting according to segment order.

Comments integrated with the code are written in VSegmentUnit.scala. You can read the following text in conjunction with the comments and code to understand the relevant logic of the SegmentUnit.

When a Segment instruction is executed, the pipeline's out-of-order backend needs to ensure that: preceding instructions have all finished execution, subsequent instructions cannot enter the pipeline (similar to the waiting mechanism for atomic instructions), and at the same time, the uops of the instruction must enter the SegmentUnit according to the split order. Only then can the order of the Segment instruction within the SegmentUnit be guaranteed.

Feature 1: Segment Instruction Splitting

alt text

  • segmentIdx: The index of the segment, segmentIdx <= vl. Used to indicate which segment is currently being sent to, and also for selecting and merging data.
  • fieldIdx: The index of the field, used to indicate whether the current segment has finished being sent. fieldIdx < nfields.
  • fieldOffset: The relative offset of each element within the same segment, implemented as an accumulator incremented by 1.
  • segmentOffset: Used to record the offset between different segments. For stride instructions, it's an accumulator incremented by the stride value; for unit-stride, it's an accumulator incremented by nfield * eew; for index, it's the index register element corresponding to segmentIdx.
  • vaddr = baseaddr + (fieldIdx << eew) + segmentOffset

The figure above shows an example of queue pointer jumping, illustrating an example under the configuration of lmul=1, nf=2, vl=16. segmentIdx points to the current segment being split, and SplitPtr points to the split field register. In the figure above, segmentIdx is 0, splitPtr is 0. After splitting the first element of the first uop and performing memory access, SplitPtr + nf, performing memory access for the field1 element of segment0. After performing memory access for field2, the element access for the current segment ends, segmentIdx + 1, and simultaneously SplitPtr jumps to the register containing field0 of the next segment. When segmentIdx increments to 8, for the register group corresponding to field0, it is the first element of the next uop (corresponding to the second element in each field register in the figure above). When segmentIdx = 16, and after completing memory access for the field2 element, instruction execution finishes. For segment Index, there is also a pointer used to select the index register, which is implemented similarly to selecting different registers for the same field as described above.

Feature 2: fault only first Modifying VL Register uop Separate Writeback

For fault only first instructions, the VSegmentUnit does not use VfofBuffer to write back extra uops. Instead, it transitions to s_fof_fix_vl itself to write back the uop that modifies the VL register.

Feature 3: Support for Unaligned Memory Access in Segments

The VSegmentUnit instruction performs unaligned memory access itself independently, without needing the assistance of MisalignBuffer. The VSegmentUnit itself handles the splitting of unaligned instructions and the merging of data.

State Transition Diagram

alt text

State Description

State Description
s_idle Waiting for SegmentUnit uop to enter
s_flush_sbuffer_req Flush sbuffer
s_wait_flush_sbuffer_resp Waiting for Sbuffer and StoreQueue to be empty
s_tlb_req Query DTLB
s_wait_tlb_resp Waiting for DTLB response
s_pm Check execution permissions
s_cache_req Request reading DCache
s_cache_resp DCache response
s_misalign_merge_data Merge unaligned Load Data
s_latch_and_merge_data Merge the Data of each element into complete uop-grained Data
s_send_data Send data to Sbuffer
s_wait_to_sbuffer Waiting for the pipeline stage sending to Sbuffer to clear, i.e., truly sent to Sbuffer
s_finish This instruction execution is complete, start writing back to the backend at the uop granularity
s_fof_fix_vl fault only first instruction data uop has been written back, write back the uop modifying the VL register

Decoding Examples

Segment Unit-Stride/Stride

Unit-stride is processed as a stride instruction with stride = eew * nf. The offset register used by this type of instruction is a scalar register. The number of uops depends on the number of data registers, so the number of uops split = emul * nf For example, if emul = 2, nf = 4, the uop numbering is as follows: uopIdx = 0, base address rs1, stride rs2, destination register vd uopIdx = 1, base address rs1, stride rs2, destination register vd+1 uopIdx = 2, base address rs1, stride rs2, destination register vd+2 ...... uopIdx = 7, base address rs1, stride rs2, destination register vd+7

Segment Index

  • The number of splits is: Max(lmul * nf, emul). It must be ensured that the splitting starts sequentially from the register group of the first field.

  • For example: emul=4, lmul=2, nf=2, the uop splitting is as follows:

    • uopidx=0, base address src, offset vs2, destination register vd
    • uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
    • uopidx=2, base address (dontCare), offset vs2+2, destination register vd+2
    • uopidx=3, base address (dontCare), offset vs2+3, destination register vd+3
  • Another example: emul=2, lmul=1, nf=3, the uop splitting is as follows:

    • uopidx=0, base address src, offset vs2, destination register vd
    • uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
    • uopidx=2, base address (dontCare), offset (dontCare), destination register vd+2
  • Another example: emul=8, lmul=1, nf=8, the uop splitting is as follows:

    • uopidx=0, base address src, offset vs2, destination register vd
    • uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
    • uopidx=2, base address (dontCare), offset vs2+2, destination register vd+2
    • uopidx=3, base address (dontCare), offset vs2+3, destination register vd+3
    • uopidx=4, base address (dontCare), offset vs2+4, destination register vd+4
    • uopidx=5, base address (dontCare), offset vs2+5, destination register vd+5
    • uopidx=6, base address (dontCare), offset vs2+6, destination register vd+6
    • uopidx=7, base address (dontCare), offset vs2+7, destination register vd+7

Main Ports

Direction Description
in In Receives uop issue from Issue Queue
uopwriteback In Writes back the finished uop to the backend
rdcache In/Out DCache request/response
sbuffer Out Write Sbuffer request
vecDifftestInfo Out Information required for DifftestStoreEvent in sbuffer
dtlb In/out Read/Write DTLB request/response
pmpResp In Receives access permission information from PMP
flush_sbuffer Out Flush sbuffer request
feedback Out Feedback to Issue Queue module
redirect In Redirect port
exceptionInfo Out Outputs Exception information, participating in the arbitration for writing back exception information in MemBlock
fromCsrTrigger In Receives Trigger-related data from CSR

Interface Timing

The interface timing is relatively simple, only text descriptions are provided. | | Description | | :-------------- | :--------------------------------------------------------------- | | in | Has Valid, Ready. Data is valid when Valid && ready is true | | uopwriteback | Has Valid, Ready. Data is valid when Valid && ready is true | | rdcache | Has Valid, Ready. Data is valid when Valid && ready is true | | sbuffer | Has Valid, Ready. Data is valid when Valid && ready is true | | vecDifftestInfo | Valid simultaneously with the sbuffer port | | dtlb | Has Valid, Ready. Data is valid when Valid && ready is true | | pmpResp | Has Valid, Ready. Data is valid when Valid is true | | flush_sbuffer | Has Valid. Data is valid when Valid is true | | feedback | Has Valid. Data is valid when Valid is true | | redirect | Has Valid. Data is valid when Valid is true | | exceptionInfo | Has Valid. Data is valid when Valid is true | | fromCsrTrigger | Does not have Valid. Data is always considered valid, corresponding signal generates immediate response |