Vector Segment Access Instruction Processing Unit VSegmentUnit
Function Description
The main body is an 8-item queue, where each item has a 128-bit address register, a 128-bit data register, an index/stride register, and registers for storing information such as the physical register number, write enable, and uopidx for different uops. In addition, there is a register for storing the decoded information of the entire instruction. An internal state machine controls the implementation of splitting according to segment order.
Comments integrated with the code are written in VSegmentUnit.scala. You can read the following text in conjunction with the comments and code to understand the relevant logic of the SegmentUnit.
When a Segment instruction is executed, the pipeline's out-of-order backend needs to ensure that: preceding instructions have all finished execution, subsequent instructions cannot enter the pipeline (similar to the waiting mechanism for atomic instructions), and at the same time, the uops of the instruction must enter the SegmentUnit according to the split order. Only then can the order of the Segment instruction within the SegmentUnit be guaranteed.
Feature 1: Segment Instruction Splitting
- segmentIdx: The index of the segment, segmentIdx <= vl. Used to indicate which segment is currently being sent to, and also for selecting and merging data.
- fieldIdx: The index of the field, used to indicate whether the current segment has finished being sent. fieldIdx < nfields.
- fieldOffset: The relative offset of each element within the same segment, implemented as an accumulator incremented by 1.
- segmentOffset: Used to record the offset between different segments. For stride instructions, it's an accumulator incremented by the stride value; for unit-stride, it's an accumulator incremented by nfield * eew; for index, it's the index register element corresponding to segmentIdx.
- vaddr = baseaddr + (fieldIdx << eew) + segmentOffset
The figure above shows an example of queue pointer jumping, illustrating an example under the configuration of lmul=1, nf=2, vl=16. segmentIdx points to the current segment being split, and SplitPtr points to the split field register. In the figure above, segmentIdx is 0, splitPtr is 0. After splitting the first element of the first uop and performing memory access, SplitPtr + nf, performing memory access for the field1 element of segment0. After performing memory access for field2, the element access for the current segment ends, segmentIdx + 1, and simultaneously SplitPtr jumps to the register containing field0 of the next segment. When segmentIdx increments to 8, for the register group corresponding to field0, it is the first element of the next uop (corresponding to the second element in each field register in the figure above). When segmentIdx = 16, and after completing memory access for the field2 element, instruction execution finishes. For segment Index, there is also a pointer used to select the index register, which is implemented similarly to selecting different registers for the same field as described above.
Feature 2: fault only first Modifying VL Register uop Separate Writeback
For fault only first instructions, the VSegmentUnit does not use VfofBuffer to write back extra uops. Instead, it transitions to s_fof_fix_vl itself to write back the uop that modifies the VL register.
Feature 3: Support for Unaligned Memory Access in Segments
The VSegmentUnit instruction performs unaligned memory access itself independently, without needing the assistance of MisalignBuffer. The VSegmentUnit itself handles the splitting of unaligned instructions and the merging of data.
State Transition Diagram
State Description
State | Description |
---|---|
s_idle | Waiting for SegmentUnit uop to enter |
s_flush_sbuffer_req | Flush sbuffer |
s_wait_flush_sbuffer_resp | Waiting for Sbuffer and StoreQueue to be empty |
s_tlb_req | Query DTLB |
s_wait_tlb_resp | Waiting for DTLB response |
s_pm | Check execution permissions |
s_cache_req | Request reading DCache |
s_cache_resp | DCache response |
s_misalign_merge_data | Merge unaligned Load Data |
s_latch_and_merge_data | Merge the Data of each element into complete uop-grained Data |
s_send_data | Send data to Sbuffer |
s_wait_to_sbuffer | Waiting for the pipeline stage sending to Sbuffer to clear, i.e., truly sent to Sbuffer |
s_finish | This instruction execution is complete, start writing back to the backend at the uop granularity |
s_fof_fix_vl | fault only first instruction data uop has been written back, write back the uop modifying the VL register |
Decoding Examples
Segment Unit-Stride/Stride
Unit-stride is processed as a stride instruction with stride = eew * nf. The offset register used by this type of instruction is a scalar register. The number of uops depends on the number of data registers, so the number of uops split = emul * nf For example, if emul = 2, nf = 4, the uop numbering is as follows: uopIdx = 0, base address rs1, stride rs2, destination register vd uopIdx = 1, base address rs1, stride rs2, destination register vd+1 uopIdx = 2, base address rs1, stride rs2, destination register vd+2 ...... uopIdx = 7, base address rs1, stride rs2, destination register vd+7
Segment Index
-
The number of splits is: Max(lmul * nf, emul). It must be ensured that the splitting starts sequentially from the register group of the first field.
-
For example: emul=4, lmul=2, nf=2, the uop splitting is as follows:
- uopidx=0, base address src, offset vs2, destination register vd
- uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
- uopidx=2, base address (dontCare), offset vs2+2, destination register vd+2
- uopidx=3, base address (dontCare), offset vs2+3, destination register vd+3
-
Another example: emul=2, lmul=1, nf=3, the uop splitting is as follows:
- uopidx=0, base address src, offset vs2, destination register vd
- uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
- uopidx=2, base address (dontCare), offset (dontCare), destination register vd+2
-
Another example: emul=8, lmul=1, nf=8, the uop splitting is as follows:
- uopidx=0, base address src, offset vs2, destination register vd
- uopidx=1, base address (dontCare), offset vs2+1, destination register vd+1
- uopidx=2, base address (dontCare), offset vs2+2, destination register vd+2
- uopidx=3, base address (dontCare), offset vs2+3, destination register vd+3
- uopidx=4, base address (dontCare), offset vs2+4, destination register vd+4
- uopidx=5, base address (dontCare), offset vs2+5, destination register vd+5
- uopidx=6, base address (dontCare), offset vs2+6, destination register vd+6
- uopidx=7, base address (dontCare), offset vs2+7, destination register vd+7
Main Ports
Direction | Description | |
---|---|---|
in | In | Receives uop issue from Issue Queue |
uopwriteback | In | Writes back the finished uop to the backend |
rdcache | In/Out | DCache request/response |
sbuffer | Out | Write Sbuffer request |
vecDifftestInfo | Out | Information required for DifftestStoreEvent in sbuffer |
dtlb | In/out | Read/Write DTLB request/response |
pmpResp | In | Receives access permission information from PMP |
flush_sbuffer | Out | Flush sbuffer request |
feedback | Out | Feedback to Issue Queue module |
redirect | In | Redirect port |
exceptionInfo | Out | Outputs Exception information, participating in the arbitration for writing back exception information in MemBlock |
fromCsrTrigger | In | Receives Trigger-related data from CSR |
Interface Timing
The interface timing is relatively simple, only text descriptions are provided. | | Description | | :-------------- | :--------------------------------------------------------------- | | in | Has Valid, Ready. Data is valid when Valid && ready is true | | uopwriteback | Has Valid, Ready. Data is valid when Valid && ready is true | | rdcache | Has Valid, Ready. Data is valid when Valid && ready is true | | sbuffer | Has Valid, Ready. Data is valid when Valid && ready is true | | vecDifftestInfo | Valid simultaneously with the sbuffer port | | dtlb | Has Valid, Ready. Data is valid when Valid && ready is true | | pmpResp | Has Valid, Ready. Data is valid when Valid is true | | flush_sbuffer | Has Valid. Data is valid when Valid is true | | feedback | Has Valid. Data is valid when Valid is true | | redirect | Has Valid. Data is valid when Valid is true | | exceptionInfo | Has Valid. Data is valid when Valid is true | | fromCsrTrigger | Does not have Valid. Data is always considered valid, corresponding signal generates immediate response |