跳转至

XiangShan Decode Design Document

  • Version: V2R2
  • Status: OK
  • Date: 2025/02/28
  • Commit: xxx

Terminology Explanation

Terminology Explanation
Abbreviation Full Name Description
- Decode Unit Decode Unit
uop Micro Operation Micro Operation
- numOfUop Number of uops split from one instruction
- numOfWB Number of uops requiring writeback among uops split from one instruction
- vtypeArch vtype configuration of the latest committed vector instruction
- vtypeSpec Current vtype configuration for vector instructions
- walkVType vtype rolled back and restored during redirection

Sub-module List

Sub-module List
Sub-module Description
DecodeUnit Decode Unit
DecodeUnitComp Vector Instruction Splitting Module
FPDecoder Floating-Point Instruction Decoder
UopInfoGen Instruction Splitting Type and Count Generation Unit
VecDecoder Vector Instruction Decoder
VecExceptionGen Vector Exception Check Module
VTypeGen Vector Instruction vtype Configuration Generation Module

Design Specifications

  • Added Vector configuration generation module, Vector decode module, Vector instruction splitting module, Vector exception check module. All vector instructions are split and enter decoderComp.
  • Supports decoding up to 6 scalar instructions simultaneously in the same cycle.
  • Supports decoding up to 1 vector instruction simultaneously in the same cycle.
  • Partial instructions are translated:
  • zimop instruction is translated to an addi instruction with src as x0 and imm as 0.
  • csrr vlenb instruction is translated to an addi instruction with src as x0 and imm as VLEN/8.
  • csrr vl instruction is translated to a vset instruction that reads the vl register and writes to a scalar register.
  • When reading a read-only CSR, waitForward and blockBackward signals are no longer asserted, supporting out-of-order execution.
  • Other functionalities are the same as Nanhu.

Functionality

Performs decoding of instructions, converting the 32-bit instruction encoding into control signals for the instruction. If the instruction is a vector instruction or an AMO_CAS instruction, instruction splitting is required. The instruction splitting process involves splitting the instruction into one or more uops, and reassigning source register numbers, source register types, destination register numbers, destination register types, functional unit used, and operation type according to the split type. After decoding, the instruction with control information is passed to the rename module, which allocates physical registers based on the source register number and source register type. Exception instructions and virtualized exception instructions are checked during the decode stage, and the corresponding signals in exceptionVec are asserted.

Overall Design

Decoding is done by instantiating 6 DecodeUnit modules for the input instructions. The DecodeUnit outputs a signal indicating whether the instruction is a vector instruction. If it is a vector instruction, it needs to be passed to the complex decoder decoderComp for instruction splitting. Since vector instructions need to go through DecodeUnit and UopInfoGen for decoding before entering the complex decoder, which results in a long critical path, instructions entering the complex decoder are temporarily stored for one cycle and undergo vector exception checks and instruction splitting in the next cycle. They are converted into one or more uops. If the number of uops exceeds 6, multiple cycles are required for decoding. If the remaining uops can be decoded in the current cycle, the vector instruction(s) that need decoding will be passed to decoderComp in that cycle. Assuming rename is ready, the situation can be classified based on the order of input instructions:

  1. Scalar instruction: Decode directly.
  2. Vector instruction: When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed.
  3. Vector instruction + Scalar instruction: When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed; scalar instructions cannot be processed simultaneously.
  4. Scalar instruction + Vector instruction: Scalar instructions before the vector instruction are decoded directly. When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed.
  5. Split uops + Scalar instruction: Assume there are n split uops needing rename in the current cycle, and m scalar instructions needing rename simultaneously. If n + m <= 6, decode directly. Otherwise, only decode 6-n scalar instructions.
  6. Split uops + Vector instruction: Processing of vector instruction split uops is the same as the vector case.
  7. Split uops + Vector instruction + Scalar instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.
  8. Split uops + Scalar instruction + Vector instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.

Overall Block Diagram

decode

Interface List

See Interface Document

Secondary Module VTypeGen

The VTypeGen module is primarily used to maintain the vtype configuration to be used by the currently decoded vector instruction. Whenever a vset instruction is executed or a redirection occurs requiring rollback, the vtype information stored in VTypeGen is updated.

Inputs

  • 32-bit instruction information from the frontend instruction stream;
  • vtype rollback information from the ROB's vtype buffer;
  • vtype commit information from the ROB's vtype buffer;
  • vtype information for vsetvl instruction from the backend. Since the vtype information for vsetvl instruction is obtained by reading a register rather than decoding, the vtype information is passed to vtypeGen when the vsetvl instruction writes back.

Outputs

Outputs vtype information (the vtype configuration used by the vector instruction currently in the decode stage) to the Decode Unit.

Design Specifications

There are 4 cases for vtypeSpec update:

  1. When a vsetvl instruction is committed, vtypeSpec is updated to the vtype of the vsetvl instruction, which is obtained when the vsetvl instruction writes back. Since vsetvl instruction flushes the pipeline, it does not conflict with other cases.

  2. During the redirection rollback process, vtypeSpec is updated to the walkVType passed from the vtype buffer.

  3. When redirection starts, vtypeSpec is updated to vtypeArch.

  4. If the decoded instruction contains vsetivli or vsetvli instruction and no exception occurs, the vtype information of vsetivli and vsetvli instructions can be obtained from the immediate field. VTypeGen contains a simple decoder to determine if the input instruction includes these two types of instructions. If these two vset instructions exist, a PriorityMux is used to select the first vset instruction, and the VsetModule module is used to parse the vtype information.

  when(io.commitVType.hasVsetvl) {
    vtypeSpecNext := io.vsetvlVType
  }.elsewhen(io.walkVType.valid) {
    vtypeSpecNext := io.walkVType.bits
  }.elsewhen(io.walkToArchVType) {
    vtypeSpecNext := vtypeArch
  }.elsewhen(inHasVset && io.canUpdateVType) {
    vtypeSpecNext := vtypeNew
  }

There are 2 cases for vtypeArch update: 1. When a vsetvl instruction is committed, vtypeArch is updated to the vtype written back by the vsetvl instruction. 2. When a vsetivli or vsetvli instruction is committed, vtypeArch is updated to the vtype commit information passed from the vtype buffer.

Secondary Module DecodeUnit

Inputs and Outputs

  • Inputs
  • DecodeUnitEnqIO: Instruction stream information from the frontend, vtype, vstart information used by vector instructions.
  • CustomCSRCtrlIO: CSR control signals.
  • CSRToDecode: CSR control signals.
  • Outputs
  • DecodeUnitDeqIO: Decoded instruction information, whether it is a vector instruction, number of instruction splits.

Functionality

This module is the decode unit of the XiangShan backend. It converts control flow into richer micro-operations, including source register number, source register type, destination register number, destination register type, immediate type, functional unit type used, operation type, and other information.

Design Specifications

  1. Decoding Information

    • XSDecode decodeArray is defined in DecodeConstants to convert the 32-bit instruction encoding into XSDecode, which contains the following information:

    • srcType0: Type of source register 0.

    • srcType1: Type of source register 1.
    • srcType2: Type of source register 2, used for FMA instructions.
    • fuType: Functional unit type.
    • fuOpType: Operation type.
    • rfWen: Whether to write back to scalar register.
    • fpWen: Whether to write back to floating-point register.
    • vfWen: Whether to write back to vector register.
    • isXSTrap: Whether it is an XSTrap instruction.
    • noSpecExec: Whether it can execute out-of-order, i.e., does not need to wait for previous instructions to commit before executing.
    • blockBackward: Whether to block subsequent instructions, i.e., subsequent instructions can only enter ROB after the current instruction commits.
    • flushPipe: Whether to flush the pipeline, i.e., the pipeline needs to be flushed after the current instruction commits.
    • canRobCompress: Whether the instruction supports ROB compression (for instructions that do not trigger exceptions, if they are not at the boundary of the FTQ, we consider them compressible in ROB).
    • uopSplitType: Instruction splitting type. The splitting type for scalar instructions is UopSplitType.SCA_SIM, which does not require splitting. Vector instructions and AMO_CAS instructions require splitting. If a vector instruction only needs to be split into one uop and does not require modification of the instruction control signals, the splitting type is UopSplitType.dummy so that it enters the vector complex decoder for vector instruction exception checking.

    • VPUCtrlSignals Vector and floating-point instructions need to set VPUCtrlSignals. VPUCtrlSignals contains information like sew, lmul for vector configuration.

    • Vector instructions: Vector configuration information comes from the vtype information of VtypeGen in the Decode Stage.
    • Floating-point instructions: The floating-point module and vector module are independent but reuse the same functional units as vectors. The functional unit specifies the element width through sew information. Therefore, a dedicated sub-module FPToVecDecoder for floating-point instructions generates VPUCtrlSignals control signals for floating-point instructions.

    • FPUCtrlSignals Generated in the decode sub-module FPDecoder. The rm signal controls floating-point rounding, wflags controls the i2f module and fflag update, and other signals control the i2f module.

        class FPUCtrlSignals(implicit p: Parameters) extends XSBundle {
          val typeTagOut = UInt(2.W) // H S D
          val wflags = Bool()
          val typ = UInt(2.W)
          val fmt = UInt(2.W)
          val rm = UInt(3.W)
        }
      

    • uopnum UopInfoGen generates the number of instruction splits. The number of instruction splits for scalar instructions is 1. The number of splits for AMO_CAS instructions can be 2 or 4 depending on the type. The number of instruction splits for vector instructions needs to be calculated based on lmul. Vector load/store instructions also need to calculate the number of instruction splits based on lmul, sew, and eew.
  2. Translation Processing

    • move instruction Since the move instruction is a special addi instruction, it is identified by the instruction fields, and move elimination is performed in the subsequent rename stage.
    • zimop instruction Since the zimop instruction only needs to write 0 to vd, it is translated into an addi instruction with src as x0 and imm as 0.
    • csrr vlenb instruction The value of vlenb is fixed, translated into an addi instruction with src as x0 and imm as VLEN/8.
    • csrr vl instruction vl uses an independent register file, thus supporting renaming and out-of-order execution. Reading the vl instruction is converted into a vset instruction that reads vl and writes to the corresponding rd.
    • Soft Prefetch instruction Change fuType to FuType.ldu.U and pass it to the corresponding functional unit for processing.
  3. Exception Handling DecodeUnit handles two types of exceptions: illegalInstr (exception value is 2) and virtualInstr (exception value is 22).

    • illegalInstr
    • Check if immediate selection is invalid.
    • Exceptions triggered when the instruction is executed under certain CSR settings.
    • Vector-related exceptions are not checked in this module but in the complex decoder.
    • virtualInstr
    • Exceptions triggered when the instruction is executed under certain CSR settings.

Secondary Module DecodeUnitComp

Inputs and Outputs

Instruction splitting only modifies information such as operand register numbers and operand types in the instruction, so the input and output types are both DecodeUnitCompInput. Since the vtype information of the vset instruction is obtained through decoding rather than vtypegen, the vtype used by the vset instruction is updated to the vtype information of this vset instruction via the vtypebypass signal. - DecodeUnitCompIO

    class DecodeUnitCompIO(implicit p: Parameters) extends XSBundle {
      val redirect = Input(Bool())
      val csrCtrl = Input(new CustomCSRCtrlIO)
      val vtypeBypass = Input(new VType)
      // When the first inst in decode vector is complex inst, pass it in
      val in = Flipped(DecoupledIO(new DecodeUnitCompInput))
      val out = new DecodeUnitCompOutput
      val complexNum = Output(UInt(3.W))
    }

Functionality

Splits a vector instruction into multiple micro-operations based on the splitting type and lmul information, and modifies information such as operand register numbers and operand types in the micro-operations. Simultaneously, vector instruction exception checking is also performed in this module. This module uses a state machine. The ready signal is asserted only when no instruction is being processed or when the processing of a split instruction is completed in the current cycle, allowing the next instruction to be processed.

Design Specifications

Currently, there are many types of instruction splitting, which will be streamlined and optimized in the future.

Splitting Type Corresponding Instruction Type
AMO_CAS_W/AMO_CAS_D/AMO_CAS_Q AMO_CAS instruction
VSET vset instruction
VEC_VVV Instructions where two source registers and the destination register are vector registers
VEC_VFV Instructions where one source register is a floating-point register, and one source register and the destination register are vector registers
VEC_EXT2/VEC_EXT4/VEC_EXT8 Vector sign extension instructions
VEC_0XV Scalar to vector move instruction
VEC_VXV Instructions where one source register is a scalar register, and one source register and the destination register are vector registers
VEC_VVW/VEC_VFW/VEC_WVW/VEC_VXW/VEC_WXW/VEC_WVV/VEC_WFW/VEC_WXV Widening/narrow vector instructions
VEC_VVM/VEC_VFM/VEC_VXM Vector instructions where the destination register is a mask register
VEC_SLIDE1UP vslide1up instruction
VEC_FSLIDE1UP vfslide1up instruction
VEC_SLIDE1DOWN vslide1down instruction
VEC_FSLIDE1DOWN vfslide1down instruction
VEC_VRED Scalar reduction instruction
VEC_VFRED Out-of-order floating-point reduction instruction
VEC_VFREDOSUM Ordered floating-point reduction instruction
VEC_SLIDEUP vslideup instruction
VEC_SLIDEDOWN vslidedown instruction
VEC_M0X vcpop instruction
VEC_MVV vid/viota instructions
VEC_VWW Scalar widening reduction instruction
VEC_RGATHER vrgather instruction
VEC_RGATHER_VX vrgather instruction where one operand comes from a scalar register
VEC_RGATHEREI16 vrgatherei16 instruction
VEC_COMPRESS vcompress instruction
VEC_MVNR vmvnr instruction
VEC_US_LDST Unit-stride load/store instructions
VEC_S_LDST Strided load/store instructions
VEC_I_LDST Indexed load/store instructions

Secondary Module VecExceptionGen

  • Inputs:
  • inst: 32-bit instruction
  • decodedInst: Decoded information
  • vtype: vtype information
  • vstart: vstart information

  • Output:

  • illegalInst: Whether the instruction is illegal (exception)

Functionality

Checks for exceptions in vector instructions. Except for memory access related exceptions of vector load/store instructions, all others are checked in the decode stage.

Design Specifications

Vector instruction related exceptions are classified into the following eight types:

Exception Name Description
inst Illegal Exception reported for reserved instructions
vill Illegal Exception reported when executing vector instructions other than vset when the vill field of vtype is 1
EEW Illegal EEW exception for vector floating-point, sign extension, widening, and narrow instructions
EMUL Illegal ELMUL exception for vector load/store, sign extension, widening, narrow, and vrgatherei16 instructions
Reg Number Align vs1, vs2, vd not aligned according to lmul
v0 Overlap Exception reported when reading v0 register and modifying v0 simultaneously for some instructions
Src Reg Overlap Exception reported when vs1, vs2, and vd overlap for some instructions
vstart Illegal Exception reported when executing vector instructions other than vset and vector load/store instructions when vstart is not 0

If any of these exceptions are triggered, the exception signal is asserted.