XiangShan Decode Design Document

Version: V2R2
Status: OK
Date: 2025/02/28
Commit: xxx

Terminology Explanation

Terminology Explanation
Abbreviation	Full Name	Description
-	Decode Unit	Decode Unit
uop	Micro Operation	Micro Operation
-	numOfUop	Number of uops split from one instruction
-	numOfWB	Number of uops requiring writeback among uops split from one instruction
-	vtypeArch	vtype configuration of the latest committed vector instruction
-	vtypeSpec	Current vtype configuration for vector instructions
-	walkVType	vtype rolled back and restored during redirection

Sub-module List

Sub-module List
Sub-module	Description
DecodeUnit	Decode Unit
DecodeUnitComp	Vector Instruction Splitting Module
FPDecoder	Floating-Point Instruction Decoder
UopInfoGen	Instruction Splitting Type and Count Generation Unit
VecDecoder	Vector Instruction Decoder
VecExceptionGen	Vector Exception Check Module
VTypeGen	Vector Instruction vtype Configuration Generation Module

Design Specifications

Added Vector configuration generation module, Vector decode module, Vector instruction splitting module, Vector exception check module. All vector instructions are split and enter decoderComp.
Supports decoding up to 6 scalar instructions simultaneously in the same cycle.
Supports decoding up to 1 vector instruction simultaneously in the same cycle.
Partial instructions are translated:
zimop instruction is translated to an addi instruction with src as x0 and imm as 0.
csrr vlenb instruction is translated to an addi instruction with src as x0 and imm as VLEN/8.
csrr vl instruction is translated to a vset instruction that reads the vl register and writes to a scalar register.
When reading a read-only CSR, waitForward and blockBackward signals are no longer asserted, supporting out-of-order execution.
Other functionalities are the same as Nanhu.

Functionality

Performs decoding of instructions, converting the 32-bit instruction encoding into control signals for the instruction. If the instruction is a vector instruction or an AMO_CAS instruction, instruction splitting is required. The instruction splitting process involves splitting the instruction into one or more uops, and reassigning source register numbers, source register types, destination register numbers, destination register types, functional unit used, and operation type according to the split type. After decoding, the instruction with control information is passed to the rename module, which allocates physical registers based on the source register number and source register type. Exception instructions and virtualized exception instructions are checked during the decode stage, and the corresponding signals in exceptionVec are asserted.

Overall Design

Decoding is done by instantiating 6 DecodeUnit modules for the input instructions. The DecodeUnit outputs a signal indicating whether the instruction is a vector instruction. If it is a vector instruction, it needs to be passed to the complex decoder decoderComp for instruction splitting. Since vector instructions need to go through DecodeUnit and UopInfoGen for decoding before entering the complex decoder, which results in a long critical path, instructions entering the complex decoder are temporarily stored for one cycle and undergo vector exception checks and instruction splitting in the next cycle. They are converted into one or more uops. If the number of uops exceeds 6, multiple cycles are required for decoding. If the remaining uops can be decoded in the current cycle, the vector instruction(s) that need decoding will be passed to decoderComp in that cycle. Assuming rename is ready, the situation can be classified based on the order of input instructions:

Scalar instruction: Decode directly.
Vector instruction: When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed.
Vector instruction + Scalar instruction: When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed; scalar instructions cannot be processed simultaneously.
Scalar instruction + Vector instruction: Scalar instructions before the vector instruction are decoded directly. When decoderComp is ready, the vector instruction is passed to decoderComp for instruction splitting. Only one vector instruction can be processed.
Split uops + Scalar instruction: Assume there are n split uops needing rename in the current cycle, and m scalar instructions needing rename simultaneously. If n + m <= 6, decode directly. Otherwise, only decode 6-n scalar instructions.
Split uops + Vector instruction: Processing of vector instruction split uops is the same as the vector case.
Split uops + Vector instruction + Scalar instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.
Split uops + Scalar instruction + Vector instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.

Overall Block Diagram

Interface List

See Interface Document

Secondary Module VTypeGen

The VTypeGen module is primarily used to maintain the vtype configuration to be used by the currently decoded vector instruction. Whenever a vset instruction is executed or a redirection occurs requiring rollback, the vtype information stored in VTypeGen is updated.

Inputs

32-bit instruction information from the frontend instruction stream;
vtype rollback information from the ROB's vtype buffer;
vtype commit information from the ROB's vtype buffer;
vtype information for vsetvl instruction from the backend. Since the vtype information for vsetvl instruction is obtained by reading a register rather than decoding, the vtype information is passed to vtypeGen when the vsetvl instruction writes back.

Outputs

Outputs vtype information (the vtype configuration used by the vector instruction currently in the decode stage) to the Decode Unit.

Design Specifications

There are 4 cases for vtypeSpec update:

When a vsetvl instruction is committed, vtypeSpec is updated to the vtype of the vsetvl instruction, which is obtained when the vsetvl instruction writes back. Since vsetvl instruction flushes the pipeline, it does not conflict with other cases.
During the redirection rollback process, vtypeSpec is updated to the walkVType passed from the vtype buffer.
When redirection starts, vtypeSpec is updated to vtypeArch.
If the decoded instruction contains vsetivli or vsetvli instruction and no exception occurs, the vtype information of vsetivli and vsetvli instructions can be obtained from the immediate field. VTypeGen contains a simple decoder to determine if the input instruction includes these two types of instructions. If these two vset instructions exist, a PriorityMux is used to select the first vset instruction, and the VsetModule module is used to parse the vtype information.

  when(io.commitVType.hasVsetvl) {
    vtypeSpecNext := io.vsetvlVType
  }.elsewhen(io.walkVType.valid) {
    vtypeSpecNext := io.walkVType.bits
  }.elsewhen(io.walkToArchVType) {
    vtypeSpecNext := vtypeArch
  }.elsewhen(inHasVset && io.canUpdateVType) {
    vtypeSpecNext := vtypeNew
  }

There are 2 cases for vtypeArch update: 1. When a vsetvl instruction is committed, vtypeArch is updated to the vtype written back by the vsetvl instruction. 2. When a vsetivli or vsetvli instruction is committed, vtypeArch is updated to the vtype commit information passed from the vtype buffer.

Secondary Module DecodeUnit

Inputs and Outputs

Inputs
DecodeUnitEnqIO: Instruction stream information from the frontend, vtype, vstart information used by vector instructions.
CustomCSRCtrlIO: CSR control signals.
CSRToDecode: CSR control signals.
Outputs
DecodeUnitDeqIO: Decoded instruction information, whether it is a vector instruction, number of instruction splits.

Functionality

This module is the decode unit of the XiangShan backend. It converts control flow into richer micro-operations, including source register number, source register type, destination register number, destination register type, immediate type, functional unit type used, operation type, and other information.

Design Specifications

Decoding Information
- XSDecode decodeArray is defined in DecodeConstants to convert the 32-bit instruction encoding into XSDecode, which contains the following information:
- srcType0: Type of source register 0.
- srcType1: Type of source register 1.
- srcType2: Type of source register 2, used for FMA instructions.
- fuType: Functional unit type.
- fuOpType: Operation type.
- rfWen: Whether to write back to scalar register.
- fpWen: Whether to write back to floating-point register.
- vfWen: Whether to write back to vector register.
- isXSTrap: Whether it is an XSTrap instruction.
- noSpecExec: Whether it can execute out-of-order, i.e., does not need to wait for previous instructions to commit before executing.
- blockBackward: Whether to block subsequent instructions, i.e., subsequent instructions can only enter ROB after the current instruction commits.
- flushPipe: Whether to flush the pipeline, i.e., the pipeline needs to be flushed after the current instruction commits.
- canRobCompress: Whether the instruction supports ROB compression (for instructions that do not trigger exceptions, if they are not at the boundary of the FTQ, we consider them compressible in ROB).
- uopSplitType: Instruction splitting type. The splitting type for scalar instructions is UopSplitType.SCA_SIM, which does not require splitting. Vector instructions and AMO_CAS instructions require splitting. If a vector instruction only needs to be split into one uop and does not require modification of the instruction control signals, the splitting type is UopSplitType.dummy so that it enters the vector complex decoder for vector instruction exception checking.
- VPUCtrlSignals Vector and floating-point instructions need to set VPUCtrlSignals. VPUCtrlSignals contains information like sew, lmul for vector configuration.
- Vector instructions: Vector configuration information comes from the vtype information of VtypeGen in the Decode Stage.
- Floating-point instructions: The floating-point module and vector module are independent but reuse the same functional units as vectors. The functional unit specifies the element width through sew information. Therefore, a dedicated sub-module FPToVecDecoder for floating-point instructions generates VPUCtrlSignals control signals for floating-point instructions.
- FPUCtrlSignals Generated in the decode sub-module FPDecoder. The rm signal controls floating-point rounding, wflags controls the i2f module and fflag update, and other signals control the i2f module.
```
  class FPUCtrlSignals(implicit p: Parameters) extends XSBundle {
    val typeTagOut = UInt(2.W) // H S D
    val wflags = Bool()
    val typ = UInt(2.W)
    val fmt = UInt(2.W)
    val rm = UInt(3.W)
  }
```
- uopnum UopInfoGen generates the number of instruction splits. The number of instruction splits for scalar instructions is 1. The number of splits for AMO_CAS instructions can be 2 or 4 depending on the type. The number of instruction splits for vector instructions needs to be calculated based on lmul. Vector load/store instructions also need to calculate the number of instruction splits based on lmul, sew, and eew.
Translation Processing
- move instruction Since the move instruction is a special addi instruction, it is identified by the instruction fields, and move elimination is performed in the subsequent rename stage.
- zimop instruction Since the zimop instruction only needs to write 0 to vd, it is translated into an addi instruction with src as x0 and imm as 0.
- csrr vlenb instruction The value of vlenb is fixed, translated into an addi instruction with src as x0 and imm as VLEN/8.
- csrr vl instruction vl uses an independent register file, thus supporting renaming and out-of-order execution. Reading the vl instruction is converted into a vset instruction that reads vl and writes to the corresponding rd.
- Soft Prefetch instruction Change fuType to FuType.ldu.U and pass it to the corresponding functional unit for processing.
Exception Handling DecodeUnit handles two types of exceptions: illegalInstr (exception value is 2) and virtualInstr (exception value is 22).
- illegalInstr
- Check if immediate selection is invalid.
- Exceptions triggered when the instruction is executed under certain CSR settings.
- Vector-related exceptions are not checked in this module but in the complex decoder.
- virtualInstr
- Exceptions triggered when the instruction is executed under certain CSR settings.

Secondary Module DecodeUnitComp

Inputs and Outputs

Instruction splitting only modifies information such as operand register numbers and operand types in the instruction, so the input and output types are both DecodeUnitCompInput. Since the vtype information of the vset instruction is obtained through decoding rather than vtypegen, the vtype used by the vset instruction is updated to the vtype information of this vset instruction via the vtypebypass signal. - DecodeUnitCompIO

    class DecodeUnitCompIO(implicit p: Parameters) extends XSBundle {
      val redirect = Input(Bool())
      val csrCtrl = Input(new CustomCSRCtrlIO)
      val vtypeBypass = Input(new VType)
      // When the first inst in decode vector is complex inst, pass it in
      val in = Flipped(DecoupledIO(new DecodeUnitCompInput))
      val out = new DecodeUnitCompOutput
      val complexNum = Output(UInt(3.W))
    }

Functionality

Splits a vector instruction into multiple micro-operations based on the splitting type and lmul information, and modifies information such as operand register numbers and operand types in the micro-operations. Simultaneously, vector instruction exception checking is also performed in this module. This module uses a state machine. The ready signal is asserted only when no instruction is being processed or when the processing of a split instruction is completed in the current cycle, allowing the next instruction to be processed.

Design Specifications

Currently, there are many types of instruction splitting, which will be streamlined and optimized in the future.

Splitting Type	Corresponding Instruction Type
AMO_CAS_W/AMO_CAS_D/AMO_CAS_Q	AMO_CAS instruction
VSET	vset instruction
VEC_VVV	Instructions where two source registers and the destination register are vector registers
VEC_VFV	Instructions where one source register is a floating-point register, and one source register and the destination register are vector registers
VEC_EXT2/VEC_EXT4/VEC_EXT8	Vector sign extension instructions
VEC_0XV	Scalar to vector move instruction
VEC_VXV	Instructions where one source register is a scalar register, and one source register and the destination register are vector registers
VEC_VVW/VEC_VFW/VEC_WVW/VEC_VXW/VEC_WXW/VEC_WVV/VEC_WFW/VEC_WXV	Widening/narrow vector instructions
VEC_VVM/VEC_VFM/VEC_VXM	Vector instructions where the destination register is a mask register
VEC_SLIDE1UP	vslide1up instruction
VEC_FSLIDE1UP	vfslide1up instruction
VEC_SLIDE1DOWN	vslide1down instruction
VEC_FSLIDE1DOWN	vfslide1down instruction
VEC_VRED	Scalar reduction instruction
VEC_VFRED	Out-of-order floating-point reduction instruction
VEC_VFREDOSUM	Ordered floating-point reduction instruction
VEC_SLIDEUP	vslideup instruction
VEC_SLIDEDOWN	vslidedown instruction
VEC_M0X	vcpop instruction
VEC_MVV	vid/viota instructions
VEC_VWW	Scalar widening reduction instruction
VEC_RGATHER	vrgather instruction
VEC_RGATHER_VX	vrgather instruction where one operand comes from a scalar register
VEC_RGATHEREI16	vrgatherei16 instruction
VEC_COMPRESS	vcompress instruction
VEC_MVNR	vmvnr instruction
VEC_US_LDST	Unit-stride load/store instructions
VEC_S_LDST	Strided load/store instructions
VEC_I_LDST	Indexed load/store instructions

Secondary Module VecExceptionGen

Inputs:
inst: 32-bit instruction
decodedInst: Decoded information
vtype: vtype information
vstart: vstart information
Output:
illegalInst: Whether the instruction is illegal (exception)

Functionality

Checks for exceptions in vector instructions. Except for memory access related exceptions of vector load/store instructions, all others are checked in the decode stage.

Design Specifications

Vector instruction related exceptions are classified into the following eight types:

Exception Name	Description
inst Illegal	Exception reported for reserved instructions
vill Illegal	Exception reported when executing vector instructions other than `vset` when the `vill` field of `vtype` is 1
EEW Illegal	EEW exception for vector floating-point, sign extension, widening, and narrow instructions
EMUL Illegal	ELMUL exception for vector load/store, sign extension, widening, narrow, and vrgatherei16 instructions
Reg Number Align	vs1, vs2, vd not aligned according to lmul
v0 Overlap	Exception reported when reading v0 register and modifying v0 simultaneously for some instructions
Src Reg Overlap	Exception reported when vs1, vs2, and vd overlap for some instructions
vstart Illegal	Exception reported when executing vector instructions other than vset and vector load/store instructions when vstart is not 0

If any of these exceptions are triggered, the exception signal is asserted.