XiangShan Decode Design Document
- Version: V2R2
- Status: OK
- Date: 2025/02/28
- Commit: xxx
Terminology Explanation
Abbreviation | Full Name | Description |
---|---|---|
- | Decode Unit | Decode Unit |
uop | Micro Operation | Micro Operation |
- | numOfUop | Number of uops split from one instruction |
- | numOfWB | Number of uops requiring writeback among uops split from one instruction |
- | vtypeArch | vtype configuration of the latest committed vector instruction |
- | vtypeSpec | Current vtype configuration for vector instructions |
- | walkVType | vtype rolled back and restored during redirection |
Sub-module List
Sub-module | Description |
---|---|
DecodeUnit | Decode Unit |
DecodeUnitComp | Vector Instruction Splitting Module |
FPDecoder | Floating-Point Instruction Decoder |
UopInfoGen | Instruction Splitting Type and Count Generation Unit |
VecDecoder | Vector Instruction Decoder |
VecExceptionGen | Vector Exception Check Module |
VTypeGen | Vector Instruction vtype Configuration Generation Module |
Design Specifications
- Added Vector configuration generation module, Vector decode module, Vector instruction splitting module, Vector exception check module. All vector instructions are split and enter
decoderComp
. - Supports decoding up to 6 scalar instructions simultaneously in the same cycle.
- Supports decoding up to 1 vector instruction simultaneously in the same cycle.
- Partial instructions are translated:
zimop
instruction is translated to anaddi
instruction withsrc
asx0
andimm
as 0.csrr vlenb
instruction is translated to anaddi
instruction withsrc
asx0
andimm
asVLEN/8
.csrr vl
instruction is translated to avset
instruction that reads thevl
register and writes to a scalar register.- When reading a read-only CSR,
waitForward
andblockBackward
signals are no longer asserted, supporting out-of-order execution. - Other functionalities are the same as Nanhu.
Functionality
Performs decoding of instructions, converting the 32-bit instruction encoding into control signals for the instruction. If the instruction is a vector instruction or an AMO_CAS instruction, instruction splitting is required. The instruction splitting process involves splitting the instruction into one or more uops, and reassigning source register numbers, source register types, destination register numbers, destination register types, functional unit used, and operation type according to the split type. After decoding, the instruction with control information is passed to the rename module, which allocates physical registers based on the source register number and source register type. Exception instructions and virtualized exception instructions are checked during the decode stage, and the corresponding signals in exceptionVec
are asserted.
Overall Design
Decoding is done by instantiating 6 DecodeUnit
modules for the input instructions. The DecodeUnit
outputs a signal indicating whether the instruction is a vector instruction. If it is a vector instruction, it needs to be passed to the complex decoder decoderComp
for instruction splitting.
Since vector instructions need to go through DecodeUnit
and UopInfoGen
for decoding before entering the complex decoder, which results in a long critical path, instructions entering the complex decoder are temporarily stored for one cycle and undergo vector exception checks and instruction splitting in the next cycle. They are converted into one or more uops. If the number of uops exceeds 6, multiple cycles are required for decoding. If the remaining uops can be decoded in the current cycle, the vector instruction(s) that need decoding will be passed to decoderComp
in that cycle.
Assuming rename is ready, the situation can be classified based on the order of input instructions:
- Scalar instruction: Decode directly.
- Vector instruction: When
decoderComp
is ready, the vector instruction is passed todecoderComp
for instruction splitting. Only one vector instruction can be processed. - Vector instruction + Scalar instruction: When
decoderComp
is ready, the vector instruction is passed todecoderComp
for instruction splitting. Only one vector instruction can be processed; scalar instructions cannot be processed simultaneously. - Scalar instruction + Vector instruction: Scalar instructions before the vector instruction are decoded directly. When
decoderComp
is ready, the vector instruction is passed todecoderComp
for instruction splitting. Only one vector instruction can be processed. - Split uops + Scalar instruction: Assume there are
n
split uops needing rename in the current cycle, andm
scalar instructions needing rename simultaneously. Ifn + m <= 6
, decode directly. Otherwise, only decode6-n
scalar instructions. - Split uops + Vector instruction: Processing of vector instruction split uops is the same as the vector case.
- Split uops + Vector instruction + Scalar instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.
- Split uops + Scalar instruction + Vector instruction: Processing of scalar instructions is the same as the Split uops + Scalar instruction case. Processing of vector instruction is the same as the Vector instruction case.
Overall Block Diagram
Interface List
See Interface Document
Secondary Module VTypeGen
The VTypeGen
module is primarily used to maintain the vtype
configuration to be used by the currently decoded vector instruction. Whenever a vset
instruction is executed or a redirection occurs requiring rollback, the vtype
information stored in VTypeGen
is updated.
Inputs
- 32-bit instruction information from the frontend instruction stream;
vtype
rollback information from the ROB'svtype
buffer;vtype
commit information from the ROB'svtype
buffer;vtype
information forvsetvl
instruction from the backend. Since thevtype
information forvsetvl
instruction is obtained by reading a register rather than decoding, thevtype
information is passed tovtypeGen
when thevsetvl
instruction writes back.
Outputs
Outputs vtype
information (the vtype
configuration used by the vector instruction currently in the decode stage) to the Decode Unit.
Design Specifications
There are 4 cases for vtypeSpec
update:
-
When a
vsetvl
instruction is committed,vtypeSpec
is updated to thevtype
of thevsetvl
instruction, which is obtained when thevsetvl
instruction writes back. Sincevsetvl
instruction flushes the pipeline, it does not conflict with other cases. -
During the redirection rollback process,
vtypeSpec
is updated to thewalkVType
passed from thevtype
buffer. -
When redirection starts,
vtypeSpec
is updated tovtypeArch
. -
If the decoded instruction contains
vsetivli
orvsetvli
instruction and no exception occurs, thevtype
information ofvsetivli
andvsetvli
instructions can be obtained from the immediate field.VTypeGen
contains a simple decoder to determine if the input instruction includes these two types of instructions. If these twovset
instructions exist, aPriorityMux
is used to select the firstvset
instruction, and theVsetModule
module is used to parse thevtype
information.
when(io.commitVType.hasVsetvl) {
vtypeSpecNext := io.vsetvlVType
}.elsewhen(io.walkVType.valid) {
vtypeSpecNext := io.walkVType.bits
}.elsewhen(io.walkToArchVType) {
vtypeSpecNext := vtypeArch
}.elsewhen(inHasVset && io.canUpdateVType) {
vtypeSpecNext := vtypeNew
}
There are 2 cases for vtypeArch
update:
1. When a vsetvl
instruction is committed, vtypeArch
is updated to the vtype
written back by the vsetvl
instruction.
2. When a vsetivli
or vsetvli
instruction is committed, vtypeArch
is updated to the vtype
commit information passed from the vtype
buffer.
Secondary Module DecodeUnit
Inputs and Outputs
- Inputs
DecodeUnitEnqIO
: Instruction stream information from the frontend,vtype
,vstart
information used by vector instructions.CustomCSRCtrlIO
: CSR control signals.CSRToDecode
: CSR control signals.- Outputs
DecodeUnitDeqIO
: Decoded instruction information, whether it is a vector instruction, number of instruction splits.
Functionality
This module is the decode unit of the XiangShan backend. It converts control flow into richer micro-operations, including source register number, source register type, destination register number, destination register type, immediate type, functional unit type used, operation type, and other information.
Design Specifications
-
Decoding Information
-
XSDecode
decodeArray
is defined inDecodeConstants
to convert the 32-bit instruction encoding intoXSDecode
, which contains the following information: -
srcType0
: Type of source register 0. srcType1
: Type of source register 1.srcType2
: Type of source register 2, used for FMA instructions.fuType
: Functional unit type.fuOpType
: Operation type.rfWen
: Whether to write back to scalar register.fpWen
: Whether to write back to floating-point register.vfWen
: Whether to write back to vector register.isXSTrap
: Whether it is an XSTrap instruction.noSpecExec
: Whether it can execute out-of-order, i.e., does not need to wait for previous instructions to commit before executing.blockBackward
: Whether to block subsequent instructions, i.e., subsequent instructions can only enter ROB after the current instruction commits.flushPipe
: Whether to flush the pipeline, i.e., the pipeline needs to be flushed after the current instruction commits.canRobCompress
: Whether the instruction supports ROB compression (for instructions that do not trigger exceptions, if they are not at the boundary of the FTQ, we consider them compressible in ROB).-
uopSplitType
: Instruction splitting type. The splitting type for scalar instructions isUopSplitType.SCA_SIM
, which does not require splitting. Vector instructions and AMO_CAS instructions require splitting. If a vector instruction only needs to be split into one uop and does not require modification of the instruction control signals, the splitting type isUopSplitType.dummy
so that it enters the vector complex decoder for vector instruction exception checking. -
VPUCtrlSignals Vector and floating-point instructions need to set
VPUCtrlSignals
.VPUCtrlSignals
contains information likesew
,lmul
for vector configuration. - Vector instructions: Vector configuration information comes from the
vtype
information ofVtypeGen
in the Decode Stage. -
Floating-point instructions: The floating-point module and vector module are independent but reuse the same functional units as vectors. The functional unit specifies the element width through
sew
information. Therefore, a dedicated sub-moduleFPToVecDecoder
for floating-point instructions generatesVPUCtrlSignals
control signals for floating-point instructions. -
FPUCtrlSignals Generated in the decode sub-module
FPDecoder
. Therm
signal controls floating-point rounding,wflags
controls thei2f
module andfflag
update, and other signals control thei2f
module.class FPUCtrlSignals(implicit p: Parameters) extends XSBundle { val typeTagOut = UInt(2.W) // H S D val wflags = Bool() val typ = UInt(2.W) val fmt = UInt(2.W) val rm = UInt(3.W) }
- uopnum
UopInfoGen
generates the number of instruction splits. The number of instruction splits for scalar instructions is 1. The number of splits for AMO_CAS instructions can be 2 or 4 depending on the type. The number of instruction splits for vector instructions needs to be calculated based onlmul
. Vector load/store instructions also need to calculate the number of instruction splits based onlmul
,sew
, andeew
.
-
-
Translation Processing
- move instruction
Since the
move
instruction is a specialaddi
instruction, it is identified by the instruction fields, and move elimination is performed in the subsequent rename stage. - zimop instruction
Since the
zimop
instruction only needs to write 0 tovd
, it is translated into anaddi
instruction withsrc
asx0
andimm
as 0. - csrr vlenb instruction
The value of
vlenb
is fixed, translated into anaddi
instruction withsrc
asx0
andimm
asVLEN/8
. - csrr vl instruction
vl
uses an independent register file, thus supporting renaming and out-of-order execution. Reading thevl
instruction is converted into avset
instruction that readsvl
and writes to the correspondingrd
. - Soft Prefetch instruction
Change
fuType
toFuType.ldu.U
and pass it to the corresponding functional unit for processing.
- move instruction
Since the
-
Exception Handling
DecodeUnit
handles two types of exceptions:illegalInstr
(exception value is 2) andvirtualInstr
(exception value is 22).- illegalInstr
- Check if immediate selection is invalid.
- Exceptions triggered when the instruction is executed under certain CSR settings.
- Vector-related exceptions are not checked in this module but in the complex decoder.
- virtualInstr
- Exceptions triggered when the instruction is executed under certain CSR settings.
Secondary Module DecodeUnitComp
Inputs and Outputs
Instruction splitting only modifies information such as operand register numbers and operand types in the instruction, so the input and output types are both DecodeUnitCompInput
. Since the vtype
information of the vset
instruction is obtained through decoding rather than vtypegen
, the vtype
used by the vset
instruction is updated to the vtype
information of this vset
instruction via the vtypebypass
signal.
- DecodeUnitCompIO
class DecodeUnitCompIO(implicit p: Parameters) extends XSBundle {
val redirect = Input(Bool())
val csrCtrl = Input(new CustomCSRCtrlIO)
val vtypeBypass = Input(new VType)
// When the first inst in decode vector is complex inst, pass it in
val in = Flipped(DecoupledIO(new DecodeUnitCompInput))
val out = new DecodeUnitCompOutput
val complexNum = Output(UInt(3.W))
}
Functionality
Splits a vector instruction into multiple micro-operations based on the splitting type and lmul
information, and modifies information such as operand register numbers and operand types in the micro-operations. Simultaneously, vector instruction exception checking is also performed in this module. This module uses a state machine. The ready
signal is asserted only when no instruction is being processed or when the processing of a split instruction is completed in the current cycle, allowing the next instruction to be processed.
Design Specifications
Currently, there are many types of instruction splitting, which will be streamlined and optimized in the future.
Splitting Type | Corresponding Instruction Type |
---|---|
AMO_CAS_W/AMO_CAS_D/AMO_CAS_Q | AMO_CAS instruction |
VSET | vset instruction |
VEC_VVV | Instructions where two source registers and the destination register are vector registers |
VEC_VFV | Instructions where one source register is a floating-point register, and one source register and the destination register are vector registers |
VEC_EXT2/VEC_EXT4/VEC_EXT8 | Vector sign extension instructions |
VEC_0XV | Scalar to vector move instruction |
VEC_VXV | Instructions where one source register is a scalar register, and one source register and the destination register are vector registers |
VEC_VVW/VEC_VFW/VEC_WVW/VEC_VXW/VEC_WXW/VEC_WVV/VEC_WFW/VEC_WXV | Widening/narrow vector instructions |
VEC_VVM/VEC_VFM/VEC_VXM | Vector instructions where the destination register is a mask register |
VEC_SLIDE1UP | vslide1up instruction |
VEC_FSLIDE1UP | vfslide1up instruction |
VEC_SLIDE1DOWN | vslide1down instruction |
VEC_FSLIDE1DOWN | vfslide1down instruction |
VEC_VRED | Scalar reduction instruction |
VEC_VFRED | Out-of-order floating-point reduction instruction |
VEC_VFREDOSUM | Ordered floating-point reduction instruction |
VEC_SLIDEUP | vslideup instruction |
VEC_SLIDEDOWN | vslidedown instruction |
VEC_M0X | vcpop instruction |
VEC_MVV | vid/viota instructions |
VEC_VWW | Scalar widening reduction instruction |
VEC_RGATHER | vrgather instruction |
VEC_RGATHER_VX | vrgather instruction where one operand comes from a scalar register |
VEC_RGATHEREI16 | vrgatherei16 instruction |
VEC_COMPRESS | vcompress instruction |
VEC_MVNR | vmvnr instruction |
VEC_US_LDST | Unit-stride load/store instructions |
VEC_S_LDST | Strided load/store instructions |
VEC_I_LDST | Indexed load/store instructions |
Secondary Module VecExceptionGen
- Inputs:
inst
: 32-bit instructiondecodedInst
: Decoded informationvtype
: vtype information-
vstart
: vstart information -
Output:
illegalInst
: Whether the instruction is illegal (exception)
Functionality
Checks for exceptions in vector instructions. Except for memory access related exceptions of vector load/store instructions, all others are checked in the decode stage.
Design Specifications
Vector instruction related exceptions are classified into the following eight types:
Exception Name | Description |
---|---|
inst Illegal | Exception reported for reserved instructions |
vill Illegal | Exception reported when executing vector instructions other than vset when the vill field of vtype is 1 |
EEW Illegal | EEW exception for vector floating-point, sign extension, widening, and narrow instructions |
EMUL Illegal | ELMUL exception for vector load/store, sign extension, widening, narrow, and vrgatherei16 instructions |
Reg Number Align | vs1, vs2, vd not aligned according to lmul |
v0 Overlap | Exception reported when reading v0 register and modifying v0 simultaneously for some instructions |
Src Reg Overlap | Exception reported when vs1, vs2, and vd overlap for some instructions |
vstart Illegal | Exception reported when executing vector instructions other than vset and vector load/store instructions when vstart is not 0 |
If any of these exceptions are triggered, the exception signal is asserted.