Store Address Execution Unit StoreUnit

Function Description

The Store instruction address pipeline is divided into five stages: S0/S1/S2/S3/S4, as shown in \ref{fig:LSU-StoreUnit-Pipeline}. It receives requests from the store address issue queue. After processing, it needs to provide responses to the backend and vector parts. During processing, it needs to provide feedback information to the issue queue, feedback information to the StoreQueue, and finally writeback. If an exception occurs in the middle, it is re-issued from the issue queue.

Feature 1: StoreUnit supports scalar Store instructions

stage 0:
- Calculate VA address
- Address non-alignment check updates to uop.cf.exceptionVec(storeAddrMisaligned)
- Issue DTLB read request to tlb
- Update the instruction's mask information to s0_mask_out and send to StoreQueue
- Determine if it is a store instruction with a data width of 128 bits.
stage 1:
- Update the DTLB query result to storeQueue
- Issue a store-load violation check request to LoadQueue
- If DTLB hit, send store issue information to the backend
stage 2:
- mmio/PMP check and update storeQueue
- Update DTLB result via feedback_slow to the backend
stage 3
- To be sent to the backend synchronously with the RAW violation check, an additional cycle is required
stage 4
- Scalar store initiates Writeback, sent to the backend via stout

Feature 3: StoreUnit supports vector Store instructions

StoreUnit's process for handling non-aligned Store instruction flow is similar to scalar, specifically:

stage 0:
- Accepts vsSplit execution requests, with higher priority than scalar requests, and does not need to calculate the virtual address
stage 1:
- Calculate vecVaddrOffset and vecTriggerMask
stage 2:
- Does not need to send feedback_slow response to the backend
stage 4:
- Vector store initiates Writeback, sent to the backend via vecstout

Feature 2: StoreUnit supports non-aligned Store instructions

StoreUnit's process for handling non-aligned Store instruction flow is similar to scalar, specifically:

stage 0:
- Accepts requests from StoreMisalignBuffer, with higher priority than vector and scalar requests, and does not need to calculate the virtual address
stage 2:
- Does not need to send feedback response to the backend,
- If it is not a request from StoreMisalignBuffer and is a non-aligned request that does not cross a 16-byte boundary, then it needs to enter StoreMisalignBuffer for processing
  - Via the io_misalign_buf interface, send an enqueue request to StoreMisalignBuffer
  - Does not enter stage 3
- If it is a request from StoreMisalignBuffer and does not cross a 16-byte boundary, then it needs to send a retry or writeback response to StoreMisliagnBuffer
  - Via the io_misalign_sout interface, send a response to StoreMisalignBuffer
  - If a TLB miss occurs, a retry is required, otherwise writeback
  - Does not enter stage 3

\newpage

Overall Block Diagram

\newpage

Interface Timing

Interface Timing Example

As shown in \ref{fig:LSU-StoreUnit-Timing}, after a store instruction enters the StoreUnit, it requests the TLB in stage 0, and gets the paddr returned by the TLB in stage 1. The mask is written to the StoreQueue in stage 0, a request is sent to RAW in stage 1, and other information about the store instruction is updated to the LoadStoreQueue via io_lsq. Feedback-related information is obtained in stage 2, and writeback is performed via stout in stage 4.