Load Misalign Buffer
Function Description
The LoadMisalignBuffer stores 1 misaligned Load instruction that crosses a 16-Byte boundary. Its execution logic is a 7-state state machine. When an instruction is detected as misaligned and crossing a 16-Byte boundary in the LoadUnit, it requests entry into the LoadMisalignBuffer. The LoadMisalignBuffer latches this Load instruction and splits it into two separate Load accesses (flows) which are then re-issued into the LoadUnit.
The LoadMisalignBuffer collects the Load accesses it issues. After both Load accesses complete execution, it performs data concatenation and then sends a wakeup operation back to the LoadUnit. This operation does not actually enter the LoadUnit pipeline for execution, but merely triggers a wakeup signal and takes three cycles. After three cycles, the LoadMisalignBuffer receives a write-back request from the LoadUnit again, marked as coming from the wakeup operation. At this point, the LoadMisalignBuffer dequeues and truly writes back to the backend and bypasses.
Scalar misaligned write-back to the backend must occur when LoadUnit 1's scalar write-back is not enabled. If this condition is not met, the LoadMisalignBuffer's write-back to the backend is blocked. Vector misaligned write-back to the VLMergeBuffer must occur when LoadUnit 1's vector scalar write-back is not enabled. If this condition is not met, the LoadMisalignBuffer's write-back to the VLMergeBuffer is blocked.
Feature 1: Supports Splitting Misaligned Loads Crossing 16-Byte Boundaries
Behavior varies based on the already completed flow. The state machine re-enters the s_req state after the first flow writes back, to send the second flow. If the first flow carries an exception upon writing back to the LoadMisalignBuffer, it will directly carry the exception information and write back to the backend, without needing to execute the second flow. Any flow writing back might produce a replay for any reason. The LoadMisalignBuffer chooses to resend this flow to the LoadUnit, regardless of the reason for the replay.
-
lb instructions can never be misaligned.
-
lh is split into two corresponding lb operations:
- lw split varies depending on the address splitting method:
- ld split varies depending on the address splitting method:
Feature 2: Supports Vector Misalignment
Vector misaligned flows are handled the same way as scalar misalignment, with the difference being that vector write-back goes to VLMergeBuffer, while scalar write-back goes directly to the backend.
Feature 3: Does Not Support Misaligned Loads from Non-Memory Space
Misaligned Loads from non-Memory space are not supported. When a Load from non-Memory space is misaligned, it will generate a LoadAddrMisalign exception.
Overall Block Diagram
State Description
State | Description |
---|---|
s_idle | Waiting for a misaligned Load uop to enter |
s_split | Splitting the misaligned Load |
s_req | Issuing the split misaligned Load operations to LoadUnit |
s_resp | LoadUnit write-back |
s_comb_wakeup_rep | Combining results of the two misaligned Loads, issuing wakeup uop |
s_wb | Writing back to backend or VLMergeBuffer |
Main Ports
Port | Direction | Description |
---|---|---|
redirect | In | Redirect port |
req | In | Receives enqueue requests from LoadUnit |
rob | In | Internally unused |
splitLoadReq | Out | Sends split flow access requests to LoadUnit |
splitLoadResp | In | Receives split flow access responses from LoadUnit |
writeBack | Out | Scalar misaligned write-back to backend |
vecWriteBack | Out | Vector misaligned write-back to VLMergeBuffer |
loadOutValid | In | Load Unit has a Load instruction about to write back to backend |
loadVecOutValid | In | Load Unit has a Vector Load instruction about to write back to VLMergeBuffer |
overwriteExpBuf | Out | Unused |
loadMisalignFull | Out | LoadMisalignBuffer full flag |
Interface Timing
Interface timing is relatively simple, only textual descriptions are provided.
Port | Description |
---|---|
redirect | Has Valid. Data is valid synchronously with Valid. |
req | Has Valid, Ready. Data is valid synchronously with Valid && Ready. |
rob | Internally unused. |
splitLoadReq | Has Valid, Ready. Data is valid synchronously with Valid && Ready. |
splitLoadResp | Has Valid. Data is valid synchronously with Valid. |
writeBack | Has Valid, Ready. Data is valid synchronously with Valid && Ready. |
vecWriteBack | Has Valid, Ready. Data is valid synchronously with Valid && Ready. |
loadOutValid | Does not have Valid. Data is always considered valid. Corresponds to signal assertion being immediate response. |
loadVecOutValid | Does not have Valid. Data is always considered valid. Corresponds to signal assertion being immediate response. |
overwriteExpBuf | Unused. |
loadMisalignFull | Does not have Valid. Data is always considered valid. Corresponds to signal assertion being immediate response. |