跳转至

\newpage

写后读违例检查 LoadQueueRAW

Write-After-Read Violation Check LoadQueueRAW

功能描述

Function Description

LoadQueueRAW是用于处理store-load违例的。由于load和store在流水线中都是乱序执行,会经常出现load越过了更老的相同地址的store,即这条load本应该前递store的数据,但是由于store地址或者数据没有准备好,导致这条load没有前递到store的数据就已经提交,后续使用这条load结果的指令也都发生了错误,于是产生store to load forwarding违例。

LoadQueueRAW is used to handle store-load violations. Since loads and stores are executed out-of-order in the pipeline, it often happens that a load bypasses an older store to the same address. That is, this load should have forwarded data from the store, but because the store's address or data was not ready, this load committed without forwarding the store's data. Subsequent instructions using the result of this load also encountered errors, thus causing a store-to-load forwarding violation.

当store address通过STA保留站发射出来进入store流水线时,会去查询LQRAW中在这条store后面的所有已经完成访存的相同地址的load,以及load流水线中正在进行的在该条store之后的相同地址的load,一旦发现有,就发生了store to load forwarding违例,可能有多个load发生了违例,需要找到离store最近的load,也就是最老的违例的load,然后给RedirectGenerator部件发送重定向请求,冲刷最老的违例的load及之后的所有指令。

When a store address is dispatched from the STA reservation station and enters the store pipeline, it queries LQRAW for all completed loads at the same address that are younger than this store, and also queries the load pipeline for in-progress loads at the same address that are younger than this store. Once any are found, a store-to-load forwarding violation occurs. There might be multiple loads that violated. It is necessary to find the load closest to the store (which is the oldest violating load in program order among the younger ones), and then send a redirect request to the RedirectGenerator unit to flush the oldest violating load and all subsequent instructions.

当store流水线执行cbo zero指令时,也需要进行store-load违例检查。

When the store pipeline executes a cbo zero instruction, a store-load violation check is also required.

特性 1:load query入队

Feature 1: Load Query Enqueue

当query到达load流水线的s2时,判断是否满足入队条件,如果在当前load指令之前有地址未准备好的store指令,且当前指令没有被flush时,当前load可以入队。

When a query arrives at the load pipeline's s2 stage, it checks if the enqueue condition is met. If there is an older store instruction before the current load instruction whose address is not ready, and the current instruction has not been flushed, then the current load can be enqueued.

在freelist中得到可以分配的entry以及index。

An allocatable entry and index are obtained from the freelist.

将入队query的物理地址压缩为24-bit保存到PaddrModule中对应的entry。

The physical address of the enqueued query is compressed to 24-bit and saved to the corresponding entry in PaddrModule.

将入队query的mask保存到maskModule中对应的entry。

The mask of the enqueued query is saved to the corresponding entry in maskModule.

特性 2:store-load违例检查

Feature 2: Store-Load Violation Check

store指令到达store流水线的 s1 时会进行store-load检查,此时store需要与LoadQueueRAW中已经完成访存的 load,以及 load 流水线中s1和 s2 阶段正在访存的 load 作比较,这些 load 可能没有 forward 到 store 的数据。如果检查时发现 load 和 store 访问的物理地址有重叠的地方,且load比store年轻,就发生了违例,需要找到最老的 load,重发这条 load 以及之后的所有指令(重新取指执行),在store流水线的s4阶段得到store-load违例检查的结果。

When a store instruction reaches the s1 stage of the store pipeline, a store-load check is performed. At this point, the store needs to be compared with loads that have completed memory access in LoadQueueRAW, as well as loads currently performing memory access in the s1 and s2 stages of the load pipeline. These loads might not have forwarded the store's data. If during the check it is found that the physical addresses accessed by the load and store overlap, and the load is younger than the store, a violation occurs. It is necessary to find the oldest violating load, reissue this load and all subsequent instructions (re-fetch and execute). The result of the store-load violation check is obtained at the s4 stage of the store pipeline.

一共分四拍:

It is divided into four beats (cycles):

  • 第一拍进行物理地址匹配,条件匹配,得到 mask,匹配的是那些在这条 store 之后的新的 load,如果它们已经拿到了数据(datavalid)或者 dcache miss 了,正在等待 refill(miss),就一定没有 forward 到这个 store 的数据。
  • Beat 1: Physical address matching, condition matching, obtain the mask. It matches those younger loads that are subsequent to this store. If they have already obtained data (datavalid) or have encountered a dcache miss and are waiting for refill (miss), they definitely have not forwarded the data from this store.
  • 第二拍store流水线中的store根据mask在LoadQueueRAW里面找到所有匹配的load,LoadQueueRAW一共有32项,将这32项平分为八组,从每组的4项里面各选出一个oldest,最多可能得到4个oldest。
  • Beat 2: The store in the store pipeline uses the mask to find all matching loads in LoadQueueRAW. LoadQueueRAW has a total of 32 entries, which are evenly divided into eight groups. One oldest load is selected from each group of 4 entries, potentially yielding up to 4 oldest candidates.
  • 第三拍从4个oldest里面选出一个最老的oldest。
  • Beat 3: Select the overall oldest from the 4 oldest candidates.
  • 第四拍如果两条store流水线中的store都发生了store-to-load违例,从两条store流水线各自在loadQueue匹配的oldest load中选出一个更老的oldest,产生回滚请求发给redirect。
  • Beat 4: If stores in both store pipelines have caused store-to-load violations, select the older of the respective oldest loads matched in LoadQueue from the two store pipelines, generate a rollback request, and send it to redirect.

整体框图

Overall Block Diagram

LoadQueueRAW整体框图

接口时序

Interface Timing

LoadQueueRAW请求入队时序实例

LoadQueueRAW Request Enqueue Timing Example

LoadQueueRAW请求入队时序

当io_query_ req_valid和io_query req_ready都为高时,表示握手成功,needEnqueue和io_canAllocate 都为高时,将io_doAllocate_ 置为高,表示query需要入队且FreeList可以分配,io_allocateSlot_* 为接收query入队的entry,下一拍对应entry的allocate拉高,sqIdx写入entry。再下一拍后mask写入LqMaskModule对应的entry,压缩后的物理地址写入到LqPAddrModule对应的entry。

When both io_query_*_req_valid and io_query_*_req_ready are high, it indicates a successful handshake. When both needEnqueue and io_canAllocate_* are high, io_doAllocate_* is set high, indicating that the query needs to be enqueued and the FreeList can allocate. io_allocateSlot_* is the entry receiving the enqueued query. In the next beat, the allocate signal for the corresponding entry is asserted, and sqIdx is written to the entry. In the subsequent beat, the mask is written to the corresponding entry in LqMaskModule, and the compressed physical address is written to the corresponding entry in LqPaddrModule.

store-load违例时序实例

Store-Load Violation Timing Example

store-load违例检查时序

当io_rollback_valid为高时,表示发生了store-load违例,违例的信息为io_rollback_bits_*。

When io_rollback_valid is high, it indicates a store-load violation has occurred. The violation information is io_rollback_bits_*.