主流水线 MainPipe

功能描述 Functional Description

Used by the pipeline to control the execution of Store, Probe, Refill, and Atomic operations (i.e., all instructions that need to contend for the WritebackQueue to issue requests/write back data to the lower-level cache).

特性 1：MainPipe 各级流水线完成的功能：Feature 1: Functions Completed by Each Stage of the MainPipe Pipeline:

Stage 0: Arbitrates incoming MainPipe requests and selects the one with the highest priority; determines if the resources required by the request are ready based on the request information; issues tag and meta read requests.
Stage 1: Obtains the results of tag and meta read requests; performs tag matching check to determine if it hits; if replacement is needed, obtains the replacement choice result provided by PLRU; performs permission check based on the read meta; pre-determines whether MissQueue access is needed.
Stage 2: Obtains the data read result and combines it with the data to be written; if miss, attempts to write this request information to the MissQueue; checks tag_error and l2_error.
Stage 3: Updates meta, data, and tag based on the operation result; if hit, returns store response to lsu; if the instruction needs to issue access/write back data to the lower-level cache, generates a WritebackQueue access request at this stage and attempts to write to the WritebackQueue; checks data_error; special support for atomic instructions: AMO instructions stall for two cycles at this stage, first completing the AMO instruction's computation operation, then writing the result back to dcache and returning the response; LR/SC instructions set/check their reservation set here.

特性 2：mainpipe争用和阻塞：Feature 2: MainPipe Contention and Stalling:

MainPipeline contention has the following priority: probe_req > refill_req > store_req > atomic_req. A request is only accepted if all its requested resources are ready, there is no set conflict, and no higher priority request exists. Write requests from the committed store buffer have separate checking logic due to timing reasons.

特征 3: set阻塞逻辑：Characteristic 3: Set Stalling Logic:

Ensures that instructions executing in parallel do not access different lines in the same set simultaneously, in order to maintain data consistency and correctness, i.e., to prevent situations where data processed by s3 (or s1, s2) hasn't finished writing, and data entering s0 is read incorrectly. When valid is asserted in each stage, MainPipe's set conflict check compares if the address indices of s0 and s1, s0 and s2 are the same. If they are the same, it triggers a set conflict and stalls s0.

特征 4: meta更新 Characteristic 4: Meta Update

Meta update is performed in s3. All four types of requests in Main Pipe require updating the meta data of the corresponding cacheline within MainPipe. All request types update meta via the meta_write port, but the specific behavior differs.

Probe requests generate the meta_coh to be written in s3 based on the probe_param parameter carried in the request, corresponding to the permission modification desired by this Probe request.

For hit Store and AMO requests, the coh of the corresponding data block is obtained in s1, the new_coh after this access is generated in s2, and the two are compared in s3 to decide whether meta writing is needed this time. If writing is needed, it is updated to the new_coh generated in s2.

For requests that missed on the first attempt and subsequently re-entered MainPipe after refill by MissQueue, the miss_new_coh to be updated for this refill access is generated in s3 based on the Acquire-related parameters carried in the MissQueue refill request, and the meta is written.

特征 5: AMO指令处理 Characteristic 5: AMO Instruction Processing

AMO requests enter MainPipe after priority contention. In the first two pipeline stages, the execution flow is basically the same as other instruction types. In s1, the results of tag and meta read requests are obtained, tag matching check and meta permission check are performed, and amo_hit is determined, deciding whether this AMO request needs to enter MissQueue. If the current AMO request misses the cache, the request information is written to the MissQueue during stage s2; if this AMO request hits, the data read result is obtained in s2, and then it proceeds to s3. After entering s3, AMO instructions stall for two cycles at this stage. In the first cycle, the AMO instruction's computation operation is executed. In the second cycle, the modification of the result is written back to dcache, and a response is returned to the Atomic Instruction Processing Unit.

For LR/SC instructions, their reservation set is set/checked during the s3 stage, and the lrsc_count is updated to maintain execution correctness and prevent execution from being interrupted or deadlocked.

特征 6: MainPipe写回 Characteristic 6: MainPipe Writeback

MainPipe writeback requests are initiated in s3. For instructions that need to issue access/write back data to the lower-level L2 Cache, a writeback request is sent to the WritebackQueue. The actual writing to L2 Cache is completed after being processed by the WritebackQueue. There are three types of requests in MainPipe that require writeback.

For refill requests sent back by MissQueue, if the data block being refilled needs to replace a data block that is currently valid in dcache (not Nothing), then this data block needs to be released to the L2 Cache, and an attempt will be made to write it to wbq in s3.

For Probe requests, a ProbeAck needs to be returned to the lower-level cache, so a request needs to be written to wbq; if the probed data block contains dirty data, it needs to be written back to the lower-level L2 Cache, and ProbeAckData should be returned, which also requires sending a writeback request to wbq.

For miss AMO requests, the scenarios requiring writeback are similar to the refill process. Miss AMO requests re-enter the MainPipe pipeline after refill. At this point, if the refilled data block needs to replace a valid data block, this data block needs to be released to the lower-level cache and a writeback request to wbq will be generated in s3.

特征 7：MainPipe回填数据异常处理 Characteristic 7: MainPipe Refill Data Exception Handling

Currently, all refill requests are initiated early by MissQueue after receiving a hint signal to MainPipe. The refilled data block is obtained via refill_info forwarding when MainPipe processes the request up to s2. Therefore, an abnormal gap between l2_hint and refill data may occur, causing the corresponding MSHR to fail to forward valid refill data after the request enters s2. For such exceptional situations, the following two handling measures are taken.

To ensure refill efficiency and reduce the number of replays, an extra cycle of tolerance is allowed in stage s2 if data has not arrived. When a refill request enters stage s2 and refill_info is found invalid (s2_req_miss_without_data is true), it can stall for an extra cycle, waiting for refill data to arrive in the next cycle before proceeding with subsequent processes.

If valid refill data is still not received after stalling for one cycle, the corresponding MSHR is notified via s2_replay_to_mq to resend the refill_req. The current request exits MainPipe and no longer proceeds with subsequent data operations.

In case of cache aliasing and some other special circumstances, the Cacheline to be replaced by a refill request might currently be in a valid MSHR entry, waiting for a response to an L2 Acquire request. To ensure correctness and comply with the manual's specifications, this replacement operation cannot proceed. For such a refill request, the corresponding MSHR is also notified via s2_replay_to_mq to resend the refill_req, and the current request exits MainPipe and no longer proceeds with subsequent data operations.

整体框图 Overall Block Diagram

The overall architecture of MainPipe is shown in 此图.

接口时序 Interface Timing

请求接口时序实例 Request Interface Timing Example

Interface timing is shown in 此图. req1 is a store request, reading meta and tag in the first cycle. In the second cycle, tag comparison finds the request misses, and the way to be replaced is selected according to the replacement algorithm. In the third cycle, the miss request is sent to the MissQueue. In the fourth cycle, because of the miss, no response is returned to the StoreBuffer. req2 is a probe request, reading meta and tag in the first cycle, reading data in the second cycle, obtaining the probe data block result in the third cycle, updating meta based on the probe command in the fourth cycle, initiating a wb request to WritebackQueue, and returning a probeAck response. req3 is an amo instruction, reading meta and tag in the first cycle, tag comparison hits in the second cycle, issuing a data read request. In the third cycle, the data result is obtained. The fourth and fifth cycles are both in the stage_3 pipeline stage. In the fourth cycle, instruction computation is executed. In the fifth cycle, a data write operation is issued to update the content of the original data block, and a response is returned to the AtomicsUnit. req4 is the refill request corresponding to req1. In the first cycle when MissQueue sends refill_req, meta is read. Since req2 is performing a meta write at this time, and metaArray write has priority over read, req4 stalls for one cycle in stage_0 and can only handshake successfully in the next cycle; in the third cycle, stage_1 reads data and also obtains the replacement selection result provided by PLRU. Since req3 is performing a data write at this time, it stalls for one more cycle in stage_1; in the fifth cycle, stage_2 obtains the data of the data block to be replaced, and also obtains the forwarded refill data from MissQueue; in the sixth cycle, stage_3 initiates a wb request to WritebackQueue, attempting to allow the replaced block to enter the wb queue, while writing the refilled data into the storage unit, and returning a refill completion response to MissQueue.