跳转至

BPU Submodule Composer

Function Overview

Composer is a module used to combine multiple predictors. In Nanhu, it combines five predictors: uFTB, FTB, TAGE-SC, ITTAGE, and RAS, and abstracts them externally as a three-stage pipelined covering predictor. Each predictor within Composer can be enabled or disabled by writing to the custom register sbpctl, allowing predictors to be used as needed. Upon detecting a redirect from external sources, Composer sends the redirect request to each predictor to restore speculatively updated elements. After all instructions in a predicted block are committed, the predictors within Composer perform training. Finally, Composer outputs the three-stage prediction results to the Predictor module.

When an internal redirect occurs within the three-stage BPU pipeline due to a prediction error, only speculatively updated states are restored, such as branch history and RAS. Other predictor updates are done after commitment.

If the predictor is not flushed at this point, only the pipeline is flushed. Will the same location not be predicted incorrectly again the next time? While flushing the pipeline, prediction starts from the corrected path. If the subsequent path goes through the same location again, it is possible to predict the same result again. However, it is also possible that due to a different branch history, different entries are indexed in predictors like TAGE.

If the target address is found to be incorrect during execution, a redirect is not initiated immediately but is unified to wait until instruction commitment. One reason for this design is that redirects due to misprediction are on an incorrect path, and their execution results might also be incorrect. Training in this situation could pollute the predictor.

Configuration of Starting PC

The io_reset_vector IO interface of Composer allows for the configuration of the starting PC. Simply pass the desired starting PC to this IO.

Connection with Predictors

Composer connects the five predictors: uFTB, FTB, TAGE-SC, ITTAGE, and RAS. Since there are three branch predictor pipeline stages, and each predictor has a fixed latency, the prediction is completed at that specific pipeline stage. Therefore, Composer only needs to output the prediction result of the corresponding predictor at the respective pipeline stage.

meta is the data used by the predictor during prediction. It is retrieved for updating during the update phase. The term meta is used because Composer integrates all predictors and interacts with the external world using a common interface, meta.

Predictor Enable/Disable

Through Zicsr instructions, we can read and write the custom CSR sbpctl to control the enabling of each predictor within Composer. sbpctl[6:0] represents the enables for the seven predictors: {LOOP, RAS, SC, TAGE, BIM, BTB, uFTB}. Specifically, a high level indicates enabled, and a low level indicates disabled. The value of the sbpctl CSR is passed to each predictor via Composer's IO interface io_ctrl_*, and each predictor is responsible for implementing the enable/disable functionality.

Redirect Restoration

Composer receives redirect requests through IO ports such as io_s2_redirect, io_s3_redirect, and io_redirect_*. These requests are sent to its various predictors to restore speculatively updated elements, such as the top item of the RAS stack.

Predictor Training

Composer sends training signals to its various predictors through the IO port io_update_*. Generally, to prevent pollution of predictor content by incorrect execution paths, each predictor performs training after all instructions in the predicted block are committed. Their training content comes from their own prediction information and the decode and execution results of instructions in the predicted block. This information is read from the FTQ and sent back to the BPU. The predictor's own prediction information is packaged and stored in the FTQ after prediction; the instruction decode results come from the IFU's pre-decode module and are written back to the FTQ after fetching the instruction; and the execution results come from the various execution units.

Overall Block Diagram

Composer Module Overall Block Diagram

Interface Timing

Control Signal Ctrl Interface Timing

Control Signal Ctrl Interface Timing

The figure above shows a timing example for the Composer module's Control signal Ctrl interface. After the io_ctrl signal is passed into the Composer module, it is delayed by one cycle and sent to the internal components submodule.

Redirect Interface Timing

Redirect Interface Timing

The figure above shows the interface for redirect requests to the Composer module. After the BPU receives a redirect request from the backend, it is delayed by one cycle before being sent to the Composer. Therefore, the predictors within Composer will receive the corresponding request one cycle later.

Branch Prediction Block Training Interface Timing

Branch Prediction Block Training Interface Timing

Similar to redirects, for timing optimization, the update interface for branch prediction block training is also delayed by one cycle within the BPU before being sent to the Composer and its internal predictors.

Key Circuits

The following figures show the Composer meta concatenation and the arbitration logic for redirect/branch history update sources, respectively.

Composer meta Concatenation

Redirect/Branch History Update Source Arbitration Logic