BPU Submodule uFTB
Function Overview
uFTB, as the next line predictor for the BPU, provides bubble-free basic prediction for the processor to continuously generate the next speculative PC value.
uFTB Request Reception
Each time the stage 0 request is valid, bits 16 to 1 of the incoming predicted block starting PC are extracted to generate a tag, which is sent to the fully associative uFTB within this module to read the FTB entry. The content of the FTB entry is as previously described. Each bank within the uFTB has 32 entries built using register-based fully associative structures. Because registers are used for implementation, each entry can generate a hit signal and output the read FTB entry data in the current cycle based on whether the stored data is valid and whether the stored tag matches the incoming information. This data is returned to the uFTB level but is not used until the next cycle.
uFTB Data Reading and Return
In the next cycle, the uFTB storage has returned the hit signal and read data. The predictor enters stage 1. In this stage, at most one hit entry will be selected from the returned hit signals, and a prediction result will be generated using this hit entry. The algorithm for generating the complete prediction result is described in detail in the subsequent FTB module. Here, the uFTB has an additional counter mechanism, adding a 2-bit wide counter to each of the up to 2 branch instructions within each uFTB entry. If the counter is greater than 1 or if the always_taken
field within the FTB entry is valid (this latter mechanism also exists in the FTB module), the prediction result is taken. Furthermore, the hit signal from this stage and the selected hit way number are also sent out as this predictor's meta information when other predictors also enter stage s3, and stored in the FTQ along with the final prediction result. This predictor performs no other actions in stages 2 and 3.
uFTB Data Update
When all instructions corresponding to the predicted block are committed, an update channel from the FTQ to the BPU, connected directly to this module, will transmit the updated FTB entry based on instruction commitment information from the FTQ module. Since the fully associative uFTB storage is entirely built with registers, write operations do not affect parallel read operations, and the incoming update information is always used for updating. When the update channel is valid, in the current cycle, the incoming update PC value is used to generate a tag and match it against the existing entries within the uFTB, generating a match signal and the matching way signal. In the next cycle, if an existing match exists, the write signal for the matching way is asserted. Otherwise, a way to be replaced is selected using a pseudo-LRU replacement algorithm, and the corresponding way's write signal is asserted. The data written is the updated FTB entry.
The counter maintenance for each branch instruction is also updated when the update channel is asserted. In the cycle after the update channel is asserted, the counters corresponding to the taken branch instruction and the branch instructions before it within the updated FTB entry are updated. If taken, the counter increments by 1; if not taken, it decrements by 1. If it reaches saturation (0 or all 1s), the current value remains unchanged.
The pseudo-LRU algorithm also requires data updates. It has two data sources: one is the way encoding that hit during prediction, and the other is the way encoding to be written during a uFTB update. If either is valid, its information is used to update the pseudo-LRU state. If both are valid, combination logic is used to update using both pieces of information sequentially within a single cycle.
SRAM Specifications
This module does not use SRAM, but it contains many concatenated register structures. These are listed below.
The module contains 32 ways of data. Each way includes 2 saturation counters of 2 bits wide, recording basic branch direction prediction; a 60-bit FTB entry, with specific meaning identical to the FTB module, detailed in the FTB SRAM specification description; a 16-bit tag and a 1-bit way valid signal.
Overall Block Diagram
TODO: Diagram does not match Kunming Lake, needs updating
Interface Timing
Result Output Interface
The figure above shows a valid uFTB output, with the next predicted block starting address at 0x80002000.
Update Interface
The figure above shows a valid update request, updating the FTB entry at address 0x80003b9a.