Level 3 Module: Page Cache
Page Cache refers to the following module: * PtwCache cache
Design Specifications
- Supports caching three levels of page tables separately
- Supports receiving PTW requests from L1 TLB
- Supports receiving PTW requests from Miss Queue
- Supports returning hit results to L1 TLB, returning PTW replies
- Supports returning miss results to L2 TLB, forwarding PTW requests
- Supports Page Cache refill
- Supports ECC validation
- Supports sfence flush
- Supports exception handling mechanism
- Supports TLB compression
- Supports three types for each level of page tables
- Supports receiving stage 2 translation requests (hptw requests)
- Supports hfence flush
Functionality
Separate caching of three levels of page tables
Page Cache is an "enlarged" version of L1 TLB, and is also a veritable L2 TLB. In Page Cache, three levels of page tables are cached separately, allowing the query of three levels of information to be completed in one cycle (in the H extension, each level of page table is further divided into VS stage page tables, G stage page tables, and host page tables, which will be introduced in subsequent chapters). Page Cache determines whether it hits based on the requested address and obtains the result closest to the leaf node. Since the memory access width is 512bits, which is 8 page table entries, one Page Cache item contains 8 page table entries. (1 virtual page number, corresponding to 8 physical page numbers, 8 permission bits)
In Page Cache, page tables are cached separately according to their level, divided into l1, l2, l3, and sp items. l1, l2, and l3 items only store valid page table entries, storing level 1, level 2, and level 3 page tables respectively. l1 contains 16 items, with a fully associative structure; l2 contains 64 items with 2-way set associative structure, and l3 contains 512 items with 4-way set associative structure. sp is a 16-item fully associative structure, storing huge pages (level 1 or level 2 page tables that are leaf nodes), as well as invalid (V bit in the page table is 0, or W bit is 1 and R bit is 0, or the page table is unaligned) level 1 or level 2 page tables. When storing, l1 and l2 items do not need to store permission bits, while l3 and sp items need to store permission bits.
The Page Cache item configuration is shown in 此表.
Item | Number of Items | Organization Structure | Implementation Method | Replacement Algorithm | Stored Content |
---|---|---|---|---|---|
l1 | 16 | Fully Associative | Register File | PLRU | Valid level 1 (1GB size) page tables, no need to store permission bits |
l2 | 64 | 2-way Set Associative | SRAM | PLRU | Valid level 2 (2MB size) page tables, no need to store permission bits |
l3 | 512 | 4-way Set Associative | SRAM | PLRU | Valid level 3 (4KB size) page tables, need to store permission bits |
sp | 16 | Fully Associative | Register File | PLRU | Huge pages (level 1 or level 2 page tables that are leaf nodes), invalid level 1 or level 2 page tables, need to store permission bits |
The information required to be stored in a Page Cache item includes: tag, asid, ppn, perm (optional), level (optional), prefetch. The H extension adds vmid and h (used to distinguish between the three types of page tables). Since l1 and sp items use a fully associative structure, the tag bit widths are vpnnlen (9) and 2 * vpnnlen (18) respectively. Since the address in stage 2 translation has two more bits than stage 1, the tag needs to add two more bits. l2 and l3 use a set associative structure and need to consider the number of sets, and each virtual page number can index 8 page table entries. l2 is 2-way set associative, so the tag bit width is 2 * vpnnlen (18) - log2(64) - log2(8) + log2(2) = 10 bits; l3 is 4-way set associative, so the tag bit width is 3 * vpnnlen (27) - log2(512) - log2(8) + log2(4) = 17 bits. For l3 and sp items, since they store leaf nodes, they need to store the perm item, while l1 and l2 items do not. The perm item stores the D, A, G, U, X, W, R bits as specified in the riscv manual, and does not need to store the V bit. For the sp item, the level needs to be stored to indicate the page table level (level 1 or level 2). prefetch indicates that this page table entry was obtained from a prefetch request. vmid is only used in VS stage page tables and G stage page tables. asid is not used in G stage page tables. h is a two-bit register that distinguishes between these three types of page tables, with encoding consistent with s2xlate. The information required to be stored in a Page Cache item is shown in 此表, and the page table attribute bits are shown in 此表:
Item | tag | asid | vmid | ppn | perm | level | prefetch | h |
---|---|---|---|---|---|---|---|---|
l1 | Yes, 9 bits + 2 bits | Yes | Yes | Yes | NO | NO | Yes | Yes |
l2 | Yes, 10 bits + 2 bits | Yes | Yes | Yes | NO | NO | Yes | Yes |
l3 | Yes, 17 bits + 2 bits | Yes | Yes | Yes | Yes | NO | Yes | Yes |
sp | Yes, 18 bits + 2 bits | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Bit | Field | Description |
---|---|---|
7 | D | Dirty, indicates the virtual page has been written since the last clear of the D bit |
6 | A | Accessed, indicates the virtual page has been read, written, or instruction fetched since the last clear of the A bit |
5 | G | Indicates whether the page is a global mapping. A value of 1 means the page is a global mapping, meaning it exists in all address spaces |
4 | U | Indicates whether the page can be accessed by User Mode. A value of 0 means it cannot be accessed by User Mode; a value of 1 means it can be accessed |
3 | X | Indicates whether the page is executable. A value of 0 means not executable; a value of 1 means the page is executable |
2 | W | Indicates whether the page is writable. A value of 0 means not writable; a value of 1 means the page is writable |
1 | R | Indicates whether the page is readable. A value of 0 means not readable; a value of 1 means the page is readable |
0 | V | Indicates whether the page table entry is valid. If this bit is 0, it means the page table entry is invalid, and other bits of the page table entry can be freely used by software |
h | Description |
---|---|
00 | noS2xlate, Host Page Table |
01 | onlyStage1, VS Stage Page Table |
10 | onlyStage2, G Stage Page Table |
The manual allows updating A/D bits through both software and hardware methods. Xiangshan chooses the software method, which means that when the following two situations are detected, a page fault is reported, and the page table is updated through software.
- Access a page, but the A bit in its page table is 0
- Write to a page, but the D bit in its page table is 0
The possible combinations and meanings of the X, W, and R bits in a page table entry are shown in 此表:
X | W | R | Description |
---|---|---|---|
0 | 0 | 0 | Indicates that this page table entry is not a leaf node, and the next level of the page table needs to be indexed through this page table |
0 | 0 | 1 | Indicates that this page is read-only |
0 | 1 | 0 | Reserved |
0 | 1 | 1 | Indicates that this page is readable and writable |
1 | 0 | 0 | Indicates that this page is executable only |
1 | 0 | 1 | Indicates that this page is readable and executable |
1 | 1 | 0 | Reserved |
1 | 1 | 1 | Indicates that this page is readable, writable, and executable |
Receive PTW requests and return results
Page Cache receives PTW requests from L2 TLB. PTW requests from L2 TLB are arbitrated by an arbitrator and finally sent to Page Cache. These PTW requests may originate from Miss Queue, L1 TLB, hptw_req_arb, or Prefetcher. Since Page Cache can only process one request per cycle, for allStage requests, it first queries the first stage. For allStage requests, when querying the 'h' field of each item, it only queries page tables of type onlyStage1. Stage 2 translation is handled by PTW or LLPTW after the request is sent to PTW or LLPTW. The Page Cache query process is as follows:
- Cycle 0: Issue read requests to l1, l2, l3, and sp items, performing simultaneous queries.
- Cycle 1: Obtain the results read from the register file (l1, sp items) and SRAM (for l2, l3 items). However, due to timing reasons, these results are not used directly in the same cycle but wait for the next cycle for subsequent operations.
- Cycle 2: Compare the stored tag in each Page Cache item with the tag of the incoming request, compare the 'h' register of each item with the incoming s2xlate (allStage is converted to query onlyStage1). Simultaneously perform match queries in l1, l2, l3, and sp items, and also perform ECC checks.
- Cycle 3: Summarize the results obtained from matching in l1, l2, l3, and sp items, as well as the results of ECC checks.
After the above Page Cache query process, if a leaf node is found in the Page Cache, it is returned to the L1 TLB (for allStage requests, if the first stage hits, it is sent to PTW for processing); otherwise, depending on the situation, the request is forwarded to LLPTW, PTW, HPTW, or Miss Queue.
Send PTW requests to L2 TLB
Page Cache will forward requests to LLPTW, PTW, HPTW, or Miss Queue depending on the situation.
- For noS2xlate, onlyStage1, and allStage, if Page Cache misses a leaf node, but hits a level 2 page table (for onlyStage1 and allStage, it's a hit on the first stage's level 2 page table), and this PTW request is not a bypass request, Page Cache forwards the request to llptw.
- For noS2xlate, onlyStage1, and allStage, if Page Cache misses a leaf node and also misses a level 2 page table (for onlyStage1 and allStage, it's a miss on the first stage's level 2 page table), the request needs to be forwarded to Miss queue or PTW. If this request is not a bypass request, and it originates directly from Miss queue, and PTW is idle, the PTW request is forwarded to PTW. If it is an allStage request, and the first stage translation hits a leaf node, it will also be sent to PTW for the final stage 2 translation. If it is an onlyStage2 request and misses a stage 2 leaf node, it will also be sent to PTW for subsequent translation.
- If the request is a stage 2 translation request (hptwReq) sent from PTW or LLPTW, if this request hits, it will be sent to hptw_resp_arb. If it misses, it will be sent to HPTW for processing. If HPTW is busy at this time, Page Cache will be blocked.
- If Page Cache misses a leaf node, and this request is neither from a prefetch request nor an hptwReq request. At this time, one of the following three conditions must be met to enter the miss queue.
- The request is a bypass request.
- The request misses a level 2 page table or the first stage translation hits, and the request originates from L1 TLB or PTW is unable to receive the request from Page Cache.
- The request hits a level 2 page table, but LLPTW is unable to receive the request.
It is necessary to specifically explain that points 1, 2, 3, and 4 are parallel processes. For every request forwarded by Page Cache, it will definitely meet and uniquely meet one of the conditions 1, 2, 3, or 4. However, 1, 2, 3, and 4 are judged separately and there is no sequential relationship between them. To describe the request forwarding situation more clearly, a serialized flowchart is used for further explanation, but in fact, the hardware description must be a parallel process, and there is no serial relationship. The serialized flowchart is shown in 此图.
Refill Cache
When a PTW request sent to mem by PTW or LLPTW receives a reply, it will also send a refill request to Page Cache. The information passed into Page Cache includes: page table entry, page table level, virtual page number, page table type, etc. After this information is passed into the Cache, it will be filled into l1, l2, l3, or sp items based on the level of the page table to be refilled and the page table's attribute bits. If the page table is valid, it is filled into l1, l2, l3, or sp items based on the different levels of the page table; if the page table is invalid, and it is a level 1 or level 2 page table, it is filled into the sp item. For replaced Page Cache items, the replacement policy can be selected via ReplacementPolicy. Currently, Xiangshan's Page Cache uses the PLRU replacement policy.
Support bypass access
When a Page Cache request misses, but data is simultaneously being written to the Cache at the requested address, the Page Cache request will be bypassed. If this occurs, the data being written to the Cache will not be directly given to the Page Cache access request. Page Cache will send a miss signal to L2 TLB, and simultaneously send a bypass signal to L2 TLB, indicating that this request is a bypass request and needs to access Page Cache again to get the result. Bypassed PTW requests do not enter PTW, but directly enter MissQueue, waiting for the next access to Page Cache to get the result. However, it should be noted that for hptw req (from PTW and LLPTW) stage 2 translation requests, bypass may also occur, but hptw req does not enter miss queue. Therefore, to avoid repeated filling into Page Cache, Page Cache sends a bypassed signal to HPTW. When this signal is valid, the results of the memory access performed after this request enters HPTW will not be refilled into Page Cache. HptwReq requests also support l1Hit and l2Hit functionality.
Support ECC validation
Page Cache supports ECC validation. When accessing l2 or l3 items, ECC checks are performed simultaneously. If an ECC check reports an error, it does not report an exception, but rather sends a miss signal for that request to L2 TLB. At the same time, Page Cache flushes the ECC-errored item and resends the PTW request. Other behaviors are the same as when Page Cache misses. SECDED strategy is used for ECC checking.
Support sfence flush
When the sfence signal is valid, Page Cache will flush Cache items based on the sfence rs1 and rs2 signals and the current virtualization mode. The flush of Page Cache is performed by setting the v bit of the corresponding Cache line to 0. Since l2 and l3 items are stored in SRAM, asid comparison cannot be done in the same cycle. Therefore, for l2 and l3 flushes, asid is ignored (vmid is handled the same way as asid). For information related to the sfence signal, refer to the riscv manual. In a virtualized environment, sfence flushes VS stage (stage 1 translation) page tables (vmid needs to be considered in this case); in a non-virtualized environment, sfence flushes G stage (stage 2 translation) page tables (vmid is not considered in this case).
Support exception handling mechanism
An ECC validation error may occur in Page Cache. In this case, Page Cache will invalidate the current item, return a miss result, and restart the Page Walk. See Section 6 of this document: Exception Handling Mechanism.
Support TLB compression
To support TLB compression, when Page Cache hits a 4KB page, it needs to return 8 consecutive page table entries. In fact, due to the 512bits memory access width, each Page Cache item already contains 8 page table entries, so these 8 page table entries can be returned directly. Unlike L1TLB, in L2TLB, the H extension still uses TLB compression.
Support three types of page tables at each level
In the H extension, there are three types of page tables, which are managed by vsatp, hgatp, and satp respectively. A new 'h' register is added in Page Cache to distinguish these three types. onlyStage1 represents page tables related to vsatp, onlyStage2 represents page tables related to hgatp (asid is invalid in this case), and noS2xlate represents page tables related to satp (vmid is invalid in this case).
Support receiving stage 2 translation requests (hptw requests)
PTW and LLPTW in L2TLB will send stage 2 translation requests (indicated by the isHptwReq signal). This type of request will first be queried in Page Cache. The query process is the same as for onlyStage2 requests, only querying page tables of type onlyStage2. However, depending on whether this type of request hits, it is sent to hptw_resp_arb or HPTW. The return signal for Page Cache's hptwReq includes an 'id' signal, which is used to determine whether to return to PTW or LLPTW. The return signal also includes a bypassed signal, indicating that a bypass occurred for this request. If this request enters HPTW for translation, the page tables obtained from memory access by HPTW will not be filled into Page Cache. HptwReq requests also support l1Hit and l2Hit functionality.
Support hfence flush
The hfence instruction can only be executed in non-virtualized mode. There are two such instructions, responsible for flushing VS stage page tables (stage 1 translation, h field is onlyStage1) and G stage page tables (stage 2 translation, h field is onlyStage2) respectively. Based on hfence's rs1 and rs2, as well as the added vmid and h fields, the content to be flushed is determined. Similarly, because asid and vmid are stored in SRAM in l3 and l2, flushing l3 and l2 does not consider vmid and asid. Furthermore, for the implementation of flushing l3, a simple approach is adopted, directly flushing VS or G stage page tables (if necessary in the future, the flush of the set containing the address can be further refined).
Overall Block Diagram
Page Cache is essentially a Cache, and its internal implementation has been introduced in detail above. A block diagram of Page Cache's internal structure has limited reference value. For the connection relationships between Page Cache and other modules in L2 TLB, please refer to Section 5.3.3.
Interface List
Page Cache signals can mainly be summarized into the following 3 categories:
- req: arb2 sends PTW requests to Page Cache.
- resp: Page Cache returns PTW replies to L2 TLB. Page Cache may send requests to PTW, LLPTW, Miss Queue, HPTW; send replies to mergeArb and hptw_resp_arb.
- refill: Page Cache receives refill data returned by mem.
For details, please refer to the interface list document.
Interface Timing
Page Cache interacts with other modules in L2 TLB using a valid-ready handshake. The involved signals are quite numerous and fragmented, and there are no particularly noteworthy timing relationships, thus they will not be elaborated further.