Level 2 Module PMP&PMA
The PMP module includes the following sub-modules. PMA checks are included within the PMP modules:
- PMP (Distributed PMP & PMA Registers)
- PMP pmp (Frontend)
- PMP pmp (Memblock)
- PMP pmp (L2TLB)
- PMPChecker (PMP & PMA Checker, returns result in the same cycle)
- PMPChecker PMPChecker (Frontend)
- PMPChecker PMPChecker_1 (Frontend)
- PMPChecker PMPChecker_2 (Frontend)
- PMPChecker PMPChecker_3 (Frontend)
- PMPChecker PMPChecker (L2TLB)
- PMPChecker PMPChecker_1 (L2TLB)
- PMPChecker_8 (PMP & PMA Checker, returns result in the next cycle)
- PMPChecker_8 PMPChecker (Memblock)
- PMPChecker_8 PMPChecker_1 (Memblock)
- PMPChecker_8 PMPChecker_2 (Memblock)
- PMPChecker_8 PMPChecker_3 (Memblock)
- PMPChecker_8 PMPChecker_4 (Memblock)
- PMPChecker_8 PMPChecker_5 (Memblock)
Design Specifications
- Supports Physical Address Protection
- Supports Physical Address Attributes
- Supports PMP and PMA checks executing in parallel
- Supports dynamic checking and static checking
- Supports distributed PMP and distributed PMA
- Supports exception handling mechanism
Functionality
Supports Physical Address Protection
The XiangShan processor supports Physical Memory Protection (PMP) checks. PMP defaults to 16 entries and can be modified parametrically. For timing considerations, a distributed replication implementation method is adopted. The PMP registers in the CSR unit are responsible for handling instructions such as CSRRW. The Frontend for instruction fetch, the Memblock for memory access, and the Page Table Walker each have a copy of the PMP registers. Consistency with the PMP registers in the CSR unit is ensured by pulling the CSR write signals.
Please refer to the XiangShan Open Source Processor User Manual and the RISC-V Privileged Architecture Manual for details on the PMP register format, reset values, etc.
Supports Physical Address Attributes
Physical Memory Attributes (PMA) are implemented in a manner similar to PMP, utilizing two reserved bits in the PMP Configuration register. These bits are designated as 'atomic' and 'cacheable', indicating support for atomic operations and cacheability, respectively. Unlike PMP registers which have no initial value, PMA registers default to initial values and must be manually set to match the platform's address attributes. PMA registers utilize the reserved register address space within M-mode CSRs, default to 16 entries, and can be modified parametrically.
Please refer to the XiangShan Open Source Processor User Manual for the default PMA configuration.
PMP and PMA Checks Execute in Parallel
PMP and PMA checks are queried in parallel. If either permission is violated, it is considered an illegal operation. All physical address accesses within the core require physical address permission checks, including after ITLB and DTLB checks, and before memory accesses by the Page Table Walker, Hypervisor Page Table Walker, and Last Level Page Table Walker. The mapping between the distributed PMP, PMA used by ITLB, DTLB, Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker, and the corresponding PMP, PMA checkers is shown in 此表. This means that the Frontend, Memblock, and L2 TLB each have a backup copy of the PMP and PMA registers (see Section 5.2.5), and these backups drive the PMP and PMA checkers for the Frontend, Memblock, and L2 TLB, respectively.
Module | Channel | Distributed PMP&PMA | PMP&PMA Checker |
---|---|---|---|
ITLB | |||
requestor(0) | pmp(Frontend) | PMPChecker | |
requestor(1) | pmp(Frontend) | PMPChecker_1 | |
requestor(2) | pmp(Frontend) | PMPChecker_2 | |
requestor(3) | pmp(Frontend) | PMPChecker_3 | |
DTLB_LD | |||
requestor(0) | pmp(Memblock) | PMPChecker | |
requestor(1) | pmp(Memblock) | PMPChecker_1 | |
requestor(2) | pmp(Memblock) | PMPChecker_2 | |
DTLB_ST | |||
requestor(0) | pmp(Memblock) | PMPChecker_3 | |
requestor(1) | pmp(Memblock) | PMPChecker_4 | |
DTLB_PF | |||
requestor(0) | pmp(Memblock) | PMPChecker_5 | |
L2 TLB | |||
Page Table Walker | pmp(L2 TLB) | PMPChecker | |
Last Level Page Table Walker | pmp(L2 TLB) | PMPChecker_1 | |
Hypervisor Page Table Walker | Pmp(L2TLB) | PMPChecker_2 |
According to the RV manual, Page Fault has higher priority than Access Fault. However, if an Access Fault occurs during PMP or PMA checking by the Page Table Walker or Last Level Page Table Walker, the page table entry is illegal. In this special case where both Page Fault and Access Fault occur, XiangShan chooses to report Access Fault. The manual does not explicitly describe this situation, which may not be consistent with the manual. In all other cases, the priority of Page Fault over Access Fault is maintained.
Dynamic Checking and Static Checking
As specified in the manual, PMP and PMA checks should be dynamic checks, meaning that the physical address translated by the TLB is used for physical address permission checks. The 5 PMPCheckers for Frontend, L2 TLB, and Memblock (see 此表) all perform dynamic checks. For timing considerations, the PMP & PMA check results for DTLB can be queried in advance and stored in the TLB entry during write-back. This is static checking. Specifically, when an L2 TLB page table entry is written back to the DTLB, the written-back page table entry is simultaneously sent to PMP and PMA for permission checking. The resulting attribute bits (including R, W, X, C, Atomic; the specific meaning of these attribute bits can be found in Section 5.4) are also stored in the DTLB, allowing these check results to be directly returned to the MemBlock without needing another check. To implement static checking, the granularity of PMP and PMA needs to be increased to 4KB.
It should be noted that currently, PMP & PMA checking is temporarily not the timing bottleneck for Kunminghu, therefore static checking is not used, and all checks are dynamic. This means that checking is performed after the physical address is obtained from the TLB lookup. The code for Kunminghu V1 does not include static checking, only dynamic checking. Please note this again. However, for compatibility, the granularity of PMP and PMA remains 4KB.
The information obtained from dynamic and static checking is as follows:
- Dynamic Checking: Returns whether an inst access fault, load access fault, or store access fault occurred; whether the checked physical address belongs to the MMIO address space.
- Static Checking: Returns the attribute bits of the checked physical address, including R, W, X, C, Atomic. Please note that Kunminghu V1 will not use static checking by default.
Distributed PMP and PMA
The specific implementation of PMP and PMA includes four parts: the CSR Unit, Frontend, Memblock, and L2 TLB. The CSR Unit is responsible for responding to CSR instructions like CSRRW for reading and writing these PMP and PMA registers. Since the CSR Unit is relatively far from the ITLB, DTLB, and L2 TLB, a backup copy of the PMP and PMA registers needs to be stored in the ITLB, DTLB, and L2 TLB for physical address checking and physical attribute checking. Therefore, we need to implement distributed PMP and PMA, storing backup copies of these registers near the ITLB, DTLB, and L2 TLB.
Backup copies of these PMP and PMA registers are included in the Frontend, Memblock, and L2 TLB, responsible for address checking. Consistency of the register contents is guaranteed by pulling the CSR write signals. Since the L1 TLB has a small area, the backup copies of PMP and PMA registers are stored in the Frontend or Memblock, providing checks for the ITLB and DTLB, respectively. The L2 TLB has a larger area, so the backup copies of PMP and PMA registers are stored directly within the L2 TLB.
PMP and PMA Checking Flow
Physical address checking is required after obtaining physical addresses from ITLB and DTLB lookups, and before memory access by the L2 TLB's Page Table Walker, Last Level Page Table Walker, and Hypervisor Page Table Walker. ITLB, DTLB, and L2 TLB need to provide information to the PMPChecker, including information related to the PMP and PMA configuration registers and address registers; the number of consecutive ones from the least significant bits upwards in the PMP and PMA address registers (since the granularity of PMP and PMA is 4KB, the minimum is 12); the physical address to be queried; and the type of permission required for the query, including execution (ITLB), read/write (L2 TLB, LoadUnits, and StoreUnits), and atomic read/write (AtomicsUnit).
The relevant information that PMP and PMA check requests need to provide is shown in 此表:
PMPChecker Module | Information to be provided | Source |
---|---|---|
Frontend | ||
PMP and PMA configuration registers | Frontend pmp | |
PMP and PMA address registers | Frontend pmp | |
PMP and PMA mask, i.e., the number of consecutive ones from the least significant bits upwards in the address register, minimum is 12 | Frontend pmp | |
Queried paddr | Icache, IFU | |
Queried cmd, ITLB is fixed at 2, indicating required execute permission | Icache, IFU | |
Memblock | ||
PMP and PMA configuration registers | Memblock pmp | |
PMP and PMA address registers | Memblock pmp | |
PMP and PMA mask, i.e., the number of consecutive ones from the least significant bits upwards in the address register, minimum is 12 | Memblock pmp | |
Queried paddr | LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher | |
Queried cmd, DTLB can be 0, 1, 4, 5; representing required read, write, atom_read, atom_write permissions, respectively. | LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher | |
Memblock Static Check | ||
PMP and PMA configuration registers | Memblock pmp | |
PMP and PMA address registers | Memblock pmp | |
PMP and PMA mask, the form of the mask is i low bits are 1, high bits are 0, the number of i is log2(address space matched by PMP entry) | Memblock pmp | |
Queried paddr | L2 TLB returned PTW | |
L2 TLB | ||
PMP and PMA configuration registers | L2 TLB pmp | |
PMP and PMA address registers | L2 TLB pmp | |
PMP and PMA mask, the form of the mask is i low bits are 1, high bits are 0, the number of i is log2(address space matched by PMP entry) | L2 TLB pmp | |
Queried paddr | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker | |
Queried cmd, L2 TLB is fixed at 0, indicating required read permission | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker |
The PMPChecker needs to return whether an inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), or store access fault (StoreUnits, AtomicsUnit) occurred to the ITLB, DTLB, and L2 TLB; and whether the address belongs to the MMIO address space (ITLB, DTLB, L2 TLB). At the same time, static checking needs to fill the address's attribute bits, including cacheable, atomic, x, w, r, into the DTLB.
For requests from ITLB and L2 TLB, the PMP and PMA check results are given in the same cycle; for requests from DTLB, the PMP and PMA check results are given in the next cycle. The interface timing for the ITLB and L2 TLB PMP module is shown in 此图.
PMPChecker Module | Information to be returned | Destination |
---|---|---|
Frontend | ||
Whether inst access fault occurred | Icache, IFU | |
Whether address belongs to MMIO space | Icache, IFU | |
Memblock Dynamic Check | ||
Whether load access fault occurred | LoadUnits | |
Whether store access fault occurred | StoreUnits, AtomicsUnit | |
Whether address belongs to MMIO space | LoadUnits, StoreUnits, AtomicsUnit | |
Memblock Static Check | ||
Whether address is cacheable | DTLB | |
Whether address is atomic | DTLB | |
Whether address is executable | DTLB | |
Whether address is writable | DTLB | |
Whether address is readable | DTLB | |
L2 TLB | ||
Whether load access fault occurred | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker | |
Whether address belongs to MMIO space | Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker |
The interface timing for the DTLB PMP module is shown in 此图. The interface timing for static checking and dynamic checking is the same.
Exception Handling
Possible exceptions generated by PMP and PMA checks include: inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), and store access fault (StoreUnits, AtomicsUnit). For exceptions generated by ITLB and DTLB, they are delivered to the module that sent the physical address query for processing, based on the request source. ITLB will deliver to Icache or IFU; DTLB will deliver to LoadUnits, StoreUnits, or AtomicsUnit for processing.
Since the Page Table Walker, Last Level Page Table Walker, or Hypervisor Page Table Walker need to perform PMP and PMA checks on the physical address being accessed before accessing memory, the L2 TLB may generate an access fault. The L2 TLB does not directly handle the generated access fault but returns this information to the L1 TLB. Upon discovering an access fault during lookup, the L1 TLB will generate an inst access fault, load access fault, or store access fault based on the request's cmd; and deliver it to the respective modules for processing based on the request source.
Possible exceptions and the MMU module processing flow are shown in 此表:
Module | Possible Exception | Processing Flow |
---|---|---|
ITLB | ||
Generate inst access fault | Delivered to Icache or IFU for processing based on request source | |
DTLB | ||
Generate load access fault | Delivered to LoadUnits for processing | |
Generate store access fault | Delivered to StoreUnits or AtomicsUnit based on request source | |
L2 TLB | ||
Generate access fault | Delivered to L1 TLB, L1 TLB delivers for processing based on request source |
Checking Rules
The PMP and PMA checking rules in the XiangShan Kunminghu architecture follow the PMP and PMA sections in the RV manual. Here, only the matching modes are introduced. The physical address range controlled by a PMP or PMA entry is jointly determined by the A bit of the PMP or PMA configuration register and the PMP or PMA address register. To support DTLB static checking (see Section 5.4.2.4), the granularity of PMP and PMA needs to be increased to 4KB, so the minimum physical address range controlled by a PMP or PMA entry is 4KB.
The matching modes corresponding to the configuration register A bit are as follows. A bits 0, 1, 2, and 3 correspond to OFF, TOR, NA4, and NAPOT modes, respectively.
- A = 0, OFF mode: This PMP or PMA entry is disabled and does not match addresses.
- A = 1, TOR mode (Top of range): Matches addresses between the address register of the previous PMP or PMA entry and the address register of this PMP or PMA entry.
- A = 2, NA4 mode (Naturally Aligned Four-byte regions): The XiangShan Kunminghu architecture does not support NA4 mode.
- A = 3, NAPOT mode (Naturally Aligned Power-of-two regions): Find the number of consecutive ones starting from the least significant bit of the PMP or PMA address register. Let the PMP or PMA address register be
ADDR=yyy...111
(with x ones). The matched address range is \(2^{x+3}\) bytes starting fromyyy...000
(at bitADDR >> 2
). Since the XiangShan Kunminghu architecture stipulates that the minimum granularity for PMP or PMA checks is 4KB, the minimum matched address range is 4KB.
To facilitate address matching, the distributed PMP and PMA need to send a mask signal to the PMPChecker. The form of the mask is that the low i bits are 1 and the high bits are 0, where i is the number log2(address space matched by the PMP entry). When a PMP or PMA entry is updated, the mask value is also updated simultaneously. The XiangShan Kunminghu architecture supports a minimum granularity of 4KB for PMP and PMA, so the low 12 bits of the mask signal must be 1.
For example, if the pmpaddr of a certain pmp entry is 16'b1111_0000_0000_0000
, since the XiangShan Kunminghu architecture supports a minimum granularity of 4KB for PMP and PMA, the address range matched by NAPOT mode is \(2^{12}\) B, which is 4 KB. The value of the mask signal is 18'hfff.
Another example, if the pmpaddr of a certain pmp entry is 16'b1011_1111_1111_1111
, then the address range matched by NAPOT mode is \(2^{17}\) B, which is 128KB. The value of the mask signal is 18'h1ffff
.
Overall Block Diagrams
The overall block diagrams for the PMP module and PMA module are shown in 此图 and 此图, respectively. The CSR Unit is responsible for responding to CSR instructions like CSRRW for reading and writing these PMP and PMA registers; backup copies of these PMP and PMA registers are included in the Frontend, Memblock, and L2 TLB, responsible for address checking. Consistency of the register contents is guaranteed by pulling the CSR write signals.
Interface List
Refer to the interface list document.
Interface Timing
For ITLB and L2 TLB, PMP and PMA checks need to return results in the same cycle; for DTLB, PMP and PMA checks will return results in the next cycle. The interface timing for the ITLB and L2 TLB PMP module is shown in 此图.
The interface timing for the DTLB PMP module is shown in 此图. The interface timing for static checking and dynamic checking is the same.