跳转至

Level 2 Module PMP&PMA

The PMP module includes the following sub-modules. PMA checks are included within the PMP modules:

  1. PMP (Distributed PMP & PMA Registers)
    1. PMP pmp (Frontend)
    2. PMP pmp (Memblock)
    3. PMP pmp (L2TLB)
  2. PMPChecker (PMP & PMA Checker, returns result in the same cycle)
    1. PMPChecker PMPChecker (Frontend)
    2. PMPChecker PMPChecker_1 (Frontend)
    3. PMPChecker PMPChecker_2 (Frontend)
    4. PMPChecker PMPChecker_3 (Frontend)
    5. PMPChecker PMPChecker (L2TLB)
    6. PMPChecker PMPChecker_1 (L2TLB)
  3. PMPChecker_8 (PMP & PMA Checker, returns result in the next cycle)
    1. PMPChecker_8 PMPChecker (Memblock)
    2. PMPChecker_8 PMPChecker_1 (Memblock)
    3. PMPChecker_8 PMPChecker_2 (Memblock)
    4. PMPChecker_8 PMPChecker_3 (Memblock)
    5. PMPChecker_8 PMPChecker_4 (Memblock)
    6. PMPChecker_8 PMPChecker_5 (Memblock)

Design Specifications

  1. Supports Physical Address Protection
  2. Supports Physical Address Attributes
  3. Supports PMP and PMA checks executing in parallel
  4. Supports dynamic checking and static checking
  5. Supports distributed PMP and distributed PMA
  6. Supports exception handling mechanism

Functionality

Supports Physical Address Protection

The XiangShan processor supports Physical Memory Protection (PMP) checks. PMP defaults to 16 entries and can be modified parametrically. For timing considerations, a distributed replication implementation method is adopted. The PMP registers in the CSR unit are responsible for handling instructions such as CSRRW. The Frontend for instruction fetch, the Memblock for memory access, and the Page Table Walker each have a copy of the PMP registers. Consistency with the PMP registers in the CSR unit is ensured by pulling the CSR write signals.

Please refer to the XiangShan Open Source Processor User Manual and the RISC-V Privileged Architecture Manual for details on the PMP register format, reset values, etc.

Supports Physical Address Attributes

Physical Memory Attributes (PMA) are implemented in a manner similar to PMP, utilizing two reserved bits in the PMP Configuration register. These bits are designated as 'atomic' and 'cacheable', indicating support for atomic operations and cacheability, respectively. Unlike PMP registers which have no initial value, PMA registers default to initial values and must be manually set to match the platform's address attributes. PMA registers utilize the reserved register address space within M-mode CSRs, default to 16 entries, and can be modified parametrically.

Please refer to the XiangShan Open Source Processor User Manual for the default PMA configuration.

PMP and PMA Checks Execute in Parallel

PMP and PMA checks are queried in parallel. If either permission is violated, it is considered an illegal operation. All physical address accesses within the core require physical address permission checks, including after ITLB and DTLB checks, and before memory accesses by the Page Table Walker, Hypervisor Page Table Walker, and Last Level Page Table Walker. The mapping between the distributed PMP, PMA used by ITLB, DTLB, Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker, and the corresponding PMP, PMA checkers is shown in 此表. This means that the Frontend, Memblock, and L2 TLB each have a backup copy of the PMP and PMA registers (see Section 5.2.5), and these backups drive the PMP and PMA checkers for the Frontend, Memblock, and L2 TLB, respectively.

Correspondence between PMP and PMA Check Modules
Module Channel Distributed PMP&PMA PMP&PMA Checker
ITLB
requestor(0) pmp(Frontend) PMPChecker
requestor(1) pmp(Frontend) PMPChecker_1
requestor(2) pmp(Frontend) PMPChecker_2
requestor(3) pmp(Frontend) PMPChecker_3
DTLB_LD
requestor(0) pmp(Memblock) PMPChecker
requestor(1) pmp(Memblock) PMPChecker_1
requestor(2) pmp(Memblock) PMPChecker_2
DTLB_ST
requestor(0) pmp(Memblock) PMPChecker_3
requestor(1) pmp(Memblock) PMPChecker_4
DTLB_PF
requestor(0) pmp(Memblock) PMPChecker_5
L2 TLB
Page Table Walker pmp(L2 TLB) PMPChecker
Last Level Page Table Walker pmp(L2 TLB) PMPChecker_1
Hypervisor Page Table Walker Pmp(L2TLB) PMPChecker_2

According to the RV manual, Page Fault has higher priority than Access Fault. However, if an Access Fault occurs during PMP or PMA checking by the Page Table Walker or Last Level Page Table Walker, the page table entry is illegal. In this special case where both Page Fault and Access Fault occur, XiangShan chooses to report Access Fault. The manual does not explicitly describe this situation, which may not be consistent with the manual. In all other cases, the priority of Page Fault over Access Fault is maintained.

Dynamic Checking and Static Checking

As specified in the manual, PMP and PMA checks should be dynamic checks, meaning that the physical address translated by the TLB is used for physical address permission checks. The 5 PMPCheckers for Frontend, L2 TLB, and Memblock (see 此表) all perform dynamic checks. For timing considerations, the PMP & PMA check results for DTLB can be queried in advance and stored in the TLB entry during write-back. This is static checking. Specifically, when an L2 TLB page table entry is written back to the DTLB, the written-back page table entry is simultaneously sent to PMP and PMA for permission checking. The resulting attribute bits (including R, W, X, C, Atomic; the specific meaning of these attribute bits can be found in Section 5.4) are also stored in the DTLB, allowing these check results to be directly returned to the MemBlock without needing another check. To implement static checking, the granularity of PMP and PMA needs to be increased to 4KB.

It should be noted that currently, PMP & PMA checking is temporarily not the timing bottleneck for Kunminghu, therefore static checking is not used, and all checks are dynamic. This means that checking is performed after the physical address is obtained from the TLB lookup. The code for Kunminghu V1 does not include static checking, only dynamic checking. Please note this again. However, for compatibility, the granularity of PMP and PMA remains 4KB.

The information obtained from dynamic and static checking is as follows:

  • Dynamic Checking: Returns whether an inst access fault, load access fault, or store access fault occurred; whether the checked physical address belongs to the MMIO address space.
  • Static Checking: Returns the attribute bits of the checked physical address, including R, W, X, C, Atomic. Please note that Kunminghu V1 will not use static checking by default.

Distributed PMP and PMA

The specific implementation of PMP and PMA includes four parts: the CSR Unit, Frontend, Memblock, and L2 TLB. The CSR Unit is responsible for responding to CSR instructions like CSRRW for reading and writing these PMP and PMA registers. Since the CSR Unit is relatively far from the ITLB, DTLB, and L2 TLB, a backup copy of the PMP and PMA registers needs to be stored in the ITLB, DTLB, and L2 TLB for physical address checking and physical attribute checking. Therefore, we need to implement distributed PMP and PMA, storing backup copies of these registers near the ITLB, DTLB, and L2 TLB.

Backup copies of these PMP and PMA registers are included in the Frontend, Memblock, and L2 TLB, responsible for address checking. Consistency of the register contents is guaranteed by pulling the CSR write signals. Since the L1 TLB has a small area, the backup copies of PMP and PMA registers are stored in the Frontend or Memblock, providing checks for the ITLB and DTLB, respectively. The L2 TLB has a larger area, so the backup copies of PMP and PMA registers are stored directly within the L2 TLB.

PMP and PMA Checking Flow

Physical address checking is required after obtaining physical addresses from ITLB and DTLB lookups, and before memory access by the L2 TLB's Page Table Walker, Last Level Page Table Walker, and Hypervisor Page Table Walker. ITLB, DTLB, and L2 TLB need to provide information to the PMPChecker, including information related to the PMP and PMA configuration registers and address registers; the number of consecutive ones from the least significant bits upwards in the PMP and PMA address registers (since the granularity of PMP and PMA is 4KB, the minimum is 12); the physical address to be queried; and the type of permission required for the query, including execution (ITLB), read/write (L2 TLB, LoadUnits, and StoreUnits), and atomic read/write (AtomicsUnit).

The relevant information that PMP and PMA check requests need to provide is shown in 此表:

Relevant Information Required for PMP and PMA Check Requests
PMPChecker Module Information to be provided Source
Frontend
PMP and PMA configuration registers Frontend pmp
PMP and PMA address registers Frontend pmp
PMP and PMA mask, i.e., the number of consecutive ones from the least significant bits upwards in the address register, minimum is 12 Frontend pmp
Queried paddr Icache, IFU
Queried cmd, ITLB is fixed at 2, indicating required execute permission Icache, IFU
Memblock
PMP and PMA configuration registers Memblock pmp
PMP and PMA address registers Memblock pmp
PMP and PMA mask, i.e., the number of consecutive ones from the least significant bits upwards in the address register, minimum is 12 Memblock pmp
Queried paddr LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher
Queried cmd, DTLB can be 0, 1, 4, 5; representing required read, write, atom_read, atom_write permissions, respectively. LoadUnits, L1 Load Stream & Stride Prefetch StoreUnits, AtomicsUnit, SMSprefetcher
Memblock Static Check
PMP and PMA configuration registers Memblock pmp
PMP and PMA address registers Memblock pmp
PMP and PMA mask, the form of the mask is i low bits are 1, high bits are 0, the number of i is log2(address space matched by PMP entry) Memblock pmp
Queried paddr L2 TLB returned PTW
L2 TLB
PMP and PMA configuration registers L2 TLB pmp
PMP and PMA address registers L2 TLB pmp
PMP and PMA mask, the form of the mask is i low bits are 1, high bits are 0, the number of i is log2(address space matched by PMP entry) L2 TLB pmp
Queried paddr Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker
Queried cmd, L2 TLB is fixed at 0, indicating required read permission Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker

The PMPChecker needs to return whether an inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), or store access fault (StoreUnits, AtomicsUnit) occurred to the ITLB, DTLB, and L2 TLB; and whether the address belongs to the MMIO address space (ITLB, DTLB, L2 TLB). At the same time, static checking needs to fill the address's attribute bits, including cacheable, atomic, x, w, r, into the DTLB.

For requests from ITLB and L2 TLB, the PMP and PMA check results are given in the same cycle; for requests from DTLB, the PMP and PMA check results are given in the next cycle. The interface timing for the ITLB and L2 TLB PMP module is shown in 此图.

Relevant Information Returned by PMP and PMA Checks
PMPChecker Module Information to be returned Destination
Frontend
Whether inst access fault occurred Icache, IFU
Whether address belongs to MMIO space Icache, IFU
Memblock Dynamic Check
Whether load access fault occurred LoadUnits
Whether store access fault occurred StoreUnits, AtomicsUnit
Whether address belongs to MMIO space LoadUnits, StoreUnits, AtomicsUnit
Memblock Static Check
Whether address is cacheable DTLB
Whether address is atomic DTLB
Whether address is executable DTLB
Whether address is writable DTLB
Whether address is readable DTLB
L2 TLB
Whether load access fault occurred Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker
Whether address belongs to MMIO space Page Table Walker, Last Level Page Table Walker, Hypervisor Page Table Walker

The interface timing for the DTLB PMP module is shown in 此图. The interface timing for static checking and dynamic checking is the same.

Exception Handling

Possible exceptions generated by PMP and PMA checks include: inst access fault (ITLB), load access fault (LoadUnits, L2 TLB), and store access fault (StoreUnits, AtomicsUnit). For exceptions generated by ITLB and DTLB, they are delivered to the module that sent the physical address query for processing, based on the request source. ITLB will deliver to Icache or IFU; DTLB will deliver to LoadUnits, StoreUnits, or AtomicsUnit for processing.

Since the Page Table Walker, Last Level Page Table Walker, or Hypervisor Page Table Walker need to perform PMP and PMA checks on the physical address being accessed before accessing memory, the L2 TLB may generate an access fault. The L2 TLB does not directly handle the generated access fault but returns this information to the L1 TLB. Upon discovering an access fault during lookup, the L1 TLB will generate an inst access fault, load access fault, or store access fault based on the request's cmd; and deliver it to the respective modules for processing based on the request source.

Possible exceptions and the MMU module processing flow are shown in 此表:

Possible Exceptions and Processing Flow for PMP and PMA Checks
Module Possible Exception Processing Flow
ITLB
Generate inst access fault Delivered to Icache or IFU for processing based on request source
DTLB
Generate load access fault Delivered to LoadUnits for processing
Generate store access fault Delivered to StoreUnits or AtomicsUnit based on request source
L2 TLB
Generate access fault Delivered to L1 TLB, L1 TLB delivers for processing based on request source

Checking Rules

The PMP and PMA checking rules in the XiangShan Kunminghu architecture follow the PMP and PMA sections in the RV manual. Here, only the matching modes are introduced. The physical address range controlled by a PMP or PMA entry is jointly determined by the A bit of the PMP or PMA configuration register and the PMP or PMA address register. To support DTLB static checking (see Section 5.4.2.4), the granularity of PMP and PMA needs to be increased to 4KB, so the minimum physical address range controlled by a PMP or PMA entry is 4KB.

The matching modes corresponding to the configuration register A bit are as follows. A bits 0, 1, 2, and 3 correspond to OFF, TOR, NA4, and NAPOT modes, respectively.

  • A = 0, OFF mode: This PMP or PMA entry is disabled and does not match addresses.
  • A = 1, TOR mode (Top of range): Matches addresses between the address register of the previous PMP or PMA entry and the address register of this PMP or PMA entry.
  • A = 2, NA4 mode (Naturally Aligned Four-byte regions): The XiangShan Kunminghu architecture does not support NA4 mode.
  • A = 3, NAPOT mode (Naturally Aligned Power-of-two regions): Find the number of consecutive ones starting from the least significant bit of the PMP or PMA address register. Let the PMP or PMA address register be ADDR=yyy...111 (with x ones). The matched address range is \(2^{x+3}\) bytes starting from yyy...000 (at bit ADDR >> 2). Since the XiangShan Kunminghu architecture stipulates that the minimum granularity for PMP or PMA checks is 4KB, the minimum matched address range is 4KB.

To facilitate address matching, the distributed PMP and PMA need to send a mask signal to the PMPChecker. The form of the mask is that the low i bits are 1 and the high bits are 0, where i is the number log2(address space matched by the PMP entry). When a PMP or PMA entry is updated, the mask value is also updated simultaneously. The XiangShan Kunminghu architecture supports a minimum granularity of 4KB for PMP and PMA, so the low 12 bits of the mask signal must be 1.

For example, if the pmpaddr of a certain pmp entry is 16'b1111_0000_0000_0000, since the XiangShan Kunminghu architecture supports a minimum granularity of 4KB for PMP and PMA, the address range matched by NAPOT mode is \(2^{12}\) B, which is 4 KB. The value of the mask signal is 18'hfff.

Another example, if the pmpaddr of a certain pmp entry is 16'b1011_1111_1111_1111, then the address range matched by NAPOT mode is \(2^{17}\) B, which is 128KB. The value of the mask signal is 18'h1ffff.

Overall Block Diagrams

The overall block diagrams for the PMP module and PMA module are shown in 此图 and 此图, respectively. The CSR Unit is responsible for responding to CSR instructions like CSRRW for reading and writing these PMP and PMA registers; backup copies of these PMP and PMA registers are included in the Frontend, Memblock, and L2 TLB, responsible for address checking. Consistency of the register contents is guaranteed by pulling the CSR write signals.

Overall Block Diagram of the PMP Module

Overall Block Diagram of the PMA Module

Interface List

Refer to the interface list document.

Interface Timing

For ITLB and L2 TLB, PMP and PMA checks need to return results in the same cycle; for DTLB, PMP and PMA checks will return results in the next cycle. The interface timing for the ITLB and L2 TLB PMP module is shown in 此图.

Interface Timing Diagram of the ITLB and L2 TLB PMP Module

The interface timing for the DTLB PMP module is shown in 此图. The interface timing for static checking and dynamic checking is the same.

Interface Timing Diagram of the DTLB PMP Module