Memory management
The ARMv8-A architecture employs a weakly ordered model of memory. This means that the order
of memory accesses is not necessarily required to be the same as the program order for load and
store operations.
During the optimization process, the processor and system elements can reorder memory read
operations with respect to each other to improve data throughput. Writes can also be reordered.
This means that the required bandwidth between the processor and external memory can be
reduced and the long latencies that are associated with such external memory accesses are hidden.
To ensure that reordering can take place, there must be memory types that allow such
optimizations to take place in them.
Hardware can reorder reads and writes to Normal memory. Reads and writes can also be
ordered by address dependencies, and half barriers. However, the existence of either data
dependencies or explicit memory barrier instructions can override this. Certain situations require
stronger ordering rules. You can provide information to the core about this through the memory
type attribute of the translation table entry that describes that memory.
High-performance systems can support techniques such as speculative memory reads, multiple
issuing of instructions, or out-of-order execution and these, along with other techniques, offer
further possibilities for hardware reordering of memory access:
Page table entry
- From virtual to physical address translation, as well as attributes for that address
- Some bits are for OS, such as dirty and accessed (PTE_YOUNG / PTE_OLD in Linux)
Memory types = normal
Normal memory is used for all code and for most data regions in memory. Examples of Normal
memory include areas of RAM, Flash, or ROM in physical memory. This kind of memory provides
the highest processor performance as it is weakly ordered and the compiler can perform more
optimizations. The processor can reorder, repeat, and merge accesses to Normal memory.
- Reordering
- Merging
- Speculation
- Unaligned
- either cacheable or non-cacheable are OK
Memory type = device
The Device memory type is used with memory-mapped peripherals and all memory regions where
an access might have a side effect. For example, a read to a timer is not repeatable, as it returns different values for each read. A write to a control register can trigger an interrupt. The Device
memory type imposes more restrictions on the core.
Speculative data accesses cannot be performed to regions of memory that are marked as Device.
Trying to execute code from a region marked as Device is UNPREDICTABLE.
- Side effects
- Cannot do speculative access
- Cannot be executable
- Attributes
- Gathering?
- Re-ordering?
- Early ack?
- Device type: stronger to weaker
- GRE -> nGRE -> nGnRE -> nGnRnE
- Can upgrade to a weaker type
Barrier :
The ARM architecture includes barrier instructions to force access ordering and access completion
at a specific point.
Barriers are used to prevent unsafe optimizations from occurring and to enforce a specific memory
ordering. Use of unnecessary barrier instructions can therefore reduce software performance.
Consider carefully whether a barrier is necessary in a specific situation, and if so, which is the
correct barrier to use.
ISB -- Instruction synchronization barrier
DMB -- Data memory barrier
DSB -- Data synchronization barrier
MMU (memory management unit)
- Software defines the translation, MMU in charge of reading that table and provide the translation service to the core
- TLB (translation look-aside buffer) + PTW (page table walker)
- TLB of most modern ARM cores also caches intermediate steps of translation to speed up the process
- MMU is before the cache, so cache works with physical address and won’t be affected by changes in address translation
Virtual address space
AArch64 uses 48-bit virtual address, and there are 2 of them. One for kernel (not avaible in EL2 and EL3), one for application. So there are 2 sets of translation tables, which are both in memory. TTBR is pointing to the translation table base.
Translation table (page table)
- 3-level of tables
- 3 different sizes of page
- 4KB, 16KB or 64KB
Translation regimes
- EL3 secure monitor table
- EL2 hypervisor table
- EL1/EL0 goes through 2 stages of translation tables for virtualization
Secure physical address spaces
- Secure vs non-secure
- Non-secure program in EL1/EL0 can only access non-secure physical address
- Secure EL1/EL0 programs can access both
References:
https://developer.arm.com/documentation/100941/0100/Memory-typesarmv8_a_memory_systems_100941_0100_en.pdf
https://phdbreak99.github.io/blog/arch/2019-03-18-armv8-architecture/
https://stackoverflow.com/questions/65684882/is-memory-reordering-equivalent-to-instruction-reordering
No comments:
Post a Comment