Wednesday, 18 August 2021

ARM SMMU and IOMMU Analysis




 



As shown in the figure above, the role of smmu is similar to that of mmu.
The role of mmu is to translate the page table for the cpu to convert the virtual address of 
the process into a physical address that the cpu can recognize. 
In the same way, the role of smmu is to translate the address requested by 
DMA for the device into a physical address that the device can actually use, but when smmu bypasses, 
the device can also directly use the physical address for DMA.

SMMU:

The important data structure of smmu for DMA address translation is stored in memory. 
The register of smmu stores the base address of these tables in memory. 
The first is StreamTable (STE). This ste table contains both stage1 translation. 
The table structure also contains the translation structure of stage2. 
The so-called stage1 is responsible for the conversion from VA to PA, and 
stage2 is responsible for the conversion from IPA to PA.

Stream Table Entry: 
we will focus on the structure of this STE and how it is organized in memory.

For smmu, a smmu can serve many devices. 
Therefore, in order to manage each device separately in smmu, smmu gives each device a ste entry. 
How does the device locate the ste entry? For a smmu, we give each device he manages a unique device id,which is also called stream id; for the case of fewer devices, our smmu's ste table obviously only needs to be one-dimensional The array is fine, as shown below.


Note that the linear table used by ste here is not really determined by the number of devices, 
but is written in the ID0 register of smmu, which is configured. This structure is basically not used for smmu.

In the case of a large number of devices, we can use a two-layer ste table structure for smmu to be more precise, as shown in the following figure.





The structure here is actually very similar to the page table of our mmu. In arm smmu v3,
our first-level directory desc has enough directories. 
The size is 8 (STRTAB_SPLIT) bits, which is the high 8 bits of the stream id, and the stream id remains. The lower low bits are all used to address the real ste entry of the second layer.

After introducing the two structures of the ste table of the management device in smmu,
let’s take a look at the specific structure of the ste table and what is the mystery inside it.


The red box is the complete picture of a ste entry in smmu. 
It can be seen from the red box that this ste entry manages the data structure of stage1 and stage2 at the same time,config is the configuration item related to ste, this No need to understand or memorize. 
If you don't know, just check the manual of smuv3. The VMID in it refers to the virtual machine ID. 
Here we focus on S1ContextPtr and S2TTB.

S1ContextPtr:

The directory structure of a Context Descriptor pointed to by this S1ContextPtr. 
This picture only draws one for better understanding. In our arm, 
if there is no virtual machine involved, the translation of either the cpu or smmu address is from va->pa/iova ->pa, we call it stage1, that is, it does not involve virtual, just a stage translation.

Important CD table, after reading this, would you ask a question, 
why do we use CD table in smmu? The reason is this, a smmu can manage many devices, 
so the ste table is used to distinguish the data structure of each device, and each device has a ste table. 
If there are multiple tasks running on each device, and these tasks use different page tables at the same time, how to manage it? right? So smmu uses a CD (Context Descriptor) table to manage each page table.


Tuesday, 17 August 2021

Linux-5.4 arm64 early_fixmap_init analysis

The early_fixmap_init function analysis based on the following configuration, 
there is no pud, so the key code can be extracted as follows

CONFIG_ARM64_VA_BITS=39
CONFIG_ARM64_4K_PAGES=y
CONFIG_PGTABLE_LEVELS=3 

#dfine PAGE_SHIFT 12
#define STRUCT_PAGE_MAX_SHIFT 6
__end_of_permanent_fixed_addresses =0x405 

/*
* Size of the PCI I/O space. This must remain a power of two so that
* IO_SPACE_LIMIT acts as a mask for the low bits of I/O addresses.
*/

#define PCI_IO_SIZE SZ_16M
//(0xffffff4000000000 -0xffffff8000000000 ) >> (12-6) = FFFF FFFF00000000
#define VMEMMAP_SIZE ((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) \
  >> (PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)) 

/*
* PAGE_OFFSET - the virtual address of the start of the linear map, at the
*               start of the TTBR1 address space.
* PAGE_END - the end of the linear map, where all other kernel mappings begin.
* KIMAGE_VADDR - the virtual address of the start of the kernel image.
* VA_BITS - the maximum number of bits for virtual addresses.
*/
#define VA_BITS (CONFIG_ARM64_VA_BITS) //39
#define VA_BITS_MIN (VA_BITS) // if VA_BITS > 48, it is 48 bit
#define _PAGE_OFFSET(va) (-(UL(1) << (va)))
#define PAGE_OFFSET (_PAGE_OFFSET(VA_BITS)) //0xffffff8000000000 
#define KIMAGE_VADDR (MODULES_END) //0xFFFFFFC010000000
#define BPF_JIT_REGION_START (KASAN_SHADOW_END) //(_PAGE_END(VA_BITS_MIN)) 0xffffffc000000000
#define BPF_JIT_REGION_SIZE (SZ_128M) //0x8000000
#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) //(0xffffffc000000000 + 0x8000000)=0xFFFFFFC008000000
#define MODULES_END (MODULES_VADDR + MODULES_VSIZE)//(0xFFFFFFC008000000+0x8000000) =0xFFFFFFC010000000
#define MODULES_VADDR (BPF_JIT_REGION_END) //0xFFFFFFC008000000
#define MODULES_VSIZE (SZ_128M) //0x8000000
#define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M) (-FFFFFFFF00000000- 0x200000) 0xFFFFFFFEFFE00000
#define PCI_IO_END (VMEMMAP_START - SZ_2M) //0xfffffffeffc00000 
#define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) //0xfffffffefec00000
#define FIXADDR_TOP (PCI_IO_START - SZ_2M) //fffffffefea00000
#define KASAN_SHADOW_END     (_PAGE_END(VA_BITS_MIN)) // 0xffffffc000000000
#define _PAGE_END(va)     (-(UL(1) << ((va) - 1))) 

#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) (0x405 << 12) //0x405000
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) //0xFFFFFFFEFE5FB000

/*
 * The p*d_populate functions call virt_to_phys implicitly so they can't be used
 * directly on kernel symbols (bm_p*d). This function is called too early to use
 * lm_alias so __p*d_populate functions must be used to populate with the
 * physical address from __pa_symbol.
 */
void __init early_fixmap_init(void)
{
	pgd_t *pgdp;
	p4d_t *p4dp, p4d;
	pud_t *pudp;
	pmd_t *pmdp;
	unsigned long addr = FIXADDR_START;

	pgdp = pgd_offset_k(addr);
	p4dp = p4d_offset(pgdp, addr);
	p4d = READ_ONCE(*p4dp);
	if (CONFIG_PGTABLE_LEVELS > 3 &&
	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
		/*
		 * We only end up here if the kernel mapping and the fixmap
		 * share the top level pgd entry, which should only happen on
		 * 16k/4 levels configurations.
		 */
		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
		pudp = pud_offset_kimg(p4dp, addr);
	} else {
		if (p4d_none(p4d))
			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
		pudp = fixmap_pud(addr);
	}
	if (pud_none(READ_ONCE(*pudp)))
		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
	pmdp = fixmap_pmd(addr);
	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);

	/*
	 * The boot-ioremap range spans multiple pmds, for which
	 * we are not prepared:
	 */
	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));

	if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
	     || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
		WARN_ON(1);
		pr_warn("pmdp %p != %p, %p\n",
			pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
			fix_to_virt(FIX_BTMAP_BEGIN));
		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
			fix_to_virt(FIX_BTMAP_END));

		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
	}
}
This function is to create a mapping for the virtual address FIXADDR_START, 
but there is no corresponding physical address, which means that the mapping is only part of it, 
and no value is assigned to pte.

Friday, 13 August 2021

How to acess the physical address from linux kernel space?

 

  1. Get a virtual address mapping setup to the registers in question using ioremap
  2. Use readl/writel to manipulate the physical memory.


examples :

void __iomem *regs = ioremap(0xdead0000, 4);

pr_info("0xdead0000: %#x\n", readl(regs));

iounmap(regs);
write examples:
 void __iomem *regs;
 regs = ioremap(0xa90260, 4);
 writel(0x4380f, regs);

Allocation of I/O memory is not the only required step before that memory may be accessed. You must also ensure that this I/O memory has been made accessible to the kernel. So a mapping must be set up first. This is the role of the ioremap function.

void *ioremap(unsigned long phys_addr, unsigned long size);
void *ioremap_nocache(unsigned long phys_addr, unsigned long size);
void iounmap(void * addr);

The function is designed specifically to assign virtual addresses to I/O memory regions.

The proper way of getting at I/O memory is via a set of functions (defined via ) provided for that purpose.

To read from I/O memory, use one of the following:

unsigned int ioread8(void *addr);
unsigned int ioread16(void *addr);
unsigned int ioread32(void *addr);

Here, addr should be an address obtained from ioremap and the return value is what was read from the given I/O memory.


There is a similar set of functions for writing to I/O memory:

void iowrite8(u8 value, void *addr);
void iowrite16(u16 value, void *addr);
void iowrite32(u32 value, void *addr);

As an example:

void __iomem *io = ioremap(PHYSICAL_ADDRESS, SZ_4K);
iowrite32(value, io);


On the other hand, you can do it in user space on this way:

static volatile uint32_t *gpio = NULL;
int   fd;

if ((fd = open ("/dev/mem", O_RDWR | O_SYNC | O_CLOEXEC) ) < 0) return -1; 
gpio = (uint32_t *)mmap(0, BLOCK_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, GPIO_BASE);
if ((int32_t)gpio == -1) return -1; 

*(gpio + n) = value;

Thursday, 12 August 2021

Linux memory management fixmap of memory mapping in ARM64 bit

 Fixmap base address of memory map

After compilation stage, the mmu function has been enbale completed, which means that for the SOC we are currently analyzing,   we can only use virt addr to access dram in the future;
But at this time, the range of addresses we can access is limited, only the idmap and swapper parts can find the physical address, and the other parts are still inaccessible without translation text;
This part is to add a small mechanism before the complete paging is established. At the current stage, the mapping of necessary resources is established.

1. Fixmap

In fixmap, fix means fixed, and map means to establish a mapping. However, the understanding here is not to establish a fixed mapping, but to establish a mapping at a fixed virtual address;
You can use this virtual address to map to any physicals address. After memory management unit(MMU) enable, you can freely access the content we need;

That is to say, the kernel fixes a virtual address at compile time, and this address is used for the use of memory by each module before the early memory management system is completed;
For example, in the early debugging, outputting a log to the console, reading flatten device tree(FDT), and then actually establishing paging init also need to use this, peripherals

2. Fixmap

Linux integrates the fixmap mechanism and supports the following paragraphs:

  1. FDT is used to obtain device tree information
  2. console is used for early debugging needs, printing log and the like
  3. text is used to map the RO segment code, which can be used as a dynamic upgrade
  4. other BTMAP is used to apply for each module, that is, temporary mapping
  5. fix page is used for mapping page table processing and will be used in the paging_init part
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/fixmap.h#L35

enum fixed_addresses {
	FIX_HOLE,

	/*
	 * Reserve a virtual window for the FDT that is 2 MB larger than the
	 * maximum supported size, and put it at the top of the fixmap region.
	 * The additional space ensures that any FDT that does not exceed
	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
	 * 2 MB alignment boundaries.
	 *
	 * Keep this at the top so it remains 2 MB aligned.
	 */
#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
	FIX_FDT_END,
	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,

	FIX_EARLYCON_MEM_BASE,
	FIX_TEXT_POKE0,

#ifdef CONFIG_ACPI_APEI_GHES
	/* Used for GHES mapping from assorted contexts */
	FIX_APEI_GHES_IRQ,
	FIX_APEI_GHES_SEA,
#ifdef CONFIG_ARM_SDE_INTERFACE
	FIX_APEI_GHES_SDEI_NORMAL,
	FIX_APEI_GHES_SDEI_CRITICAL,
#endif
#endif /* CONFIG_ACPI_APEI_GHES */

#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
	FIX_ENTRY_TRAMP_DATA,
	FIX_ENTRY_TRAMP_TEXT,
#define TRAMP_VALIAS		(__fix_to_virt(FIX_ENTRY_TRAMP_TEXT))
#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
	__end_of_permanent_fixed_addresses,

	/*
	 * Temporary boot-time mappings, used by early_ioremap(),
	 * before ioremap() is functional.
	 */
#define NR_FIX_BTMAPS		(SZ_256K / PAGE_SIZE)
#define FIX_BTMAPS_SLOTS	7
#define TOTAL_FIX_BTMAPS	(NR_FIX_BTMAPS * FIX_BTMAPS_SLOTS)

	FIX_BTMAP_END = __end_of_permanent_fixed_addresses,
	FIX_BTMAP_BEGIN = FIX_BTMAP_END + TOTAL_FIX_BTMAPS - 1,

	/*
	 * Used for kernel page table creation, so unmapped memory may be used
	 * for tables.
	 */
	FIX_PTE,
	FIX_PMD,
	FIX_PUD,
	FIX_PGD,

	__end_of_fixed_addresses
};

#define FIXADDR_SIZE	(__end_of_permanent_fixed_addresses << PAGE_SHIFT)
#define FIXADDR_START	(FIXADDR_TOP - FIXADDR_SIZE)

3. Fixmap

This part of initialization (mainly for FDT) will be performed when the architecture-related initialization is performed when the kernel is started, that is, early_fixmap_init;In essence,
the mapping between FIXADDR_START and physical address is established, and
the code is directly uploaded:

void __init early_fixmap_init(void)
{
	pgd_t *pgdp;
	p4d_t *p4dp, p4d;
	pud_t *pudp;
	pmd_t *pmdp;
	unsigned long addr = FIXADDR_START;

	pgdp = pgd_offset_k(addr);
	p4dp = p4d_offset(pgdp, addr);
	p4d = READ_ONCE(*p4dp);
	if (CONFIG_PGTABLE_LEVELS > 3 &&
	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
		/*
		 * We only end up here if the kernel mapping and the fixmap
		 * share the top level pgd entry, which should only happen on
		 * 16k/4 levels configurations.
		 */
		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
		pudp = pud_offset_kimg(p4dp, addr);
	} else {
		if (p4d_none(p4d))
			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
		pudp = fixmap_pud(addr);
	}
	if (pud_none(READ_ONCE(*pudp)))
		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
	pmdp = fixmap_pmd(addr);
	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);

	/*
	 * The boot-ioremap range spans multiple pmds, for which
	 * we are not prepared:
	 */
	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));

	if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
	     || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
		WARN_ON(1);
		pr_warn("pmdp %p != %p, %p\n",
			pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
			fix_to_virt(FIX_BTMAP_BEGIN));
		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
			fix_to_virt(FIX_BTMAP_END));

		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
	}
}
Based on page=4k, level=3, vbit=39, FIXADDR_START = 0xffffffbefe7fb000,
and then split it into page table conversion:

  1. The offset within the page is 0000 0000 0000
  2. L3 index is 1 1111 1011 [0x1FB]
  3. L2 index is 1 1111 0011 [0x1F3]
  4. L1 index is 0 1111 1011 [0xFB]
In other words, what the above function does is: write bm_pmd to the position of
swapper[0xFB], and then write bm_pte to the position of bm_pmd[0x1F3]

3.1 bm_pmd\bm_pte address:

    
static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss; //PTRS_PER_PTE << 9
static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused; //PTRS_PER_PMD << 9 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused; //If there are 3 levels
                                                                    of PTRS_PER_PUD, there is no such level
system.map

ffffffc012eca000 b bm_pud ffffffc012ecb000 b bm_pmd ffffffc012ecc000 b bm_pte
ffffffc012480000 R swapper_pg_dir ffffffc012481000 R swapper_pg_end

array page requested here is the type of u64, so the offset is calculated as index offset * sizeof (pmd_t), that is, base + offset * 8 There are 8 addresses in a page table entry; So you need to confirm 0xffffff80094a1000 + 0xFB * 8

3.2 pgd offset calculation

The swapper address as the base address + the offset taken from FIXADDR_START for calculation

//The definition of init_mm, the initial root node of the memory red-black tree, here only pay attention to pgd as swapper_pg_dir

struct mm_struct init_mm = {
	.mm_rb		= RB_ROOT,
	.pgd		= swapper_pg_dir,
	.mm_users	= ATOMIC_INIT(2),
	.mm_count	= ATOMIC_INIT(1),
	.write_protect_seq = SEQCNT_ZERO(init_mm.write_protect_seq),
	MMAP_LOCK_INITIALIZER(init_mm)
	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
	.arg_lock	=  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
	.user_ns	= &init_user_ns,
	.cpu_bitmap	= CPU_BITS_NONE,
	INIT_MM_CONTEXT(init_mm)
};

#define pgd_offset_k(addr)	pgd_offset(&init_mm, addr)//Get to init_mm
#define pgd_offset(mm, addr)	(pgd_offset_raw((mm)->pgd, (addr)))//Take the pgd part of the structure that is swap
#define pgd_offset_raw(pgd, addr)	((pgd) + pgd_index(addr))//Need to add the offset behind
#define pgd_index(addr)		(((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))//The offset is calculated by FIXADDR_START = 0xffffffbefe7fb000
#define PGDIR_SHIFT		30//This is the definition at level = 3;
#define PTRS_PER_PGD		(1 << (VA_BITS - PGDIR_SHIFT))

After the above calculation, pgd_index is: FB (in fact, it is the process of our calculation above)
swapper_pg_dir is 0xffffff80094a1000, so calculated here should be 0xffffff80094a17d8

3.3 pud offset calculation:


if level is 3, pud and pgd should be the same.
static inline pud_t *fixmap_pud(unsigned long addr)
{
	pgd_t *pgdp = pgd_offset_k(addr); /First get the pgd virtual address
	p4d_t *p4dp = p4d_offset(pgdp, addr);
	p4d_t p4d = READ_ONCE(*p4dp);

	BUG_ON(p4d_none(p4d) || p4d_bad(p4d));

	return pud_offset_kimg(p4dp, addr); //Calculate pud position
}
#define pud_offset_kimg(dir,addr)	((pud_t *)dir