Thursday, 21 April 2022

Linux Secondary Core Bootup on AArch64

 The boot flow diagram 

  



CPU description in Device tree:

In ARM v8 64-bit systems this property is required and matches the MPIDR_EL1 register affinity bits.

* If cpus node's #address-cells property is set to 2
The first reg cell bits [7:0] must be set to bits [39:32] of MPIDR_EL1.
The second reg cell bits [23:0] must be set to bits [23:0] of MPIDR_EL1.

* If cpus node's #address-cells property is set to 1
The reg cell bits [23:0] must be set to bits [23:0] of MPIDR_EL1.

All other bits in the reg cells must be set to 0.
compatible:
    enum:
      - arm,cortex-a78
enable-method:
# On ARM v8 64-bit this property is required
      - enum:
          - psci
          - spin-table
cpu-release-addr:
    description:
     The DT specification defines this as 64-bit always, but some 32-bit Arm
     systems have used a 32-bit value which must be supported.
     Required for systems that have an "enable-method"
     property value of "spin-table".
cpu-idle-states:
   items:
      maxItems: 1
    description: |
      List of phandles to idle state nodes supported by this cpu.

capacity-dmips-mhz:
    description:
      u32 value representing CPU capacity in DMIPS/MHz, relative to highest capacity-dmips-mhz in the system.

cci-control-port: true

dynamic-power-coefficient:
description:
      A u32 value that represents the running time dynamic power coefficient in units of uW/MHz/V^2. The coefficient can either be calculated from power measurements or derived by analysis.

      The dynamic power consumption of the CPU  is proportional to the square of the Voltage (V) and the clock frequency (f). The coefficient is used to calculate the dynamic power as below -

      Pdyn = dynamic-power-coefficient * V^2 * f

      where voltage is in V, frequency is in MHz.

performance-domains:
    maxItems: 1
    description:
      List of phandles and performance domain specifiers, as defined by
      bindings of the performance domain provider. See also
      dvfs/performance-domain.yaml.

power-domains:
    description:
      List of phandles and PM domain specifiers, as defined by bindings of the
      PM domain provider (see also ../power_domain.txt).

power-domain-names:
    description:
      A list of power domain name strings sorted in the same order as the
      power-domains property.

      For PSCI based platforms, the name corresponding to the index of the PSCI
      PM domain provider, must be "psci".

qcom,saw:
    
    description: 
      Specifies the SAW* node associated with this CPU.

      Required for systems that have an "enable-method" property
      value of "qcom,kpss-acc-v1" or "qcom,kpss-acc-v2"

      * arm/msm/qcom,saw2.txt

qcom,acc:
    
    description: 
      Specifies the ACC* node associated with this CPU.

      Required for systems that have an "enable-method" property
      value of "qcom,kpss-acc-v1", "qcom,kpss-acc-v2", "qcom,msm8226-smp" or
      "qcom,msm8916-smp".
secondary-boot-reg:
   
    description: 
      Required for systems that have an "enable-method" property value of
      "brcm,bcm11351-cpu-method", "brcm,bcm23550" or "brcm,bcm-nsp-smp". 
      This binding defines the enable method used for starting secondary CPUs in the following Broadcom SoCs
      This includes the following SoCs: |
      BCM11130, BCM11140, BCM11351, BCM28145, BCM28155, BCM21664, BCM23550
      BCM58522, BCM58525, BCM58535, BCM58622, BCM58623, BCM58625, BCM88312

      The secondary-boot-reg property is a u32 value that specifies the
      physical address of the register used to request the ROM holding pen
      code release a secondary CPU. The value written to the register is
      formed by encoding the target CPU id into the low bits of the
      physical start address it should jump to.


Example:
cpu@1 {
        device_type = "cpu";
        compatible = "arm,cortex-a78";
        reg = <0x0 0x1>;
        enable-method = "spin-table";
        cpu-release-addr = <0 0x20000000>;
      };


1. Bootloader starts first in the upper part of the picture;

2. The Kernel is in the lower part of the picture, which is guided by the bootloader;

3. The execution flow of CPU0 is in the left half of the picture, and the bootloader code will judge and start CPU0 first;

4. Secondary CPUs are in the right half of the picture and are awakened by the CPU

The specific startup process is as follows:

1. When bootloader/ABL starts, it will judge whether the executing code is CPU0, if not, execute wfe and wait for CPU0 to issue sev instruction to wake up. If it is CPU0, continue the initialization work.

         mrs x4,mpidr_el1

         tst
    x4,#15
             //testwether the current cpu is CPU0, ie. mpidr_el1=15

         b.eq 2f

/*

 * Secondary CPUs

 */

1: wfe

ldr x4, mbox               

cbz x4, 1b        //if x4==0(ie. The value in address of mbox is 0) dead loop,or jump to x4

br x4//branch to thegiven address

2:……//UART initialisation (38400 8N1)

The address of the above mbox is written in the Makefile as 0x20000000, and the initial state content at this address is all 0s. The above code judges that if the content at the mbox address is 0, there is an endless loop; if it is not 0, it directly jumps to the content contained in the address for execution.

2. In dts, assign cpu-release-addr and set its address to 0x20000000. That is, as long as the corresponding value is written to the address, such as address A, and the sev instruction is sent, the secondary CPU can be awakened and jump to the address A for execution.

     cpu-release-addr = <0 0x20000000>;

3. The smp_prepare_cpus function in the kernel assigns the value at the address of 0x20000000, and its value is the address of the function secondary_holding_pen:

 When the secondary cpu executes the secondary_holding_pen() function, it will judge the ID of the current CPU and compare it with the secondary_holding_pen_release variable. If it is equal, perform further initialization, otherwise perform WFE wait;

The modification process of the secondary_holding_pen_release variable is carried out by CPU0 calling the smp_init() function. This function first binds an idle thread to the corresponding CPU, then modifies the value of secondary_holding_pen_release (its value is the ID of the CPU to be awakened by CPU0), and finally sends the sev instruction to wake up the corresponding CPU to execute the idle thread.

NTRY(secondary_holding_pen)
	bl	el2_setup			// Drop to EL1, w0=cpu_boot_mode
	bl	set_cpu_boot_mode_flag
	mrs	x0, mpidr_el1
	mov_q	x1, MPIDR_HWID_BITMASK
	and	x0, x0, x1
	adr_l	x3, secondary_holding_pen_release
pen:	ldr	x4, [x3]
	cmp	x4, x0
	b.eq	secondary_startup
	wfe
	b	pen
ENDPROC(secondary_holding_pen)
  
5. The primary core goes through head.S and the most of the functions in start_kernel, it will spawn a kernel thread in rest_init, which will eventually run kernel_init –> kernel_init_freeable –> smp_init.
  • smp_init will call idle_threads_init to fork a swapper for each cpu, share the same PID but with different thread_info and task_struct, as shown below.
void __init smp_init(void)
{
	unsigned int cpu;

	idle_threads_init();
	cpuhp_threads_init();

	/* FIXME: This should be done in userspace --RR */
	for_each_present_cpu(cpu) {
		if (num_online_cpus() >= setup_max_cpus)
			break;
		if (!cpu_online(cpu))
			cpu_up(cpu);
	}

	/* Any cleanup work */
	smp_announce();
	smp_cpus_done(setup_max_cpus);
}


void __init idle_threads_init(void)
{
	unsigned int cpu, boot_cpu;

	boot_cpu = smp_processor_id();

	for_each_possible_cpu(cpu) {
		if (cpu != boot_cpu)
			idle_init(cpu);
	}
}


static inline void idle_init(unsigned int cpu)
{
	struct task_struct *tsk = per_cpu(idle_threads, cpu);

	if (!tsk) {
		tsk = fork_idle(cpu);
		if (IS_ERR(tsk))
			pr_err("SMP: fork_idle() failed for CPU %u\n", cpu);
		else
			per_cpu(idle_threads, cpu) = tsk;
	}
}
static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
{
	......
	target = min((int)target, CPUHP_BRINGUP_CPU);
	ret = cpuhp_up_callbacks(cpu, st, target);
out:
	cpu_hotplug_done();
	return ret;
}

static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
			      enum cpuhp_state target)
{
	enum cpuhp_state prev_state = st->state;
	int ret = 0;

	while (st->state < target) {
		st->state++;
		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL);
		if (ret) {
			st->target = prev_state;
			undo_cpu_up(cpu, st);
			break;
		}
	}
	return ret;
}


static struct cpuhp_step cpuhp_bp_states[] = {
	[CPUHP_BRINGUP_CPU] = {
		.name			= "cpu:bringup",
		.startup.single		= bringup_cpu,
		.teardown.single	= NULL,
		.cant_stop		= true,
	},
};

static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
				 bool bringup, struct hlist_node *node)
{
	......
	if (!step->multi_instance) {
		cb = bringup ? step->startup.single : step->teardown.single;
		ret = cb(cpu);
	}
	......
}

_cpu_up -> cpuhp_up_callbacks -> cpuhp_invoke_callback, which will call cpuhp_bp_states->startup.single, which points to bringup_cpu. bringup_cpu -> __cpu_up -> boot_secondary –> cpu_ops[cpu]->cpu_boot, which is smp_spin_table_cpu_boot –> write_pen_release.

static void write_pen_release(u64 val)
{
	void *start = (void *)&secondary_holding_pen_release;
	unsigned long size = sizeof(secondary_holding_pen_release);

	secondary_holding_pen_release = val;
	__flush_dcache_area(start, size);
}



In write_pen_release, the variable secondary_holding_pen_release will be updated, which allows the secondary core breaks the loop of secondary_holding_pen, the boot process can finally move on.

Modern age: jump to C world

secondary_startup:
	/*
	 * Common entry point for secondary CPUs.
	 */
	bl	__cpu_setup			// initialise processor
	bl	__enable_mmu
	ldr	x8, =__secondary_switched
	br	x8
ENDPROC(secondary_startup)

__secondary_switched:
	adr_l	x5, vectors
	msr	vbar_el1, x5
	isb

	adr_l	x0, secondary_data
	ldr	x0, [x0, #CPU_BOOT_STACK]	// get secondary_data.stack
	mov	sp, x0
	and	x0, x0, #~(THREAD_SIZE - 1)
	msr	sp_el0, x0			// save thread_info
	mov	x29, #0
	b	secondary_start_kernel
ENDPROC(__secondary_switched)

// in arch/arm64/kernel/smp.c
asmlinkage void secondary_start_kernel(void) {}

secondary_holding_pen -> secondary_startup -> __secondary_switched -> secondary_start_kernel, which is in arch/arm64/kernel/smp.c. The secondary bootup is done. To summarize, a figure from wowotech explains the overall secondary core bootup steps.

secondary


References:
https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cpus.yaml
https://devicetree-specification.readthedocs.io/en/latest/chapter3-devicenodes.html
https://blog.actorsfit.com/a?ID=00450-670b3808-c2a6-411c-9ad1-dae93038ab9a
https://wenboshen.org/posts/2016-12-21-secondary-bootup.html


Saturday, 26 March 2022

Introduction to IRQ Domain

 Overview

In the linux kernel, we use the following two IDs to identify an interrupt from a peripheral:

1. IRQ number. The CPU needs to number each peripheral interrupt, which we call IRQ Number. This IRQ number is a virtual interrupt ID that has nothing to do with hardware and is only used by the CPU to identify a peripheral interrupt.

2. HW interrupt ID. For the interrupt controller, it collects the interrupt request lines of multiple peripherals and passes them up, so the interrupt controller needs to encode the peripheral interrupt. Interrupt controller uses HW interrupt ID to identify peripheral interrupts. In the case of cascading interrupt controllers, only using the HW interrupt ID can not uniquely identify a peripheral interrupt. It is also necessary to know the interrupt controller to which the HW interrupt ID belongs (the HW interrupt ID will be repeatedly encoded on different interrupt controllers) .

In this way, the CPU and the interrupt controller have some different concepts in identifying interrupts. However, for the driver engineer, we have the same perspective as the CPU. We only want to get an IRQ number, regardless of the specific interrupt controller. The HW interrupt ID on the . An advantage of this is that the driver software does not need to be modified when the interrupt-related hardware changes. Therefore, the interrupt subsystem in the linux kernel needs to provide a mechanism to map the HW interrupt ID to the IRQ number, which is the main content of this article.

 

2. History

Regarding the mapping of the HW interrupt ID to the IRQ number, it was very simple in the past when the system had only one interrupt controller. The actual HW interrupt line number on the interrupt controller can be directly changed to the IRQ number. For example, we are all familiar with the SOC embedded interrupt controller. Most of this controller has an interrupt status register. This register may have 64 bits (or more). Each bit is an IRQ number, which can be directly mapped. At this time, the GPIO interrupt has only one bit in the status register of the interrupt controller, so all GPIO interrupts have only one IRQ number, and deduplex is performed in the irq handler of the general GPIO interrupt to map each specific GPIO interrupt to its corresponding on the IRQ number. If you are an old enough engineer, you should have gone through this stage.

With the development of the linux kernel, the concept of abstracting the interrupt controller into an irqchip is becoming more and more popular, and even the GPIO controller can be seen as an interrupt controller chip. In this way, there are at least two interrupt controllers in the system, one in the traditional sense. Interrupt controller, one is an interrupt controller of GPIO controller type. With the increasing complexity of the system and the increase of peripheral interrupt data, in fact, the system may require multiple interrupt controllers for cascading. Faced with such a trend, how should Linux kernel engineers respond? The answer is the concept of irq domain.

We have heard a lot of domain, power domain, clock domain, etc. The so-called domain is the domain, the meaning of the scope, that is to say, any definition of this scope is meaningless. All the interrupt controllers in the system will form a tree structure. Each interrupt controller can be connected to the interrupt request of several peripherals (we call it the interrupt source), and the interrupt controller will connect the interrupt source on it (according to its The physical characteristics in the controller) are numbered (that is, the HW interrupt ID). But this number is only limited to the scope of this interrupt controller.

 

3. Interface

1. Register the irq domain with the system

How to do the mapping is the interrupt controller's own business. However, engineers with software architecture ideas are more willing to abstract all kinds of interrupt controllers, and further abstract how to map HW interrupt ID to IRQ number. Therefore, there is a sub-module of irq domain in the general interrupt processing module, which divides this mapping relationship into three categories:

(1) Linear mapping. In fact, it is a lookup table, the HW interrupt ID is used as the index, and the corresponding IRQ number can be obtained by looking up the table. For the Linear map, the interrupt controller must meet certain conditions when encoding its HW interrupt ID: the hw ID cannot be too large, and the ID arrangement is preferably tight. For linear mapping, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_node,
                     unsigned int size,--------How many IRQs does the interrupt domain support
                     const struct irq_domain_ops *ops,---callback function
                     void *host_data)---- -driver private data
{
    return __irq_domain_add(of_node, size, size, 0, ops, host_data);
}

(2) Radix Tree map. Create a Radix Tree to maintain the HW interrupt ID to IRQ number mapping relationship. The HW interrupt ID is used as the lookup key, and the IRQ number is retrieved from the Radix Tree. If it is true that the conditions of linear mapping cannot be met, Radix Tree map can be considered. In fact, only powerPC and MIPS hardware platforms use the Radix Tree map in the kernel. For Radix Tree map, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    return __irq_domain_add(of_node, 0, ~0, 0, ops, host_data);
}

(3) no map. Some interrupt controllers are so powerful that the HW interrupt ID can be configured through a register rather than determined by the physical connection. For example, the MPIC (Multi-Processor Interrupt Controller) used by PowerPC systems. In this case, no mapping is required. We can directly write the IRQ number to the HW interrupt ID configuration register. At this time, the generated HW interrupt ID is the IRQ number, and no mapping is required. For this type of mapping, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node,
                     unsigned int max_irq,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data);
}

The logic of this type of interface is very simple. According to its own mapping type, initialize each member of struct irq_domain, and call __irq_domain_add to hang the irq domain into the global list of irq_domain_list.

2. Create a mapping for the irq domain

The content of the previous section is mainly to register an irq domain with the system. The mapping relationship between the specific HW interrupt ID and IRQ number is empty. Therefore, the database required for how each irq domain manages the mapping still needs to be established. For example: for the irq domain of linear mapping, we need to establish the lookup table of linear mapping, and for the Radix Tree map, we need to establish the Radix tree that reflects the IRQ number and HW interrupt ID. There are four interface functions for creating a map:

(1) Call the irq_create_mapping function to establish the mapping relationship between HW interrupt ID and IRQ number. This interface function takes irq domain and HW interrupt ID as parameters, and returns the IRQ number (this IRQ number is dynamically allocated). The prototype of this function is defined as follows:

extern unsigned int irq_create_mapping(struct irq_domain *host,
                       irq_hw_number_t hwirq);

The driver must provide the HW interrupt ID when calling this function, which means that the driver knows the HW interrupt ID it is using. In general, the HW interrupt ID should not be visible to the specific driver, but some scenarios are special. For example, a GPIO type interrupt has a specific relationship between its HW interrupt ID and GPIO. The driver knows which GPIO it uses, that is, which HW interrupt ID to use.

(2) irq_create_strict_mappings. This interface function is used to map a set of HW interrupt IDs. The prototype of the specific function is defined as follows:

extern int irq_create_strict_mappings(struct irq_domain *domain,
                      unsigned int irq_base,
                      irq_hw_number_t hwirq_base, int count);

(3) irq_create_of_mapping. Seeing of (open firmware) in the function name, I think you can also guess a few points. Of course, this interface uses the device tree to establish the mapping relationship. The prototype of the specific function is defined as follows:

extern unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data);

Usually, the device tree node of a common device has described enough interrupt information. In this case, the driver of the device can call the interface function irq_of_parse_and_map during initialization to perform interrupt-related content in the device node (interrupts and interrupt-parent attribute) to analyze, and establish a mapping relationship, the specific code is as follows:

unsigned int irq_of_parse_and_map(struct device_node *dev, int index)
{
    struct of_phandle_args oirq;

    if (of_irq_parse_one(dev, index, &oirq))————Analyze the interrupt related attributes in the device node
        return 0;

    return irq_create_of_mapping(&oirq);-----create the mapping and return the corresponding IRQ number
}

For a normal driver using the Device tree (which we recommend), basically the initialization needs to call irq_of_parse_and_map to get the IRQ number, and then call request_threaded_irq to request the interrupt handler.

(4) irq_create_direct_mapping. This is used for the type of interrupt controller with no map, so I won't repeat it here.

 

Four, data structure description

1. Callback interface of irq domain

struct irq_domain_ops abstracts an irq domain callback function, defined as follows:

struct irq_domain_ops {
    int (*match)(struct irq_domain *d, struct device_node *node);
    int (*map)(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw);
    void (*unmap)(struct irq_domain *d, unsigned int virq);
    int (*xlate)(struct irq_domain *d, struct device_node *node,
             const u32 *intspec, unsigned int intsize,
             unsigned long *out_hwirq, unsigned int *out_type);
};

Let's look at the xlate function first. The semantics is the meaning of translate, so what exactly is translated? In the DTS file, each device node that uses interrupts will provide interrupt information to the kernel through some attributes (such as interrupts and interrupt-parent attributes) so that the kernel can correctly initialize the driver. Here, the interrupt specifier represented by the interrupts attribute can only be resolved by the specific interrupt controller (that is, the irq domain). The xlate function is to translate several (intsize parameters) interrupt attributes (intspec parameters) on the specified device (node ​​parameter) into HW interrupt ID (out_hwirq parameter) and trigger type (out_type).

match is to determine whether a specified interrupt controller (node ​​parameter) matches an irq domain (d parameter), and if so, returns 1. In fact, this callback function is rarely defined in the kernel. In fact, there is an of_node in the struct irq_domain that points to the device node of the corresponding interrupt controller. Therefore, if this function is not provided, the default matching function is actually to judge the of_node of the irq domain Whether the member is equal to the incoming node parameter.

map and unmap are functions that operate inversely, and it is OK to describe one of them. The timing of calling the map function is when creating (or updating) the relationship between the HW interrupt ID (hw parameter) and the IRQ number (virq parameter). In fact, it is not enough to call a request_threaded_irq from the occurrence of an interrupt to the handler that calls the interrupt, but also needs to be set for the irq number:

(1) Set the irq chip of the interrupt descriptor (struct irq_desc) corresponding to the IRQ number

(2) Set the highlevel irq-events handler of the interrupt descriptor corresponding to the IRQ number

(3) Set the irq chip data of the interrupt descriptor corresponding to the IRQ number

These settings are not suitable to be set by specific hardware drivers, so they are set in the Interrupt controller, which is the callback function of the irq domain.

2. irq domain

In the kernel, the concept of irq domain is represented by struct irq_domain:

struct irq_domain {
    struct list_head link;
    const char *name;
    const struct irq_domain_ops *ops; ----callback function
    void *host_data;

    /* Optional data */
    struct device_node *of_node; ---- the device node of the interrupt controller corresponding to the interrupt domain
    struct irq_domain_chip_generic *gc; --- The concept of generic irq chip is not described in this article

    /* reverse map data. The linear map gets appended to the irq_domain */
    irq_hw_number_t hwirq_max; --- the largest HW interrupt ID in the domain
    unsigned int revmap_direct_max_irq; ----
    unsigned int revmap_size; -- linearly mapped size, for Radix Tree map and no map, this value is equal to 0
    struct radix_tree_root revmap_tree; ----Radix tree root node used by Radix Tree map
    unsigned int linear_revmap[]; -----lookup table used by linear map
} ;

In the Linux kernel, all irq domains are linked to a global linked list, and the linked list header is defined as follows:

static LIST_HEAD(irq_domain_list);

The link member in struct irq_domain is the node attached to this queue. Through the pointer of irq_domain_list, the mapping DB of HW interrupt ID and IRQ number in the whole system can be obtained. host_data defines the private data used by the underlying interrupt controller, which is related to the specific interrupt controller (for GIC, the pointer points to a struct gic_chip_data data structure).

For linear mapping:

(1) linear_revmap saves a linear lookup table, the index is the HW interrupt ID, and the IRQ number value is saved in the table

(2) revmap_size is equal to the size of the linear lookup table.

(3) hwirq_max saves the largest HW interrupt ID

(4) revmap_direct_max_irq is useless, set to 0. revmap_tree is useless.

For Radix Tree map:

(1) linear_revmap is useless, revmap_size is equal to 0.

(2) hwirq_max is useless and is set to a maximum value.

(3) revmap_direct_max_irq is useless, set to 0.

(4) revmap_tree points to the root node of the Radix tree.

 

Five, interrupt-related Device Tree knowledge review

To map, first understand the topology of the interrupt controller. The topology of the interrupt controller in the system and the allocation of its interrupt request line (which specific peripheral is allocated to) are described in the Device Tree Source file through the following properties. These contents are given some descriptions in the three documents of Device Tree, here is a brief summary:

For those peripherals that generate interrupts, we need to define the interrupt-parent and interrupts properties:

(1) interrupt-parent. Indicates which interrupt controller the peripheral's interrupt request line is physically connected to

(2) Interrupts. This attribute describes the details of the interrupt generated by the specific peripheral (that is, the legendary interrupt specifier). For example: HW interrupt ID (resolved by the interrupt controller pointed to by interrupt-parent in the device node of the peripheral), interrupt trigger type, etc.

For the Interrupt controller, we need to define the properties of interrupt-controller and #interrupt-cells:

(1) interrupt-controller. Indicates that the device node is an interrupt controller

(2) #interrupt-cells. How many cells are used by the interrupt controller (a cell is a 32-bit unit) to describe the interrupt request line of a peripheral. The specific meaning of each cell is defined by the interrupt controller itself.

(3) interrupts and interrupt-parent. For those interrupt controllers that are not root, they are also connected to other interrupt controllers as an interrupt-generating peripheral, so the properties of interrupts and interrupt-parent also need to be defined.

 

6. Establishment of Mapping DB

1 Overview

The mapping DB of HW interrupt ID and IRQ number in the system is established during the whole system initialization process. The process is as follows:

(1) The DTS file describes the topological structure of the interrupt controller and peripheral IRQ in the system. When the Linux kernel starts, it is passed to the kernel by the bootloader (the DTB is actually passed).

(2) When the Device Tree is initialized, a tree structure of all device nodes in the system is formed, of course, including all data structures related to the interrupt topology (all interrupt controller nodes and peripheral nodes that use interrupts)

(3) When the machine driver is initialized, the of_irq_init function will be called. In this function, all interrupt controller nodes will be scanned and the appropriate interrupt controller driver will be called for initialization. There is no doubt that initialization needs to pay attention to the order, first initialize root, then first level, second level, preferably leaf node. During the initialization process, the interface function in the previous section is generally called to add the irq domain to the system. Some interrupt controllers create mappings during the initialization of their drivers

(4) During the initialization of each driver, create a mapping

 

2. During the initialization of the interrupt controller, register the irq domain

Let's take the code of GIC as an example. The specific code is in gic_of_init->gic_init_bases, as follows:

void __init gic_init_bases(unsigned int gic_nr, int irq_start,
               void __iomem *dist_base, void __iomem *cpu_base,
               u32 percpu_offset, struct device_node *node)
{
    irq_hw_number_t hwirq_base;
    struct gic_chip_data *gic;
    int gic_irqs, irq_base, i;

...
for root GIC
        hwirq_base = 16;
        gic_irqs = number of all interrupts supported by the system - 16. The reason for subtracting 16 is mainly because HW interrupt Nos. 0 to 15 of the root GIC are for IPI, so they should be removed. Also because of this hwirq_base starts from 16


    irq_base = irq_alloc_descs(irq_start, 16, gic_irqs, numa_node_id()); Apply for gic_irqs IRQ resources, and search for IRQ number from the 16th. Since it is a root GIC, the applied IRQ will basically start from the 16th


    gic->domain = irq_domain_add_legacy(node, gic_irqs, irq_base,
                    hwirq_base, &gic_irq_domain_ops, gic);---register irq domain with the system and create a mapping

...
}

Unfortunately, the standard interface function for registering the irq domain is not called in the GIC code. To understand the reasons behind it, we need to go back in time. In the old linux kernel, the code for the ARM architecture was less than ideal. The arch/arm directory is filled with a lot of board-specific code, which defines many static tables related to specific devices. These tables specify the resources used by each device, of course, including IRQ resources. In this case, the IRQ of each peripheral is fixed (if you are old enough as a driver programmer, you should remember a long macro definition for IRQ number), that is, HW interrupt ID and IRQ number relationship is fixed. Once the relationships are fixed, we can create these mappings in the code of the interface controller. The specific code is as follows:

struct irq_domain *irq_domain_add_legacy(struct device_node *of_node,
                     unsigned int size,
                     unsigned int first_irq,
                     irq_hw_number_t first_hwirq,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    struct irq_domain *domain;

    domain = __irq_domain_add(of_node, first_hwirq + size,----register irq domain
                  first_hwirq + size, 0, ops, host_data);
    if (!domain)
        return NULL;

    irq_domain_associate_many(domain, first_irq, first_hwirq, size); ---Create a map

    return domain;
}

At this time, for this version of the GIC driver, after initialization, the mapping relationship between HW interrupt ID and IRQ number has been established and stored in the linear lookup table. The size is equal to the number of interrupts supported by the GIC, as follows:

The IRQ corresponding to index 0~15 is invalid

IRQ No. 16 <----------------->HW interrupt ID No. 16

IRQ No. 17 <----------------->HW interrupt ID No. 17

...

If you want to fully utilize the power of Device Tree, the GIC code in version 3.14 needs to be modified.

 

3. In the driver initialization process of each hardware peripheral, create a mapping relationship between HW interrupt ID and IRQ number

In the above description process, it has been mentioned that the device driver can call the interface function irq_of_parse_and_map during initialization to analyze the interrupt-related content (interrupts and interrupt-parent attributes) in the device node, and establish a mapping relationship. The specific code is as follows:

unsigned int irq_of_parse_and_map(struct device_node *dev, int index)
{
    struct of_phandle_args oirq;

    if (of_irq_parse_one(dev, index, &oirq))————Analyze the interrupt related attributes in the device node
        return 0;

    return irq_create_of_mapping(&oirq);-----create mapping
}

Let's take a look at how the irq_create_of_mapping function creates a mapping:

unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
{
    struct irq_domain *domain;
    irq_hw_number_t hwirq;
    unsigned int type = IRQ_TYPE_NONE;
    unsigned int virq;

    domain = irq_data->np ? irq_find_host(irq_data->np) : irq_default_domain;--A
    if (!domain) {
        return 0;
    }


    if (domain->ops->xlate == NULL)-------------B
        hwirq = irq_data->args[0];
    else {
        if (domain->ops->xlate( domain, irq_data->np, irq_data->args,----C
                    irq_data->args_count, &hwirq, &type))
            return 0;
    }

    /* Create mapping */
    virq = irq_create_mapping(domain, hwirq);--------D
    if (!virq)
        return virq;

    /* Set type if specified and different than the current one */
    if (type != IRQ_TYPE_NONE &&
        type != irq_get_trigger_type(virq))
        irq_set_irq_type(virq, type);---------E
    return virq;
}

A: The code here is mainly to find the irq domain. This is searched according to the np member of the passed parameter irq_data, which is specifically defined as follows:

struct of_phandle_args {
    struct device_node *np;---points to the device node
    int args_count of the interrupt controller corresponding to the peripheral;------the number of interrupt-related attributes defined by the peripheral
    uint32_t args[MAX_PHANDLE_ARGS];- --- Definition of specific interrupt equivalent attributes
};

B: If the xlate function is not defined, then take the first cell of the interrupts attribute as the HW interrupt ID.

C: It is necessary to tie the bell to untie the bell. The interrupts attribute is best explained by the interrupt controller (that is, the irq domain). If the xlate function can complete the attribute parsing, the parameters hwirq and type will be output, indicating the HW interrupt ID and interrupt type (trigger mode, etc.), respectively.

D: After parsing, finally call the irq_create_mapping function to create the mapping relationship between HW interrupt ID and IRQ number.

E: If necessary, call the irq_set_irq_type function to set the trigger type

The irq_create_mapping function establishes the mapping relationship between HW interrupt ID and IRQ number. This interface function takes irq domain and HW interrupt ID as parameters, and returns IRQ number. The specific code is as follows:

unsigned int irq_create_mapping(struct irq_domain *domain,
                irq_hw_number_t hwirq)
{
    unsigned int hint;
    int virq;

If the mapping already exists, no mapping is needed, just return
    virq = irq_find_mapping(domain, hwirq);
    if (virq) {
        return virq;
    }


    hint = hwirq % nr_irqs;------- allocate an IRQ descriptor and corresponding irq number
    if (hint == 0)
        hint++;
    virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
    if (virq <= 0)
        virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node));
    if (virq <= 0) {
        pr_debug("-> virq allocation failed\n");
        return 0;
    }

    if (irq_domain_associate(domain, virq, hwirq)) {---create mapping
        irq_free_desc(virq);
        return 0;
    }

    return virq;
}

The code for allocating interrupt descriptors will be described in detail in subsequent articles. Simply skip it here, anyway, after pointing to this code, we can either an IRQ number and its corresponding interrupt descriptor. Program comments do not use IRQ number but use the term virtual interrupt number. The virtual interrupt number still focuses on understanding the word "virtual". The so-called virtual actually means that it has nothing to do with the specific hardware connection, it is just a number. The specific mapping function is the irq_domain_associate function. The code is as follows:

int irq_domain_associate(struct irq_domain *domain, unsigned int virq,
             irq_hw_number_t hwirq)
{
    struct irq_data *irq_data = irq_get_irq_data(virq);
    int ret;

    mutex_lock(&irq_domain_mutex);
    irq_data->hwirq = hwirq;
    irq_data->domain = domain;
    if (domain->ops->map) {
        ret = domain->ops->map(domain, virq, hwirq);--- Call the map callback function of irq domain
    }

    if (hwirq < domain->revmap_size) {
        domain->linear_revmap[hwirq] = virq;--- fill in the data of the linear map lookup table
    } else {
        mutex_lock(&revmap_trees_mutex);
        radix_tree_insert(&domain->revmap_tree, hwirq, irq_data) ;--Insert a node into the radix tree
        mutex_unlock(&revmap_trees_mutex);
    }
    mutex_unlock(&irq_domain_mutex);

    irq_clear_status_flags(virq, IRQ_NOREQUEST); --- This IRQ can already be applied for, so clear the relevant flag

    return 0;
}

 

7. Convert HW interrupt ID to IRQ number

Created a huge mapping DB of HW interrupt ID to IRQ number, which will eventually be used. The specific usage scenario is that in the CPU-related processing function, the program will read the hardware interrupt ID, convert it into an IRQ number, and call the corresponding irq event handler. In this chapter, we describe the conversion process by taking a cascaded GIC system as an example

1. GIC driver initialization

The initialization of the root GIC has been described above, let's look at the initialization of the second GIC. The specific code is in gic_of_init->gic_init_bases, as follows:

void __init gic_init_bases(unsigned int gic_nr, int irq_start,
               void __iomem *dist_base, void __iomem *cpu_base,
               u32 percpu_offset, struct device_node *node)
{
    irq_hw_number_t hwirq_base;
    struct gic_chip_data *gic;
    int gic_irqs, irq_base, i;

...
for second GIC
        hwirq_base = 32; 
        gic_irqs = number of all interrupts supported by the system - 32. The reason why 32 is subtracted is mainly because for the second GIC, the HW interrupt Nos. 0 to 15 are for IPI, so they should be removed. And HW interrupt No. 16 to 31 is for PPI and should be removed. Also because of this hwirq_base starts from 32


    irq_base = irq_alloc_descs(irq_start, 16, gic_irqs, numa_node_id()); Apply for gic_irqs IRQ resources, and search for IRQ number from the 16th. Since it is a second GIC, the applied IRQ will basically start from the last IRQ number applied by the root GIC + 1


    gic->domain = irq_domain_add_legacy(node, gic_irqs, irq_base,
                    hwirq_base, &gic_irq_domain_ops, gic);---register irq domain with the system and create a mapping

...
}

After the second GIC is initialized, the mapping relationship between the HW interrupt ID and IRQ number of the irq domain has been established and stored in the linear lookup table. The size is equal to the number of interrupts supported by the GIC, as follows:

The IRQ corresponding to index 0~32 is invalid

The last IRQ number applied by the root GIC +1 <-----------------> HW interrupt ID number 32

The last IRQ number applied by the root GIC +2 <----------------->HW interrupt ID number 33

...

OK, let's go back to the initialization function of gic. For the second GIC, there are other parts of the initialization content:

int __init gic_of_init(struct device_node *node, struct device_node *parent)
{

...

    if (parent) {
        irq = irq_of_parse_and_map(node, 0);-- Parse the interrupts attribute of second GIC, and perform mapping, return IRQ number
        gic_cascade_irq(gic_cnt, irq);--- set handler
    }

}

The above initialization function removes code that has nothing to do with cascading. For the root GIC, the parent passed in is NULL, so the code in the cascading part will not be executed. For the second GIC, it is a common irq source as its parent (root GIC), therefore, the handler for this IRQ needs to be registered as well. It can be seen that the initialization of the non-root GIC is divided into two parts: one part acts as an interrupt controller and executes the same initialization code as the root GIC. On the other hand, GIC acts as a common interrupt generating device and needs to register its interrupt handler like a common device driver.

The irq_of_parse_and_map function is believed to be familiar to everyone and will not be described here. The gic_cascade_irq function is as follows:

void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq)
{
    if (irq_set_handler_data(irq, &gic_data[gic_nr]) != 0)---set handler data
        BUG();
    irq_set_chained_handler(irq, gic_handle_cascade_irq);---set handler
}

2. How to convert HW interrupt ID into IRQ number during interrupt processing

During the startup process of the system, after the efforts of each interrupt controller and each peripheral driver, the database of the entire interrupt system (the database that converts the HW interrupt ID into an IRQ number, the database here does not refer to general database software such as SQL lite or oracle) ) has been established. Once a hardware interrupt occurs, the irq handler will be called after the CPU architecture-related interrupt code. The general process of this function is as follows:

(1) First find the irq domain corresponding to the root interrupt controller.

(2) Obtain HW interrupt ID according to HW register information and irq domain information

(3) Call irq_find_mapping to find the irq number corresponding to the HW interrupt ID

(4) Call handle_IRQ (for ARM platform) to handle the irq number

For the cascading case, the process is similar to the above description, but it should be noted that in step 4, the IRQ's hander is not directly called to process the irq number because the irq needs to be resolved at each interrupt controller level. Take a simple second-level connection: Suppose there are two interrupt controllers, A and B, in the system, A is the root interrupt controller, and B is connected to A's HW interrupt ID No. 13. When the B interrupt controller is initialized, in addition to the part of initializing it as an interrupt controller, there is also the part of initializing it as a common peripheral on the root interrupt controller A. The most important thing is to call irq_set_chained_handler to set the handler. In this way, in the above step 4, the handler corresponding to HW interrupt ID No. 13 (that is, the handler of B) will be called. In this handler, the above (1) to (4) will be repeated.

 

Original article, please indicate the source when forwarding. Snail Technology. http://www.wowotech.net/linux_kenrel/irq-domain.html