Saturday 26 March 2022

Introduction to IRQ Domain

 Overview

In the linux kernel, we use the following two IDs to identify an interrupt from a peripheral:

1. IRQ number. The CPU needs to number each peripheral interrupt, which we call IRQ Number. This IRQ number is a virtual interrupt ID that has nothing to do with hardware and is only used by the CPU to identify a peripheral interrupt.

2. HW interrupt ID. For the interrupt controller, it collects the interrupt request lines of multiple peripherals and passes them up, so the interrupt controller needs to encode the peripheral interrupt. Interrupt controller uses HW interrupt ID to identify peripheral interrupts. In the case of cascading interrupt controllers, only using the HW interrupt ID can not uniquely identify a peripheral interrupt. It is also necessary to know the interrupt controller to which the HW interrupt ID belongs (the HW interrupt ID will be repeatedly encoded on different interrupt controllers) .

In this way, the CPU and the interrupt controller have some different concepts in identifying interrupts. However, for the driver engineer, we have the same perspective as the CPU. We only want to get an IRQ number, regardless of the specific interrupt controller. The HW interrupt ID on the . An advantage of this is that the driver software does not need to be modified when the interrupt-related hardware changes. Therefore, the interrupt subsystem in the linux kernel needs to provide a mechanism to map the HW interrupt ID to the IRQ number, which is the main content of this article.

 

2. History

Regarding the mapping of the HW interrupt ID to the IRQ number, it was very simple in the past when the system had only one interrupt controller. The actual HW interrupt line number on the interrupt controller can be directly changed to the IRQ number. For example, we are all familiar with the SOC embedded interrupt controller. Most of this controller has an interrupt status register. This register may have 64 bits (or more). Each bit is an IRQ number, which can be directly mapped. At this time, the GPIO interrupt has only one bit in the status register of the interrupt controller, so all GPIO interrupts have only one IRQ number, and deduplex is performed in the irq handler of the general GPIO interrupt to map each specific GPIO interrupt to its corresponding on the IRQ number. If you are an old enough engineer, you should have gone through this stage.

With the development of the linux kernel, the concept of abstracting the interrupt controller into an irqchip is becoming more and more popular, and even the GPIO controller can be seen as an interrupt controller chip. In this way, there are at least two interrupt controllers in the system, one in the traditional sense. Interrupt controller, one is an interrupt controller of GPIO controller type. With the increasing complexity of the system and the increase of peripheral interrupt data, in fact, the system may require multiple interrupt controllers for cascading. Faced with such a trend, how should Linux kernel engineers respond? The answer is the concept of irq domain.

We have heard a lot of domain, power domain, clock domain, etc. The so-called domain is the domain, the meaning of the scope, that is to say, any definition of this scope is meaningless. All the interrupt controllers in the system will form a tree structure. Each interrupt controller can be connected to the interrupt request of several peripherals (we call it the interrupt source), and the interrupt controller will connect the interrupt source on it (according to its The physical characteristics in the controller) are numbered (that is, the HW interrupt ID). But this number is only limited to the scope of this interrupt controller.

 

3. Interface

1. Register the irq domain with the system

How to do the mapping is the interrupt controller's own business. However, engineers with software architecture ideas are more willing to abstract all kinds of interrupt controllers, and further abstract how to map HW interrupt ID to IRQ number. Therefore, there is a sub-module of irq domain in the general interrupt processing module, which divides this mapping relationship into three categories:

(1) Linear mapping. In fact, it is a lookup table, the HW interrupt ID is used as the index, and the corresponding IRQ number can be obtained by looking up the table. For the Linear map, the interrupt controller must meet certain conditions when encoding its HW interrupt ID: the hw ID cannot be too large, and the ID arrangement is preferably tight. For linear mapping, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_node,
                     unsigned int size,--------How many IRQs does the interrupt domain support
                     const struct irq_domain_ops *ops,---callback function
                     void *host_data)---- -driver private data
{
    return __irq_domain_add(of_node, size, size, 0, ops, host_data);
}

(2) Radix Tree map. Create a Radix Tree to maintain the HW interrupt ID to IRQ number mapping relationship. The HW interrupt ID is used as the lookup key, and the IRQ number is retrieved from the Radix Tree. If it is true that the conditions of linear mapping cannot be met, Radix Tree map can be considered. In fact, only powerPC and MIPS hardware platforms use the Radix Tree map in the kernel. For Radix Tree map, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    return __irq_domain_add(of_node, 0, ~0, 0, ops, host_data);
}

(3) no map. Some interrupt controllers are so powerful that the HW interrupt ID can be configured through a register rather than determined by the physical connection. For example, the MPIC (Multi-Processor Interrupt Controller) used by PowerPC systems. In this case, no mapping is required. We can directly write the IRQ number to the HW interrupt ID configuration register. At this time, the generated HW interrupt ID is the IRQ number, and no mapping is required. For this type of mapping, its interface API is as follows:

static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node,
                     unsigned int max_irq,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data);
}

The logic of this type of interface is very simple. According to its own mapping type, initialize each member of struct irq_domain, and call __irq_domain_add to hang the irq domain into the global list of irq_domain_list.

2. Create a mapping for the irq domain

The content of the previous section is mainly to register an irq domain with the system. The mapping relationship between the specific HW interrupt ID and IRQ number is empty. Therefore, the database required for how each irq domain manages the mapping still needs to be established. For example: for the irq domain of linear mapping, we need to establish the lookup table of linear mapping, and for the Radix Tree map, we need to establish the Radix tree that reflects the IRQ number and HW interrupt ID. There are four interface functions for creating a map:

(1) Call the irq_create_mapping function to establish the mapping relationship between HW interrupt ID and IRQ number. This interface function takes irq domain and HW interrupt ID as parameters, and returns the IRQ number (this IRQ number is dynamically allocated). The prototype of this function is defined as follows:

extern unsigned int irq_create_mapping(struct irq_domain *host,
                       irq_hw_number_t hwirq);

The driver must provide the HW interrupt ID when calling this function, which means that the driver knows the HW interrupt ID it is using. In general, the HW interrupt ID should not be visible to the specific driver, but some scenarios are special. For example, a GPIO type interrupt has a specific relationship between its HW interrupt ID and GPIO. The driver knows which GPIO it uses, that is, which HW interrupt ID to use.

(2) irq_create_strict_mappings. This interface function is used to map a set of HW interrupt IDs. The prototype of the specific function is defined as follows:

extern int irq_create_strict_mappings(struct irq_domain *domain,
                      unsigned int irq_base,
                      irq_hw_number_t hwirq_base, int count);

(3) irq_create_of_mapping. Seeing of (open firmware) in the function name, I think you can also guess a few points. Of course, this interface uses the device tree to establish the mapping relationship. The prototype of the specific function is defined as follows:

extern unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data);

Usually, the device tree node of a common device has described enough interrupt information. In this case, the driver of the device can call the interface function irq_of_parse_and_map during initialization to perform interrupt-related content in the device node (interrupts and interrupt-parent attribute) to analyze, and establish a mapping relationship, the specific code is as follows:

unsigned int irq_of_parse_and_map(struct device_node *dev, int index)
{
    struct of_phandle_args oirq;

    if (of_irq_parse_one(dev, index, &oirq))————Analyze the interrupt related attributes in the device node
        return 0;

    return irq_create_of_mapping(&oirq);-----create the mapping and return the corresponding IRQ number
}

For a normal driver using the Device tree (which we recommend), basically the initialization needs to call irq_of_parse_and_map to get the IRQ number, and then call request_threaded_irq to request the interrupt handler.

(4) irq_create_direct_mapping. This is used for the type of interrupt controller with no map, so I won't repeat it here.

 

Four, data structure description

1. Callback interface of irq domain

struct irq_domain_ops abstracts an irq domain callback function, defined as follows:

struct irq_domain_ops {
    int (*match)(struct irq_domain *d, struct device_node *node);
    int (*map)(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw);
    void (*unmap)(struct irq_domain *d, unsigned int virq);
    int (*xlate)(struct irq_domain *d, struct device_node *node,
             const u32 *intspec, unsigned int intsize,
             unsigned long *out_hwirq, unsigned int *out_type);
};

Let's look at the xlate function first. The semantics is the meaning of translate, so what exactly is translated? In the DTS file, each device node that uses interrupts will provide interrupt information to the kernel through some attributes (such as interrupts and interrupt-parent attributes) so that the kernel can correctly initialize the driver. Here, the interrupt specifier represented by the interrupts attribute can only be resolved by the specific interrupt controller (that is, the irq domain). The xlate function is to translate several (intsize parameters) interrupt attributes (intspec parameters) on the specified device (node ​​parameter) into HW interrupt ID (out_hwirq parameter) and trigger type (out_type).

match is to determine whether a specified interrupt controller (node ​​parameter) matches an irq domain (d parameter), and if so, returns 1. In fact, this callback function is rarely defined in the kernel. In fact, there is an of_node in the struct irq_domain that points to the device node of the corresponding interrupt controller. Therefore, if this function is not provided, the default matching function is actually to judge the of_node of the irq domain Whether the member is equal to the incoming node parameter.

map and unmap are functions that operate inversely, and it is OK to describe one of them. The timing of calling the map function is when creating (or updating) the relationship between the HW interrupt ID (hw parameter) and the IRQ number (virq parameter). In fact, it is not enough to call a request_threaded_irq from the occurrence of an interrupt to the handler that calls the interrupt, but also needs to be set for the irq number:

(1) Set the irq chip of the interrupt descriptor (struct irq_desc) corresponding to the IRQ number

(2) Set the highlevel irq-events handler of the interrupt descriptor corresponding to the IRQ number

(3) Set the irq chip data of the interrupt descriptor corresponding to the IRQ number

These settings are not suitable to be set by specific hardware drivers, so they are set in the Interrupt controller, which is the callback function of the irq domain.

2. irq domain

In the kernel, the concept of irq domain is represented by struct irq_domain:

struct irq_domain {
    struct list_head link;
    const char *name;
    const struct irq_domain_ops *ops; ----callback function
    void *host_data;

    /* Optional data */
    struct device_node *of_node; ---- the device node of the interrupt controller corresponding to the interrupt domain
    struct irq_domain_chip_generic *gc; --- The concept of generic irq chip is not described in this article

    /* reverse map data. The linear map gets appended to the irq_domain */
    irq_hw_number_t hwirq_max; --- the largest HW interrupt ID in the domain
    unsigned int revmap_direct_max_irq; ----
    unsigned int revmap_size; -- linearly mapped size, for Radix Tree map and no map, this value is equal to 0
    struct radix_tree_root revmap_tree; ----Radix tree root node used by Radix Tree map
    unsigned int linear_revmap[]; -----lookup table used by linear map
} ;

In the Linux kernel, all irq domains are linked to a global linked list, and the linked list header is defined as follows:

static LIST_HEAD(irq_domain_list);

The link member in struct irq_domain is the node attached to this queue. Through the pointer of irq_domain_list, the mapping DB of HW interrupt ID and IRQ number in the whole system can be obtained. host_data defines the private data used by the underlying interrupt controller, which is related to the specific interrupt controller (for GIC, the pointer points to a struct gic_chip_data data structure).

For linear mapping:

(1) linear_revmap saves a linear lookup table, the index is the HW interrupt ID, and the IRQ number value is saved in the table

(2) revmap_size is equal to the size of the linear lookup table.

(3) hwirq_max saves the largest HW interrupt ID

(4) revmap_direct_max_irq is useless, set to 0. revmap_tree is useless.

For Radix Tree map:

(1) linear_revmap is useless, revmap_size is equal to 0.

(2) hwirq_max is useless and is set to a maximum value.

(3) revmap_direct_max_irq is useless, set to 0.

(4) revmap_tree points to the root node of the Radix tree.

 

Five, interrupt-related Device Tree knowledge review

To map, first understand the topology of the interrupt controller. The topology of the interrupt controller in the system and the allocation of its interrupt request line (which specific peripheral is allocated to) are described in the Device Tree Source file through the following properties. These contents are given some descriptions in the three documents of Device Tree, here is a brief summary:

For those peripherals that generate interrupts, we need to define the interrupt-parent and interrupts properties:

(1) interrupt-parent. Indicates which interrupt controller the peripheral's interrupt request line is physically connected to

(2) Interrupts. This attribute describes the details of the interrupt generated by the specific peripheral (that is, the legendary interrupt specifier). For example: HW interrupt ID (resolved by the interrupt controller pointed to by interrupt-parent in the device node of the peripheral), interrupt trigger type, etc.

For the Interrupt controller, we need to define the properties of interrupt-controller and #interrupt-cells:

(1) interrupt-controller. Indicates that the device node is an interrupt controller

(2) #interrupt-cells. How many cells are used by the interrupt controller (a cell is a 32-bit unit) to describe the interrupt request line of a peripheral. The specific meaning of each cell is defined by the interrupt controller itself.

(3) interrupts and interrupt-parent. For those interrupt controllers that are not root, they are also connected to other interrupt controllers as an interrupt-generating peripheral, so the properties of interrupts and interrupt-parent also need to be defined.

 

6. Establishment of Mapping DB

1 Overview

The mapping DB of HW interrupt ID and IRQ number in the system is established during the whole system initialization process. The process is as follows:

(1) The DTS file describes the topological structure of the interrupt controller and peripheral IRQ in the system. When the Linux kernel starts, it is passed to the kernel by the bootloader (the DTB is actually passed).

(2) When the Device Tree is initialized, a tree structure of all device nodes in the system is formed, of course, including all data structures related to the interrupt topology (all interrupt controller nodes and peripheral nodes that use interrupts)

(3) When the machine driver is initialized, the of_irq_init function will be called. In this function, all interrupt controller nodes will be scanned and the appropriate interrupt controller driver will be called for initialization. There is no doubt that initialization needs to pay attention to the order, first initialize root, then first level, second level, preferably leaf node. During the initialization process, the interface function in the previous section is generally called to add the irq domain to the system. Some interrupt controllers create mappings during the initialization of their drivers

(4) During the initialization of each driver, create a mapping

 

2. During the initialization of the interrupt controller, register the irq domain

Let's take the code of GIC as an example. The specific code is in gic_of_init->gic_init_bases, as follows:

void __init gic_init_bases(unsigned int gic_nr, int irq_start,
               void __iomem *dist_base, void __iomem *cpu_base,
               u32 percpu_offset, struct device_node *node)
{
    irq_hw_number_t hwirq_base;
    struct gic_chip_data *gic;
    int gic_irqs, irq_base, i;

...
for root GIC
        hwirq_base = 16;
        gic_irqs = number of all interrupts supported by the system - 16. The reason for subtracting 16 is mainly because HW interrupt Nos. 0 to 15 of the root GIC are for IPI, so they should be removed. Also because of this hwirq_base starts from 16


    irq_base = irq_alloc_descs(irq_start, 16, gic_irqs, numa_node_id()); Apply for gic_irqs IRQ resources, and search for IRQ number from the 16th. Since it is a root GIC, the applied IRQ will basically start from the 16th


    gic->domain = irq_domain_add_legacy(node, gic_irqs, irq_base,
                    hwirq_base, &gic_irq_domain_ops, gic);---register irq domain with the system and create a mapping

...
}

Unfortunately, the standard interface function for registering the irq domain is not called in the GIC code. To understand the reasons behind it, we need to go back in time. In the old linux kernel, the code for the ARM architecture was less than ideal. The arch/arm directory is filled with a lot of board-specific code, which defines many static tables related to specific devices. These tables specify the resources used by each device, of course, including IRQ resources. In this case, the IRQ of each peripheral is fixed (if you are old enough as a driver programmer, you should remember a long macro definition for IRQ number), that is, HW interrupt ID and IRQ number relationship is fixed. Once the relationships are fixed, we can create these mappings in the code of the interface controller. The specific code is as follows:

struct irq_domain *irq_domain_add_legacy(struct device_node *of_node,
                     unsigned int size,
                     unsigned int first_irq,
                     irq_hw_number_t first_hwirq,
                     const struct irq_domain_ops *ops,
                     void *host_data)
{
    struct irq_domain *domain;

    domain = __irq_domain_add(of_node, first_hwirq + size,----register irq domain
                  first_hwirq + size, 0, ops, host_data);
    if (!domain)
        return NULL;

    irq_domain_associate_many(domain, first_irq, first_hwirq, size); ---Create a map

    return domain;
}

At this time, for this version of the GIC driver, after initialization, the mapping relationship between HW interrupt ID and IRQ number has been established and stored in the linear lookup table. The size is equal to the number of interrupts supported by the GIC, as follows:

The IRQ corresponding to index 0~15 is invalid

IRQ No. 16 <----------------->HW interrupt ID No. 16

IRQ No. 17 <----------------->HW interrupt ID No. 17

...

If you want to fully utilize the power of Device Tree, the GIC code in version 3.14 needs to be modified.

 

3. In the driver initialization process of each hardware peripheral, create a mapping relationship between HW interrupt ID and IRQ number

In the above description process, it has been mentioned that the device driver can call the interface function irq_of_parse_and_map during initialization to analyze the interrupt-related content (interrupts and interrupt-parent attributes) in the device node, and establish a mapping relationship. The specific code is as follows:

unsigned int irq_of_parse_and_map(struct device_node *dev, int index)
{
    struct of_phandle_args oirq;

    if (of_irq_parse_one(dev, index, &oirq))————Analyze the interrupt related attributes in the device node
        return 0;

    return irq_create_of_mapping(&oirq);-----create mapping
}

Let's take a look at how the irq_create_of_mapping function creates a mapping:

unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
{
    struct irq_domain *domain;
    irq_hw_number_t hwirq;
    unsigned int type = IRQ_TYPE_NONE;
    unsigned int virq;

    domain = irq_data->np ? irq_find_host(irq_data->np) : irq_default_domain;--A
    if (!domain) {
        return 0;
    }


    if (domain->ops->xlate == NULL)-------------B
        hwirq = irq_data->args[0];
    else {
        if (domain->ops->xlate( domain, irq_data->np, irq_data->args,----C
                    irq_data->args_count, &hwirq, &type))
            return 0;
    }

    /* Create mapping */
    virq = irq_create_mapping(domain, hwirq);--------D
    if (!virq)
        return virq;

    /* Set type if specified and different than the current one */
    if (type != IRQ_TYPE_NONE &&
        type != irq_get_trigger_type(virq))
        irq_set_irq_type(virq, type);---------E
    return virq;
}

A: The code here is mainly to find the irq domain. This is searched according to the np member of the passed parameter irq_data, which is specifically defined as follows:

struct of_phandle_args {
    struct device_node *np;---points to the device node
    int args_count of the interrupt controller corresponding to the peripheral;------the number of interrupt-related attributes defined by the peripheral
    uint32_t args[MAX_PHANDLE_ARGS];- --- Definition of specific interrupt equivalent attributes
};

B: If the xlate function is not defined, then take the first cell of the interrupts attribute as the HW interrupt ID.

C: It is necessary to tie the bell to untie the bell. The interrupts attribute is best explained by the interrupt controller (that is, the irq domain). If the xlate function can complete the attribute parsing, the parameters hwirq and type will be output, indicating the HW interrupt ID and interrupt type (trigger mode, etc.), respectively.

D: After parsing, finally call the irq_create_mapping function to create the mapping relationship between HW interrupt ID and IRQ number.

E: If necessary, call the irq_set_irq_type function to set the trigger type

The irq_create_mapping function establishes the mapping relationship between HW interrupt ID and IRQ number. This interface function takes irq domain and HW interrupt ID as parameters, and returns IRQ number. The specific code is as follows:

unsigned int irq_create_mapping(struct irq_domain *domain,
                irq_hw_number_t hwirq)
{
    unsigned int hint;
    int virq;

If the mapping already exists, no mapping is needed, just return
    virq = irq_find_mapping(domain, hwirq);
    if (virq) {
        return virq;
    }


    hint = hwirq % nr_irqs;------- allocate an IRQ descriptor and corresponding irq number
    if (hint == 0)
        hint++;
    virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
    if (virq <= 0)
        virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node));
    if (virq <= 0) {
        pr_debug("-> virq allocation failed\n");
        return 0;
    }

    if (irq_domain_associate(domain, virq, hwirq)) {---create mapping
        irq_free_desc(virq);
        return 0;
    }

    return virq;
}

The code for allocating interrupt descriptors will be described in detail in subsequent articles. Simply skip it here, anyway, after pointing to this code, we can either an IRQ number and its corresponding interrupt descriptor. Program comments do not use IRQ number but use the term virtual interrupt number. The virtual interrupt number still focuses on understanding the word "virtual". The so-called virtual actually means that it has nothing to do with the specific hardware connection, it is just a number. The specific mapping function is the irq_domain_associate function. The code is as follows:

int irq_domain_associate(struct irq_domain *domain, unsigned int virq,
             irq_hw_number_t hwirq)
{
    struct irq_data *irq_data = irq_get_irq_data(virq);
    int ret;

    mutex_lock(&irq_domain_mutex);
    irq_data->hwirq = hwirq;
    irq_data->domain = domain;
    if (domain->ops->map) {
        ret = domain->ops->map(domain, virq, hwirq);--- Call the map callback function of irq domain
    }

    if (hwirq < domain->revmap_size) {
        domain->linear_revmap[hwirq] = virq;--- fill in the data of the linear map lookup table
    } else {
        mutex_lock(&revmap_trees_mutex);
        radix_tree_insert(&domain->revmap_tree, hwirq, irq_data) ;--Insert a node into the radix tree
        mutex_unlock(&revmap_trees_mutex);
    }
    mutex_unlock(&irq_domain_mutex);

    irq_clear_status_flags(virq, IRQ_NOREQUEST); --- This IRQ can already be applied for, so clear the relevant flag

    return 0;
}

 

7. Convert HW interrupt ID to IRQ number

Created a huge mapping DB of HW interrupt ID to IRQ number, which will eventually be used. The specific usage scenario is that in the CPU-related processing function, the program will read the hardware interrupt ID, convert it into an IRQ number, and call the corresponding irq event handler. In this chapter, we describe the conversion process by taking a cascaded GIC system as an example

1. GIC driver initialization

The initialization of the root GIC has been described above, let's look at the initialization of the second GIC. The specific code is in gic_of_init->gic_init_bases, as follows:

void __init gic_init_bases(unsigned int gic_nr, int irq_start,
               void __iomem *dist_base, void __iomem *cpu_base,
               u32 percpu_offset, struct device_node *node)
{
    irq_hw_number_t hwirq_base;
    struct gic_chip_data *gic;
    int gic_irqs, irq_base, i;

...
for second GIC
        hwirq_base = 32; 
        gic_irqs = number of all interrupts supported by the system - 32. The reason why 32 is subtracted is mainly because for the second GIC, the HW interrupt Nos. 0 to 15 are for IPI, so they should be removed. And HW interrupt No. 16 to 31 is for PPI and should be removed. Also because of this hwirq_base starts from 32


    irq_base = irq_alloc_descs(irq_start, 16, gic_irqs, numa_node_id()); Apply for gic_irqs IRQ resources, and search for IRQ number from the 16th. Since it is a second GIC, the applied IRQ will basically start from the last IRQ number applied by the root GIC + 1


    gic->domain = irq_domain_add_legacy(node, gic_irqs, irq_base,
                    hwirq_base, &gic_irq_domain_ops, gic);---register irq domain with the system and create a mapping

...
}

After the second GIC is initialized, the mapping relationship between the HW interrupt ID and IRQ number of the irq domain has been established and stored in the linear lookup table. The size is equal to the number of interrupts supported by the GIC, as follows:

The IRQ corresponding to index 0~32 is invalid

The last IRQ number applied by the root GIC +1 <-----------------> HW interrupt ID number 32

The last IRQ number applied by the root GIC +2 <----------------->HW interrupt ID number 33

...

OK, let's go back to the initialization function of gic. For the second GIC, there are other parts of the initialization content:

int __init gic_of_init(struct device_node *node, struct device_node *parent)
{

...

    if (parent) {
        irq = irq_of_parse_and_map(node, 0);-- Parse the interrupts attribute of second GIC, and perform mapping, return IRQ number
        gic_cascade_irq(gic_cnt, irq);--- set handler
    }

}

The above initialization function removes code that has nothing to do with cascading. For the root GIC, the parent passed in is NULL, so the code in the cascading part will not be executed. For the second GIC, it is a common irq source as its parent (root GIC), therefore, the handler for this IRQ needs to be registered as well. It can be seen that the initialization of the non-root GIC is divided into two parts: one part acts as an interrupt controller and executes the same initialization code as the root GIC. On the other hand, GIC acts as a common interrupt generating device and needs to register its interrupt handler like a common device driver.

The irq_of_parse_and_map function is believed to be familiar to everyone and will not be described here. The gic_cascade_irq function is as follows:

void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq)
{
    if (irq_set_handler_data(irq, &gic_data[gic_nr]) != 0)---set handler data
        BUG();
    irq_set_chained_handler(irq, gic_handle_cascade_irq);---set handler
}

2. How to convert HW interrupt ID into IRQ number during interrupt processing

During the startup process of the system, after the efforts of each interrupt controller and each peripheral driver, the database of the entire interrupt system (the database that converts the HW interrupt ID into an IRQ number, the database here does not refer to general database software such as SQL lite or oracle) ) has been established. Once a hardware interrupt occurs, the irq handler will be called after the CPU architecture-related interrupt code. The general process of this function is as follows:

(1) First find the irq domain corresponding to the root interrupt controller.

(2) Obtain HW interrupt ID according to HW register information and irq domain information

(3) Call irq_find_mapping to find the irq number corresponding to the HW interrupt ID

(4) Call handle_IRQ (for ARM platform) to handle the irq number

For the cascading case, the process is similar to the above description, but it should be noted that in step 4, the IRQ's hander is not directly called to process the irq number because the irq needs to be resolved at each interrupt controller level. Take a simple second-level connection: Suppose there are two interrupt controllers, A and B, in the system, A is the root interrupt controller, and B is connected to A's HW interrupt ID No. 13. When the B interrupt controller is initialized, in addition to the part of initializing it as an interrupt controller, there is also the part of initializing it as a common peripheral on the root interrupt controller A. The most important thing is to call irq_set_chained_handler to set the handler. In this way, in the above step 4, the handler corresponding to HW interrupt ID No. 13 (that is, the handler of B) will be called. In this handler, the above (1) to (4) will be repeated.

 

Original article, please indicate the source when forwarding. Snail Technology. http://www.wowotech.net/linux_kenrel/irq-domain.html



Saturday 19 March 2022

is it possible connect two i2c device of same address ?

 According to I2C standards, master can connect with many slaves but with different address. Slave with same address will bring the conflict in state machine of master. Resulting transmission error. This will cause at 9th clock when both slaves will try to pull down SDA.

Practically speaking, master is least bothered if it gets ACK on each 9th clock cycle. If the slaves are identical, like I2C memory from Microchip. Then, it is possible to access both slaves at time.

Every vendor tries to comply with I2C standards though few of the things goes as implementation specific. It means vendor can design however it wish to. So, if you keep different slaves, there are chances that pulling down from slave side will change. This will bring failures in master state machine.

Wednesday 16 March 2022

why local variable address do not as return type of function?

 The return statement should not return a pointer that has the address of a local variable. Because, as soon as the function exits, all local variables are destroyed and your pointer will be pointing to someplace in the memory that you no longer own.


Reference:

https://www.educative.io/edpresso/resolving-the-function-returns-address-of-local-variable-error

Monday 14 March 2022

How to pass strings in c

Mutable strings: 

You are passing it a pointer to a string which you're not allowed to modify (you can't modify literal strings). That could be the cause of the segfault. So instead of using a pointer to the non-modifiable string literal, you could copy it to your own, modifiable buffer, like this:

char mybaz[] = "hello:world";

Immutable strings: 

if you passing it a pointer to a string which you're not allowed to modify (you can't modify literal strings). That could be the cause of the segfault.

char *mybaz = "hello:world";







Reference:

https://stackoverflow.com/questions/1863094/pass-strings-by-reference-in-c

Monday 7 March 2022

Android vendor hook for common kernel

The vendor hook are requires to mainline vendor changes into upstream android common kernel.

Add support for vendor hooks. Adds include/trace/hooks directory for trace definition headers where hooks can be defined and vendor_hook.c for instantiating and exporting them for vendor modules.


There are two variants of vendor hooks, both based on tracepoints:

Normal: this uses the DECLARE_HOOK macro

to create a tracepoint function with the name trace_<name>where <name> is the unique identifier for the trace.

Restricted: restricted hooks are needed for cases like scheduler hooks where the attached function must be called even if the cpu is offline or requires a non-atomic context. Restricted vendor hooks cannot be detached, so modules that attach to a restricted hook can never unload. Also, only 1 attachment is allowed (any other attempts to attach will fail with-EBUSY).

For either case, modules attach to the hook by using

register_trace_<name>(func_ptr, NULL).


New hooks should be defined in headers in the

include/trace/hooks/ directory using the

DECLARE_HOOK() or DECLARE_RESTRICTED_HOOK()

macros.


New files added to include/trace/hooks should

be #include'd from drivers/android/vendor_hooks.c.

The EXPORT_TRACEPOINT_SYMBOL_GPL() should be

also added to drivers/android/vendor_hooks.c.


For example, if a new hook, 'android_vh_foo(int &ret)'

is added in do_exit() in exit.c, these changes are

needed:


1. create a new header file include/trace/hooks/foo.h

which contains:

#include <trace/hooks/vendor_hooks.h>

...

  DECLARE_HOOK(android_vh_foo,

     TP_PROTO(int *retp),

     TP_ARGS(retp);


2. in exit.c, add

#include <trace/hooks/foo.h>

...

  int ret = 0;

...

  android_vh_foo(&ret);

  if (ret)

    return ret;

...


3. in drivers/android/vendor_hooks.c, add

#include <trace/hooks/foo.h>

...

EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_foo);


The hook can then be attached by adding the registration code

to the module:


#include <trace/hooks/sched.h>

...

static void my_foo(int *retp)

{

*retp = 0;

}

...

rc = register_trace_android_vh_sched_exit(my_foo, NULL);


Reference:

https://android.googlesource.com/kernel/common/+/7f62740112ef7260d399a340e210f3a49bc4177e

Friday 4 March 2022

Linux system call flow in ARM64

 ARMv8 has four exception has four levels.

EL0 --  user applications

EL1 --  OS kernel 

EL2  - - Hypervisor for virtualization platform

EL3  -- Secure Monitor firmware

The EL3 to EL0 elevation from one exception level to next exception level are achieved by setting exceptions. These exceptions will be set by one level and the next level will handle it.

The synchronous exception from user space EL0 to kernel EL1 using the svc supervisor call. Thus an application runs in Linux should issue svc with registers set with appropriate values. To know what are those appropriate values, Lets see how kernel handles svc.



Kernel :

Note : https://elixir.bootlin.com/linux/v5.16.10/source/arch/arm64/kernel/entry.S

Vector table :

There are multiple exceptions can be set by applications [EL0] which will be taken by Kernel [EL1]. The handlers for these exceptions are stored in a vector table. In ARMv8 the register that mentions the base address of that vector table is VBAR_EL1 [Vector Base Address Register for EL1].

When an exception occurs, the processor must execute handler code which corresponds to the exception. The location in memory where the handler is stored is called the exception vector. In the ARM architecture, exception vectors are stored in a table, called the exception vector table. Each Exception level has its own vector table, that is, there is one for each of EL3, EL2 and EL1. The table contains instructions to be executed, rather than a set of addresses. Vectors for individual exceptions are located at fixed offsets from the beginning of the table. 
 The virtual address of each table base is set by the Vector Based Address Registers VBAR_EL3, VBAR_EL2 and VBAR_EL1.

Linux defines the vector table at arch/arm64/kernel/entry.S + 493. Eachkerenl_ventry is 32 instructions long. As an instruction in ARMv8 is 4 bytes long, next kerenl_ventry will start at +0x80 of current kerenl_ventry.

ARM infocenter.

The exception-handlers reside in a continuous memory and each vector spans up to 32 instructions long. Based on type of the exception, the execution will start from an instruction in a particular offset from the base address VBAR_EL1. Below is the ARM64 vector table. For example when an synchronous exception is set from EL0 is set, the handler at VBAR_EL1 +0x400 will execute to handle the exception


Offset from VBAR_EL1Exception typeException set level
+0x000SynchronousCurrent EL with SP0
+0x080IRQ/vIRQ
+0x100FIQ/vFIQ
+0x180SError/vSError
+0x200SynchronousCurrent EL with SPx
+0x280IRQ/vIRQ
+0x300FIQ/vFIQ
+0x380SError/vSError
+0x400SynchronousLower EL using ARM64
+0x480IRQ/vIRQ
+0x500FIQ/vFIQ
+0x580SError/vSError
+0x600SynchronousLower EL with ARM32
+0x680IRQ/vIRQ
+0x700FIQ/vFIQ
+0x780SError/vSError


Linux defines the vector table at arch/arm64/kernel/entry.S + 493. Eachkerenl_ventry is 32 instructions long. As an instruction in ARMv8 is 4 bytes long, next kerenl_ventry will start at +0x80 of current kerenl_ventry.

ENTRY(vectors)
kernel_ventry 1, t, 64, sync // Synchronous EL1t
kernel_ventry 1,t, 64, irq // IRQ EL1t
kernel_ventry 1,t, 64, fiq // FIQ EL1t
kernel_ventry 1,t, 64, error // Error EL1t

kernel_ventry 1,h, 64 sync // Synchronous EL1h
kernel_ventry 1,h, 64 irq // IRQ EL1h
kernel_ventry 1,h, 64 fiq // FIQ EL1h
kernel_ventry 1,h, 64 error // Error EL1h

kernel_ventry 0,t, 64 sync // Synchronous 64-bit EL0
kernel_ventry 0,t, 64 irq // IRQ 64-bit EL0
kernel_ventry 0,t, 64 fiq // FIQ 64-bit EL0
kernel_ventry 0,t, 64 error // Error 64-bit EL0

    kernel_ventry 0,t, 32 sync // Synchronous 32-bit EL0
kernel_ventry 0,t, 32 irq // IRQ 32-bit EL0
kernel_ventry 0,t, 32 fiq // FIQ 32-bit EL0
kernel_ventry 0,t, 32 error // Error 32-bit EL0
END(vectors)

Loads the vector table into VBAR_EL1 at arch/arm64/kernel/head.S +429


adr_l   x8, vectors    // load VBAR_EL1 with virtual
msr     vbar_el1, x8   // vector table address
isb                       // instruction set barrier

VBAR_EL1 is an system register. So it cannot be accessed directly. Special system instructions msr and mrs should be used manipulate system registers.
InstructionDescription
adr_l x8, vectorloads the address of vector table into general purpose register X8
msr vbar_el1, x8moves value in X8 to system register VBAR_EL1
isbinstruction sync barrier

System call flow in Kernel

Lets see what happens when an application issues the instruction svc. From thtable, we can see for AArch64 synchronous exception from lower level, the offset is +0x400. In the Linux vector definition VBAR_EL1+0x400 is el0t_64_sync. it call el0t_64_sync_handler definition at arch/arm64/kernel/entry-common.c + 615

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
35
35
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55


asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
{
	unsigned long esr = read_sysreg(esr_el1); //read the syndrome register
switch (ESR_ELx_EC(esr)) { case ESR_ELx_EC_SVC64: el0_svc(regs); // SVC in 64-bit state
break; case ESR_ELx_EC_DABT_LOW: el0_da(regs, esr); // Data abort in EL0
break; case ESR_ELx_EC_IABT_LOW: el0_ia(regs, esr); // instruction abort in EL0 break; case ESR_ELx_EC_FP_ASIMD: el0_fpsimd_acc(regs, esr); // FP/ASIMD access break; case ESR_ELx_EC_SVE: el0_sve_acc(regs, esr); // SVE access in EL0 break; case ESR_ELx_EC_FP_EXC64: //FP access execution el0_fpsimd_exc(regs, esr); // break; case ESR_ELx_EC_SYS64: case ESR_ELx_EC_WFx: el0_sys(regs, esr); //configurable trap break; case ESR_ELx_EC_SP_ALIGN: el0_sp(regs, esr); //stack alignment exception break; case ESR_ELx_EC_PC_ALIGN: el0_pc(regs, esr); //PC alignment exception break; case ESR_ELx_EC_UNKNOWN: el0_undef(regs); //Unknown error break; case ESR_ELx_EC_BTI: //unallocated exception el0_bti(regs); break; case ESR_ELx_EC_BREAKPT_LOW: case ESR_ELx_EC_SOFTSTP_LOW: case ESR_ELx_EC_WATCHPT_LOW: case ESR_ELx_EC_BRK64: el0_dbg(regs, esr); //Debug exception break; case ESR_ELx_EC_FPAC: el0_fpac(regs, esr); break; default: el0_inv(regs, esr); } }

The synchronous exception can have multiple reasons which will be stored in the syndrome register esr_el1. Compare the value in syndrome register with predefined macros and branch to the corresponding subroutine.

In a system call case, control will be branched to el0_svc and it call do_e10_svc. It is defined at arm64/kernel/entry-common.c +599 and arch/arm64/kernel/syscall.c +178 as follows


/*
* SVC handler.
*/


static void noinstr el0_svc(struct pt_regs *regs) { enter_from_user_mode(regs); cortex_a76_erratum_1463225_svc_handler(); do_el0_svc(regs); exit_to_user_mode(regs); }

void do_el0_svc(struct pt_regs *regs)
{
	sve_user_discard();
	el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
}

static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
			   const syscall_fn_t syscall_table[])
{
   invoke_syscall(regs, scno, sc_nr, syscall_table); //system call invoke here
}                                                                                                              

sys_call_table

It is nothing but an array of function pointer indexed with the system call number. It has to be placed in an 4K aligned memory. For ARM64 sys_call_table is defined at arch/arm64/kernel/sys.c +58.

#undef __SYSCALL
#define __SYSCALL(nr, sym) [nr] = sym,

/*
* The sys_call_table array must be 4K aligned to be accessible from
* kernel/entry.S.
*/
void * const sys_call_table[__NR_syscalls] __aligned(4096) = {
[0 ... __NR_syscalls - 1] = sys_ni_syscall,
#include <asm/unistd.h>
};
  • __NR_syscalls defines the number of system call. This varies from architecture to architecture.
  • Initially all the system call numbers were set sys_ni_syscall - not implemented system call. If a system call is removed, its system call number will not be reused. Instead it will be assigned with sys_ni_syscall function.
  • And the include goes like this arch/arm64/include/asm/unistd.h -> arch/arm64/include/uapi/asm/unistd.h -> include/asm-generic/unistd.h -> include/uapi/asm-generic/unistd.h. The last file has the definition of all system calls. For example the write system call is defined here as

1
2
#define __NR_write 64
__SYSCALL(__NR_write, sys_write)

  • The sys_call_table is an array of function pointers. As in ARM64 a function pointer is 8 bytes long, to calculate the address of actual system call, system call number scno is left shifted by 3 and added with system call table address.

System call definition

Each system call is defined with a macro SYSCALL_DEFINEn macro. n is corresponding to the number of arguments the system call accepts. For example the write is implemented at fs/read_write.c +652

1
2
3
4

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
size_t, count)
{
return ksys_write(fd, buf, count);
}

This macro will expand into sys_write function definition and other aliases functions as mentioned in this LWN article. The expanded function will have the compiler directive asmlinkage set. It instructs the compiler to look for arguments in CPU stack instead of registers. This is to implement system calls architecture independent. That’s why kernel_entry macro in el0_sync pushed all general purpose registers into stack. In ARM64 case registers X0 to X7 will have the arguments.

Application Flow






Reference: