Thursday, 21 April 2022

Linux Secondary Core Bootup on AArch64

 The boot flow diagram 

  



CPU description in Device tree:

In ARM v8 64-bit systems this property is required and matches the MPIDR_EL1 register affinity bits.

* If cpus node's #address-cells property is set to 2
The first reg cell bits [7:0] must be set to bits [39:32] of MPIDR_EL1.
The second reg cell bits [23:0] must be set to bits [23:0] of MPIDR_EL1.

* If cpus node's #address-cells property is set to 1
The reg cell bits [23:0] must be set to bits [23:0] of MPIDR_EL1.

All other bits in the reg cells must be set to 0.
compatible:
    enum:
      - arm,cortex-a78
enable-method:
# On ARM v8 64-bit this property is required
      - enum:
          - psci
          - spin-table
cpu-release-addr:
    description:
     The DT specification defines this as 64-bit always, but some 32-bit Arm
     systems have used a 32-bit value which must be supported.
     Required for systems that have an "enable-method"
     property value of "spin-table".
cpu-idle-states:
   items:
      maxItems: 1
    description: |
      List of phandles to idle state nodes supported by this cpu.

capacity-dmips-mhz:
    description:
      u32 value representing CPU capacity in DMIPS/MHz, relative to highest capacity-dmips-mhz in the system.

cci-control-port: true

dynamic-power-coefficient:
description:
      A u32 value that represents the running time dynamic power coefficient in units of uW/MHz/V^2. The coefficient can either be calculated from power measurements or derived by analysis.

      The dynamic power consumption of the CPU  is proportional to the square of the Voltage (V) and the clock frequency (f). The coefficient is used to calculate the dynamic power as below -

      Pdyn = dynamic-power-coefficient * V^2 * f

      where voltage is in V, frequency is in MHz.

performance-domains:
    maxItems: 1
    description:
      List of phandles and performance domain specifiers, as defined by
      bindings of the performance domain provider. See also
      dvfs/performance-domain.yaml.

power-domains:
    description:
      List of phandles and PM domain specifiers, as defined by bindings of the
      PM domain provider (see also ../power_domain.txt).

power-domain-names:
    description:
      A list of power domain name strings sorted in the same order as the
      power-domains property.

      For PSCI based platforms, the name corresponding to the index of the PSCI
      PM domain provider, must be "psci".

qcom,saw:
    
    description: 
      Specifies the SAW* node associated with this CPU.

      Required for systems that have an "enable-method" property
      value of "qcom,kpss-acc-v1" or "qcom,kpss-acc-v2"

      * arm/msm/qcom,saw2.txt

qcom,acc:
    
    description: 
      Specifies the ACC* node associated with this CPU.

      Required for systems that have an "enable-method" property
      value of "qcom,kpss-acc-v1", "qcom,kpss-acc-v2", "qcom,msm8226-smp" or
      "qcom,msm8916-smp".
secondary-boot-reg:
   
    description: 
      Required for systems that have an "enable-method" property value of
      "brcm,bcm11351-cpu-method", "brcm,bcm23550" or "brcm,bcm-nsp-smp". 
      This binding defines the enable method used for starting secondary CPUs in the following Broadcom SoCs
      This includes the following SoCs: |
      BCM11130, BCM11140, BCM11351, BCM28145, BCM28155, BCM21664, BCM23550
      BCM58522, BCM58525, BCM58535, BCM58622, BCM58623, BCM58625, BCM88312

      The secondary-boot-reg property is a u32 value that specifies the
      physical address of the register used to request the ROM holding pen
      code release a secondary CPU. The value written to the register is
      formed by encoding the target CPU id into the low bits of the
      physical start address it should jump to.


Example:
cpu@1 {
        device_type = "cpu";
        compatible = "arm,cortex-a78";
        reg = <0x0 0x1>;
        enable-method = "spin-table";
        cpu-release-addr = <0 0x20000000>;
      };


1. Bootloader starts first in the upper part of the picture;

2. The Kernel is in the lower part of the picture, which is guided by the bootloader;

3. The execution flow of CPU0 is in the left half of the picture, and the bootloader code will judge and start CPU0 first;

4. Secondary CPUs are in the right half of the picture and are awakened by the CPU

The specific startup process is as follows:

1. When bootloader/ABL starts, it will judge whether the executing code is CPU0, if not, execute wfe and wait for CPU0 to issue sev instruction to wake up. If it is CPU0, continue the initialization work.

         mrs x4,mpidr_el1

         tst
    x4,#15
             //testwether the current cpu is CPU0, ie. mpidr_el1=15

         b.eq 2f

/*

 * Secondary CPUs

 */

1: wfe

ldr x4, mbox               

cbz x4, 1b        //if x4==0(ie. The value in address of mbox is 0) dead loop,or jump to x4

br x4//branch to thegiven address

2:……//UART initialisation (38400 8N1)

The address of the above mbox is written in the Makefile as 0x20000000, and the initial state content at this address is all 0s. The above code judges that if the content at the mbox address is 0, there is an endless loop; if it is not 0, it directly jumps to the content contained in the address for execution.

2. In dts, assign cpu-release-addr and set its address to 0x20000000. That is, as long as the corresponding value is written to the address, such as address A, and the sev instruction is sent, the secondary CPU can be awakened and jump to the address A for execution.

     cpu-release-addr = <0 0x20000000>;

3. The smp_prepare_cpus function in the kernel assigns the value at the address of 0x20000000, and its value is the address of the function secondary_holding_pen:

 When the secondary cpu executes the secondary_holding_pen() function, it will judge the ID of the current CPU and compare it with the secondary_holding_pen_release variable. If it is equal, perform further initialization, otherwise perform WFE wait;

The modification process of the secondary_holding_pen_release variable is carried out by CPU0 calling the smp_init() function. This function first binds an idle thread to the corresponding CPU, then modifies the value of secondary_holding_pen_release (its value is the ID of the CPU to be awakened by CPU0), and finally sends the sev instruction to wake up the corresponding CPU to execute the idle thread.

NTRY(secondary_holding_pen)
	bl	el2_setup			// Drop to EL1, w0=cpu_boot_mode
	bl	set_cpu_boot_mode_flag
	mrs	x0, mpidr_el1
	mov_q	x1, MPIDR_HWID_BITMASK
	and	x0, x0, x1
	adr_l	x3, secondary_holding_pen_release
pen:	ldr	x4, [x3]
	cmp	x4, x0
	b.eq	secondary_startup
	wfe
	b	pen
ENDPROC(secondary_holding_pen)
  
5. The primary core goes through head.S and the most of the functions in start_kernel, it will spawn a kernel thread in rest_init, which will eventually run kernel_init –> kernel_init_freeable –> smp_init.
  • smp_init will call idle_threads_init to fork a swapper for each cpu, share the same PID but with different thread_info and task_struct, as shown below.
void __init smp_init(void)
{
	unsigned int cpu;

	idle_threads_init();
	cpuhp_threads_init();

	/* FIXME: This should be done in userspace --RR */
	for_each_present_cpu(cpu) {
		if (num_online_cpus() >= setup_max_cpus)
			break;
		if (!cpu_online(cpu))
			cpu_up(cpu);
	}

	/* Any cleanup work */
	smp_announce();
	smp_cpus_done(setup_max_cpus);
}


void __init idle_threads_init(void)
{
	unsigned int cpu, boot_cpu;

	boot_cpu = smp_processor_id();

	for_each_possible_cpu(cpu) {
		if (cpu != boot_cpu)
			idle_init(cpu);
	}
}


static inline void idle_init(unsigned int cpu)
{
	struct task_struct *tsk = per_cpu(idle_threads, cpu);

	if (!tsk) {
		tsk = fork_idle(cpu);
		if (IS_ERR(tsk))
			pr_err("SMP: fork_idle() failed for CPU %u\n", cpu);
		else
			per_cpu(idle_threads, cpu) = tsk;
	}
}
static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
{
	......
	target = min((int)target, CPUHP_BRINGUP_CPU);
	ret = cpuhp_up_callbacks(cpu, st, target);
out:
	cpu_hotplug_done();
	return ret;
}

static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
			      enum cpuhp_state target)
{
	enum cpuhp_state prev_state = st->state;
	int ret = 0;

	while (st->state < target) {
		st->state++;
		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL);
		if (ret) {
			st->target = prev_state;
			undo_cpu_up(cpu, st);
			break;
		}
	}
	return ret;
}


static struct cpuhp_step cpuhp_bp_states[] = {
	[CPUHP_BRINGUP_CPU] = {
		.name			= "cpu:bringup",
		.startup.single		= bringup_cpu,
		.teardown.single	= NULL,
		.cant_stop		= true,
	},
};

static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
				 bool bringup, struct hlist_node *node)
{
	......
	if (!step->multi_instance) {
		cb = bringup ? step->startup.single : step->teardown.single;
		ret = cb(cpu);
	}
	......
}

_cpu_up -> cpuhp_up_callbacks -> cpuhp_invoke_callback, which will call cpuhp_bp_states->startup.single, which points to bringup_cpu. bringup_cpu -> __cpu_up -> boot_secondary –> cpu_ops[cpu]->cpu_boot, which is smp_spin_table_cpu_boot –> write_pen_release.

static void write_pen_release(u64 val)
{
	void *start = (void *)&secondary_holding_pen_release;
	unsigned long size = sizeof(secondary_holding_pen_release);

	secondary_holding_pen_release = val;
	__flush_dcache_area(start, size);
}



In write_pen_release, the variable secondary_holding_pen_release will be updated, which allows the secondary core breaks the loop of secondary_holding_pen, the boot process can finally move on.

Modern age: jump to C world

secondary_startup:
	/*
	 * Common entry point for secondary CPUs.
	 */
	bl	__cpu_setup			// initialise processor
	bl	__enable_mmu
	ldr	x8, =__secondary_switched
	br	x8
ENDPROC(secondary_startup)

__secondary_switched:
	adr_l	x5, vectors
	msr	vbar_el1, x5
	isb

	adr_l	x0, secondary_data
	ldr	x0, [x0, #CPU_BOOT_STACK]	// get secondary_data.stack
	mov	sp, x0
	and	x0, x0, #~(THREAD_SIZE - 1)
	msr	sp_el0, x0			// save thread_info
	mov	x29, #0
	b	secondary_start_kernel
ENDPROC(__secondary_switched)

// in arch/arm64/kernel/smp.c
asmlinkage void secondary_start_kernel(void) {}

secondary_holding_pen -> secondary_startup -> __secondary_switched -> secondary_start_kernel, which is in arch/arm64/kernel/smp.c. The secondary bootup is done. To summarize, a figure from wowotech explains the overall secondary core bootup steps.

secondary


References:
https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cpus.yaml
https://devicetree-specification.readthedocs.io/en/latest/chapter3-devicenodes.html
https://blog.actorsfit.com/a?ID=00450-670b3808-c2a6-411c-9ad1-dae93038ab9a
https://wenboshen.org/posts/2016-12-21-secondary-bootup.html