Tuesday, 23 August 2016

Process Segments and VMA

Process Segments:

1. Introduction

Traditionally, a Unix process is divided into segments. The standard segments are code segment, data segment, BSS (block started by symbol), and stack segment.
The code segment contains the binary code of the program which is running as the process (a "process" is a program in execution).
 The data segment contains the initialized global variables and data structures.
The BSS segment contains the uninitialized global data structures and finally, 
the stack segment contains the local variables, return addresses, etc. for the particular process.
Under Linux, a process can execute in two modes - user mode and kernel mode. A process usually executes in user mode, but can switch to kernel mode by making system calls. When a process makes a system call, the kernel takes control and does the requested service on behalf of the process. The process is said to be running in kernel mode during this time. When a process is running in user mode, it is said to be "in userland" and when it is running in kernel mode it is said to be "in kernel space". We will first have a look at how the process segments are dealt with in userland and then take a look at the bookkeeping on process segments done in kernel space.

2. Userland's view of the segments

The code segment consists of the code - the actual executable program. The code of all the functions we write in the program resides in this segment. The addresses of the functions will give us an idea where the code segment is. If we have a function foo() and let x be the address of foo (x = &foo;). we know that x will point within the code segment.
The Data segment consists of the initialized global variables of a program. The Operating system needs to know what values are used to initialize the global variables. The initialized variables are kept in the data segment. To get the address of the data segment we declare a global variable and then print out its address. This address must be inside the data segment.
The BSS consists of the uninitialized global variables of a process. To get an address which occurs inside the BSS, we declare an uninitialized global variable, then print its address.
The automatic variables (or local variables) will be allocated on the stack, so printing out the addresses of local variables will provide us with the addresses within the stack segment.

3. A C program

Let's have a look at the following C program:
1 #include 
 2 #include 
 3 #include 
 4 #include 
 5
 6 int our_init_data = 30;
 7 int our_noinit_data;
 8
 9 void our_prints(void)
10 {
11         int our_local_data = 1;
12         printf("\nPid of the process is = %d", getpid());
13         printf("\nAddresses which fall into:");
14         printf("\n 1) Data  segment = %p",
15                 &our_init_data);
16         printf("\n 2) BSS   segment = %p",
17                 &our_noinit_data);
18         printf("\n 3) Code  segment = %p",
19                 &our_prints);
20         printf("\n 4) Stack segment = %p\n",
21                 &our_local_data);
22
23         while(1);
24 }
25
26 int main()
27 {
28         our_prints();
29         return 0;
30 }
 
We can see that lines 6 and 7 declare two global variables. One is
initialized and one is uninitialized. Per the previous discussion,
the initialized variable will fall into the data segment and the
uninitialized variable will fall into the BSS segment. Lines 14-17
print the addresses of the variables. 
 
We also know that the address of the function our_prints will fall into the code segment, so that if we print the address of this function, we will get a value which falls into the code segment. This is done in lines 18-19.
Finally we print the address of a local variable. This automatic variable's address will be within the stack segment.

venkat@pari-ubuntu:test$ ./a.out

Pid of the process is = 3441
Addresses which fall into:
 1) Data  segment = 0x601028
 2) BSS   segment = 0x601040
 3) Code  segment = 0x400564
 4) Stack segment = 0x7fff0193afec

4. Execution of a userland program

When we execute a userland program, similar to the one given above, what happens is that the shell will fork() and exec() the new program. The exec() code inside the kernel will figure out what format the binary is in (ELF, a.out, etc.) and will call the corresponding handler for that format. For example when an ELF format file is loaded, the function load_elf_binary() from fs/binfmt_elf.c takes care of initializing the kernel data structures for the particular process. Details of this portion of loading will not be dealt with here, as that in itself is a topic for another article :-) The point here is that the code which loads the executable into the kernel fills in the kernel data structures.


5. Memory-related data structures in the kernel

In the Linux kernel, every process has an associated struct task_struct. The definition of this struct is in the header file include/linux/sched.h. The following snippet is from the 2.6.10 Linux kernel source code (only the needed fields and a few nearby fields are shown):

struct task_struct {
        volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
        atomic_t usage;
        ...
        ...
        ...
        struct mm_struct *mm, *active_mm;
        ...
        ...
        ...
        pid_t pid;
        ...
        ...
        ...
        char comm[16];
        ...
        ...
};

Three members of the data structure are relevant to us:
  1. pid contains the Process ID of the process.
  2. comm holds the name of the process.
  3. The mm_struct within the task_struct is the key to all memory management activities related to the process.
The mm_struct is defined in include/linux/sched.h as: 


struct mm_struct {
        struct vm_area_struct * mmap;           /* list of VMAs */
        struct rb_root mm_rb;
        struct vm_area_struct * mmap_cache;     /* last find_vma result */
        ...
        ...
        ...
        unsigned long start_code, end_code, start_data, end_data;
        unsigned long start_brk, brk, start_stack;
        ...
        ...
        ...
};
 
 
 


Linux Boot process on ARM CPU

 Linux Boot sequence on ARM CPU

Bootloader preparations

Before jumping to kernel entry point boot loader should do at least the following:
1. Setup and initialise the RAM.
2. Initialise one serial port. 
3. Detect the machine type.    
4. Setup the kernel tagged list.
5. Call the kernel image. 
 CPU register settings
  r0 = 0,
  r1 = machine type number discovered in (3) above.
  r2 = physical address of tagged list in system RAM, or
       physical address of device tree block (dtb) in system RAM

Low level kernel init

Kernel entry point is arch/arm/kernel/head.S:stext
At this point we save values from boot loader, check that CPU is in correct state, enable low level debug if enabled. Prepare MMU and enable it. Call trace is below.
1) ./arch/arm/kernel/head.S:stext()
//For early printk example look the trace below. For stages w/o MMU you should just set omap_uart_phys as one defined in uboot config file.
   ./arch/arm/kernel/head-common.S:__error_p() ->  printascii() -> addruart_current() ->
        arch/arm/mach-omap2/include/mach/debug-macro.S:addruart()
2) ./arch/arm/kernel/head.S:__enable_mmu()
3) ./arch/arm/kernel/head.S:__turn_mmu_on()
4) ./arch/arm/kernel/head-common.S:__mmap_switched()
5) init/main.c:start_kernel()

Low level debug

To enable console output as soon as possible, CONFIG_EARLY_PRINTK should be enabled.
Then correct physical and virtual address should be set up (for example see [2]). At this point kernel just writes output symbols directly to specific addresses instead of using kernel log daemon.
If UART address values are set up properly you can use assembler routines like ./arch/arm/kernel/head-common.S:printascii() from assembler code. From kernel code you can use early_printk() routine, which actually use same assembler code to throw output symbols though console.

Single thread kernel initialization

Code is in init/main.c:kernel_start() does the following things:
1) Obtain CPU id
2) Initialize runtime locking correctness validator
3) Initialize object tracker. (initialize the hash buckets and link the static object pool objects into the poll list)
4) Set up the the initial canary   (GCC stack protector support. Stack protector works by putting predefined pattern at the start of  the stack frame and verifying that it hasn't been overwritten when returning from the function.  The pattern is called stack canary and gcc expects it to be defined by a global variable called "__stack_chk_guard" on ARM.  This unfortunately means that on SMP we cannot have a different canary value per task. 
5) Initialize cgroups at system boot, and initialize any subsystems that request early init. (cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.)
==== Disable IRQ ====
6) initialize the tick control (Register the notifier with the clockevents framework)
7) Activate the first processor.
8) Initialize page address pool
9) Setup architecture
   a) Setup CPU configuration and CPU initialization
   b) Setup machine device tree (tags)
   c) Parse early parameters
   d) Initialize mem blocks
   e) sets up the page tables, initialises the zone memory maps, and sets up the zero page, bad page and bad page tables.
   f)  Unflatten device tree
   g) Store callbacks from machine description
   h) Init other CPUs if necessary
   i) reserves memory area given in "crashkernel=" kernel command line parameter. The memory reserved is used by a dump capture kernel when primary kernel is crashing.
   j) Initialize TCM memory (Tightly-coupled Memory, memory which resides directly on the processor of a computer)
   k) Early trap initialization
   l) Call machine early_init routine (if exists)
10) Setup init mm owner and cpumask
11) Store command line (We need to store the untouched command line for future reference. We also need to store the touched command line since the parameter  parsing is performed in place, and we should allow a component to store reference of name/value for future reference.)
12) Save nubmber of CPU IDs
13) SMP percpu area setup
14) Run arch-specific boot CPU hooks (??? SMP staff)
15) Build zone lists
16) Initialize page allocation
17) Parse early parameters (earlycon, console)
18) Parse other parameters
19) Initialize jump_labels
20) Setup log buffer
21) Initialize pid hash table
22) early initialization of vfs caches
23) Sort the kernel's built-in exception table
24) trap initialization (not implemeted for ARM)
25) Set up kernel memory allocators
26) Set up the scheduler prior starting any interrupts (such as the timer interrupt). Full topology setup happens at smp_init() time - but meanwhile we still have a functioning scheduler.
==== Disable preemption ====
27) initialize idr cache (Small id to pointer translation service.)
28) Initialize performance events core
29) Initialize RCU (Read-Copy Update mechanism for mutual exclusion)
30) Initialize radix tree
31) Early IRQ init (init some links before init_ISA_irqs())
32) IRQ init
33) Initialize priority search tree (A clever mix of heap and radix trees forms a radix priority search tree which is useful for storing intervals.)
34) Init timers
35) Init HR timers
36) Init  soft IRQ
37) Initializes the clocksource and common timekeeping values
38) Set machine timer as a system one and initialize it
39) Initialize simple kernel profiler
40) Register CPUs going up/down notifiers
====Enable IRQ====
41) Late initialize of kmem cache
42) Initialize console
43) Fall with panic here if needed
44) Run lock dependency validator
45) Run locking API test suite
46) Check initrd was not overwritten (if needed)
47) Initialize page cgroup
48) Enable debug page allocation
49) Initialize debug memory objects (Called after the kmem_caches are functional to setup a dedicated cache pool, which has the SLAB_DEBUG_OBJECTS flag set. This flag prevents that the debug code is called on kmem_cache_free() for the debug tracker objects to avoid recursive calls.)
50) Initialize kmemleaks (Kmemleak provides a way of detecting possible kernel memory leaks in a way similar to a tracing garbage collector with the difference that the orphan objects are not freed but only reported via /sys/kernel/debug/kmemleak. A similar method is used by the Valgrind tool (memcheck --leak-check) to detect the memory leaks in user-space applications.)
51) Allocate per cpu pagesets and initialize them.
52) Numa policy initialization (Non Uniform Memory Access policy)
53) Run late time init if provided (Machine specific ?)
54) Initialize schedule clock
55) Calibrating delay loop
56) Initialize pid hash table
57) anon_vma_init (?)
58) initialise the credentials stuff
59) Initialize fork (Allocate space for task structures )
60) Prepare proc caches (allocate memory for fork)
61) Allocate kernel buffer
62) Initialize the key management state.
63) Initialize security framework
64) Late gdb initialization
65) Initialize VFS caches
66) Initialize signals
67) Initialize page write-back
68) Initialize proc FS (if enabled)
69) Initialize cgroups (Register cgroup filesystem and /proc file, and initialize any subsystems that didn't request early init.
70) Initialize top_cpuset and the cpuset internal file system
71) early initialization of taskstat (Export per-task statistics to userland)
72) Initialize per-task delay accounting
73) Check write buffer bugs
74) Early initialization of ACPI
75) Late initialization of SFI (Simple Firmware Interface)
76) Ftrace initialization
77) Do the rest non-__init'ed, we're now alive (Create bunch of threads and call schedule to get things moving) Code in init/main.c:rest_init() routine.
   a) Create kernel init thread.
   b) Create kthreadd thread
   c) Prepare scheduler
==== Enable preemption ====
   d) call schedule()

Late kernel initialization 

kthread() thread

1) Setup a clean context for our children to inherit
2) If kthread_create_list empty just reschedule
3) Create kernel thread for every task in kthread_create_list 
4) Go back to step 2  

kernel_init() thread

1) Wait for kernel thread daemon initialization completion
2) Setup init permissions:
    a) init can allocate pages on any node
    b) init can run on any cpu
3) Prepare CPUs for smp
4) Do pre SMP init calls
5) Init lockup detector
6) Enable SMP
7) Initialize SMP support in scheduler
8) Initialize devices in init/main.c:do_basic_setup() (Ok, the machine is now initialized. None of the devices have been touched yet, but the CPU subsystem is up and running, and memory and process management works. Now we can finally start doing some real work..)
    a) Finish top cpuset after cpu, node maps are initialized
    b) Initialize user mode helper
    c) Initialize shmem
    d) Initialize drivers
    e) Initialize /proc/irq handling code
    f) Call all constructor functions linked into the kernel
    g) usermodehelper_enable - allow new helpers to be started again
    h) Do init calls
9) Open the /dev/console on the rootfs
10) check if there is an early userspace init.  If yes, let it do all the work
11) Run late init (Ok, we have completed the initial bootup, and we're essentially up and running. Get rid of the initmem segments and start the user-mode stuff..)
    a) finish all async __init code before freeing the memory
    b) Free init memory
    c) Mark readonly data as RO
    d) Set system state to  Running
    e) Try to execute userspace init command

Monday, 22 August 2016

Exploration of ARM TrustZone Technology

ARM TrustZone technology has been around for almost a decade. It was introduced at a time when the controversial discussion about trusted platform-modules (TPM) on x86 platforms was in full swing (TCPA, Palladium). Similar to how TPM chips were meant to magically make PCs "trustworthy", TrustZone aimed at establishing trust in ARM-based platforms. In contrast to TPMs, which were designed as fixed-function devices with a predefined feature set, TrustZone represented a much more flexible approach by leveraging the CPU as a freely programmable trusted platform module. To do that, ARM introduced a special CPU mode called "secure mode" in addition to the regular normal mode, thereby establishing the notions of a "secure world" and a "normal world". The distinction between both worlds is completely orthogonal to the normal ring protection between user-level and kernel-level code and hidden from the operating system running in the normal world. Furthermore, it is not limited to the CPU but propagated over the system bus to peripheral devices and memory controllers. This way, such an ARM-based platform effectively becomes a kind of split personality. When secure mode is active, the software running on the CPU has a different view on the whole system than software running in non-secure mode. This way, system functions, in particular security functions and cryptographic credentials, can be hidden from the normal world. It goes without saying that this concept is vastly more flexible than TPM chips because the functionality of the secure world is defined by system software instead of being hard-wired

https://genode.org/documentation/articles/trustzone

Swap two variables without using third variable.

Swap two variables without using third variable.

Explanation:
#include<stdio.h>
int main(){
    int a=5,b=10;
//process one
    a=b+a;
    b=a-b;
    a=a-b;
    printf("a= %d  b=  %d",a,b);
//process two
    a=5;
    b=10;
    a=a+b-(b=a);
    printf("\na= %d  b=  %d",a,b);
//process three
    a=5;
    b=10;
    a=a^b;
    b=a^b;
    a=b^a;
    printf("\na= %d  b=  %d",a,b);
   
//process four
    a=5;
    b=10;
    a=b-~a-1;
    b=a+~b+1;
    a=a+~b+1;
    printf("\na= %d  b=  %d",a,b);
   
//process five
    a=5,
    b=10;
    a=b+a,b=a-b,a=a-b;
    printf("\na= %d  b=  %d",a,b);
    return 0;
}

printf("%d%d%d%d%d%d",i++,i--,++i,--i,i);

main()
{
int i=3;
printf("%d%d%d%d%d%d",i++,i--,++i,--i,i);
}
Answer:
         23323

Explanation: The arguments in a function call are pushed into the stack from left to right. The evaluation is by popping out from the stack. and the  evaluation is from right to left, hence the result.

Wednesday, 10 August 2016

how to enable and analysis core dump or crash dump in linux ?

Core dump is not enabled by default in embedded systems mainly due to memory limitations. I found this the hard way. I was setting the ulimit to unlimited (ulimit -c unlimited) and also setting the core pattern to directory with read-write permissions(echo /var/tmp/core > /proc/sys/kernel/core_pattern). But core was still not getting dumped when I sent the segmentation fault (kill -11 <pid>) to the the process. Here are some of the reasons why core dump was not generating :

Most of the embedded systems uses busybox. So you need to make sure busybox is configured properly.
You need to have the .init_enable_core for enabling the core dump in the root directory. root directory on most of the embedded systems is read only. So you need to modify your Makefile or post build script to add empty empty file . init_enable_core for the target rootfs.
You need to enable CONFIG_FEATURE_INIT_COREDUMPS for busybox in your .config file.
Problem with custom Signal handlers:
If you have custom signal handlers installed then core will not be dumped for that signal. So you need to make sure that you don’t have custom signal handler installed for the signal for which you are trying to get core dump.
There are lot of information on the web on enabling the core dump. But most of it is for desktop linux distributions and not for embedded linux.


Let’s trigger a crash, and use the dump we obtain to understand the Crash utility. Trigger a crash by trying the following command:
echo c > /proc/sysrq-trigger

This will trigger a panic, and the system boots into the crash kernel, and takes a dump of system memory into the directory /var/crash/<date-time>/. This is named vmcore. Once done, it boots back to the normal kernel.

With the help of the vmcore, vmlinux and system-map files, we will invoke the Crash tool, and view the sample


et’s trigger a crash, and use the dump we obtain to understand the Crash utility. Trigger a crash by trying the following command:
echo c > /proc/sysrq-trigger

This will trigger a panic, and the system boots into the crash kernel, and takes a dump of system memory into the directory /var/crash/<date-time>/. This is named vmcore. Once done, it boots back to the normal kernel.

With the help of the vmcore, vmlinux and system-map files, we will invoke the Crash tool, and view the sample output from it:
[root@DELL-RnD-India linux-2.6]# crash -S System.map vmlinux /var/crash/2011-01-10-12\:23/vmcore

crash 5.1.1
---snip---
crash: overriding /boot/System.map with System.map
GNU gdb (GDB) 7.0
This GDB was configured as "x86_64-unknown-linux-gnu"...
---snip------
  SYSTEM MAP: System.map                
DEBUG KERNEL: vmlinux (2.6.36-rc6-ftrace+)
  DUMPFILE: /var/crash/2011-01-10-12:23/vmcore
        CPUS: 4
        DATE: Mon Jan 10 12:21:33 2011
      UPTIME: 00:06:56
LOAD AVERAGE: 0.80, 0.65, 0.31
       TASKS: 278
    NODENAME: DELL-RnD-India
     RELEASE: 2.6.36-rc6-ftrace+
     VERSION: #2 SMP Wed Sep 29 16:43:59 IST 2010
     MACHINE: x86_64  (2666 Mhz)
      MEMORY: 2 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 7203
     COMMAND: "bash"
        TASK: ffff88007b0d0000  [THREAD_INFO: ffff88007a6ba000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash>

The above output shows you details about the kernel, the number of processors on the target machine, the command which caused the panic, etc.
Note: Crash can also be invoked on a live system with /dev/mem instead of the vmcore file. For this to work, you need to disable the CONFIG_STRICT_DEVMEM option while compiling the kernel. Stock kernels come with this option enabled, and will not let you use it.
The help command

The most useful command would be the help command, which gives you all the available commands from within the crash tool:
 t             gdb            p              sig            waitq         
btop           help           ps             struct         whatis        
dev            irq            pte            swap           wr            
dis            kmem           ptob           sym            q             
eval           list           ptov           sys           
exit           log            rd             task          
extend         mach           repeat         timer         

crash version: 5.1.1 gdb version: 7.0

To obtain help on any command, run help followed by the command name — for example, help vm.
The bt command

The bt (backtrace) command gives you the stack trace in the current context. And bt -a gives you a stack trace of active tasks on all CPUs. Once the crash tool loads the first context, it sets up information of the panicked process. Here we take a look at the sample output of the command:
crash> bt
PID: 7203 TASK: ffff88007b0d0000  CPU: 0 COMMAND: "bash"
#0 [ffff88007a6bbb00] machine_kexec at ffffffff81027ac7
#1 [ffff88007a6bbb80] crash_kexec at ffffffff810888c9
#2 [ffff88007a6bbc50] oops_end at ffffffff814570c4
#3 [ffff88007a6bbc80] no_context at ffffffff81032ee7
<snipped>
The ps command

This command obtains the status of all the processes, or a selected one. It has an amazing number of options to provide lots of information during dump analysis. Refer to the help section for more details. Here is a sample output:
crash> ps -a 5390
PID: 5390 TASK: ffff8800799ac650  CPU: 2 COMMAND: "httpd"
ARG: /usr/sbin/httpd
ENV: TERM=linux
     PATH=/sbin:/usr/sbin:/bin:/usr/bin
     runlevel=5 \<snipped....>
The set command

You can change the current context using the set command, which takes the PID of the process (which can be obtained from the ps command). It takes various other arguments as well, which can be learnt by running help set. If set is used without arguments, it shows information about the current stack. For example:
crash> set ffff88007d7c0000
    PID: 1
COMMAND: "init"
   TASK: ffff88007d7c0000  [THREAD_INFO: ffff88007d7ba000]

  CPU: 0
  STATE: TASK_INTERRUPTIBLE

Here, the address is the task pointer of the init process.
The files command

This can be used to get all the open files in the current context; it is a context-sensitive command:
crash> set 1
    PID: 1
COMMAND: "init"
   TASK: ffff88007d7c0000  [THREAD_INFO: ffff88007d7ba000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE
crash> files
PID: 1      TASK: ffff88007d7c0000  CPU: 0   COMMAND: "init"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null
  1 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null
  2 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null
  3 ffff880037a58a80 ffff88003747b000 ffff88003750d540 FIFO
  4 ffff880037a586c0 ffff88003747b000 ffff88003750d540 FIFO
  5 ffff880037a58c00 ffff880037493240 ffff88007cdc2ca0 UNKN anon_inode:/inotify
  6 ffff880037a58180 ffff8800374936c0 ffff88007cdc2ca0 UNKN anon_inode:/inotify
  7 ffff880076087a80 ffff8800376d8540 ffff88007ceb87b0 SOCK
  8 ffff880079a25d80 ffff88007a205e40 ffff880079eabc30 SOCK
  9 ffff88007688b6c0 ffff88007a8f0480 ffff88003752e830 SOCK

We have looked into some regularly used commands. For other commands, kindly refer to the help section.

how device tree works in linux or android?


The linux kernel requires the entire description of the hardware, like which board it is booting(machine type), which all devices it is using there addresses(device/bus addresses), there interrupts numbers(irq), mfp pins configuration(pin muxing/gpios)  also some board level information like memory size, kernel command line etc etc …

Before device tree, all these information use to be set in a huge cluster of board files. And, Information like command line, memory size etc use to be passed by bootloaders as part of ATAGS through register R2(ARM). Machine type use to be set separately in register R1(ARM).
At this time each kernel compilation use to be for only one specific chip an a specific board.

So there was a long pending wish to compile the kernel for all ARM processors, and let the kernel somehow detect its hardware and apply the right drivers as needed just like your PC.
But how? On a PC, the initial registers are hardcoded, and the rest of the information is supplied by the BIOS. But ARM processors don’t have a BIOS.

The solution chosen was device tree, also referred to as Open Firmware (abbreviated OF) or Flattened Device Tree (FDT). This is essentially a data structure in byte code format which contains information that is helpful to the kernel when booting up.

The bootloader now loads two binaries: the kernel image and the DTB.
DTB is the device tree blob. The bootloader passes the DTB address through R2 instead of ATAGS and R1 register is not required now.

For a one line bookish definition “A device tree is a tree data structure with nodes that describe the physical devices in a system”

Currently device tree is supported by ARM, x86, Microblaze, PowerPC, and Sparc architectures.


I. Device Tree Compilation

Device tree compiler and its source code  located at scripts/dtc/.
On ARM all device tree source are located at /arch/arm/boot/dts/.
The Device Tree Blob(.dtb) is produced by the compiler, and it is the binary that gets loaded by the bootloader and parsed by the kernel at boot time.

    $ scripts/dtc/dtc -I dts -O dtb -o /path/my_tree.dtb /arch/arm/boot/dts/my_tree.dts

This will result my_tree.dtb

For creating the dts from dtb

    $ scripts/dtc/dtc -I dtb -O dts -o /path/my_tree.dts /path/my_tree.dtb

This will result my_tree.dts


 II. Device Tree Basics



Each module in device tree is defined by a node and all its properties are defined under that node. Depending on the driver it can have child nodes or parent node.
For example a device connected by i2c bus, will have i2c as its parent node, and that device will be one of the child node of i2c node, i2c may have apd bus as its parent and so on. All leads up to root node, which is parent of all. (Don’t worry an example after this section will make it more clear.)
Under the root of the Device Tree, one typically finds the following most common top-level nodes:

    cpus: its each sub-nodes describing each CPU in the system.
    memory : defines location and size of the RAM.
    chosen : defines parameters chosen or defined by the system firmware at boot time. In practice, one of its usage is to pass the kernel command line.
    aliases: shortcuts to certain nodes.
    One or more nodes defining the buses in the SoC
    One or mode nodes defining on-board devices


III. Device Tree Structure example

Here will take the example of a dummy dts code for explanation

     #include "pxa910.dtsi"
    / {
        compatible = "mrvl,pxa910-dkb", "mrvl,pxa910";
        chosen {
        bootargs = "<boot args here>";
        };
        memory {
            reg = <0x00000000 0x10000000>;
        };
        soc {
        apb@d4000000 {        

            uart1: uart@d4017000 {
            status = "okay";
            };
            twsi1: i2c@d4011000 {
                    #address-cells = <1>
                    #size-cells = <0>
            status = "okay";
            pmic: 88pm860x@34 {

                        compatible = "marvell,88pm860x";
                reg = <0x34>;
                interrupts = <4>;
                interrupt-parent = <&intc>;
                interrupt-controller;
                #interrupt-cells = <1>;

Figure 1

Each module is defined in one curly bracket area under one node, any sub modules can be defined further inside.

Explaning the above tree starting from the first line :

#include : including any headed file, just like any C file
.dtsi : extended dts file, single dts can have any number of dtsi, but couldn’t include other dts file
/: root node, device tree structure starts here
IV. Properties



There are data define in dts as form of property which are read by the kernel code, lets read about some of the major properties
Compatible

The top-level compatible property typically defines a compatible string for the board. Priority always given with the most-specific first, to least-specific last. It used to match with the dt_compat field of the DT_MACHINE structure.
Inside a driver or bus node , it is the most crucial one, as it is the link between the hardware and its driver.Each node belongs to one compatible string and based on compatible string only kernel matches the device driver with its data in device tree node.
The connection between a kernel driver and the “compatible” entries it should be attached to, is made by a code segment as follows in the driver’s source code:

    static struct of_device_id dummy_of_match[] = {
      { .compatible = "marvell,88pm860x", },
        {}
      };
    MODULE_DEVICE_TABLE(of, dummy_of_match);

 The above code in driver matches it to the pmic node shown in device tree structure shown in figure 1.
reg

defines the address for that node/device
#address-cells

property indicate how many cells (i.e 32 bits values) are needed to form the base address part in the reg property
#size-cells

the size part of the reg property
interrupt-controller

is a boolean property that indicates that the current node is an interrupt controller
#interrupt-cells

indicates the number of cells in the interrupts property for the interrupts managed by the selected interrupt controller
interrupt-parent

is a phandle that points to the interrupt controller for the current node. There is generally a top-level interrupt-parent definition for the main interrupt controller.
The label and node name

First, the label (”pmic”) and entry’s name (”88pm860x@34″). The label could have been omitted altogether, and the entry’s node name should stick to this format (some-name@address). This tells the kernel that this driver name 88pm860x and connected to its parent bus(i2c in this case) with the adress 34 (i2c slave address here). PMIC is the label which could be use as a phandle to refer this node inside dts.


 V. Getting the resources from DTS



Below are the few major APIs in current kernel (4.3) for reading the various properties from DTS.

of_address_to_resource: Reads the memory address of device defined by res property

irq_of_parse_and_map: Attach the interrupt handler, provided by the properties interrupt and interrupt-parent

of_find_property(np, propname, NULL): To find if property named in argument2 is present or not.

of_property_read_bool: To read a bool property named in argument 2, as it is a bool property it just like searching if that property present or not. Returns true or false

of_get_property: For reading any property named in argument 2

of_property_read_u32: To read a 32 bit property, populate into 3rd argument. Doesn’t set anything to 3rd argument in case of error.

of_property_read_string: To read string property

of_match_device: Sanity check for device that device is matching with the node, highly optional, I don’t see much use of it.