Friday 25 November 2016

Building and Running Clang(LLVM)

Release Clang Versions

Clang is released as part of regular LLVM releases. You can download the release versions from http://llvm.org/releases/.
Clang is also provided in all major BSD or GNU/Linux distributions as part of their respective packaging systems. From Xcode 4.2, Clang is the default compiler for Mac OS X.

Building Clang and Working with the Code

On Unix-like Systems

Note: as an experimental setup, you can use a single checkout with all the projects, and an easy CMake invocation, see the LLVM Doc "For developers to work with a git monorepo"
If you would like to check out and build Clang, the current procedure is as follows:
  1. Get the required tools.
  2. Check out LLVM:
    • Change directory to where you want the llvm directory placed.
    • svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
  3. Check out Clang:
    • cd llvm/tools
    • svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
    • cd ../..
  4. Check out extra Clang tools: (optional)
    • cd llvm/tools/clang/tools
    • svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk extra
    • cd ../../../..
  5. Check out Compiler-RT (optional):
    • cd llvm/projects
    • svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt
    • cd ../..
  6. Check out libcxx: (only required to build and run Compiler-RT tests on OS X, optional otherwise)
    • cd llvm/projects
    • svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx
    • cd ../..
  7. Build LLVM and Clang:
    • mkdir build (in-tree build is not supported)
    • cd build
    • cmake -G "Unix Makefiles" ../llvm
    • make
    • This builds both LLVM and Clang for debug mode.
    • Note: For subsequent Clang development, you can just run make clang.
    • CMake allows you to generate project files for several IDEs: Xcode, Eclipse CDT4, CodeBlocks, Qt-Creator (use the CodeBlocks generator), KDevelop3. For more details see Building LLVM with CMake page.
  8. If you intend to use Clang's C++ support, you may need to tell it how to find your C++ standard library headers. In general, Clang will detect the best version of libstdc++ headers available and use them - it will look both for system installations of libstdc++ as well as installations adjacent to Clang itself. If your configuration fits neither of these scenarios, you can use the -DGCC_INSTALL_PREFIX cmake option to tell Clang where the gcc containing the desired libstdc++ is installed.
  9. Try it out (assuming you add llvm/build/bin to your path):
    • clang --help
    • clang file.c -fsyntax-only (check for correctness)
    • clang file.c -S -emit-llvm -o - (print out unoptimized llvm code)
    • clang file.c -S -emit-llvm -o - -O3
    • clang file.c -S -O3 -o - (output native machine code)
  10. Run the testsuite:
    • make check-clang
If you encounter problems while building Clang, make sure that your LLVM checkout is at the same revision as your Clang checkout. LLVM's interfaces change over time, and mismatched revisions are not expected to work together.

Simultaneously Building Clang and LLVM:

Once you have checked out Clang into the llvm source tree it will build along with the rest of llvm. To build all of LLVM and Clang together all at once simply run make from the root LLVM directory.
Note: Observe that Clang is technically part of a separate Subversion repository. As mentioned above, the latest Clang sources are tied to the latest sources in the LLVM tree. You can update your toplevel LLVM project and all (possibly unrelated) projects inside it with make update. This will run svn update on all subdirectories related to subversion.

Using Visual Studio

The following details setting up for and building Clang on Windows using Visual Studio:
  1. Get the required tools:
  2. Check out LLVM:
    • svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
  3. Check out Clang:
    • cd llvm\tools
    • svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
    Note: Some Clang tests are sensitive to the line endings. Ensure that checking out the files does not convert LF line endings to CR+LF. If you use git-svn, make sure your core.autocrlf setting is false.
  4. Run CMake to generate the Visual Studio solution and project files:
    • cd ..\.. (back to where you started)
    • mkdir build (for building without polluting the source dir)
    • cd build
    • If you are using Visual Studio 2013: cmake -G "Visual Studio 12" ..\llvm
    • See the LLVM CMake guide for more information on other configuration options for CMake.
    • The above, if successful, will have created an LLVM.sln file in the build directory.
  5. Build Clang:
    • Open LLVM.sln in Visual Studio.
    • Build the "clang" project for just the compiler driver and front end, or the "ALL_BUILD" project to build everything, including tools.
  6. Try it out (assuming you added llvm/debug/bin to your path). (See the running examples from above.)
  7. See Hacking on clang - Testing using Visual Studio on Windows for information on running regression tests on Windows.
Note that once you have checked out both llvm and clang, to synchronize to the latest code base, use the svn update command in both the llvm and llvm\tools\clang directories, as they are separate repositories.

Clang Compiler Driver (Drop-in Substitute for GCC)

The clang tool is the compiler driver and front-end, which is designed to be a drop-in replacement for the gcc command. Here are some examples of how to use the high-level driver:
$ cat t.c
#include <stdio.h>
int main(int argc, char **argv) { printf("hello world\n"); }
$ clang t.c
$ ./a.out
hello world
The 'clang' driver is designed to work as closely to GCC as possible to maximize portability. The only major difference between the two is that Clang defaults to gnu99 mode while GCC defaults to gnu89 mode. If you see weird link-time errors relating to inline functions, try passing -std=gnu89 to clang.

Examples of using Clang

$ cat ~/t.c
typedef float V __attribute__((vector_size(16)));
V foo(V a, V b) { return a+b*a; }

Preprocessing:

$ clang ~/t.c -E
# 1 "/Users/sabre/t.c" 1

typedef float V __attribute__((vector_size(16)));

V foo(V a, V b) { return a+b*a; }

Type checking:

$ clang -fsyntax-only ~/t.c

GCC options:

$ clang -fsyntax-only ~/t.c -pedantic
/Users/sabre/t.c:2:17: warning: extension used
typedef float V __attribute__((vector_size(16)));
                ^
1 diagnostic generated.

Pretty printing from the AST:

Note, the -cc1 argument indicates the compiler front-end, and not the driver, should be run. The compiler front-end has several additional Clang specific features which are not exposed through the GCC compatible driver interface.
$ clang -cc1 ~/t.c -ast-print
typedef float V __attribute__(( vector_size(16) ));
V foo(V a, V b) {
   return a + b * a;
}

Code generation with LLVM:

$ clang ~/t.c -S -emit-llvm -o -
define <4 x float> @foo(<4 x float> %a, <4 x float> %b) {
entry:
         %mul = mul <4 x float> %b, %a
         %add = add <4 x float> %mul, %a
         ret <4 x float> %add
}
$ clang -fomit-frame-pointer -O3 -S -o - t.c # On x86_64
...
_foo:
Leh_func_begin1:
 mulps %xmm0, %xmm1
 addps %xmm1, %xmm0
 ret
Leh_func_end1:

Low Level Virtual Machine (LLVM)

The LLVM compiler infrastructure project (formerly Low Level Virtual Machine) is a "collection of modular and reusable compiler and toolchain technologies used to develop compiler front ends and back ends.

LLVM is written in C++ and is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages. Originally implemented for C and C++, the language-agnostic design of LLVM has since spawned a wide variety of front ends: languages with compilers that use LLVM include ActionScript, Ada, C#,[4][5][6] Common Lisp, Crystal, D, Delphi, Fortran, OpenGL Shading Language, Halide, Haskell, Java bytecode, Julia, Lua, Objective-C, Pony,[7] Python, R, Ruby, Rust, CUDA, Scala,[8] and Swift.

The LLVM project started in 2000 at the University of Illinois at Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner. LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. LLVM was released under the University of Illinois/NCSA Open Source License,[2] a permissive free software licence. In 2005, Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems.[9] LLVM is an integral part of Apple's latest development tools for OS X and iOS.[10] Since 2013, Sony has been using LLVM's primary front end Clang compiler in the software development kit (SDK) of its PS4 console.[11]
The name LLVM was originally an initialism for Low Level Virtual Machine, but this became increasingly less apt as LLVM became an "umbrella project" that included a variety of other compiler and low-level tool technologies, so the project abandoned the initialism.[12] Now, LLVM is a brand that applies to the LLVM umbrella project, the LLVM intermediate representation (IR), the LLVM debugger, the LLVM C++ Standard Library (with full support of C++11 and C++14[13]), etc. LLVM is administered by the LLVM Foundation. Its president is compiler engineer Tanya Lattne


A Quick Introduction to Classical Compiler Design

The most popular design for a traditional static compiler (like most C compilers) is the three phase design whose major components are the front end, the optimizer and the back end  The front end parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code. The AST is optionally converted to a new representation for optimization, and the optimizer and back end are run on the code.
[Three Major Components of a Three-Phase Compiler]
 Three Major Components of a Three-Phase Compiler
The optimizer is responsible for doing a broad variety of transformations to try to improve the code's running time, such as eliminating redundant computations, and is usually more or less independent of language and target. The back end (also known as the code generator) then maps the code onto the target instruction set. In addition to making correct code, it is responsible for generating good code that takes advantage of unusual features of the supported architecture. Common parts of a compiler back end include instruction selection, register allocation, and instruction scheduling.
This model applies equally well to interpreters and JIT compilers. The Java Virtual Machine (JVM) is also an implementation of this model, which uses Java bytecode as the interface between the front end and optimizer.

Implications of this Design

The most important win of this classical design comes when a compiler decides to support multiple source languages or target architectures. If the compiler uses a common code representation in its optimizer, then a front end can be written for any language that can compile to it, and a back end can be written for any target that can compile from it.
[Retargetablity]  Retargetablity
With this design, porting the compiler to support a new source language (e.g., Algol or BASIC) requires implementing a new front end, but the existing optimizer and back end can be reused. If these parts weren't separated, implementing a new source language would require starting over from scratch, so supporting N targets and M source languages would need N*M compilers.
Another advantage of the three-phase design (which follows directly from retargetability) is that the compiler serves a broader set of programmers than it would if it only supported one source language and one target. For an open source project, this means that there is a larger community of potential contributors to draw from, which naturally leads to more enhancements and improvements to the compiler. This is the reason why open source compilers that serve many communities (like GCC) tend to generate better optimized machine code than narrower compilers like FreePASCAL. This isn't the case for proprietary compilers, whose quality is directly related to the project's budget. For example, the Intel ICC Compiler is widely known for the quality of code it generates, even though it serves a narrow audience.
A final major win of the three-phase design is that the skills required to implement a front end are different than those required for the optimizer and back end. Separating these makes it easier for a "front-end person" to enhance and maintain their part of the compiler. While this is a social issue, not a technical one, it matters a lot in practice, particularly for open source projects that want to reduce the barrier to contributing as much as possible.

When and Where Each Phase Runs

As mentioned earlier, LLVM IR can be efficiently (de)serialized to/from a binary format known as LLVM bitcode. Since LLVM IR is self-contained, and serialization is a lossless process, we can do part of compilation, save our progress to disk, then continue work at some point in the future. This feature provides a number of interesting capabilities including support for link-time and install-time optimization, both of which delay code generation from "compile time".
Link-Time Optimization (LTO) addresses the problem where the compiler traditionally only sees one translation unit (e.g., a .c file with all its headers) at a time and therefore cannot do optimizations (like inlining) across file boundaries. LLVM compilers like Clang support this with the -flto or -O4 command line option. This option instructs the compiler to emit LLVM bitcode to the .ofile instead of writing out a native object file, and delays code generation to link time
[Link-Time Optimization]  Link-Time Optimization
Details differ depending on which operating system you're on, but the important bit is that the linker detects that it has LLVM bitcode in the .o files instead of native object files. When it sees this, it reads all the bitcode files into memory, links them together, then runs the LLVM optimizer over the aggregate. Since the optimizer can now see across a much larger portion of the code, it can inline, propagate constants, do more aggressive dead code elimination, and more across file boundaries. While many modern compilers support LTO, most of them (e.g., GCC, Open64, the Intel compiler, etc.) do so by having an expensive and slow serialization process. In LLVM, LTO falls out naturally from the design of the system, and works across different source languages (unlike many other compilers) because the IR is truly source language neutral.
Install-time optimization is the idea of delaying code generation even later than link time, all the way to install time, Install time is a very interesting time (in cases when software is shipped in a box, downloaded, uploaded to a mobile device, etc.), because this is when you find out the specifics of the device you're targeting. In the x86 family for example, there are broad variety of chips and characteristics. By delaying instruction choice, scheduling, and other aspects of code generation, you can pick the best answers for the specific hardware an application ends up running on.
[Install-Time Optimization]