LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine. However, the project has since expanded, and the name is no longer an acronym but an orphan initialism.
LLVM is written in C++ and is designed for compile-time, link-time, and runtime optimization. Originally implemented for C and C++, the language-agnostic design of LLVM has since spawned a wide variety of frontends: languages with compilers that use LLVM (or which do not directly use LLVM but can generate compiled programs as LLVM IR) include ActionScript, Ada, C# for .NET, Common Lisp, Crystal, CUDA, D, Delphi, Dylan, Forth, Fortran, FreeBASIC, Free Pascal, Halide, Haskell, Idris, Jai (only for optimized release builds), Java bytecode, Julia, Kotlin, LabVIEW's G language, Objective-C, OpenCL, Odin, PicoLisp, PostgreSQL's SQL and PL/pgSQL, Ruby, Rust, Scala, Standard ML, Swift, Wolfram Language, Xojo, and Zig.
History
The LLVM project started in 2000 at the University of Illinois Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner. LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. LLVM was released under the University of Illinois/NCSA Open Source License, LLVM has been an integral part of Apple's Xcode development tools for macOS and iOS since Xcode 4 in 2011.
In 2006, Lattner started working on a new project named Clang. The combination of the Clang frontend and LLVM backend is named Clang/LLVM or simply Clang.
The name LLVM was originally an initialism for Low Level Virtual Machine. However, the LLVM project evolved into an umbrella project that has little relationship to what most current developers think of as a virtual machine. This made the initialism "confusing" and "inappropriate", and since 2011 LLVM is "officially no longer an acronym", but a brand that applies to the LLVM umbrella project. The project encompasses the LLVM intermediate representation (IR), the LLVM debugger, the LLVM implementation of the C++ Standard Library (with full support of C++11 and C++14), etc. LLVM is administered by the LLVM Foundation. Compiler engineer Tanya Lattner became its president in 2014 and was still president and Executive Director .
"For designing and implementing LLVM", the Association for Computing Machinery presented Vikram Adve, Chris Lattner, and Evan Cheng with the 2012 ACM Software System Award.
The project was originally available under the UIUC license. After v9.0.0 released in 2019, LLVM relicensed to the Apache License 2.0 with LLVM Exceptions.
Features
LLVM can provide the middle layers of a complete compiler system, taking intermediate representation (IR) code from a compiler and emitting an optimized IR. This new IR can then be converted and linked into machine-dependent assembly language code for a target platform. LLVM can accept the IR from the GNU Compiler Collection (GCC) toolchain, allowing it to be used with a wide array of extant compiler front-ends written for that project. LLVM can also be built with gcc after version 7.5.
LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at runtime.
LLVM supports a language-independent instruction set and type system.
Graphics code within the OpenGL stack can be left in intermediate representation and then compiled when run on the target machine. On systems with high-end graphics processing units (GPUs), the resulting code remains quite thin, passing the instructions on to the GPU with minimal changes. On systems with low-end GPUs, LLVM will compile optional procedures that run on the local central processing unit (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA chipsets. A similar system was developed under the Gallium3D LLVMpipe, and incorporated into the GNOME shell to allow it to run without a proper 3D hardware driver loaded.
In 2011, programs compiled by GCC outperformed those from LLVM by 10%, on average.
In 2013, phoronix reported that LLVM had caught up with GCC, compiling binaries of approximately equal performance.
Components
LLVM has become an umbrella project containing multiple components.
Frontends
LLVM was originally written to be a replacement for the extant code generator in the GCC stack, and many of the GCC frontends were modified to work with it, resulting in the now-defunct LLVM-GCC suite. The modifications generally involved a GIMPLE-to-LLVM IR step so that LLVM optimizers and codegen could be used instead of GCC's GIMPLE system. Apple was a significant user of LLVM-GCC through Xcode 4.x (2013). This use of the GCC frontend was considered a temporary measure which became mostly obsolete with the advent of LLVM/Clang's more modern, modular codebase and compilation speed.
LLVM's support for programming languages is done by frontends, that parse and analyse the source code, to output intermediate representation (LLVM-IR). Widespread interest in LLVM has led to several efforts to develop new frontends for many languages. One such frontend is Clang, a newer compiler supporting C, C++, and Objective-C; primarily supported by Apple, it is aimed at replacing the C/Objective-C compiler in the GCC system with a system that is more easily integrated with integrated development environments (IDEs) and has wider support for multithreading. Support for OpenMP directives has been included in Clang since release 3.8.
Frontends exist for compiling different languages, including Ada, C, C++, D, Delphi, Fortran, Haskell, Julia, Objective-C, Rust, Swift, and others.
The Utrecht Haskell compiler can generate code for LLVM. While the generator was in early stages of development, in many cases it was more efficient than the C code generator. The Glasgow Haskell Compiler (GHC) backend uses LLVM and achieves a 30% speed-up of compiled code relative to native code compiling via GHC or C code generation followed by compiling, missing only one of the many optimizing techniques implemented by the GHC.
Many other components are in various stages of development, including, but not limited to, the Rust compiler, a Java bytecode frontend, a Common Intermediate Language (CIL) frontend, the MacRuby implementation of Ruby 1.9, various frontends for Standard ML, and a new graph coloring register allocator.
Intermediate representation
thumb|LLVM IR is used e.g., by radeonsi and by llvmpipe. Both are part of [[Mesa 3D.]]
The core of LLVM is the intermediate representation (IR), a low-level programming language similar to assembly. IR is a strongly typed reduced instruction set computer (RISC) instruction set which abstracts away most details of the target. For example, the calling convention is abstracted through <code>call</code> and <code>ret</code> instructions with explicit arguments. Also, instead of a fixed set of registers, IR uses an infinite set of temporaries of the form %0, %1, etc. LLVM supports three equivalent forms of IR: a human-readable assembly format, an in-memory format suitable for frontends, and a dense bitcode format for serializing. A simple "Hello, world!" program in the human-readable IR format:
<syntaxhighlight lang="llvm">
@.str = internal constant [14 x i8] c"Hello, world\0A\00"
declare i32 @printf(ptr, ...)
define i32 @main(i32 %argc, ptr %argv) nounwind {
entry:
%tmp1 = getelementptr [14 x i8], ptr @.str, i32 0, i32 0
%tmp2 = call i32 (ptr, ...) @printf( ptr %tmp1 ) nounwind
ret i32 0
}
</syntaxhighlight>
The many different conventions used and features provided by different targets mean that LLVM cannot truly produce a target-independent IR and retarget it without breaking some established rules. Examples of target dependence beyond what is explicitly mentioned in the documentation can be found in a 2011 proposal for "wordcode", a fully target-independent variant of LLVM IR intended for online distribution. A more practical example is PNaCl.
The LLVM project also introduces another type of intermediate representation named MLIR which helps build reusable and extensible compiler infrastructure by employing a plugin architecture named Dialect. It enables the use of higher-level information on the program structure in the process of optimization including polyhedral compilation.
Backends
At version 16, LLVM supports many instruction sets, including IA-32, x86-64, ARM, Qualcomm Hexagon, LoongArch, M68K, MIPS, NVIDIA Parallel Thread Execution (PTX, also named NVPTX in LLVM documentation), PowerPC, AMD TeraScale, most recent AMD GPUs (also named AMDGPU in LLVM documentation), SPARC, z/Architecture (also named SystemZ in LLVM documentation), and XCore.
Some features are not available on some platforms. Most features are present for IA-32, x86-64, z/Architecture, ARM, and PowerPC. RISC-V is supported as of version 7.
In the past, LLVM also supported other backends, fully or partially, including C backend, Cell SPU, mblaze (MicroBlaze), AMD R600, DEC/Compaq Alpha (Alpha AXP) and Nios2, but that hardware is mostly obsolete, and LLVM developers decided the support and maintenance costs were no longer justified.
LLVM also supports WebAssembly as a target, enabling compiled programs to execute in WebAssembly-enabled environments such as Google Chrome / Chromium, Firefox, Microsoft Edge, Apple Safari or WAVM. LLVM-compliant WebAssembly compilers typically support mostly unmodified source code written in C, C++, D, Rust, Nim, Kotlin and several other languages.
The LLVM machine code (MC) subproject is LLVM's framework for translating machine instructions between textual forms and machine code. Formerly, LLVM relied on the system assembler, or one provided by a toolchain, to translate assembly into machine code. LLVM MC's integrated assembler supports most LLVM targets, including IA-32, x86-64, ARM, and ARM64. For some targets, including the various MIPS instruction sets, integrated assembly support is usable but still in the beta stage.
Linker
The lld subproject is an attempt to develop a built-in, platform-independent linker for LLVM. lld aims to remove dependence on a third-party linker. , lld supports ELF, PE/COFF, Mach-O, and WebAssembly in descending order of completeness. lld is faster than both flavors of GNU ld.
Unlike the GNU linkers, lld has built-in support for link-time optimization (LTO). This allows for faster code generation as it bypasses the use of a linker plugin, but on the other hand prohibits interoperability with other flavors of LTO.
C++ Standard Library
The LLVM project includes an implementation of the C++ Standard Library named libc++, dual-licensed under the MIT License and the UIUC license.
Since v9.0.0, it was relicensed to the Apache License 2.0 with LLVM Exceptions.
Debugger
C Standard Library
llvm-libc is an incomplete, upcoming, ABI independent C standard library designed by and for the LLVM project.
Derivatives
Due to its permissive license, many vendors release their own tuned forks of LLVM. This is officially recognized by LLVM's documentation, which suggests against using version numbers in feature checks for this reason. Some of the vendors include:
- AMD's AMD Optimizing C/C++ Compiler is based on LLVM, Clang, and Flang.
- Apple maintains an open-source fork for Xcode.
- Arm provides a number of LLVM based toolchains, including Arm Compiler for Embedded targeting bare-metal development and Arm Compiler for Linux targeting the High Performance Computing market
- Flang, Fortran project in development
- IBM is adopting LLVM in its C/C++ and Fortran compilers.
- Intel has adopted LLVM for their next generation Intel C++ Compiler.
- The Los Alamos National Laboratory has a parallel-computing fork of LLVM 8 named "Kitsune".
- Nvidia uses LLVM in the implementation of its NVVM CUDA Compiler. The NVVM compiler is distinct from the "NVPTX" backend mentioned in the Backends section, although both generate PTX code for Nvidia GPUs.
- Since 2013, Sony has been using LLVM's primary front-end Clang compiler in the software development kit (SDK) of its PlayStation 4 console.
See also
- HHVM
- C--
- Amsterdam Compiler Kit (ACK)
- GNU lightning
- Pure
- ROCm
- Emscripten
- TenDRA Distribution Format
- Architecture Neutral Distribution Format (ANDF)
- Comparison of application virtualization software
- SPIR-V
References
Further reading
- Chris Lattner - The Architecture of Open Source Applications - Chapter 11 LLVM, , released 2012 under CC BY 3.0 (Open Access).
- LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, a published paper by Chris Lattner, Vikram Adve
