In this tutorial we will create a traditional helloworld code in C, and be able to generate LLVM IR code along with native exectuables.
Let's start by coding the helloworld program in C. Save the following file as
helloworld.c
#include <stdio.h>
int main()
{
puts("hello world!");
return 0;
}
Generating the LLVM IR
To the generate the LLVM IR type the following.
llvm-gcc -emit-llvm -S hellworld.c
A file called
helloworld.s is created. If you open it in a text editor you will see something similar to this.
; ModuleID = 'helloworld.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i486-linux-gnu"
@.str = internal constant [13 x i8] c"hello world!\00" ; <[13 x i8]*> [#uses=1]
define i32 @main() {
entry:
%retval = alloca i32 ; [#uses=2]
%tmp = alloca i32 ; [#uses=2]
%"alloca point" = bitcast i32 0 to i32 ; [#uses=0]
%tmp1 = getelementptr [13 x i8]* @.str, i32 0, i32 0 ; [#uses=1]
%tmp2 = call i32 @puts( i8* %tmp1 ) nounwind ; [#uses=0]
store i32 0, i32* %tmp, align 4
%tmp3 = load i32* %tmp, align 4 ; [#uses=1]
store i32 %tmp3, i32* %retval, align 4
br label %return
return: ; preds = %entry
%retval4 = load i32* %retval ; [#uses=1]
ret i32 %retval4
}
declare i32 @puts(i8*)
Generating the LLVM BitCode
LLVM bit code can be generated by
llvm-as -f helloworld.s
Executing the LLVM BitCode
To execute the llvm bitcode which was created, type as below
lli helloworld.s.bc
Optimization
We will now optimize the our code using
mem2reg.
llvm-as < helloworld.s | opt -mem2reg > helloworld.bc
File called helloworld.bc is created which is an optimized version, while helloworld.s.bc is an unoptimized version of the same code. You can also see the decrease in the file size.
Disassemble LLVM bitcode
To make sure that our llvm bitcode is optimized, let us disassemble the llvm bitcode.
llvm-dis -f helloworld.bc -o opthelloworld.ll
Optimized helloworld LLVM IR is created in the file called
opthelloworld.ll, now open the text editor and compare the previous
hellworld.s and
ophelloworld.ll. You will see some code/optimization difference.
opthelloworld.ll looks somewhat like this...
; ModuleID = 'helloworld.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i486-linux-gnu"
@.str = internal constant [13 x i8] c"hello world!\00" ; <[13 x i8]*> [#uses=1]
define i32 @main() {
entry:
%"alloca point" = bitcast i32 0 to i32 ; [#uses=0]
%tmp1 = getelementptr [13 x i8]* @.str, i32 0, i32 0 ; [#uses=1]
%tmp2 = call i32 @puts(i8* %tmp1) nounwind ; [#uses=0]
br label %return
return: ; preds = %entry
ret i32 0
}
declare i32 @puts(i8*)
Generating native assembly code
Before we generate the native assembly code, we need to generate the bitcode first.
llvm-as -f opthelloworld.ll
llc -f opthelloworld.bc
A file containg the native assemlby code caled opthelloworld.s is created which looks similar as follows...
.file "opthelloworld.bc"
.text
.align 16
.globl main
.type main,@function
main:
.Leh_func_begin1:
.Llabel1:
subl $4, %esp
movl $.str, (%esp)
call puts
.LBB1_1: # return
xorl %eax, %eax
addl $4, %esp
ret
.size main, .-main
.Leh_func_end1:
.type .str,@object
.section .rodata.str1.1,"aMS",@progbits,1
.str: # .str
.size .str, 13
.asciz "hello world!"
.section .eh_frame,"aw",@progbits
.LEH_frame0:
.Lsection_eh_frame:
.Leh_frame_common:
.long .Leh_frame_common_end-.Leh_frame_common_begin
.Leh_frame_common_begin:
.long 0x0
.byte 0x1
.asciz "zR"
.uleb128 1
.sleb128 -4
.byte 0x8
.uleb128 1
.byte 0x1B
.byte 0xC
.uleb128 4
.uleb128 4
.byte 0x88
.uleb128 1
.align 4
.Leh_frame_common_end:
.Lmain.eh:
.long .Leh_frame_end1-.Leh_frame_begin1
.Leh_frame_begin1:
.long .Leh_frame_begin1-.Leh_frame_common
.long .Leh_func_begin1-.
.long .Leh_func_end1-.Leh_func_begin1
.uleb128 0
.byte 0xE
.uleb128 8
.byte 0x4
.long .Llabel1-.Leh_func_begin1
.byte 0xD
.uleb128 4
.align 4
.Leh_frame_end1:
.section .note.GNU-stack,"",@progbits
The above synatx generated is the default GAS (GNU Assembler) synatx format which is also referred to as AT&T synatx. If you prefer Intel synatx (which nasm uses) it can be generated using
llc -f -x86-asm-syntax=intel opthelloworld.bc -o iopthelloworld.s
A file called iopthelloworld.s is created which uses intel style assembler synatx.
.686
.model flat
extern _puts:near
extern _abort:near
_text segment 'CODE'
public _main
align 16
_main proc near
$label1:
sub ESP, 4
mov DWORD PTR [ESP], OFFSET __2E_str
call _puts
$BB1_1: ; return
xor EAX, EAX
add ESP, 4
ret
_main endp
_text ends
_data segment 'DATA'
__2E_str: ; .str
db 'hello world!',0
_data ends
end
Generating executable file
After the native assembly code has been created, can compiler and link using gcc in the following way
gcc -c opthelloworld.s
gcc opthelloworld.o
You will then get the appropritae executeable file a.exe or a.out depending on you OS and gcc default executable file.
You can download the source code of this tutorial form top-right of the article in attachments.
Makefile for this program is as follows:
all: helloworld.bc opthelloworld.o
helloworld.bc: helloworld.s
llvm-as -f helloworld.s
@echo Optimizing LLVM IR
llvm-as < helloworld.s | opt -mem2reg > helloworld.bc
opthelloworld.o: opthelloworld.s
gcc -c opthelloworld.s
gcc opthelloworld.o
opthelloworld.s: opthelloworld.bc
llc -f opthelloworld.bc
llc -f --x86-asm-syntax=intel opthelloworld.bc -o iopthelloworld.s
opthelloworld.bc: opthelloworld.ll
llvm-as -f opthelloworld.ll
opthelloworld.ll: helloworld.bc
llvm-dis -f helloworld.bc -o opthelloworld.ll
helloworld.s:
llvm-gcc -emit-llvm -S helloworld.c
clean:
rm *.s *.bc *.ll *.o