Assembly Language

Posted by: Rea Maor In: Programming - Saturday, April 28th, 2007

Assembly Language

Aaaah! Scary stuff, and we’re going to start with this? Yes, we are! Mainly because assembly code is the most elemental, raw form of programming language. You see the guts of the system and what it does, then with later languages appreciate how we ascend the ladder of language levels to express what we want the computer to do in a more concise manner. For the record, nobody would recommend assembly as a first language. Assembly is mainly used to talk directly to the hardware (building a device driver) or writing compilers (a program that translates a higher-level language into assembly and then compiled binary, which is all the computer can understand).

Assembly is hardware and platform specific. You are not expected to understand how to use it, but just get some idea of how it works, which we can apply when we look at later languages. So, let’s pick a Unix system running on an x86 (one of the 386, 486, 586, 686 processors) system and dive right in:


; 'Hello World' in assembly!
section .data
hello: db 'Hello world!', 10
helloLen: equ $-hello

section .text
global _start

_start:
mov eax,4
mov ebx,1
mov ecx,hello
mov edx,helloLen
int 80h
mov eax,1
mov ebx,0
int 80h

Sadistic, aren’t I? Some things that distinguish assembly are that there are no high-level functions; everything is one step, one byte, one processor register at a time. You would save this file in a text editor as something like “test.asm”, then (from the command line, of course!) call the compiler:

nasm -f elf test.asm

which will digest it into an object file, then call the linker:

ld -s -o test test.o

to create the executable file. Running the resulting program “test” makes it print:

Hello world!

Oh, the joy! Now, how did that happen? Line by line:

; 'Hello World' in assembly!

^This is a comment. The semi-colon at the front is how you signal the compiler: “Don’t read this, dummy!” Comments are for we programmers, to clarify what the source code does. You can comment every line if you want to! You can write a novel in the comments, the compiler won’t care. All programming languages have a way to comment.

section .data

^Here, we tell the compiler that this part of the code sets up the variables, the pieces of data, that the program will use. Again, all languages use variables.

hello: db 'Hello world!', 10

^’hello’ is the name of our “string variable”, which will contain letters, a space, an exclamation point, and a “line feed” character (10) so the program knows to drop to the next line after printing the string.

helloLen: equ $-hello

^What we’re doing here is telling the compiler to compute the number of bytes in the string, which is 13 (including the line feed). Not all languages need to be told how many bytes are in each value; dynamically typed languages figure this part automatically.

Blank line before the next section – that tells the compiler that section.data is finished. This is called a “delimiter”, and all languages have them and use them in different ways. Assembler is “white-space sensitive”, meaning that if we crammed the whole code on one line, it wouldn’t work.

section .text

^This is to tell the compiler that the actual executable instructions will begin with the address we give it below.

global _start

^This is your last chance, assembler! We’re going to start the program when you see this flag. Get ready!

_start:

^Told you so! Other languages have other ways to designate the starting point of execution.

mov eax,4

^”mov” is a function which moves a value into a register. “eax” is the name of the register, and “4” is the value we’re sticking into it. “4” is the system-call code for “write”

mov ebx,1

^Write where ? “1” means “to the screen (stdout)” as opposed to a file, or sending smoke signals.

mov ecx,hello

^Remember, ‘hello’ is just the variable we have our string stuffed into. With our computer ordered to start writing to the screen, the next thing we tell it is what bytes to start printing.

mov edx,helloLen

^Now, what if we left this out? The computer wouldn’t know when to stop printing hello, because it doesn’t know where “Hello World!”, 10 ends. this tells it to keep doing it until it has printed exactly 13 bytes to the screen, then stop printing before it starts spewing the rest of its memory.

int 80h

^This is the system call to execute all of the above instructions as a single step. “Make it so!”

mov eax,1

^You also have to tell assembly where the program ends, and this is the signal for it. Without this call, the program will sit there buzzing forever until you interrupted it.

mov ebx,0

^All programs return an “exit code”. “0” always means everything’s fine. In other cases, you might handle errors by returning “1” “399”, etc., it’s all up to the programmer. When you see a pop-up dialog box from programs where it tells you what the problem was, this is how it “knows” what went wrong.

int 80h

^Again, make it so. Exit with the return value of “0”.

Whew! OK, I admit it, I just dove into assembler just to scare the daylights out of you! Coding in assembler as opposed to other languages is the difference between telling somebody to drive to the store, and telling them “Open the car door, sit inside, close the door, get your keys, put them in the ignition, turn the key” etc. This is exactly what’s going on inside every computer program ever written! But higher-level languages take all of the grunt work out, by giving us those condensed commands which then are translated into assembler, and eventually into executable binary.

It gets much, much easier from here, I promise!

If you’ve enjoyed this post, consider subscribing to my blog feed for free updates


Related Posts:


Leave a Reply