This series has been dragging on for a long time, mainly because LZ actually just saw the second chapter when he wrote it, so I quickly read the third chapter during this period of time, and spent a little time to settle it, which was delayed.
This article is the first in the 3.X series and the beginning of the compilation world. LZ has been thinking about how to make this series a little more interesting, because the second chapter is really boring, even LZ finds it extremely boring, but LZ has done a lot of after-school questions. The assembly part will be much better, although it is still not a programming language we are familiar with, but it is still a language after all, not the 0s and 1s that we hardly deal with.
Why learn assembly language
For most ape friends, they usually write some advanced programming languages, and they are many gods in the computer field. After several layers of encapsulation, we can enjoy such treatment. In this way, we can save a lot of low-level troubles in the usual development process. Imagine if when you write a method, you also need to worry about which variables need to be placed in registers, which variables should be placed in main memory, which one should be placed in registers, and which should be placed in main memory. Are you going to crash with the low-level issues of putting it in that memory area and so on, and having to remember all kinds of register names and what they do and so on.
Therefore, it is not difficult to see that high-level languages have brought us a lot of convenience, but things are not always perfect, and the convenience brought by this also leads to some problems. This is because the code we see may have changed beyond recognition when we actually execute it, so many times it will cause some inexplicable problems.
To give a small example, LZ once asked a similar question in the group. This time, LZ wrote a small program. All the ape friends who have learned Java, come and see the results of this program.
public class Main
{
public static void main(String[] args)
{
Integer a = 127;
Integer b = 127;
Integer c = 128;
Integer d = 128;
System.out.println(a == b);
System.out.println(c == d);
}
}
I believe that many people can’t see the problem with this program, and think that it should output two true. However, the result of this program is a true and a false. If you don’t believe it, you can try it yourself. As for the reason, if you are interested, you can study the automatic unpacking of Java, and look at the range of the valueOf method cache of the Integer object, and the answer will be revealed automatically.
The root cause of this problem is actually because the compiler casts a layer of fog on developers, causing some developers to only know what it is, but don’t know why. They don’t know what the program they wrote is actually. How it works. Such a layer of fog is destined to lower the level of developers, so in order to improve ourselves, it is necessary for us to uncover this layer of fog. For C/C++ developers, uncovering this fog is actually the process of understanding assembly language.
Assembly language is for C/C++ programmers, just like class files are for Java programmers, because they are all products processed by the compiler. In fact, LZ said so much, just wanted to say one thing, that is Understanding the knowledge of assembly language has benefits that cannot be ignored for our usual development, especially for developers engaged in C/C++, the benefits are endless.
Some ape friends may think that LZ is a guy who relies on Java for food. It is a bit superfluous to understand assembly language. After all, Java language is still a little too far from assembly. After all, Java needs to be compiled into a class file, and then handed over to the execution engine of the virtual machine, and the execution engine of the virtual machine is implemented by C/C++, and C/C++ needs to be preprocessed and compiled by the GCC compiler. into assembly language. At first glance, Java is indeed too far from assembly language.
But what LZ wants to say is that no matter what position you are in, as long as what you do is to direct the computer to help you complete some things, then you must understand how the computer can help you complete these things, otherwise you will only direct , without knowing how to do it. The consequence of not knowing how to do it is that you won’t know how to do it better, which reflects in reality that you don’t know how to write better programs. This is actually not difficult to understand, just imagine, you don’t know how your program actually runs, how can you know how to write it better.
first experience compilation
In the process of compiling a C language program, there are actually many steps, such as pre-compilation processing, compilation processing, assembly processing and linking processing. The assembly language we want to understand is the product after compilation. Therefore, we can add some parameters to the GCC compiler to control it to only generate assembly language without assembling and linking.
Let’s look at the following simple C language code, assuming sum.c.
int simple(int *xp,int y){
int t = *xp+y;
*xp=t;
return t;
}
We use the GCC compiler to add the -S parameter to compile this code, and finally we can get a sum.s file, we use cat to view this file.
.file “sum.c”
.text
.globl simple
.type simple, @function
simple:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %eax//This step is to access the variable xp from the main
movl (%eax), %eax//Get the value of *xp
addl 12(%ebp), %eax//Calculate *xp+y and store it in the %eax register
movl %eax, -4(%ebp)//Assign *xp+y to variable t
movl 8(%ebp), %eax//take xp again
movl -4(%ebp), %edx//take t
movl %edx, (%eax)//Execute t->*xp
movl -4(%ebp), %eax//put t into %eax ready to return
leave
ret
.size simple, .-simple
.ident “GCC: (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3”
.section .note.GNU-stack,””,@progbits
Here we mainly look at how assembly language describes our calculation process, so LZ simply added a few comments to roughly describe the calculation process of the above program. It should be noted that the registers starting with % are registers, and those with parentheses are the main memory.
Ape friends who are familiar with GCC should know that we can control the optimization level of the compiler, so we use another way to compile sum.c, we add a -O1 parameter on the basis of -S. After that use cat to open the sum.s file.
.file “sum.c”
.text
.globl simple
.type simple, @function
simple:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx//take xp
movl 12(%ebp), %eax//take y
addl (%edx), %eax//Calculate *xp+y and store it in the %eax register, ready to return
movl %eax, (%edx)//Store *xp+y in *xp
popl %ebp
ret
.size simple, .-simple
.ident “GCC: (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3”
.section .note.GNU-stack,””,@progbits
It can be clearly seen that the number of assembly instructions has been drastically reduced, and here LZ has also added a simple comment. As can be seen from the simple comment of LZ, the optimization here is mainly to remove the existence of the variable t, thus reducing the number of instructions.
If any ape friend really can’t understand the meaning of these two assembly language, you can ignore it for the time being. Here LZ is just to let you experience the format of assembly language and get in touch with assembly language in person. Our purpose is not to understand its meaning. . I believe that after the 3.X series of explanations, when you come back to look at these two pieces of assembly code, you should be able to easily see the meaning.
Article summary
This chapter has been dragged for a long time, mainly because LZ, as a Java developer, has a little difficulty in learning assembly language. After all, LZ is not good at C/C++. There is another reason, because LZ hopes to understand the ins and outs as much as possible, so as not to mislead some ape friends.
Of course, even so, LZ can’t guarantee that he already knows the content of 3.X well, so if there is any inconsistency with your understanding of the ape friends, I hope you ape friends can put it forward. Not only can it avoid misleading the ape friends who read the blog post, but it can also help LZ correct the wrong understanding.
Well, the main purpose of this article is to bring you ape friends closer to the world of compilation, so it is just a brief introduction. Next, we’ll discuss registers, data formats, and some assembly instructions in depth.