1

I would like to know what happen to an object code when we use the linker to get an executable version of it.

I presume that the linker job is not the same for Linux nor window, I am on Linux.

1
  • @Jonas: this question arguably is about [compiler-construction] in terms of the whole process from source to linked executable. The linker isn't part of the compiler proper, but it is part of the toolchain and does get invoked as part of gcc foo.c for example. Someone writing their own compiler that makes executables needs to understand how object files work. I don't think this question is doing harm by having that tag, so I re-added it. Commented Nov 8, 2024 at 9:21

3 Answers 3

3

Object code is lacking information about the big picture. It contains executable code for functions, but all references to other, external functions, as well as to global data, cannot be part of the actual instructions, since their addresses are not known. So instead all those references are left blank (e.g. just filled with zero bytes in the object code) and annotated with a symbol name.

It's the linker's job to look at all the missing symbol names and match them up against all the exported names (i.e. functions and global data provided by the object files), then find a permanent location for each datum, and finally rewrite all the code to replace the zero bytes with the actual addresses at which the data (functions and global variables) are ultimately stored.


For example, consider this piece of C code:

extern int a;
extern int bar(int);     // "extern" is redundant here
static int zip(int);

int foo(int x, int y)
{
    return 2 * x + 3 * y + zip(x - y) + a * bar(x + y);
}

int zip(int n)
{
    return 2 * (n + 1) - (n - 1) / 2;
}

This code exports one symbol, foo, which it provides to anyone who links in this translation unit. It also has two missing symbols, a and bar. In the code implementing foo, the references to a and bar are left blank and can only be filled in by the linker when the linker knows where those actual data reside.

Here's the machine code generated for x86 by GCC with -O3:

0000000000000000 <foo>:
   0:   89 f9                   mov    ecx,edi
   2:   8d 04 76                lea    eax,[rsi+rsi*2]
   5:   53                      push   rbx
   6:   29 f1                   sub    ecx,esi
   8:   8d 51 ff                lea    edx,[rcx-0x1]
   b:   8d 1c 78                lea    ebx,[rax+rdi*2]
   e:   01 f7                   add    edi,esi
  10:   89 d0                   mov    eax,edx
  12:   c1 e8 1f                shr    eax,0x1f
  15:   01 c2                   add    edx,eax
  17:   d1 fa                   sar    edx,1
  19:   f7 da                   neg    edx
  1b:   8d 44 4a 02             lea    eax,[rdx+rcx*2+0x2]
  1f:   01 c3                   add    ebx,eax
  21:   e8 00 00 00 00          call   26 <foo+0x26>
  22:                                  R_X86_64_PC32       bar-0x4
  26:   0f af 05 00 00 00 00    imul   eax,DWORD PTR [rip+0x0]        # 2d <foo+0x2d>
  29:                                  R_X86_64_PC32       a-0x4
  2d:   01 d8                   add    eax,ebx
  2f:   5b                      pop    rbx
  30:   c3                      ret    

Note the bytes 22 and 29: The operands are left at zero, but there is an annotation telling the linker the name of the symbol to be filled in.

Sign up to request clarification or add additional context in comments.

Comments

1

Additionally to Kerrek's answer: The job of the linker is, to some extent, operating system dependent. For instance, the way external references (from .so or .dll files) are treated depends on the operating system, also how the different segments (data, code, etc) are placed within the file may depend on the operating system.

The header of the executable file - also generated by the linker - is operating system specific and defines the type of file and where to find the different segments. An executable file in Linux starts with an "ELF" header, in windows with an "MZ" header (these are the identification characters that can be found at the beginning of a file).

Comments

0

I presume that the linker job is not the same for Linux nor Windows

Just some addition to Kerrek SB's answer:

A linker works the same way on all operating systems. Only the file format of object and binary files differs.

2 Comments

Yeah, by and large... PMF makes a good point about shared objects and load-time loading (as opposed to link-time loading) of libraries, for which different platforms provide different mechanics. For example, on Linux the linker sets up the procedure link table (PLT) and trampoline functions for shared libraries... many details :-)
@Kerrek, Where Can I find Documentation about all of this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.