3

I have been implementing just for fun a simple operating system for x86 architecture from scratch. I implemented the assembly code for the bootloader that loads the kernel from disk and enters in 32-bit mode. The kernel code that is loaded is written in C, so in order to be executed the idea is to generate the raw binary from the C code.

Firstly, I used these commands:

$gcc -ffreestanding -c kernel.c -o kernel.o -m32

$ld -o kernel.bin -Ttext 0x1000 kernel.o --oformat binary -m elf_i386

However, it didn't generate any binary giving back these errors:

kernel.o: In function 'main':
kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'

Just for clarity sake, the kernel.c code is:

/* kernel.c */

void main ()
{
   char *video_memory = (char *) 0xb8000 ;
   *video_memory = 'X';
}

Then I followed this tutorial: http://wiki.osdev.org/GCC_Cross-Compiler to implement my own cross-compiler for my own target. It worked for my purpose, however disassembling with the command ndisasm I obtained this code:

00000000  55                push ebp
00000001  89E5              mov ebp,esp
00000003  83EC10            sub esp,byte +0x10
00000006  C745FC00800B00    mov dword [ebp-0x4],0xb8000
0000000D  8B45FC            mov eax,[ebp-0x4]
00000010  C60058            mov byte [eax],0x58
00000013  90                nop
00000014  C9                leave
00000015  C3                ret
00000016  0000              add [eax],al
00000018  1400              adc al,0x0
0000001A  0000              add [eax],al
0000001C  0000              add [eax],al
0000001E  0000              add [eax],al
00000020  017A52            add [edx+0x52],edi
00000023  0001              add [ecx],al
00000025  7C08              jl 0x2f
00000027  011B              add [ebx],ebx
00000029  0C04              or al,0x4
0000002B  0488              add al,0x88
0000002D  0100              add [eax],eax
0000002F  001C00            add [eax+eax],bl
00000032  0000              add [eax],al
00000034  1C00              sbb al,0x0
00000036  0000              add [eax],al
00000038  C8FFFFFF          enter 0xffff,0xff
0000003C  16                push ss
0000003D  0000              add [eax],al
0000003F  0000              add [eax],al
00000041  41                inc ecx
00000042  0E                push cs
00000043  088502420D05      or [ebp+0x50d4202],al
00000049  52                push edx
0000004A  C50C04            lds ecx,[esp+eax]
0000004D  0400              add al,0x0
0000004F  00                db 0x00

As you can see, the first 9 rows (except for the NOP that I don't know why it is inserted) are the assembly translation of my main function. From 10 row to the end, there's a lot code that I don't know why it is here.

In the end, I have two questions:

1) Why is it produced that code?

2) Is there a way to produce the raw machine code from C without that useless stuff?

15
  • 1
    You are looking at inefficient code generated when you don't enable optimizations. When compiling C you could try to pass -O3 . The first part of the code generated is typical stack frame prologue and then it allocates space on the stack for local variables. Commented Feb 28, 2017 at 12:12
  • Inserting the option for optimization of course it does not generate the stack frame prologue, however it still does generate the code after the RET that has not matching with the main function. Commented Feb 28, 2017 at 12:19
  • 1
    The stuff after the function is likely exception handling information. I didn't look at it closely. It really isn't code but data. You could try building with GCC using -fno-exceptions and see what happens Commented Feb 28, 2017 at 12:20
  • 1
    @Olaf, although C doesn't, GCC will often still create an .eh_frame section in the object. I usually use a linker script to discard the .eh_frame section and the comment section(as well as the build notes). Commented Feb 28, 2017 at 12:41
  • 1
    What GCC you are using? I tried your code and it works fine - binary is 21 bytes. Also rename main() to _start() to eliminate warning. Commented Feb 28, 2017 at 12:44

1 Answer 1

2

A few hints first:

  • avoid naming your starting routine main. It is confusing (both for the reader and perhaps for the compiler; when you don't pass -ffreestanding to gcc it is handling main very specifically). Use something else like start or begin_of_my_kernel ...

  • compile with gcc -v to understand what your particular compiler is doing.

  • you probably should ask your compiler for some optimizations and all warnings, so pass -O -Wall at least to gcc

  • you may want to look into the produced assembler code, so use gcc -S -O -Wall -fverbose-asm kernel.c to get the kernel.s assembler file and glance into it

  • as commented by Michael Petch you might want to pass -fno-exceptions

  • your probably need some linker script and/or some hand-written assembler for crt0

  • you should read something about linkers & loaders


 kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'

This smells like something related to position-independent-code. My guess: try compiling with an explicit -fno-pic or -fno-pie

(on some Linux distributions, their gcc might be configured with some -fpic enabled by default)

PS. Don't forget to add -m32 to gcc if you want x86 32 bits binaries.

Sign up to request clarification or add additional context in comments.

4 Comments

I'm your upvote. I also provided an example in my last comment using OBJCOPY instead of a linker script. Linker script is my preference of course but there is always more than one way to skin the cat.
Thanks for the advices. With the -fno-pic option I can compile directly with my gcc without using the cross-compiler gcc I made. However, even passing the option -fno-exceptions, If I disassembly from the binary I had the same useless code after the RET. With the procedure proposed by @Michael Petch it worked fine! Thanks also to you
@gyro91 : If you are going to work on a toy OS over the long term I highly recommend you stick with a cross compiler. It will save you hassles and grief in the long run. The useless code is actually data being interpreted as instructions by NDISASM because in a binary file it can't properly distinguish between code and data that has been lumped together.
@Michael Petch: Actually I am working with a cross compiler. I used OBJCOPY as you suggested and it worked fine!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.