1

I am creating an emulator for an instruction set architecture, and I needed to implement a stack structure. I decided that my %eip, %ebp and %esp would be int pointers. However, there are situations where I need to store memory addresses on the stack, in which case this memory would be encoded as an integer value. But when I return this value, I need to put it back into my instruction pointer, which is implemented as an int pointer. C will not let me assign my integer to my int pointer, so I have no way of recovering these memory addresses from the "stack". Any suggestions?

4
  • "C will not let me assign my integer to my int pointer" - what exactly happens? Did you try to cast? int*iptr=(int*)your_int Commented Aug 12, 2018 at 19:40
  • 5
    Need your code. Can't help without code. Commented Aug 12, 2018 at 20:07
  • 1
    Maybe you can cast through uintptr_t, a type defined in <inttypes.h> when it is available on your machine (and it usually is available, though it is theoretically an optional type). Or you may simply need to add an explicit cast. If you get warnings about different sizes of integer and pointer, then you need to worry a lot more. Commented Aug 12, 2018 at 20:07
  • Picking up on Jonathan Lefflers's comment. Using unsigned non-pointer types for these registers makes a little more sense so you can do unsigned arithmetic on them without casting. You will need this for almost any instruction you implement. After all you'll probably have to do some address translation before accessing the actual data. It is also less probable that you get address spaces mixed up between your emulator and the code run by your emulator. Also make sure that your unsigned type is wide enough to store the addresses of the emulated architecture. Commented Aug 12, 2018 at 20:29

3 Answers 3

2

To assign an int value to an int * object, use an explicit cast, as in:

destination = (int *) source;

Your question says “C will not let me assign my integer to my int pointer” but fails to state exactly what the problem is. Presumably you are getting some diagnostic message from the compiler. This would be because assigning an int value to an int * object violates the C standard’s constraints for assignments. The code above shows how to work around that.

That solves the immediate problem of the compiler diagnostic. However, there can be various issues with using int values as containers for pointers, including the possibility of trap values and discrepancies between the sizes of pointers and integers. Provided that int and int * are the same size, using an int to hold an int * is not unlikely to work, but you should be sure of the properties of your C implementation.

Sign up to request clarification or add additional context in comments.

2 Comments

"Provided that int and int * are the same size" -- this is an unsafe assumption. It is false on most 64-bit systems. (It is only true on ILP64 systems, which are… um, some Cray systems and the SPARC64 port of Solaris? Nothing you're likely to run into.)
@duskwuff: That is not an assumption. That is a statement of a requirement. As the last sentence says, the OP should be sure of the properties of their C implementation. The entire reason implementation-dependent behavior exists is so that people can write implementation-dependent code. As long as one understands the requirements and documents their software, it is perfectly fine engineering to write code for specific implementations.
0

I decided that my %eip, %ebp and %esp would be int pointers.

This is not a sound architectural decision. You need to reconsider it.

  1. The size of a pointer is architecture-dependent -- in particular, an int * will be 64 bits wide on a 64-bit system. By contrast, all of these registers are 32 bits wide by definition. Using a 64-bit pointer to store their values will result in unexpected behavior.

  2. These registers are not required to be aligned to an integer. In particular, EIP is (at best) aligned to an instruction, and will be incremented by one byte when running 1-byte instructions. Deferencing an int * which is not properly aligned will cause an unaligned access fault on many systems.

  3. There is no hard architectural distinction between any of the integer registers (EA/B/C/DX, ESP, EBP, ESI, EDI). All of them can be referenced in an ModRM encoding, and can be treated as either a numeric value or an address, depending on the context. Singling ESP and EBP out will unnecessarily complicate your emulator, and is likely to create a lot of obnoxious special cases in your code.

Note that, as you are emulating a 32-bit system on what might not be a 32-bit platform, you will need some way of translating addresses within the emulated system to "real" addresses in the host process. There are a number of different ways of doing this; which one is most appropriate for you will depend on your specific goals.

5 Comments

This is a comment about the OP’s code and design, and it is not an answer to their question. It could be a comment but should not be entered as an answer.
Additionally, regarding item 2, the OP’s intent is to store pointers in int objects, not arbitrary pointers in int * objects. They mention int * because they have int * for their %eip, %ebp, and %esp, but the pointers they want to store will be in the int pointed to by one of those.
@EricPostpischil Re-read the OP's description of their problems with popping data from the stack to these registers. It's pretty clear to me that they're trying to use the pointer itself -- not the value it points to -- as the register value.
About popping data, the question says “… I need to put it back into my instruction pointer, which is implemented as an int pointer.” Thus, they are not using int * as a storage container for an arbitrary value. They are restoring to an int * some value intended for the int * that is %eip. The fact that their “%eip” is an int * suggests the ISA they are emulating has a fixed instruction size so that all valid values for it are also valid int * values. In any case, they are not using an int * as a vehicle for arbitrary values, just for values intended for it.
Similarly, your comments about ModRM and such indicate you are assuming an x86 architecture, which OP did not assert. Quite possibly they are using the names “%eip”, “%ebp”, and “%esp” because those are what they are familiar with as names for instruction pointers, frame pointers, and stack pointers, not because they are emulating x86. It is likely also not relevant that the size of a pointer is implementation-dependent (not architecture-dependent as you wrote) because OP is likely writing for a specific implementation, not intending to write portable code.
0

It is implementation defined but if the integer width is not smaller than the pointer - you can use it this way.

Some people say that the using ptrdiff_t and NULL pointer as a reference is more portable and safer.

ptrdiff_t myptrdiff = myptr - (type_of_myptr *)NULL;

myptr = myptrdiff + (type_of_myptr *)NULL; 

2 Comments

myptr - (type_of_myptr *)NULL; is attempting pointer subtraction: UB
I am not sure what you are trying to achieve, but in any case this (type_of_myptr *) ought to be (type_of_myptr).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.