5

I've been looking into C++ and structs for a project I'm working on; at the moment I'm using 'chained' template structures to add in data fields in as pseudo-traits.

Whilst it works, I think I'd prefer something like multiple inheritance as in the example below:

struct a {
    int a_data;
}; // 'Trait' A

struct b {
    int b_data;
}; // 'Trait' B

struct c : public a, public b {
    int c_data;
}; // A composite structure with 'traits' A and B.

struct d : public b {
    int d_data;
}; // A composite structure with 'trait' B.

My experimental code examples show they work fine, but I'm a bit perplexed as to how its actually working when things get complex.

For example:

b * basePtr = new c;
cout << basePtr->b_data << endl;
b * basePtr = new d;
cout << basePtr->b_data << endl;

This works fine every time, even through function calls with the a pointer as a parameter.

My question is how does the code know where b_data is stored in one of the derived structs? As far as I can tell, the structs still use a compacted structure with no extra data (i.e. 3 int structs only take up 12 bytes, 2 ints 8 bytes, etc). Surely it needs some sort of extra data field to say where a_data and b_data are stored in a given structure?

It's more of a curiosity question as it all seems to work regardless, and if there are multiple implementations in use, I'll happily accept a single example. Though I do have a bit of a concern as I want to transfer the bytes behind these structs through a inter-process message queue and want to know if they'll be decoded OK on the other end (all the programs using the queue will be compiled by the same compiler and run on a single platform).

2
  • Righto - I admit to being abit ignorant of the details of stack overflow's ratings; will try and sort that now. Commented Jan 16, 2013 at 21:02
  • what you NEVER want to do is CAST a derived pointer to a base, especially using C-style cast, b *basePtr = (b *)cPtr; especially when the .cpp module does not have all the details of what cPtr is pointing to. Let the COMPILER RESOLVE THE CAST - don't force it when using multiple inheritance (or even simple inheritcance). C++ static_cast<> works because it would generate an error if it did not know how to cast an actual c pointer to a b pointer. C-style casts are EVIL. Commented Jan 16, 2013 at 21:07

7 Answers 7

3

In both cases, basePtr truly is a pointer to an object of type b, so there is no problem. The fact that this object is not a complete object, but rather a subobject of a more-derived object (this is actually the technical term), is not material.

The (static, implicit) conversion from d * to b *, as well as from c * to b *, takes care of adjus­ting the pointer value so that it really points to the b subobject. All the information is known statically, so the compiler makes all those computations automatically.

Sign up to request clarification or add additional context in comments.

2 Comments

Ahh, so if I understand you correctly, the implicit conversions effectively move the pointer from the base address (i.e. the c * address) to the sub-object address (i.e. &(ptr->b_data))?
@Doug: Yes, that's right. For single inheritance this conversion is often trivial, since compilers would typically put the base subobject at the front of the whole object. But if you inherit from two non-empty classes, then at least one of the base pointers should be genuinely different.
2

You should read the wikipedia value on C++ classes , under the memory management and class inheritance content.

Basically, the compiler creates the class structure, so at compile time it knows the offset to each part of the class.

When you call a variable, the compiler knows the type and therefore its structure, and if you cast it to a base class, it just needs to jump to the right off set.

Comments

2

On most implementations, a pointer conversion, say from c* to b*, will automatically adjust the address if necessary. In the statement

b * basePtr = new c;

the new expression allocates a c object, which contains an a base class subobject, a b base class subobject, and a c_data member subobject. In raw memory, this will probably look like just three ints. The new expression returns the address of the created complete c object, which is (on most implementations) the same as the address of the a base class subobject and the address of the a_data member subobject.

But then the expression new c, with type c*, is used to initialize a b* pointer, which causes an implicit conversion. The compiler sets basePtr to the address of the b base class subobject within the complete c object. Not hard, since the compiler knows the offset from a c object to its unique b subobject.

Afterward, an expression like basePtr->b_data doesn't need to know what the complete object type was. It just knows that b_data is at the very beginning of b, so it can simply dereference the b* pointer.

Comments

1

The details of this are up to the C++ implementation, but in a case like this, with non-virtual inheritance, you can think of it like this:

c has two sub-objects, one with type a and one with type b.

When you cast a pointer to c to a pointer to b, the compiler is smart enough so that the result of the cast is a pointer to the b sub-object of the c object referenced by the original pointer. This may involve changing the numerical value of the returned pointer.

Generally, with single inheritance, the sub-object pointer will have the same numerical value as the original pointer. With multi-inheritance, it might not.

Comments

1

Yes, there are extra fields that define the offset each sub-component has into the aggregate. But they are not stored in the aggregate itself, but most likely (although the ultimate choice about how to do that is left to the compiler designers) in auxiliary structure residing in a hidden side of the data segment.

Your objects are not polymorphic (and you used them wrongly, but I'll came to this later), but just compounds like:

c[a[a_data],b[b_data],c_data];
            ^
            b* points here 

d[b[b_data],d_data]
  ^
  b* points here

(Note that the real layout may depend on the particular compiler and even optimization flags used)

The offsets of the beginning of b respect to the beginning of c or d does not depend on the particular object instance, so it is not a value required to stay into the object, but just in a general d and c descriptions known to the compiler but not necessarily available to you.

The compiler knows, given a c or a d, where the b component begins. But given a b cannot know if it is inside a d or a c.

The reason why you used the object wrongly is that you did not care about their destruction. You allocate them with new, but never delete-ed them afterwards.

And you cannot just call delete baseptr since there is nothing in the b subcomponent that tells what the aggregate it is actually (at runtime) part of.

There are two programming style to come around it:

  • The classic OOP, assume the actual type is known at runtime, and pretends all your classes to have a virtual destructor: that gives to all the struct an extra "ghost" field (the v-table pointer, that point to a table in the "auxiliary descriptor", containing all the virtual functions' addresses) that makes the destructor call originated by delete to actually be dispatched to the most derived one (hence delete pbase will actually call c::~c or d::~d depending on the actual object)

  • The Generic programming style, assume you know in some other way (most likely from a template parameter) the actual derived type, so you will not delete pbase, but a static_cast<actual_derived_class*>(pbase)

Comments

0

Inheritance is the abstraction for a method to resuse functions from another class under it. The method can be called from the class if it's located in the class below it. A struct enables you to have variables as in a data structure similar to a class that uses a variable or a function.

class trait
{
  //variable definition 
  //variable declaration

  function function_name(variable_type variable_name, and more)
  {
    //operation on variables in function call
  }

  variable_name = function_name(variable_name);

  struct struct_name
  { 
    //variable definition
  }

  struct_name = {value_1, value_2, and more}

  operation on struct_name.value_1
} 

Comments

0

There is a distinction between compile time knowledge and runtime knowledge. Part of the job of the compiler is to make as much use of compile time information as possible to avoid having to do things at run time.

In this case, all the details of exactly where each piece of data is in a given type are known at compile time. So the compiler doesn't need to know it at runtime. Whenever you access a particular member, it just uses its compile time knowledge to compute the appropriate offset for the data you need.

The same thing goes for pointer conversions. It will adjust pointer values when they're converted to make sure the point at the appropriate sub-part.

Part of the reason this works is that the data values from an individual class or struct are never interleaved with any other that aren't mentioned in the class definition, even when that struct is a sub-component of another struct either through composition or inheritance. So the relative layout of any individual struct is always the same no matter where in memory it's found.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.