1

Recently I started to look into some operating system's source code, there is a special coding technique which puzzles me a lot.

First the source code declare a very basic struct, such as:

struct cmd {
   int type;
};

And then, it continue to declare several other structs which contain the first basic struct at their beginning:

struct execcmd {
  int type; //Here.
  char *argv[MAXARGS];
  char *eargv[MAXARGS];
};

struct redircmd {
  int type; //Here.
  struct cmd *cmd;
  char *file;
  char *efile;
  int mode;
  int fd;
};

Because the identity in the first few bytes of these structs, we are able to access the shared int type part even though we are not sure of which exactly the structure it is. And we can use the int type part to cast the struct pointer to the correct one:

void runcmd(struct cmd *cmd)
{
  switch(cmd->type){
  case EXEC:
    ecmd = (struct execcmd*)cmd;

  case REDIR:
    rcmd = (struct redircmd*)cmd;
    break;

  case LIST:
    lcmd = (struct listcmd*)cmd;
    break;

  case PIPE:
    pcmd = (struct pipecmd*)cmd;
    break;

  case BACK:
    bcmd = (struct backcmd*)cmd;
    break;
  }

So my question is, what is the name and benefit of this techinique, or, what is the normal use case for this technique?

3
  • 1
    Maybe we should ask what is the alternative in C to use a variable struct data and it would be done in other languages. In Java or C# method overloading would allow to call the method as per the type of the class used as a parameter. Commented Dec 23, 2020 at 17:15
  • 1
    This method is often named smart union Commented Dec 23, 2020 at 17:29
  • @Tarik That's exactly what I mean! I thought these code is trying to mimic some thing offered in other high-level language, which turn out to be 'a variable struct data'. Commented Dec 24, 2020 at 4:47

2 Answers 2

3

This is known as a "common initial sequence". Per 6.5.2.3 Structure and union members, paragraph 6 of the C11 standard:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

Strictly speaking, the code you have posted is incorrect as the struct members are not used via a common union.

Sign up to request clarification or add additional context in comments.

Comments

2

This technique is used any time you're storing some data that may have an arbitrary type from some set of types. This is what could be called a variant type.

Suppose you were writing a parser for mathematical expressions - building an AST (abstract syntax tree). You'd want each node in the tree to be able to be handled generically by some code that for example can serialize and deserialize the tree. The generic code could use the type tag to call the type-specific serialization/deserialization method (also called a virtual observer method). The observers would then cast the node to a concrete "derived" type, and use that to operate on it.

enum NodeType { NodeA, NodeB };

struct Node {
  enum NodeType type;
} typedef Node;

typedef void (Observer*)(Node *node, void *context);

void serializeNodeA(Node *node, void *context); 
void deserializeNodeA(Node *node, void *context); 
void serializeNodeB(Node *node, void *context); 
void deserializeNodeB(Node *node, void *context); 

struct VirtualMethods {
  Observer serialize;
  Observer deserialize;
} typedef VirtualMethods;

const VirtualMethods vtables[] = {
  {{serializeNodeA, deserializeNodeA},
   {serializeNodeB, deserializeNodeB}};

void serializeNode(Node *node, void *context) {
  int type = node->type;
  Observer serialize = vtables[type].serialize;
  serialize(node, context);
}

void deserializeNode(Node *node, void *context) {
  int type = node->type;
  Observer deserialize = vtables[type].deserialize;
  deserialize(node, context);
}

There are other applications as well, of course.

Using the type integer tag to select a virtual function table saves space compared to directly storing a virtual function table pointer, and is more flexible since to compare types you don't need to compare pointers to tables.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.