0

I need to parse a large C source code to extract all structure definitions, typical format is

typedef struct structure1 {
field1;
field2;
.....
structure2 new_strut;
};

struct structure2 {
field3;
}my_struct;

How can I extract these structures?

6
  • ...why would you assume that grep or sed are the right tool for the job? Commented Oct 24, 2016 at 23:10
  • (grep is quite certainly the wrong tool; sed could be used, but I'd certainly far rather use awk -- or just native bash, which is adequate to the task without any external tools whatsoever). Commented Oct 24, 2016 at 23:11
  • edited the question to include 'awk' Commented Oct 24, 2016 at 23:13
  • 1
    Again, why are you listing specific tools and limiting your answer to them? That's a bigger problem than just awk being missing. If you want to know how to do X, ask how to do X, not how to do X in a way according to your preconceptions of which tools you might use for the job. Commented Oct 24, 2016 at 23:13
  • (Is the real constraint "using only standard UNIX tools"? Then ask it with that precise constraint; there might be another standard UNIX tool useful for the job you don't already know about). Commented Oct 24, 2016 at 23:16

1 Answer 1

2

awk is a fairly good fit for the job:

awk '
  BEGIN { in_struct=0; }
  /^(typedef )?struct .*/ { in_struct=1; }
  /^}/ && in_struct { print; in_struct=0; }
  in_struct == 1 { print; }
'

However, you could also do it in native bash with no external tools whatsoever:

#!/bin/bash
#      ^^^^- bash, not /bin/sh

struct_start_re='^(typedef )?struct '
struct_end_re='^}'

filter_for_structs() {
  in_struct=0
  while IFS= read -r line; do
    [[ $line =~ $struct_start_re ]] && in_struct=1
    if (( in_struct )); then
      printf '%s\n' "$line"
      [[ $line =~ $struct_end_re ]] && in_struct=0
    fi
  done
}

...used akin to the following:

cat *.[ch] | filter_for_structs
Sign up to request clarification or add additional context in comments.

1 Comment

Right and to get it to work with files other than the "typical format" (e.g. any that have comments or slightly different white space in them!) you'd need to strip comments using gcc or similar and normalize the code layout using indent or similar first, e.g. sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' file.c | gcc -P -E - | sed 's/aC/#/g; s/aB/__/g; s/aA/a/g' | indent - | awk 'above script'. See stackoverflow.com/a/35708616/1745001 for what the seds are doing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.