Bash - extract some strings from a file using regex

Question

I have the file below:

some lines = \
   some params

SUBDIRS = \
    text1 \
    text2 \
#  commented text
   text3 \
       text4 \

OTHERS = \
     other text here \

I want to extract a list containing text1, text2, text3, text4. How can I proceed?

The question is a bit incomplete. You've not specified how you want to identify the file names to be extracted. Pulling them by name is not very hard: grep -ow 'text[0-9]' could do the job for the example (assuming GNU grep). If you want the file list associated with SUBDIRS, then you have to work harder — this is what the answers so far have assumed. What should the output format be? One word per line without backslashes? Does the code need to handle all the niceties of make syntax? How will you invoke the command? What should happen if the name specified is not found in the file? — Jonathan Leffler
– Jonathan Leffler, Commented May 10, 2016 at 0:20

John1024 · Accepted Answer · 2016-05-09 23:00:19Z

This extracts all the lines after SUBDIRS but before OTHERS excluding lines that start with # or are empty:

$ awk '/OTHERS/{f=0} f && /./ && !/^#/{print} /SUBDIRS/{f=1}' file
    text1 \
    text2 \
   text3 \
       text4 \

How it works

This program uses one variable, f: f is 1 when we are in the desired range of lines and 0 elsewhere.

/OTHERS/{f=0}

If we have reached OTHERS, then set f back to zero.
f && /./ && !/^#/{print}

If f is nonzero (f) and the line is not blank (/./) and the line does not start with # (!/^#/), then print this line.
/SUBDIRS/{f=1}

If we have reached the line containing SUBDIRS, then set f to 1.

Slightly briefer form

Because print is the default action when no action is specified, we can omit it from the script:

$ awk '/OTHERS/{f=0} f && /./ && !/^#/; /SUBDIRS/{f=1}' file
    text1 \
    text2 \
   text3 \
       text4 \

Alternative output format

This removes the trailing \ and combines all the output to one line:

$ awk '/OTHERS/{f=0} $NF=="\\"{$NF=""} f && /./ && !/^#/{a=a" "$0} /SUBDIRS/{f=1} END{print a}' file
 text1  text2  text3  text4

aghast · Accepted Answer · 2016-05-09 23:28:23Z

0

Try this. SPACE_TAB is, I hope obviously, a space plus a tab.

$ SPACE_TAB="   "; sed -ne '/^SUBDIRS/,/^['"$SPACE_TAB"']*$/p' test.in
SUBDIRS = \
    text1 \
    text2 \
#  commented text
   text3 \
       text4 \

answered May 9, 2016 at 23:28

aghast

15.4k4 gold badges31 silver badges58 bronze badges

3 Comments

John1024 Over a year ago

Since this question is tagged bash, you may want to use SPACE_TAB=$'\t ' in order to be sure that the correct characters are assigned to the variable.

aghast Over a year ago

I don't even want there to be a variable! I just put that in there to call it out.

John1024 Over a year ago

In that case, you can use the POSIX form for space-tab: sed -ne '/^SUBDIRS/,/^[[:blank:]]*$/p'

Collectives™ on Stack Overflow

Bash - extract some strings from a file using regex

2 Answers 2

How it works

Slightly briefer form

Alternative output format

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

How it works

Slightly briefer form

Alternative output format

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related