Bash extract string between two patterns

Question

I have a file containing three kinds of lines:

[       ]   APPLE
[ORANGE ] * ORANGE      on XXXXXXXXXXXXXXX
[YELLOW ] + BANANA      on XXXXXXXXXXXXXXX

What I want to do now is to extract the fruit name like below:

APPLE
ORANGE
BANANA

I tried to extract it with echo ${line:start:end} before I realized both the length of the line might vary. So I guess I have to do it with pattern matching.

I'm new to bash, how should I extract the fruit name, with sed/awk or any other ways?

Thanks!

can there be more than 9 SERVICE entries, i.e. SERVICE11 or SERVICE23 or SERVICE999999 (just for a few possible examples). Please update your question with this information, rather than replying in a comment thread. Good luck. — shellter
– shellter, Commented Apr 27, 2015 at 3:18
@shellter Sorry for causing confusion, what I mean was the length of service name might vary, if that's your concern. I'm new to bash and to here as well, so thanks for telling me that! Post updated. — haust
– haust, Commented Apr 27, 2015 at 3:29
You need to explain the rules more in detail. I expect the input lines starts with something inside square parenthesis ([ ... ]), followed zero or more symbols (+ *), followed by some fruit names (NOTE: can fruit name contains two words, like "start fruit"?), followed by zero or more "on xxxxx"? If your input ends with some garbage other than "on xxxxx", and you have to deal with two-worded fruits like "start fruit", how do I know the 2nd word is part of the fruit name or a garbage word, since garbage word can be something other than onXXX? — Robin Hsu
– Robin Hsu, Commented Apr 27, 2015 at 4:04

Robin Hsu · Accepted Answer · 2015-04-27 04:26:57Z

1

This deals with the two-worded fruit names like "star fruit", but must assume that the trailing garbage (if any) starts with "on" (i.e. those "on XXXXXX"). It also assumes that the fruit name start after the first left-square parenthesis ("]"):

sed -e 's/^[^]]*][^A-Za-z]*//' -e 's/\bon\b.*$//'  -e 's/\s*$//' your_file

Explanations:

-e 's/^[^]]*][^A-Za-z]*//': Removes anything from the start until first "]", the first "]", and any non-alphabets following the first "]".

-e 's/\bon\b.*$//': Removes a whole word "on" til the end of a line, if it exists.

-e 's/\s*$//': Removes any trailing spaces, after the above processing.

edited Apr 27, 2015 at 4:26

answered Apr 27, 2015 at 4:20

Robin Hsu

4,5743 gold badges27 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

haust Over a year ago

Sorry, I should have made it clear that the fruit name contains one word only. Anyway, this works, thanks!

josifoski · Accepted Answer · 2015-04-27 03:18:39Z

1

Try with this sed

sed 's/^\[....\] . \([A-Za-z0-9]*\).*/\1/' file

answered Apr 27, 2015 at 3:18

josifoski

1,7261 gold badge14 silver badges20 bronze badges

Comments

Reuben L. · Accepted Answer · 2015-04-27 03:26:40Z

1

Use grep with extended regex -E and -o flag to return only matching bits

grep -o -E 'SERVICE[_0-9A-Za-z]+' file

The + will ensure that digits greater than 9 are still returned

edited to match the changes in question

edited Apr 27, 2015 at 3:26

answered Apr 27, 2015 at 3:19

Reuben L.

2,8592 gold badges31 silver badges48 bronze badges

3 Comments

haust Over a year ago

Sorry, I didn't explain the problem clearly, please check for the updates. Thanks!

Reuben L. Over a year ago

since this searches just looks for strings that look like SERVICEXXXXX (inclusive of underscores, numbers and characters), it does not really need to care about what happens before and after this string.

haust Over a year ago

Actually what I meant there was $SERVICE_NAME, the service name can be anything with strings and numbers. I've updated the post again using examples, hopefully a better description of the problem.

anubhava · Accepted Answer · 2015-04-27 04:19:44Z

0

You can use this awk with custom field separator to get your values:

awk -F '\\[[^]]+\\][ *+]+| *on *' '{print $2}' file
APPLE
ORANGE
BANANA

answered Apr 27, 2015 at 4:19

anubhava

790k67 gold badges603 silver badges671 bronze badges

Collectives™ on Stack Overflow

Bash extract string between two patterns

4 Answers 4

1 Comment

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related