2

I have a file containing three kinds of lines:

[       ]   APPLE
[ORANGE ] * ORANGE      on XXXXXXXXXXXXXXX
[YELLOW ] + BANANA      on XXXXXXXXXXXXXXX

What I want to do now is to extract the fruit name like below:

APPLE
ORANGE
BANANA

I tried to extract it with echo ${line:start:end} before I realized both the length of the line might vary. So I guess I have to do it with pattern matching.

I'm new to bash, how should I extract the fruit name, with sed/awk or any other ways?

Thanks!

5
  • what will be expected output from above 3 lines? Commented Apr 27, 2015 at 3:03
  • @josifoski, thanks for the reply, the post is updated Commented Apr 27, 2015 at 3:06
  • can there be more than 9 SERVICE entries, i.e. SERVICE11 or SERVICE23 or SERVICE999999 (just for a few possible examples). Please update your question with this information, rather than replying in a comment thread. Good luck. Commented Apr 27, 2015 at 3:18
  • @shellter Sorry for causing confusion, what I mean was the length of service name might vary, if that's your concern. I'm new to bash and to here as well, so thanks for telling me that! Post updated. Commented Apr 27, 2015 at 3:29
  • You need to explain the rules more in detail. I expect the input lines starts with something inside square parenthesis ([ ... ]), followed zero or more symbols (+ *), followed by some fruit names (NOTE: can fruit name contains two words, like "start fruit"?), followed by zero or more "on xxxxx"? If your input ends with some garbage other than "on xxxxx", and you have to deal with two-worded fruits like "start fruit", how do I know the 2nd word is part of the fruit name or a garbage word, since garbage word can be something other than onXXX? Commented Apr 27, 2015 at 4:04

4 Answers 4

1

This deals with the two-worded fruit names like "star fruit", but must assume that the trailing garbage (if any) starts with "on" (i.e. those "on XXXXXX"). It also assumes that the fruit name start after the first left-square parenthesis ("]"):

sed -e 's/^[^]]*][^A-Za-z]*//' -e 's/\bon\b.*$//'  -e 's/\s*$//' your_file

Explanations:

-e 's/^[^]]*][^A-Za-z]*//': Removes anything from the start until first "]", the first "]", and any non-alphabets following the first "]".

-e 's/\bon\b.*$//': Removes a whole word "on" til the end of a line, if it exists.

-e 's/\s*$//': Removes any trailing spaces, after the above processing.

Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, I should have made it clear that the fruit name contains one word only. Anyway, this works, thanks!
1

Try with this sed

sed 's/^\[....\] . \([A-Za-z0-9]*\).*/\1/' file

Comments

1

Use grep with extended regex -E and -o flag to return only matching bits

grep -o -E 'SERVICE[_0-9A-Za-z]+' file

The + will ensure that digits greater than 9 are still returned

edited to match the changes in question

3 Comments

Sorry, I didn't explain the problem clearly, please check for the updates. Thanks!
since this searches just looks for strings that look like SERVICEXXXXX (inclusive of underscores, numbers and characters), it does not really need to care about what happens before and after this string.
Actually what I meant there was $SERVICE_NAME, the service name can be anything with strings and numbers. I've updated the post again using examples, hopefully a better description of the problem.
0

You can use this awk with custom field separator to get your values:

awk -F '\\[[^]]+\\][ *+]+| *on *' '{print $2}' file
APPLE
ORANGE
BANANA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.