3

I need to extract some information from a header file, and I need to get a site name from a string like this:

0008 0080 LO Institution Name                 Site Name Here

The problem is that the site name contains spaces too. The only thing that I came up that works is saving the line as a string and then get the site name as a string after a certain number of characters, like this:

echo ${line:50}

but I'd like something more elegant.

I just noticed that it also removed multiple spaces between Institution Name and Site Name.

6
  • post a more realistic input string(with real sitename) Commented Dec 12, 2017 at 11:07
  • The spaces are lost because you forgot to quote the value. You want echo "${string:50}". See stackoverflow.com/questions/10067266/… Commented Dec 12, 2017 at 11:20
  • With just a single example and no explanation of which part of the string you want, this is unclear. Can you specify which part of the string you want and in what circumstances this is failing? Also, "elegant" isn't really well-defined -- I find it hard to imagine that you would find anything simpler than what you already have. Commented Dec 12, 2017 at 11:21
  • @tripleee: Thanks for the edit. My first time here, not familiar yet with formatting, etc. I guess by elegant I meant doing it in one line without saving it into a variable first, e.g. pipe it to sed. Commented Dec 12, 2017 at 11:40
  • And you are looking for extracting the part after the long run of spaces? Can you verify that breaking on any occurrence of two spaces is what you really want? Commented Dec 12, 2017 at 11:47

3 Answers 3

5

If the question title is representative of your actual problem, and you want to extract the text after multiple adjacent spaces,

echo "${string##*  }"

with two spaces after the asterisk will extract a substring with the longest prefix ending with two spaces removed from the variable's value.

If you need to do this in a pipe, it's easy with sed:

something which produces the output string |
sed 's/.*  //'
Sign up to request clarification or add additional context in comments.

6 Comments

That will work. The only thing that stays the same between different inputs is the multiple spaces. The rest, like length, beginning of site name, numbers of words before it, can change. I didnt know I can use multiple characters in this construction. Thanks!
Hm, I tried sed almost like that, only used ^ for start of the string. Didn't work obviously...
^ matches the beginning of a line, but the regex .* matches everything from the start of the line anyway. sed 's/^.* //' should work just fine just as well.
I forgot the dot. I thought, from the start ^ everything * to double spaces ' ' will be replaced with nothing //.
No, in regex * means "the previous expression zero or more times"; and . means "any single character (except newline)".
|
0

I think awk would be an optimal choice. It can extract columns easily.

echo '0008 0080 LO Institution Name                 Site Name Here'|awk '{ print $7" "$8 }'

You are able to print whatever columns you want. (And do many other things.)

3 Comments

Unfortunately, the column numbers may change.
I need the last words, but it can be one, two or more. They are separated by multiple spaces from the rest of the string.
Ohh, so the sed solution seems to be the best.
-1

If given string format in not changed every time, following will do the trick.

A="0008 0080 LO Institution Name Site Name Here" echo $A | cut -d " " -f 6

1 Comment

Please see added comment. There are multiple spaces between "Institution Name" and "Site Name". Even if I use treat multiple delimiters as one option, this would still only get me the first word from the site name. It may have two or more separated by spaces.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.