20

I'm trying to write a, for me, complicated script where my goal is to do the following. I have a string coming in that looks like this:

2012 2013 "multiple words"

My goal is to put each of these onto an array split by spaces, but only for single word matches, not those surrounded by double quotes. Those should be considered one word. So my idea was to do this in two steps. First match those words that are multiples, remove those from the string, then in another iteration split by white space.
Unfortunately I can't find help on how to echo the match only. So far I have this:

array=$(echo $tags | sed -nE 's/"(.+)"/\1/p')

But this would result in (on OS X):

2012 2013 multiple words

Expected result:

array[1]="2012"
array[2]="2013"
array[3]="multiple words"

How would I go about this sort of problem?

Thanks.

4
  • can you give an example of the expected result? I read multiple times the question about the double quoted words but I dont get it. you want the quoted words to be a single entry in the array or do you want to split them too? Commented Jun 27, 2013 at 9:05
  • Putting values into an array is not a goal, it's an implementation of something you think is the best approach to help you achieve a goal. If you tell us what you're really trying to do, with sample input and expected output, maybe we can suggest an alternative. Commented Jun 27, 2013 at 12:04
  • @zekus I've added an expected result. Will look into posted solutions now. Thanks everyone! Commented Jun 27, 2013 at 13:37
  • @doubleDown Thanks for editing! Much clearer now. Commented Jul 8, 2013 at 14:48

5 Answers 5

20

eval is evil, but this may be one of those cases where it comes handy

str='2012 2013 "multiple words"'
eval x=($str)
echo ${x[2]}
multiple words

Or with more recent versions of bash (tested on 4.3)

s='2012 2013 "multiple words"'
declare -a 'a=('"$s"')'
printf "%s\n" "${a[@]}"
2012
2013
multiple words
Sign up to request clarification or add additional context in comments.

7 Comments

+1, I think sometimes eval is a very handy tool. This is one of those cases. This should be accepted solution.
Thanks so much! I must add that @anubhava's solution is the same, except it doesn't include the eval part.
Is this completely safe? Any danger of executing commands within str?
@pimlottc, eval is not ever completely safe.
@1_CR I wasn't sure if perhaps the parens acted somehow as a guard in this case. If not, this doesn't seem like a recommendable solution.
|
4
$ grep -Eo '"[^"]*"|[^" ]*' <<< '2012 2013 "multiple words"'
2012
2013
"multiple words"

That is, print only the strings matching either

  1. a quote followed by any number (even zero) non-quotes followed by a quote or
  2. a series of characters not containing a quote or space.

Of course, this does not handle complicated cases like quotes spanning multiple lines or escaped quotes (using either double quotes like SQL or backslash like the shell).

1 Comment

What does grep -V return?
1

You can directly do:

arr=(2012 2013 "multiple words")

echo ${#arr[@]} # gives 3
echo ${arr[2]} # gives "multiple words"

EDIT: Not sure if it helps the OP but following will also workL

str='2012 2013 "multiple\ words"'
read -a arr <<< $str
echo ${#arr[@]} # gives 3
echo ${arr[2]} # gives "multiple words"

9 Comments

It's not that simple if the string is stored in a variable. Can you show an example using a variable?
@dogbane: Ok I will try to find that but yes fully agree that It's not that simple if the string is stored in a variable.
You've escaped the space in your string and it no longer matches the OP's input. Ideally, you should come up with a solution that splits str='2012 2013 "multiple words"' into an array of three elements.
@dogbane: That's why I said its a workaround :P. Only if OP can get the spaces to be escaped inside double quote then it will work. (Can be easily done with a per/sed as well).
@anubhava I've tried your solutions and ${#arr[@]} gives 4 instead of 3. Different version of Bash?
|
1

The following will produce the result you want:

tags='2012 2013 "multiple words"'
IFS=$'\n'; array=($(echo $tags | egrep -o '"[^"]*"|\S+'))

result in ZSH:

echo ${array[1]} # 2012
echo ${array[2]} # 2013
echo ${array[3]} # "multiple words"

result in BASH:

echo ${array[0]} # 2012
echo ${array[1]} # 2013
echo ${array[2]} # "multiple words"

works in OSX.

Comments

0

Here is a small Python script to parse space delimited csv while respecting quoted fields:

$ python -c '
import csv, fileinput
for line in csv.reader(fileinput.input(), delimiter=" "):
   for word in line:
      print word
' test.csv
2012
2013
multiple words

Since this uses the fileinput module, works in a pipeline (or a string in a variable) as well:

$ str='2012 2013 "multiple words"'
$ echo $str | python -c '
import csv, fileinput
for line in csv.reader(fileinput.input(), delimiter=" "):
   for word in line:
      print word
' 
2012
2013
multiple words

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.