2

I have a requirement to split a string on a multi-character delimiter and return the values into an array in Bash for further processing

IFS can take a single character delimiter.

a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222_MultiCharDel_2;EEEE;FFFFFFF;22222" 
awk'{split($0,ArrayDeltaMulDep,"_MultiCharDel_")}' <<< $a

The input string can have several substrings separated by the MultiCharDel delimiter.

How can i access this array ArrayDeltaMulDep fur further processing in Bash?

3
  • what about awk '{split(...); print ArrayDeltaMulDep[3]} <<< $a` for example.Else please clarify your Q with required example output. That is always a good thing to include ;-) .. Good luck. Commented Oct 12, 2016 at 22:43
  • Will it not just return the 4th element of the ArrayDeltaMulDep array? I need to get the full ArrayDeltaMulDep array for further processing in my code. Commented Oct 12, 2016 at 22:48
  • as I said, "Else please clarify your Q with required example output. ". Good luck. Commented Oct 12, 2016 at 22:59

1 Answer 1

2

Your example string, a, does not contain newlines. If that is true in general, then:

a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222" 
readarray -t b <<< "${a//MultiCharDel/$'\n'}"

We can verify that this split the string properly using declare -p to show the value of b:

$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")

How it works:

  1. readarray -t b

    This reads lines from stdin and puts then in a bash array b.

  2. <<< "${a//MultiCharDel/$'\n'}"

    ${a//MultiCharDel/$'\n'} uses pattern substitution to replace MultiCharDel with a newline character. <<< provides the result as stdin to the command readarray.

Hat tip: Chepner

More general solution

A bash string will never contain a null character (hex 00). Using GNU sed:

b=()
while read -d '' -r line
do
   b+=("$line")
done < <(sed 's/MultiCharDel/\x00/g; s/$/\x00/' <<<"$a")

This again creates an array with the desired splitting:

$ declare -p b
declare -a b=([0]="2;AAAAA;BBBBB;1111_" [1]="_2;CCCC;DDDDDD;22222")
Sign up to request clarification or add additional context in comments.

7 Comments

Does this solution work even if there are several sub-strings separated by the MultiCharDel ??Ex the string a can also be a="2;AAAAA;BBBBB;1111_MultiCharDel_2;CCCC;DDDDDD;22222_MultiCharDel_2;EEEE;FFFFFFF;22222"
@NishantShrivastava In that string, there is only one occurrence of MultiCharDel. I copy and pasted the string from your comment and, after the second occurrence of Multi, but only the second occurrence, there are two zero-width unicode characters. That prevents the second occurrence from being matched.
readarray -t b <<< "${a//MultiCharDel/$'\n'}" would be far better than an unquoted expansion inside parentheses.
@chepner That is a vastly superior approach! Answer updated.
@NishantShrivastava You can use while IFS= read -r line; do b+=("$line"); done <<< "${a//MultiCharDel/$'\n'}" in its place.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.