0

I am parsing XML with regex. It is well known so there is no need to worry about escaping etc and proper XML parsing.

grep is returning multiple lines and I want to store each match to a file.

However, I either get each line in between my tags in my array array=( $list ) or I get the whole output array=( "$list" ).

How can I loop over each match from grep?

My script currently looks like this:

#!/bin/bash

list=$(cat result.xml|grep -ozP '(?s)<tagname.*?tagname>')
array=( "$list" )
arraySize=${#array[@]}
for ((i = 0; i <= $arraySize; i += 1)); do
  match="${array[$i]}"
  echo "$match" > "$i".xml
done
1
  • Can you show sample data from result.xml? Commented Apr 6, 2016 at 19:16

3 Answers 3

1

According to this answer, the upcoming version of grep will change the meaning of the -z flag so that both input and output are NUL-terminated. So that will automatically do what you want, but it's only available today by downloading and building grep from the git repository.

Meanwhile, a rather hackish alternative is to use the -Z flag which terminates the file name with a NUL character. That means you need to print a "filename", which you can do by using -H --label=. That will print an empty filename followed by a NUL before each match, which is not quite ideal since you really want the NUL after each match. However, the following should work:

grep -ozZPH --label= '(?s)<tagname.*?tagname>' < result.xml | {
  i=0
  while IFS= read -rd '' chunk || [[ $chunk ]]; do
    if ((i)); then
      echo "$chunk" > $i.xml
    fi
    ((++i))
  done
}
Sign up to request clarification or add additional context in comments.

3 Comments

you meant ((++i)) ? ;-) . Wow, -ozZPH ... grep has grown up since Sun 4 days ;-) Good luck to all.
@shellter- wow, lots of typos. Hopefully fixed, thanks. Maybe -PHozZ would be cooler :)
-PHozZ LOL, another advantange to having more options on grep ;-) .
0

Directly cat you lines to a while loop

my_spliting_command | grep something | while read line
do
    echo $line >myoutputfile.txt
done

Comments

0

You could use grep to grab all the matches first, and then use awk to save each matched pattern into separate files (e.g. file1.xml, file2.xml, etc):

cat result.xml | grep -Pzo '(?s)(.)<tagname.*?tagname>(.)' | awk '{ print $0 > "file" NR ".xml" }' RS='\n\n'   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.