2

i've mixed up some xml files and now have something like

<Schema>
stuff
</Schema><Schema>
stuff
</Schema><Schema>
..

i need to split them all so to have from <Schema> to </Schema> in each file

1 Answer 1

4

One way using . It splits registers with end tag and if there are characters before it print all it:

awk -c '
    BEGIN { RS = "</Schema>" } 
    $0 ~ /[^[:blank:]\n]/ { 
        printf "%s\n", $0 RS >> FILENAME "_" ++i ".xml" 
    }
' infile

Assuming infile with content:

<Schema>
stuff
</Schema><Schema>
more stuff
</Schema><Schema>
and more stuff
</Schema>

It yields:

==> infile_1.xml <==
<Schema>
stuff
</Schema>

==> infile_2.xml <==
<Schema>
more stuff
</Schema>

==> infile_3.xml <==
<Schema>
and more stuff
</Schema>
Sign up to request clarification or add additional context in comments.

6 Comments

amazing. one cool thing would be to have the output like this "infile_1.xml" "infile_2.xml" and so on
@mfirry: Simply add FILENAME variable to the output redirection of the printf. I've updated the answer with it.
awesome! one more thing before declaring you my saviour is i'm gonna have to put it in a sh file and inside a 'for file in .' kinda thing. it'll hopefully work, right?
@mfirry: Why should not work? I can't see much problem. That is not a problem of awk.
#!/bin/bash for file in $(find . "*.xml") do awk -c ' BEGIN { RS = "</Schema>" } $0 ~ /[^[:blank:]\n]/ { printf "%s\n", $0 RS >> FILENAME "_" ++i } ' $file done with that i get some gawk error Usage: gawk [POSIX or GNU style options] -f progfile [--] file ...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.