2

I have this file (dev1.temp):

 <?xml version="1.0" encoding="UTF-8"?>
<krpano version="1.0.8.15" showerrors="false">

          <include url="include/sa/index.xml" /> <include url="content/sa.xml" />
          <include url="include/global/index.xml" />
          <include url="include/orientation/index.xml" />
          <include url="include/movecamera/index.xml" /> <include url="content/movecamera.xml" />
          <include url="include/fullscreen/index.xml" />
          <include url="include/instructions/index.xml" />
          <include url="include/coordfinder/index.xml" />
          <include url="include/editor_and_options/index.xml" />
</krpano>

The goal is to get all the url's content and put them in a temp file (devel.temp). The output would be:

include/sa/index.xml
content/sa.xml
include/global/index.xml
include/orientation/index.xml
include/movecamera/index.xml
content/movecamera.xml
include/fullscreen/index.xml
include/instructions/index.xml
include/coordfinder/index.xml
include/editor_and_options/index.xml

To do the trick I have the following script:

# Make a temp file with all the files url's    
grep -o 'url=['"'"'"][^"'"'"']*['"'"'"]' $temp_folder"/devel1.temp" > $temp_folder"/devel2.temp"
# Strip off everything to leave just the url's'    
sed -e 's/^url=["'"'"']//' -e 's/["'"'"']$//' $temp_folder"/devel2.temp" > $temp_folder"/devel.temp"

Yesterday it worked perfectly. Today, devel2.temp and devel.temp output is this:

[01;31m[Kurl="include/sa/index.xml"[m[K
[01;31m[Kurl="content/sa.xml"[m[K
[01;31m[Kurl="include/global/index.xml"[m[K
[01;31m[Kurl="include/orientation/index.xml"[m[K
[01;31m[Kurl="include/movecamera/index.xml"[m[K
[01;31m[Kurl="content/movecamera.xml"[m[K
[01;31m[Kurl="include/fullscreen/index.xml"[m[K
[01;31m[Kurl="include/instructions/index.xml"[m[K
[01;31m[Kurl="include/coordfinder/index.xml"[m[K
[01;31m[Kurl="include/editor_and_options/index.xml"[m[K

Any ideas about what's going on?

4 Answers 4

3

Consider using xml targeted tools, for example xpath. I'd suggest this:

xpath -e "/krpano/include/@url" -q yourFile.xml | cut -f 2 -d "=" | sed 's/"//

If you're sure that the xml will have krpano root with include's only having url attribute. You can also use the below for shorthand, but the above will run faster.

xpath -e "//@url" -q yourFile.xml | cut -f 2 -d "=" | sed 's/"//
Sign up to request clarification or add additional context in comments.

Comments

2

Seems like grep is using ANSI sequences to colour its output even when the output is not the terminal. Change its --color from always to auto.

Rather than using grep to process XML, you should use an XML-aware tool. For example, in xsh, you can write

open file.xml ;
perl { use Term::ANSIColor } ;
for /krpano/include
    echo :s { color('bright_yellow') }
            @url
            { color('reset') } ;

2 Comments

This should not be the accepted answer, though. You should not use line-oriented regex tools to manipulate structured formats like XML.
@tripleee: Better now?
2

In additional to choroba's comment re. your ANSI sequences, I would avoid parsing XML via sed etc. where possible, and look to use an XML-aware scripting tool. I use the XMLStarlet toolkit. It'll mean your scripts are character-encoding/entity aware and more robust int he face of changing XML.

Comments

1

A third xml aware scripting tool is my Xidel:

xidel /tmp/your.xml -e //@url

(contrary to most it supports XPath 2.0, although that is overkill for this problem)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.