Bash, perl regexp help

Question

I have a text file (utf8):

Please help me with regexp. I want to replace

<p>
TERRANO...
</p>

with: empty space. :)

And:

<td width="20%" align="left" class="thead">Rám:</td>

With:

<td width="20%" align="left" class="thead">Something else:</td>

Just word "Rám" is also OK to replace.

I found this line, but I dont know how to use it:

find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;

Is this a one-time thing? Just fire up your favorite text editor and do a "search and replace". You could use regex, but why bother? — eykanal
– eykanal, Commented Apr 14, 2011 at 19:56
I think that this would be best handled with XSLT, not perl. (As much as I personally love the swiss army chainsaw) — Albert Perrien
– Albert Perrien, Commented Apr 14, 2011 at 19:57
@Albert Perrien - There are some XML parsers for Perl that are quite nice, and will probably simplify this process quite a bit. No need to jump ship. :) — Chris Lutz
– Chris Lutz, Commented Apr 14, 2011 at 20:10

DigitalRoss · Accepted Answer · 2011-04-14 20:02:53Z

3

assuming you want to replace text in HTML files:

cd /path/to/my/project
find . -iname '*.html' -exec perl -p -i -e "s/Rám:/Something else:/g" {} \;
find . -iname '*.html' -exec perl -p -i -e "s/TERRANO.../Something else:/g" {} \;

edited Apr 14, 2011 at 20:02

DigitalRoss

147k25 gold badges255 silver badges336 bronze badges

answered Apr 14, 2011 at 19:58

FoneyOp

3361 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

FoneyOp Over a year ago

"s/[^<]+</replacement string here/g" this will replace EVERYTHING in EVERY with "replacement string here". to be more selective you can add additional conditionals in place of [^<]. google regex tester for interactive help on building your regex.

Chris Lutz Over a year ago

@hullamospapagaj - Use an XML parser. Please?

Chris Lutz Over a year ago

@FoneyOp - Nope. "stuff...<a href="http://somethi.ng/">link</a>" will probably not yield the result the OP desires.

FoneyOp Over a year ago

Depends on what he needs to do. If it's only a single paragraph and there is unique text around the value to be replaced, the perl regex will be much faster to implement than a DOM or SAX parser. Though for more complex datasets an XML parser will likely be more helpful.

hullamospapagaj Over a year ago

@FoneyOP not works. Maybe because I have line break after , like in the title.

carlo · Accepted Answer · 2011-04-15 19:01:16Z

3

If you do not mind to convert your regular .txt files into .(x)html files and have HTML tidy and xmlstarlet available, you can do without regex!

tidy -v                   # HTML Tidy for Mac OS X released on 25 March 2009
xmlstarlet --version      # 1.0.6

curl -L -o utf8file 'http://d.pr/1d6T+'

# convert HTML to XHTML with tidy
tidy -h
tidy -i -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes utf8file > utf8file.xhtml

xmlstarlet el -a utf8file.xhtml
xmlstarlet el -v utf8file.xhtml
xmlstarlet edit --help

# edit file in-place
xmlstarlet edit -L -u "//*[local-name()='p']" -v 'EMPTY SPACE IS HERE' utf8file.xhtml 

# remove <p> ... </p> completely
xmlstarlet edit -L -d "//*[local-name()='p']" utf8file.xhtml  

xmlstarlet edit -L -u "//*[local-name()='td'][@width='20%' and @align='left' and @class='thead' and .='Rám:']" -v 'SOMETHING ELSE:' utf8file.xhtml

open -a Safari utf8file.xhtml

# convert XHTML to HTML with tidy
tidy -i -q -c -wrap 0 -numeric -ashtml -utf8 --merge-divs yes --merge-spans yes utf8file.xhtml > utf8file.html
open -a Safari utf8file.html

answered Apr 15, 2011 at 19:01

carlo

311 bronze badge

1 Comment

Albert Perrien Over a year ago

This is a good answer, personally, I'm going to see if I can write something workable in perl, and post one too.

carlo · Accepted Answer · 2011-04-17 13:46:29Z

0

To extract just the table from utf8file.xhtml after the in-place editing steps you may use the "print copy of XPATH expression" feature of xmlstarlet:

xmlstarlet sel --help

# test
xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml

xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml > utf8file

answered Apr 17, 2011 at 13:46

carlo

1

Comments

Rohaq · Accepted Answer · 2011-11-03 16:44:01Z

0

Old topic, but useful: For mass search and replaces, I tend to use a Perl peewee (name based on the arguments used) rather than relying on find and then executing perl code.

That is, I use the following:

perl -pi -w -e 's/<p>\nTERRANO.+?\n<\/p>/<p>\n\n<\/p>/g;' ./*.html

and

perl -pi -w -e 's/<td (.+?) class=\"thead\">Rám:<\/td>/<td $1 class="thead">Something else:<\/td>/g;' ./*.html

Hope that helps somebody!

answered Nov 3, 2011 at 16:44

Rohaq

2,0761 gold badge16 silver badges22 bronze badges

Collectives™ on Stack Overflow

Bash, perl regexp help

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related