0

I have a text file (utf8):

http://d.pr/1d6T+

Please help me with regexp. I want to replace

<p>
TERRANO...
</p>

with: empty space. :)

And:

<td width="20%" align="left" class="thead">Rám:</td>

With:

<td width="20%" align="left" class="thead">Something else:</td>

Just word "Rám" is also OK to replace.

I found this line, but I dont know how to use it:

find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;
7
  • Is this a one-time thing? Just fire up your favorite text editor and do a "search and replace". You could use regex, but why bother? Commented Apr 14, 2011 at 19:56
  • no, I have 200 files like this. :) Commented Apr 14, 2011 at 19:57
  • I think that this would be best handled with XSLT, not perl. (As much as I personally love the swiss army chainsaw) Commented Apr 14, 2011 at 19:57
  • Albert, the text is in regular .txt files not in html/php. Commented Apr 14, 2011 at 19:58
  • @Albert Perrien - There are some XML parsers for Perl that are quite nice, and will probably simplify this process quite a bit. No need to jump ship. :) Commented Apr 14, 2011 at 20:10

4 Answers 4

3

assuming you want to replace text in HTML files:

cd /path/to/my/project
find . -iname '*.html' -exec perl -p -i -e "s/Rám:/Something else:/g" {} \;
find . -iname '*.html' -exec perl -p -i -e "s/TERRANO.../Something else:/g" {} \;
Sign up to request clarification or add additional context in comments.

5 Comments

"s/<p>[^<]+</replacement string here/g" this will replace EVERYTHING in EVERY <p></p> with "replacement string here". to be more selective you can add additional conditionals in place of [^<]. google regex tester for interactive help on building your regex.
@hullamospapagaj - Use an XML parser. Please?
@FoneyOp - Nope. "<p>stuff...<a href="http://somethi.ng/">link</a></p>" will probably not yield the result the OP desires.
Depends on what he needs to do. If it's only a single paragraph and there is unique text around the value to be replaced, the perl regex will be much faster to implement than a DOM or SAX parser. Though for more complex datasets an XML parser will likely be more helpful.
@FoneyOP not works. Maybe because I have line break after <p>, like in the title.
3

If you do not mind to convert your regular .txt files into .(x)html files and have HTML tidy and xmlstarlet available, you can do without regex!

tidy -v                   # HTML Tidy for Mac OS X released on 25 March 2009
xmlstarlet --version      # 1.0.6

curl -L -o utf8file 'http://d.pr/1d6T+'

# convert HTML to XHTML with tidy
tidy -h
tidy -i -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes utf8file > utf8file.xhtml

xmlstarlet el -a utf8file.xhtml
xmlstarlet el -v utf8file.xhtml
xmlstarlet edit --help

# edit file in-place
xmlstarlet edit -L -u "//*[local-name()='p']" -v 'EMPTY SPACE IS HERE' utf8file.xhtml 

# remove <p> ... </p> completely
xmlstarlet edit -L -d "//*[local-name()='p']" utf8file.xhtml  

xmlstarlet edit -L -u "//*[local-name()='td'][@width='20%' and @align='left' and @class='thead' and .='Rám:']" -v 'SOMETHING ELSE:' utf8file.xhtml

open -a Safari utf8file.xhtml

# convert XHTML to HTML with tidy
tidy -i -q -c -wrap 0 -numeric -ashtml -utf8 --merge-divs yes --merge-spans yes utf8file.xhtml > utf8file.html
open -a Safari utf8file.html

1 Comment

This is a good answer, personally, I'm going to see if I can write something workable in perl, and post one too.
0

To extract just the table from utf8file.xhtml after the in-place editing steps you may use the "print copy of XPATH expression" feature of xmlstarlet:

xmlstarlet sel --help

# test
xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml

xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml > utf8file

Comments

0

Old topic, but useful: For mass search and replaces, I tend to use a Perl peewee (name based on the arguments used) rather than relying on find and then executing perl code.

That is, I use the following:

perl -pi -w -e 's/<p>\nTERRANO.+?\n<\/p>/<p>\n\n<\/p>/g;' ./*.html

and

perl -pi -w -e 's/<td (.+?) class=\"thead\">Rám:<\/td>/<td $1 class="thead">Something else:<\/td>/g;' ./*.html

Hope that helps somebody!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.