4

I am trying to replace certain parts of the string below.

'''<td align="center"> 5 </td> <td> align="center"> 0.0001 </td>'''

I need to remove the <td> tag if there is a '0.'(decmial occurrence). i.e. the output should be

'''<td align="center"> 5 </td>'''

I have tried this

data = ' '.join(data.split())<br>
l = data.replace('<td align="center"> 0.r"\d" </td>', "")

but didn't succeed. Could anyone please help me with doing this.

Thanks in advance

2
  • 1
    Why do some users not accept answers ? Actually, why are there ever questions asked which are then not accepted? Surely there can't be tht many people who, after asking a question, completely lost access to the Internet forever? Commented Feb 28, 2012 at 10:14
  • Obligatory reading Commented Feb 28, 2012 at 12:31

3 Answers 3

11

While both of the regular expression examples work, I would advice against using regexp.

Especially if the data is a full html document, you should go for html-aware parser, such as lxml.html e.g.:

from lxml import html
t = html.fromstring(text)
tds = t.xpath("table/tbody/tr[2]/td")
for td in tds:
    if tds.text.startswith("0."):
        td.getparent().remove(td)
text = html.tostring(t)
Sign up to request clarification or add additional context in comments.

Comments

2

I would do it with regular expression:

import re
s = "<td align='center'> 5 </td><td align='center'>0.00001</td>"
re.sub("<td align='center'>0.\d+</td>", "", s)

Comments

2

You could use a regular expression to check for the <td> and if it matches, you can use re.sub() to replace it with what ever you want.

pattern = '\"<td align=\"center\"> 0.[0-9]+ </td>\"'
p = re.compile(pattern)
p.sub('', my_string)

where my_string contains the string you want to operate on, hope this helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.