0

I'm working on creating a word cloud program in Python and I'm getting stuck on a word replace function. I am trying to replace a set of numbers in an html file (so I'm working with a string) with words from an ordered list. So 000 would be replaced with the first word in the list, 001 with the second, etc.

So below I have it selecting the word to replace w properly but I can't get it to properly replace the it with the words from the string. Any help is appreciated. Thanks!

def replace_all():  
  text = '000 001 002 003 '
  word = ['foo', 'bar', 'that', 'these']
  for a in word:    
    y = -1
    for w in text:     
      y = y + 1
      x = "00"+str(y)
      w = {x:a}      
      for i, j in w.iteritems():
        text = text.replace(i, j)
  print text      
12
  • 1
    You're shadowing the list builtin by declaring a local variable of the same name. In general, it's a good idea not to do that, in case you find yourself needing the builtin later and you're getting strange errors. Commented Dec 9, 2012 at 2:16
  • Thanks that's a good point. I will fix that. Commented Dec 9, 2012 at 2:34
  • @NightMarcher Did you look at my solution ? Commented Dec 9, 2012 at 15:45
  • @eyquem I did and I think I should have been clearer when I made my original post....these methods work with simpler strings that I posted but when I tried to apply these to my HTML file it doesn't work as I expected. In the string listed below instead of changing all of the numbers 000, 001, 002 etc. with similar items from a list it only replaces them with the first item. This is an example of text from my HTML file: Commented Dec 9, 2012 at 16:51
  • @eyquem EXAMPLE STING FROM HTML FILE: >>>text = '<p><span class="newStyle0" style="left: 291px; top: 258px">000</span></p> <p><span class="newStyle1" style="left: 85px; top: 200px">001</span></p> <p><span class="newStyle2" style="left: 580px; top: 400px; width: 167px; height: 97px">002</span></p> <p><span class="newStyle3" style="left: 375px; top: 165px">003</span></p>' Commented Dec 9, 2012 at 16:54

2 Answers 2

4

This is actually a really simple list comprehension:

>>> text = '000 001 002 003 '
>>> words = ['foo', 'bar', 'that', 'these']
>>> [words[int(item)] for item in text.split()]
['foo', 'bar', 'that', 'these']

Edit: If you need other values to be left alone, this can be catered for:

def get(seq, item):
    try:
        return seq[int(item)]
    except ValueError:
        return item

Then simply use something like [get(words, item) for item in text.split()] - naturally, more testing might be required in get() if there will be other numbers in the string that could get accidentally replaced. (End of edit)

What we do is split the text into the individual numbers, then convert them to integers and use them to index the list you have given to find words.

As to why your code doesn't work, the main issue is you are looping over the string, which will give you characters, not words. However, it's not a great way of solving the task.

It's also worth a quick note that when you are looping over values and want indices to go with them, you should use the enumerate() builtin rather than using a counting variable.

E.g: Instead of:

y = -1
for w in text:
    y = y + 1
    ...

Use:

for y, w in enumerate(text):
    ...

This is much more readable and Pythonic.

Another thing with your existing code is this:

w = {x:a}      
for i, j in w.iteritems():
    text = text.replace(i, j)

Which, if you think about it, simplifies down to:

text = text.replace(x, a)

You are setting w to be a dictionary of one item, then looping over it, but you know it will only ever contain one item.

A solution that follows your method more closely would be something like this:

words_dict = {"{0:03d}".format(index): value for index, value in enumerate(words)}
for key, value in words_dict.items():
    text = test.replace(key, value)

We create a dictionary from the zero padded number string (using str.format()) to the value, then replace for each item. Note as you are using 2.x, you'll want dict.iteritems(), and if you are pre-2.7, use the dict() builtin on a generator of tuples as dict comprehensions don't exist.

Sign up to request clarification or add additional context in comments.

6 Comments

You're shadowing the list builtin by declaring a local variable of the same name.
Wow that is much easier than what I was doing. I guess I should have posted my string like this because there will be more text in the html file. >>>text = '000 this is 001 some 002 text 003 '
@NightMarcher See my example at the end for a replace solution, although I will update the first solution to work with other text too.
@Lattyware: instead of your get function, you could use words_dict.get(x,x) in the listcomp. replace is also dangerous because of the possibility of accidental replacements, but that's trickier to fix (and possibly outside the scope of the OP's assign-- er, problem.)
This method worked perfectly when I was using it to go through a string that I hard coded into the function. When I implemented this on an html file it replaces all of the numbers with the first item from the list. Any ideas why it would do that?
|
0

When working on texts , it is evident that one must think to regexes.

import re

text = text = ('<p><span class="newStyle0" '
               'style="left: 291px; '
               'top: 258px">000</span></p> <p>'
               '<span class="newStyle1" '
               'style="left: 85px; '
               'top: 200px">001</span></p> <p>'
               '<span class="newStyle2" '
               'style="left: 580px; '
               'top: 400px; width: 167px; '
               'height: 97px">002</span></p> <p>'
               '<span class="newStyle3" '
               'style="left: 375px; top: 165px">'
               '003</span></p>')

words = ['XXX-%04d-YYY' % a for a in xrange(1000)]

regx = re.compile('(?<=>)\d+(?=</span>)')

def gv(m,words = words):
    return words[int(m.group())]

print regx.sub(gv,text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.