1

Still getting my head around python, I wonder if this function could be improved either in performance or readability?

def multi_replace_words(sentences, words, replace_str):
    """Replace all words in the sentences list with replace_str
    ex. multi_replace_words(['bad a list', 'og bad', 'in bady there bad2', 'another one', 'and bad. two'], ['bad','bad2']', 'EX')
    >> ['EX a list', 'og EX', 'in bady there EX','another one','and EX two']
    """
    docs = []
    for doc in sentences:
        for replace_me in words:
            if(replace_me in doc.encode('ascii', 'ignore')):
                doc = re.sub('((\A|[^A-Za-z0-9_])'+replace_me+'(\Z|[^A-Za-z0-9_]))', ' ' + replace_str+' ', doc)
        docs.append(doc)
    return docs

Thanks :)

4
  • 2
    I would start be renaming ds and cls to be slightly more descriptive parameter names. Commented Jan 18, 2013 at 22:50
  • you're right. i just changed the variable names to better indicate the function's purpose from ds, cls to sentences, words. they were just shortnames for dataset & classes (as in features in nlp) in my app. Commented Jan 18, 2013 at 22:58
  • don't you keep the punctuation? Commented Jan 18, 2013 at 23:02
  • no. it's one step in a preprocessing pipeline to identify swear words, including punctuation, and variations like bad-, |bad| and others. Commented Jan 19, 2013 at 13:56

2 Answers 2

1

Something like this:

In [86]: def func(lis,a,b):
    strs= "|".join("({0}{1}{2})".format(r'\b',x,r'\b[;",.]?') for x in a)
    for x in lis:
        yield re.sub(strs,b,x)
   ....:         

In [87]: lis
Out[87]: ['bad a list', 'og bad', 'in bady there bad2', 'another one', 'and bad. two']

In [88]: rep=['bad','bad2']

In [89]: st="EX"

In [90]: list(func(lis,rep,st))
Out[90]: ['EX a list', 'og EX', 'in bady there EX', 'another one', 'and EX two']

In [91]: rep=['in','two','a']

In [92]: list(func(lis,rep,st))
Out[92]: ['bad EX list', 'og bad', 'EX bady there bad2', 'another one', 'and bad. EX']
Sign up to request clarification or add additional context in comments.

Comments

0

You could try to use replace(). It acts on a string and replaces all instances of a series of characters with another. An example from here shows how replace acts.

#!/usr/bin/python

str = "this is string example....wow!!! this is really string";
print str.replace("is", "was");
print str.replace("is", "was", 3);

2 Comments

that would also replace partial text inside words, like the is in this
If you wanted to make sure to only replace full words without creating mix ups like in the example above you could add spaces before and after the words still in the quotes. Like this " is ".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.