Refactor python: replace list of words in list of strings

Question

Still getting my head around python, I wonder if this function could be improved either in performance or readability?

def multi_replace_words(sentences, words, replace_str):
    """Replace all words in the sentences list with replace_str
    ex. multi_replace_words(['bad a list', 'og bad', 'in bady there bad2', 'another one', 'and bad. two'], ['bad','bad2']', 'EX')
    >> ['EX a list', 'og EX', 'in bady there EX','another one','and EX two']
    """
    docs = []
    for doc in sentences:
        for replace_me in words:
            if(replace_me in doc.encode('ascii', 'ignore')):
                doc = re.sub('((\A|[^A-Za-z0-9_])'+replace_me+'(\Z|[^A-Za-z0-9_]))', ' ' + replace_str+' ', doc)
        docs.append(doc)
    return docs

Thanks :)

I would start be renaming ds and cls to be slightly more descriptive parameter names. — Brenden Brown
– Brenden Brown, Commented Jan 18, 2013 at 22:50
you're right. i just changed the variable names to better indicate the function's purpose from ds, cls to sentences, words. they were just shortnames for dataset & classes (as in features in nlp) in my app. — scc
– scc, Commented Jan 18, 2013 at 22:58
no. it's one step in a preprocessing pipeline to identify swear words, including punctuation, and variations like bad-, |bad| and others. — scc
– scc, Commented Jan 19, 2013 at 13:56

Ashwini Chaudhary · Accepted Answer · 2013-01-18 23:08:05Z

1

Something like this:

In [86]: def func(lis,a,b):
    strs= "|".join("({0}{1}{2})".format(r'\b',x,r'\b[;",.]?') for x in a)
    for x in lis:
        yield re.sub(strs,b,x)
   ....:         

In [87]: lis
Out[87]: ['bad a list', 'og bad', 'in bady there bad2', 'another one', 'and bad. two']

In [88]: rep=['bad','bad2']

In [89]: st="EX"

In [90]: list(func(lis,rep,st))
Out[90]: ['EX a list', 'og EX', 'in bady there EX', 'another one', 'and EX two']

In [91]: rep=['in','two','a']

In [92]: list(func(lis,rep,st))
Out[92]: ['bad EX list', 'og bad', 'EX bady there bad2', 'another one', 'and bad. EX']

edited Jan 18, 2013 at 23:08

answered Jan 18, 2013 at 23:02

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

JonathanV · Accepted Answer · 2013-01-18 23:05:48Z

0

You could try to use replace(). It acts on a string and replaces all instances of a series of characters with another. An example from here shows how replace acts.

#!/usr/bin/python

str = "this is string example....wow!!! this is really string";
print str.replace("is", "was");
print str.replace("is", "was", 3);

edited Jan 18, 2013 at 23:05

answered Jan 18, 2013 at 23:04

JonathanV

2,5041 gold badge15 silver badges9 bronze badges

2 Comments

scc Over a year ago

that would also replace partial text inside words, like the is in this

JonathanV Over a year ago

If you wanted to make sure to only replace full words without creating mix ups like in the example above you could add spaces before and after the words still in the quotes. Like this " is ".

Collectives™ on Stack Overflow

Refactor python: replace list of words in list of strings

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related