6

I want to delete certain words from a paragraph, such as "and", "as", and "like". Is there an easier way to delete words from a string than doing it via replace --

new_str = str.replace(' and ', '').replace(' as ', '').replace(' like ', '')

For example, is there a method similar to the following?

str.remove([' and ', ' like ', ' as '])

4 Answers 4

10

Yes, you could use the sub function from the re module:

>>> import re
>>> s = 'I like this as much as that'
>>> re.sub('and|as|like', '', s)
'I  this  much  that'
Sign up to request clarification or add additional context in comments.

2 Comments

... but if you care even the slightest bit about performance, you won't do it with a regular expression if it's this simple a rule. (Not that you need to worry about performance in general - but this is an obvious sort of case where the statements about premature optimisation don't apply; str.replace is known to be oodles faster than re.sub.)
@ChrisMorgan: very good observation! I thought about that too, but the OP said something else than replace, so I was forced to look for another (worse performance) solution
1

You could use regular expressions:

    >>> import re
    >>> test = "I like many words but replace some occasionally"
    >>> to_substitute = "many|words|occasionally"
    >>> re.sub(to_substitute, '', test)
    'I like   but replace some '

Comments

1

You may also do without regex. See the following example

def StringRemove(st,lst):
    return ' '.join(x for x in st.split(' ') if x not in lst)

>>> StringRemove("Python string Java is immutable, unlike C or C++ that would give you a performance benefit. So you can't change them in-place",['like', 'as', 'and'])
"Python string Java is immutable, unlike C or C++ that would give you a performance benefit. So you can't change them in-place"

>>> st="Python string Java is immutable,     unlike C or C++ that would  give you a performance benefit. So you can't change them in-place"
>>> StringRemove(st,['like', 'as', 'and'])==st
True
>>> 

7 Comments

Note that this will destroy multiple spaces in a row and will turn \r, \n and \t into space as well. If you care about spaces, use st.split(' ') instead of st.split(). Also, the square brackets around the join() body aren't neat. I'd scrap them and make it a generator expression (which for larger inputs will use less memory, also) instead of a list comprehension.
Thanks for pointing out. I tweaked it a little so now it will work with multiple spaces and other separators. I have also changed the List to generator.
Your change has made it so that tabs and newlines no longer act as word separators, so words won't be eliminated if, e.g., they occur after a tab.
I think I will delete my answer as I can't find any other feasible way to do without regex as you can't specify multiple separators in split. Unless you have anything in mind
I wouldn't delete this answer! For some problems, there might not be tabs or newlines, so this is perfectly fine.
|
1

Note that if all you care about is readability and not necessarily performance, you could do something like this:

new_str = str
for word_to_remove in [' and ', ' as ', ' like ']:
    new_str = new_str.replace(word_to_remove, '')

1 Comment

Why, do you you it would impact performance badly ? It seems to me like it would be close to equivalent to the substitute method and way more efficient than the regular expressions method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.