0

I'm new to Python and I want to remove and replace the ({ / / }) with space, the sample below

The original sentence:

NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })

Transform to this:

Regina Shueller works for Italy 's La Repubblica newspaper.

I've tried this code but that was not what I expected

Sentence = re.sub(r'[({ / / })]',' ', sentence)
5
  • The best I came up with is r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*' (to be replaced with space). But the space between the last word and the . cannot be removed like this. And the value must be trimmed from spaces. Commented Jan 21, 2016 at 16:24
  • Your transformed string does not match what you say you want Commented Jan 21, 2016 at 16:34
  • Try like this with Python regex module (pattern uses backreference (?1)). Or with re this pattern: \({[^}]*}\)|NULL|\s+(?!\w) and trim leading space. Commented Jan 21, 2016 at 17:08
  • Thank you so much @WiktorStribiżew for your answer, that regex works well. Commented Jan 22, 2016 at 9:21
  • Thanks @bobblebubble Commented Jan 22, 2016 at 9:21

4 Answers 4

1

The pattern you tried: r'[({ / / })]' means:

Match any single character that is one of (, {, , /, }, or )

The key to this is understanding the regular expression language. Each of those characters has a special meaning in that language.

A pattern such as r' \({ [^/]*/ / }\) ' would match each of the different sections in your example.

Sign up to request clarification or add additional context in comments.

1 Comment

That's right! I should have to learn the regular expression deeply. Thanks for your response!
0

You can use

r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'

See regex demo

Regex explanation:

  • \s* - zero or more whitespaces
  • (?:\(\{[^/]*/\s*/\s*}\)|NULL) - two alternatives, NULL or \(\{[^/]*/\s*/\s*}\) matching...
    • \( - opening round bracket
    • \{ - opening brace
    • [^/]* - zero or more characters other than /
    • / - a literal /
    • \s* - zero or more whitespaces
    • /\s* - ibid.
    • } - a closing brace
    • \) - a closing round bracket
  • \s* - zero or more whitespaces

Note that the spaces in between words and punctuation should be handled separately.

Python demo:

import re
p = r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'
test_str = "NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })"
result = re.sub(p, " ", test_str)
print(result.strip())
# => Regina Shueller works for Italy 's La Repubblica newspaper .

2 Comments

As a bonus :), try removing the space before non-opening punctuation and symbols with re.sub(r"\s+([~`!@#$%^&*)_+=}\]\\|;:.>,-])", r"\1", result.strip())
I can't see a -1 here, so.. +1
0

You can go with this:

r'(\([^(]*\))'

With live demo

2 Comments

I think this regex is rather unsafe for this task.
@WiktorStribiżew well... It fits the need, given the provided input. I've simplified it as far as I could, which might be bad if the input provided doesn't reflect the reality.
0

If the format is always the same you could try keeping alpha's after stripping punctuation:

from string import punctuation
print(" ".join([w for w in s.split() if w.strip(punctuation).isalpha()]))

Or using a regex:

print(re.sub(r'\({.*?}\)',"",s))

You are removing everything that has ({}) regardless of what is inside in your expected output.

2 Comments

The lazy dot matching regex may play a bad joke on you. Do not use lazy dot matching where you do not have to.
@WiktorStribiżew, I do need it, I meant to remove the / / from the pattern as it is not what the OP i looking to match based on their expected output. What is inside is irrelevant

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.