2

I have this sentence: transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?

As you see there are two pipe character in that sentence, I like to add space before and after pipe if it in the middle of word without space. eg: kota|tua to kota | tua

This is my code so far:

def puncNorm(text):
    pat = re.compile(r"\D([|:])\D")
    text = pat.sub(" \\1 ", text)
    return text

text = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"

text = puncNorm(text)

The result add space to every pipe character. So there are double space in tua | mau:

transportumum min kalo dari kota | tua  |  mau ke galeri nasional naik transjakarta jurusan apa ya?

My expected result is:

transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?

What is the best way to solve this?

2 Answers 2

4

The \D pattern matches any char other than a digit. You may use a word boundary here to make the symbols match only when inside a word:

r'\b([|:])\b'

See the regex demo

Note that you also may get rid of the (...) as you will need to replace the whole match. A backreference to the whole match is \g<0> in Python.

See a Python demo:

import re
rx = r'\b[|:]\b'
s = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"
print(re.sub(rx, ' \g<0> ', s))
# => transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?
Sign up to request clarification or add additional context in comments.

Comments

1

You can simply use quantifiers here like `\s*

* means 0 or more of the preceding expression

>>> text = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"
>>> re.sub(r'(\s*\|\s*)',' | ',text)
'transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?'

4 Comments

This r'(\s*\|\s*)(?is)' pattern will also find | in |||||| string. The (?is) modifiers make no sense here: there is no . nor letters in the pattern.
should I delete my answer ??
Well, it would be OK if you explained why you think it is helpful for OP. There is a requirement: in the middle of word without space. That is why I added a note.
This pattern will catch things like a| b or a |b, which is possibly a good thing; though it's not clear from the OP's limited problem description.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.