Python string split using regex

Question

I tried to manage this myself but I coudn't...

I have text:

{Łatwe|Proste} szukanie mieszkania {Sprawdź|Wypróbuj juz dziś}, znalezienie {wcale|w ogóle}

I want to get single words from sentence or whole expressions in {} to the list. So in list it will look like this:

{Łatwe|Proste}
szukanie
mieszkania
{Sprawdź|Wypróbuj juz dziś}
znalezienie ...

I use split() method but it produces for example:

{Sprawdź|Wypróbuj
juz
dziś}

But it should be one word. I don't want to break expressions in {}.

Any help?:)

And what do you split with?

fge
– fge

2013-01-12 10:45:54 +00:00
Commented Jan 12, 2013 at 10:45 — fge
– fge, Commented Jan 12, 2013 at 10:45

nhahtdh · Accepted Answer · 2013-01-12 13:36:41Z

4

Python 2.x solution:

>>> re.findall(r'{[^}]*}|\b\w+\b', u'{Łatwe|Proste} szukanie mieszkania {Sprawdź|Wypróbuj juz dziś}, znalezienie {wcale|w ogóle}', re.U)
[u'{\u0141atwe|Proste}', u'szukanie', u'mieszkania', u'{Sprawd\u017a|Wypr\xf3buj juz dzi\u015b}', u'znalezienie', u'{wcale|w og\xf3le}']

re.U flag is necessary, since by default, \b, \w, and a few others (\d, \s and the negated counterparts) only matches ASCII characters.

Python 3.x solution:

re.findall(r'{[^}]*}|\b\w+\b', '{Łatwe|Proste} szukanie mieszkania {Sprawdź|Wypróbuj juz dziś}, znalezienie {wcale|w ogóle}')

In Python 3.x, \b, \w, \d, \s and their counterparts will perform matching on Unicode characters by default. re.U flag still exists for backward compatibility, but it is redundant to specify.

edited Jan 12, 2013 at 13:36

nhahtdh

56.9k15 gold badges131 silver badges164 bronze badges

answered Jan 12, 2013 at 10:47

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

nhahtdh Over a year ago

Note that it will fail when the text outside {} contains diacritics. You need to indicate re.U flag.

Collectives™ on Stack Overflow

Python string split using regex

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related