Regular expressions python

Question

Sorry if the title is a little vague can't think of a better one right now.

I'm struggling to find the correct regular expression for a little test of mine:

Input and Output:

"Hello" --------------> ("Hello", "")
"How are you doing?" -> ("How", "are you doing?")
"" -------------------> ("", "")
"!h0w are you?" ------> ("!h0w", "are you?")
"#" ------------------> ("#", "")
":::::::" ------------> (":::::::", "")

The Closest regular expression so far is "(\.?)(.*?)((\s+?)(.*?)$|$)" but it gives a lot of unwanted data, like

regex = lambda text: re.search("(\.?)(.*?)((\s+?)(.*?)$|$)", text).groups()

# Input and Output
regex("Hello") --------------> ('', 'Hello', '', None, None)
regex("How are you doing?") -> ('', 'How', ' are you doing?', ' ', 'are you doing?')
regex("") -------------------> ('', '', '', None, None)
regex("!h0w are you?") ------> ('', '!h0w', ' are you?', ' ', 'are you?')
regex("#") ------------------> ('', '#', '', None, None)
regex(":::::::") ------------> ('', ':::::::', '', None, None)

None what I would prefer is:

x, y = re.search(pattern, string).groups()

If that is not possible, can someone improve upon the existing regular expression? I've been trying to improve it for a bit but I can't seem to make it any better.

Cannot use str.split for this, trying to figure out how to do things with regular expressions.

Ply has a lexer function that may help you

Joran Beasley
– Joran Beasley

2014-03-16 02:13:48 +00:00
Commented Mar 16, 2014 at 2:13 — Joran Beasley
– Joran Beasley, Commented Mar 16, 2014 at 2:13

Pi Marillion · Accepted Answer · 2014-03-16 05:09:47Z

It looks like you're just splitting into the parts before and after an optional space:

import re
regex = lambda text: re.match(r'(\S*)(?:\s*)(.*)', text).groups()
x, y = regex('this that')

Which gives these results:

regex("Hello")
('Hello', '')
regex("How are you doing?")
('How', 'are you doing?')
regex("")
('', '')
regex("!h0w are you?")
('!h0w' ,'are you?')
regex("#")
('#', '')
regex(":::::::")
(':::::::', '')

Basically:

r'string here' is a literal string where you can use \ without double-escaping it.
(\S*) matches every non-white-space character until the first white-space. If there's no characters before the first white-space, it returns "" (rather than None).
(?:\s*) matches the first stretch of white-space, but the ?: at the beginning makes it a non-matching group, so it isn't part of the output from groups().
(.*) at the end catches any remaining characters after the first white-space. If there are no characters after the white-space, or there was no white-space, then it returns "" (rather than None).

user2357112 · Accepted Answer · 2014-03-16 02:12:28Z

1

The regex way to do this is still basically str.split, but with a regex split:

parts = re.split(r'\s+', text, maxsplit=1)
part1 = parts[0]
part2 = '' if len(parts) == 1 else parts[1]

\s+ matches any run of whitespace. maxsplit=1 says to only split on the first occurrence of the pattern. Note that this may not handle leading or trailing whitespace the way you want.

answered Mar 16, 2014 at 2:12

user2357112

286k32 gold badges490 silver badges570 bronze badges

Collectives™ on Stack Overflow

Regular expressions python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related