python regex split string while keeping delimiter with value

Question

I'm trying to parse a text file with name:value elements in it into lists with "name:value"... Here's a twist: The values will sometimes be multiple words or even multiple lines and the delimiters are not a fixed set of words. Here's an example of what I'm trying to work with...

listing="price:44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!

What I want to return is...

["price:44.55", "name:John Doe", "title:Super Widget", "description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Here's what I've tried so far...

details = re.findall(r'[\w]+:.*', post, re.DOTALL)
["price:", "44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Not what I want. Or...

details = re.findall(r'[\w]+:.*?', post, re.DOTALL)
["price:", "name:", "title:", "description:"]

Not what I want. Or...

details = re.split(r'([\w]+:)', post)
["", "price:", "44.55", "name:", "John Doe", "title:", "Super Widget", "description:", "This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

which is closer, but still no dice. Also, I can deal with an empty list item. So, basically, my question is how do you keep the delimiter with the values on a re.split() or how do you keep re.findall() from either being too greedy or too stingy?

Thanks ahead of time for reading!

Pavel Anossov · Accepted Answer · 2013-02-05 19:21:12Z

7

Use a look-ahead assertion:

>>> re.split(r'\s(?=\w+:)', post)
['price:44.55',
 'name:John Doe',
 'title:Super Widget',
 'description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!']

Of course, it would still fail if there are some words followed immediately by a colon in your values.

answered Feb 5, 2013 at 19:21

Pavel Anossov

63.3k16 gold badges156 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Captain Cornfield Keyboard Over a year ago

WORKED! THANKS! I've never understood the look-ahead or look-behind stuff very well. I really appreciate the help!

Danica · Accepted Answer · 2013-02-05 19:22:37Z

2

@Pavel's answer is nicer, but you could also just merge together the results of your last attempt:

# kill the first empty bit
if not details[0]:
    details.pop(0)

return [a + b for a, b in zip(details[::2], details[1::2])]

answered Feb 5, 2013 at 19:22

Danica

29k6 gold badges94 silver badges128 bronze badges

1 Comment

Captain Cornfield Keyboard Over a year ago

I had thought about doing it this way, but it just seemed too cumbersome, but I appreciate the answer!

Collectives™ on Stack Overflow

python regex split string while keeping delimiter with value

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related