3

I'm trying to parse a text file with name:value elements in it into lists with "name:value"... Here's a twist: The values will sometimes be multiple words or even multiple lines and the delimiters are not a fixed set of words. Here's an example of what I'm trying to work with...

listing="price:44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!

What I want to return is...

["price:44.55", "name:John Doe", "title:Super Widget", "description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Here's what I've tried so far...

details = re.findall(r'[\w]+:.*', post, re.DOTALL)
["price:", "44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Not what I want. Or...

details = re.findall(r'[\w]+:.*?', post, re.DOTALL)
["price:", "name:", "title:", "description:"]

Not what I want. Or...

details = re.split(r'([\w]+:)', post)
["", "price:", "44.55", "name:", "John Doe", "title:", "Super Widget", "description:", "This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

which is closer, but still no dice. Also, I can deal with an empty list item. So, basically, my question is how do you keep the delimiter with the values on a re.split() or how do you keep re.findall() from either being too greedy or too stingy?

Thanks ahead of time for reading!

2 Answers 2

7

Use a look-ahead assertion:

>>> re.split(r'\s(?=\w+:)', post)
['price:44.55',
 'name:John Doe',
 'title:Super Widget',
 'description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!']

Of course, it would still fail if there are some words followed immediately by a colon in your values.

Sign up to request clarification or add additional context in comments.

1 Comment

WORKED! THANKS! I've never understood the look-ahead or look-behind stuff very well. I really appreciate the help!
2

@Pavel's answer is nicer, but you could also just merge together the results of your last attempt:

# kill the first empty bit
if not details[0]:
    details.pop(0)

return [a + b for a, b in zip(details[::2], details[1::2])]

1 Comment

I had thought about doing it this way, but it just seemed too cumbersome, but I appreciate the answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.