how to parse multi-line in a clean way using python?

Question

As an example of the type of content I have to parse off of a ticket:

Name:
snakeoil
Host:
foobar

{block}
  email: some data here
  url: http://foo
  date: 01/02/16
{block}

I can identify the 'key', which is any word typically ending in a colon

I could use the regex module to do a match like ^\w$ to extract the key, but I must handle both the case where the value is in the same line vs in the subsequent line.

Having to fetch the word in the next line is what I can't think of how to address cleanly and/or effectively.

Qiang Jin · Accepted Answer · 2015-11-26 12:29:49Z

2

You can still use regex if it's well formed,

>>> re.findall('(.*?):\n(.*)$', content, re.MULTILINE)
[('Name', 'snakeoil'), ('Host', 'foobar')]

answered Nov 26, 2015 at 12:29

Qiang Jin

4,47721 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

DainDwarf Over a year ago

To get both the multilines and on-same-line, I guess the regexp should be more like r'(\w*?):\n?(.*)$' ?

CaseyJones Over a year ago

The regex from @DainDwarf does it. Thanks!

Tomasz Jakub Rup · Accepted Answer · 2015-11-26 12:39:52Z

1

If You need email, url and date too:

>>> re.findall('\s*(.*?):[\n\s]?(.*)$', s, re.MULTILINE)
[('Name', 'snakeoil'), ('Host', 'foobar'), ('email', 'some data here'), ('url', 'http://foo'), ('date', '01/02/16')]

if not, @QiangJin solution is good

answered Nov 26, 2015 at 12:39

Tomasz Jakub Rup

10.7k7 gold badges52 silver badges49 bronze badges

Collectives™ on Stack Overflow

how to parse multi-line in a clean way using python?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related