Parse data multiple lines where pattern found

Question

I need to parse the below file where each row starts with date and any row can span multiple lines. Basically row delimiter should be date instead of newline

2021-01-01 INFO Workflow successful
2021-02-02 ERROR Workflow Failed due to below error:
    Data Type mismatch
    at Line number 30
2021-03-03 INFO Workflow successful

Code:

import json
import re
result = []
with open(r"C:\DUMMY\log\a1.txt", "r") as f:
    lines = f.readlines()
    for line in lines:
        data = line.split(' ')
        x = re.search('^\d{4}-\d{2}-\d{2}.*?', data[0])
        if x != None:
            result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})
        
data = json.dumps(result)
jsondata = json.loads(data)
print(jsondata)

Actual Output:

Since the 2nd row is spanning multiple lines, the data is not getting parsed. Need help to parse the entire output till next row starting with date is found

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error:\n'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'}]

Expected Output:

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error: Data Type mismatch at Line number 30'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful'}]

JSON is irrelevant to the problem, so please remove it from the question to avoid distractions. — wjandrea
– wjandrea, Commented Nov 11, 2021 at 18:44
I meant the code, mostly. print(json.loads(json.dumps(result))) is pointless; just do print(result). — wjandrea
– wjandrea, Commented Nov 11, 2021 at 18:52

joseville · Accepted Answer · 2021-11-11 19:15:04Z

1

You should add an else case to:

if x != None:
    result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})

to account for when a line does not start with a date. That is:

if x != None:
    # line contains a date
    result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:]).strip()})
else:
    result[-1]['Message'] += ' ' + line.strip()

Note, I've made the following assumption: each row is represented as a line that starts with a date optionally followed by additional lines that describe the row/error in more detail. If this assumption is broken, result[-1] may cause an IndexError or the output may be incorrect.

edited Nov 11, 2021 at 19:15

answered Nov 11, 2021 at 18:47

joseville

1,0237 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DAC Over a year ago

Yes this works great. Assumption is each row functionally starts with date. Thanks

Collectives™ on Stack Overflow

Parse data multiple lines where pattern found

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related