I need to parse the below file where each row starts with date and any row can span multiple lines. Basically row delimiter should be date instead of newline
2021-01-01 INFO Workflow successful
2021-02-02 ERROR Workflow Failed due to below error:
Data Type mismatch
at Line number 30
2021-03-03 INFO Workflow successful
Code:
import json
import re
result = []
with open(r"C:\DUMMY\log\a1.txt", "r") as f:
lines = f.readlines()
for line in lines:
data = line.split(' ')
x = re.search('^\d{4}-\d{2}-\d{2}.*?', data[0])
if x != None:
result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})
data = json.dumps(result)
jsondata = json.loads(data)
print(jsondata)
Actual Output:
Since the 2nd row is spanning multiple lines, the data is not getting parsed. Need help to parse the entire output till next row starting with date is found
[{'Date': '2021-01-01',
'Severity': 'INFO',
'Message': 'Workflow successful\n'},
{'Date': '2021-02-02',
'Severity': 'ERROR',
'Message': 'Workflow Failed due to below error:\n'},
{'Date': '2021-03-03',
'Severity': 'INFO',
'Message': 'Workflow successful\n'}]
Expected Output:
[{'Date': '2021-01-01',
'Severity': 'INFO',
'Message': 'Workflow successful'},
{'Date': '2021-02-02',
'Severity': 'ERROR',
'Message': 'Workflow Failed due to below error: Data Type mismatch at Line number 30'},
{'Date': '2021-03-03',
'Severity': 'INFO',
'Message': 'Workflow successful'}]
print(json.loads(json.dumps(result)))is pointless; just doprint(result).