0

I need to parse the below file where each row starts with date and any row can span multiple lines. Basically row delimiter should be date instead of newline

2021-01-01 INFO Workflow successful
2021-02-02 ERROR Workflow Failed due to below error:
    Data Type mismatch
    at Line number 30
2021-03-03 INFO Workflow successful 

Code:

import json
import re
result = []
with open(r"C:\DUMMY\log\a1.txt", "r") as f:
    lines = f.readlines()
    for line in lines:
        data = line.split(' ')
        x = re.search('^\d{4}-\d{2}-\d{2}.*?', data[0])
        if x != None:
            result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})
        
data = json.dumps(result)
jsondata = json.loads(data)
print(jsondata)

Actual Output:

Since the 2nd row is spanning multiple lines, the data is not getting parsed. Need help to parse the entire output till next row starting with date is found

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error:\n'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'}]

Expected Output:

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error: Data Type mismatch at Line number 30'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful'}]
3
  • 1
    JSON is irrelevant to the problem, so please remove it from the question to avoid distractions. Commented Nov 11, 2021 at 18:44
  • 1
    I meant the code, mostly. print(json.loads(json.dumps(result))) is pointless; just do print(result). Commented Nov 11, 2021 at 18:52
  • Thanks for your input. Will keep it in mind Commented Nov 11, 2021 at 19:01

1 Answer 1

1

You should add an else case to:

if x != None:
    result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})

to account for when a line does not start with a date. That is:

if x != None:
    # line contains a date
    result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:]).strip()})
else:
    result[-1]['Message'] += ' ' + line.strip()

Note, I've made the following assumption: each row is represented as a line that starts with a date optionally followed by additional lines that describe the row/error in more detail. If this assumption is broken, result[-1] may cause an IndexError or the output may be incorrect.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes this works great. Assumption is each row functionally starts with date. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.