The following string is a typical example of the format of JSON input strings that I need to convert to a pandas DataFrame. My attempted work flow is to:
- split String into List (see String below, note this represents an individual row)
- Convert each list to a dictionary
- Convert dictionary to a pd.DataFrame
- Merge DataFrames together
Input String: (Representing one row of Data)
"PN_#":9999,"Item":"Pear, Large","Vendor":["Farm"],"Class":["Food","Fruit"],"Sales Group":"59","Vendor ID (from Vendor)":[78]
Desired Output List:
{'PN_#':9999,
'Item':"Pear, Large",
'Vendor':"Farm",
'Class':"Food,Fruit",
'Sales Group':59,
'```
Vendor ID (from Vendor)':78}
Attempt:
I have been using re.split to attempt this. For most cases this is not an issue, however the items such as "Class":["Food","Fruit"] and "Item":"Pear, Large" are proving to be challenging to account for.
This regex solves the issues of the latter case, however it obviously does not work for the former:
re.split("(?=[\S]),(?=[\S])",data)
I have tried a multitude of expressions to completely satisfy my requirements. The following expression is generally representative of what I have attempted unsuccessfully:
regex.split("(?!\[.+?\s),(?=[\S])(?!.+?\])", data)
Any suggestion or solutions for how to accomplish this, or suggestion if I am going about this the wrong way?
"Item":"Pear, Large"unlike"Class":["Food","Fruit"]