I have a CSV file which contains arbitrary JSON objects. Here's a simplified version of the file:
v1,2020-06-09T22:44:46.377Z,cb6deb64-d6a0-4151-ba9b-bfa54ae75180,{"payload":{"assetId":"a3c2a944-d554-44bb-90a4-b7beafbc6bff","permissionsToParty":[{"partyType":1,"partyId":"74457bd4-c2ab-4760-942b-d6c623a97f19","permissions":["CREATE","DELETE","DOWNLOAD","EDIT","VIEW"]}]}},lastcolumn
v2,2020-06-09T22:44:47.377Z,50769c0d-0a05-4028-9f0b-40ab570af31a,{"scheduleIds":[]},lastcolumn
v3,2020-06-09T22:44:48.377Z,12345678-0a05-4028-9f0b-40ab570af31a,{"jobId":"4dfeb16d-f9d6-4480-9b84-60c5af0bd3ce","result":"success","status":"completed"},lastcolumn
The commas (if any) inside the JSON wreak havok with CSV parsing.
I'm looking for a way to either...
...capture and replace all the commas outside the JSON objects with pipes (|) so I can simply key on those:
v1|2020-06-09T22:44:46.377Z|cb6deb64-d6a0-4151-ba9b-bfa54ae75180|{"payload":{"assetId":"a3c2a944-d554-44bb-90a4-b7beafbc6bff"**,**"permissionsToParty":[{"partyType":1,"partyId":"74457bd4-c2ab-4760-942b-d6c623a97f19","permissions":["CREATE","DELETE","DOWNLOAD","EDIT","VIEW"]}]}}|lastcolumn
v2|2020-06-09T22:44:47.377Z|50769c0d-0a05-4028-9f0b-40ab570af31a|{"scheduleIds":[]}|lastcolumn
v3|2020-06-09T22:44:48.377Z|12345678-0a05-4028-9f0b-40ab570af31a|{"jobId":"4dfeb16d-f9d6-4480-9b84-60c5af0bd3ce","result":"success","status":"completed"}|lastcolumn
...or wrap each JSON object with single quotes:
v1,2020-06-09T22:44:46.377Z,cb6deb64-d6a0-4151-ba9b-bfa54ae75180,'{"payload":{"assetId":"a3c2a944-d554-44bb-90a4-b7beafbc6bff","permissionsToParty":[{"partyType":1,"partyId":"74457bd4-c2ab-4760-942b-d6c623a97f19","permissions":["CREATE","DELETE","DOWNLOAD","EDIT","VIEW"]}]}}',lastcolumn
v2,2020-06-09T22:44:47.377Z,50769c0d-0a05-4028-9f0b-40ab570af31a,'{"scheduleIds":[]}',lastcolumn
v3,2020-06-09T22:44:48.377Z,12345678-0a05-4028-9f0b-40ab570af31a,'{"jobId":"4dfeb16d-f9d6-4480-9b84-60c5af0bd3ce","result":"success","status":"completed"}',lastcolumn
Alas, my regex kung-fu is too weak to create something flexible enough based on the arbitrary nature of the JSON objects that may show up.
The closest I've gotten is:
(?!\B{[^}]*),(?![^{]*}\B)
Which still captures commas (the comma directly before "permissionsToParty", below) in an object like this:
{"payload":{"assetId":"710728f9-7c13-4bcb-8b5d-ef347afe0b58","permissionsToParty":[{"partyType":0,"partyId":"32435a92-c7b3-4fc0-b722-2e88e9e839e5","permissions":["CREATE","DOWNLOAD","VIEW"]}]}}
Can anyone simplify what I've done thus far and help me with an expression that ignores ALL commas within the outermost {} symbols of the JSON?
