I am attempting to extract valid Python-parsable objects, such as dictionaries and lists, from strings. For example, from the string "[{'a' : 1, 'b' : 2}]", the script will extract [{'a' : 1, 'b' : 2}] since the {} and [] denote completed Python objects.
However, when the string output is incomplete, such as "[{'a' : 1, 'b' : 2}, {'a' : 1'}]", I only attempt to extract {'a' : 1, 'b' : 2} and place it into a list [{'a' : 1, 'b' : 2}], as the second Python object is not yet complete and therefore must be left out.
I tried to write a regex pattern to match completed {} or [], it works for simple output but failing on nested list or dict.
Code:
import re
def match_dict_list(string):
pattern = r"\[?\{[^\}\]]*\}\]?|\[?\[[^\]\[]*\]\]?"
matches = re.findall(pattern, string)
return matches
But it's failing on """[[1, 2, 3], [11, 12, 21]""" because it's matching [[1, 2, 3], [11, 12, 21] while the expected output is only [1, 2, 3], [11, 12, 21] and put it in list [ [[1, 2, 3], [11, 12, 21] ]
Some test cases
Case 1:
"[{'a' : 1, 'b' : 2}, {'a' : 1'"Expected output:
[{'a': 1, 'b': 2}]Case 2:
'[[1, 2, 3], [11, 12, 21]'Expected output:
[[1, 2, 3], [11, 12, 21]]Case 3:
"""[{'a': [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}], 'b': [{'a':"""Expected output:
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
I am getting the output from APIs but can't do anything from their side; sometimes, the server output is complete, and sometimes, it's incomplete.
I also tried the updated pattern : \[?\{[^\}\]]*\}\]?|\[[^\]\[]*\]|\[\[[^\]\[]*\]\] but it's failing on third case. what is the best option to solve this kind of issue?
I can't use ast.literal_eval because as I mentioned above the string output is incomplete such as " [ { 'a' : 1 } , {'b' : ".
I am getting the output from APIs but can't do anything from their side; sometimes, the server output is complete, and sometimes, it's incomplete., APIs should serialize data using a format like JSON, not python reprs (incomplete, to boot). If you are able to provide feedback to the owner of the API, you should make them fix their output."[{'a' : 1, 'b' : 2}, {'a' : 1'}]", I only attempt to extract{'a' : 1, 'b' : 2}[...], as the second Python object is not yet complete and therefore must be left out. - how that could become complete, ever, with a pair-less quotation mark inside, and the closing curly brace and bracket already in place? It's not incomplete, but broken.