0

I need to parse the line similar to the:

'''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''

The line is much longer, but the pattern is the same.

Basically, I need a list (or dict) with key, value. Something like:

["'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name2''", "'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name3''", "'value', 'value_without_quotes'"]

I ended up with the following regular expression:

r'Object{(+?)=(+?)}'

It works only if I need the only one object. I'm expecting something like

((+?)=(+?),)+ 

to be worked, but it's not. For example,

re.match(r'Object{((.+?)=(.+?),?)+}', line3).groups()

Gives me:

("some_name3', value=value_without_quotes", "some_name3', value", 'value_without_quotes')

As you can see 'value=value_without_quotes' appeared. r'Object{(([^=]+?)=(.+?),?)+}' doesn't work also.

So the question is how to repeat the above in sequence? The thing is that I don't if the value contains quotes, symbols or digits.

Thank you

1
  • Did you try using findall instead of match? You don't need the 'Object{ at the beginning... Commented Nov 14, 2022 at 23:18

2 Answers 2

3

You may face this problem in an easier way.

sentence = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''
listing = [couple.split("=") for couple in sentence.split(",")]

Flat the list

listing = [y for x in listing for y in x]

And you will obtain something like:

['Object{identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name2'", ' identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name3'", ' value', 'value_without_quotes}']

The you have just to strip() and remove "Object{" and "}"

result = [x.strip().replace("Object{", "").replace("}","") for x in listing]

Final result is:

['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for providing an easier solution. It works. I'm considering your idea. But still, I'd like to understand how to do so with regex.
@DanielSmialkowsky I posted a solution using regex, but I think this one is better for you case (at least is more pythonic). Note that in both solutions you can skip flattening the list and keep some nested structure with the key:value pairs; but you can't use a dictionary since some keys would be repeated (e.g. 'identifier').
1
line3 = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''

pattern = r'[{\s](.+?)=(.+?)[}\s,]'
match = re.findall(pattern, line3)
[item for key_value_pair in match for item in key_value_pair]

Outputs

['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.