1

I've input data in the following format, not decided by me

key1: value1 key2: value2 key3: value3 key4 { key11: val11 key22: value22 } key5: value5 ............

The input string will have key values separated by colon or a brace bracket.

I want to tokenize it and I have the following idea: First to have a regular expression parsing data till I find a : or { with priority to { over :

Then split and read till the white space pattern that I said is reached and recursively traverse the whole string

I want to know if I can write a regex like (some_string)(special character pattern) (special character pattern can be : or { with precedence to {)(rest of the string)

If it is a : then for rest of the string, get the string part from ' value1 ' and capture it. Work on the remaining string

If it is a { then traverse till you find } and internally work with : logic defined above.

For eg

a: 1 b: 2 c { d: 3 e: 4 } f: 5

This should give

a:1
b:2
c { d: 3 e: 4 }
f: 5
2
  • Is regex a requirement, or would you be OK with a function? Commented May 31, 2013 at 22:10
  • @SethMMorton FUnction is also ok Commented May 31, 2013 at 22:12

1 Answer 1

4

You can use this pattern:

[^ ]+(?:: [^ ]+| \{[^}]+\})

example:

import re
test = "a: 1 b: 2 c { d: 3 e: 4 } f: 5"
pattern = re.compile(r"[^ ]+(?:: [^ ]+| \{[^}]+\})")
for match in pattern.findall(test):
    print match
Sign up to request clarification or add additional context in comments.

3 Comments

This only gave the first key and not the remaining keys,
@gizgok: test the example, you must use findall
I downvoted initially because there was no explanation, just the pattern. With the addition of the example, I have reversed my downvote, and in fact upvoted the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.