0

I am trying to parse a file that has multiple key, value lines as show below

"key1" = "value1";
"key2" = "value2";
"key3" = "value3_line1
value3_line2
value3_line3";
"key4" = "value4";

I am using below code to parse this file

def parseFile(f):
    regex = re.compile(r'^"(.*)"\s+=\s+"(.*)";',re.MULTILINE)
    with open(f) as string_file:
        alllines = string_file.read()
        matches = [m.groups() for m in regex.finditer(alllines)]
        for m in matches:
            print(m[0], '=>', m[1])

This code matches for lines with key1, key2 and key4 but doesn't match key3. How do i fix this to get all key values pairs including those that has multiline values?

1
  • regex = re.compile(r'^"(.*)"\s+=\s+"(.*)"?;?',re.MULTILINE) ? Commented Oct 3, 2018 at 10:30

2 Answers 2

1

You can use the re.DOTALL flag, which allows . to match newline characters. You should also use non-greedy quantifier *? to match the nearest pairs of double quotes:

Change:

regex = re.compile(r'^"(.*)"\s+=\s+"(.*)";',re.MULTILINE)

to:

regex = re.compile(r'^"(.*?)"\s+=\s+"(.*?)";',re.MULTILINE | re.DOTALL)

Alternatively, you can use a character class that excludes ":

regex = re.compile(r'^"([^"]*)"\s+=\s+"([^"]*)";',re.MULTILINE)
Sign up to request clarification or add additional context in comments.

2 Comments

That did a greedy match.. it took all lines till "key4" as the key and "value4" as a the value. How do make it match the 1st ';' instead of last?
My bad. I didn't see you had added '?' in the regex. That did the trick. Thanks for the quick respond and perfect answer.
0

It is not matching "key3" because the line is missing a quote and a semi-colon.

Try pattern re.compile(r'^"(.*)"\s+=\s+"(.*)"?;?',re.MULTILINE) or re.compile(r'^"(.*)"\s+=\s+"(.*)$',re.MULTILINE)

Ex:

import re

s = '''"key1" = "value1";
"key2" = "value2";
"key3" = "value3_line1
value3_line2
value3_line3";
"key4" = "value4";'''

regex = re.compile(r'^"(.*)"\s+=\s+"(.*)"?;?',re.MULTILINE) 
matches = [m.groups() for m in regex.finditer(s)]
for m in matches:
    print(m[0], '=>', m[1])

Output:

('key1', '=>', 'value1";')
('key2', '=>', 'value2";')
('key3', '=>', 'value3_line1')
('key4', '=>', 'value4";')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.