python regular expression, wrong results

Question

I am new to python and regular expressions. I am currently trying to make a program that reads the contents of the file below and get specific parameters and max_speeds within the sections. Under each SECTION:#, the parameters are all indented (TAB) until the next SECTION:#

[SECTION:3]
      paramter = 3
      state = AZ
      car = toyota
      max_speed = 90.000
      any_pets = yes
[SECTION:13]
      paramter = 10
      state = NY
      car = honda
      max_speed = 120.000
      any_pets = yes
[SECTION:85]
      paramter = 31
      state = TX
      car = kia
      max_speed = 30.000
      any_pets = no

This is my code:

import re
file = open('file.txt').readlines()
file_str = str(file)

for lines in file_str:
     myreg = re.compile(r'(?<=SECTION:13).+(max_speed\s\=\s\w+)')
     myreg1 = myreg.search(lines)
     print myreg1.group(1)

The problem is that the results are always wrong...it's as if the regular expression always matches the results of the last section.

Please let me know what am i doing wrong and what would be the best way of doing it. Thank you!

You might be interested in the Python ConfigParser

Levon
– Levon

2012-07-02 02:03:12 +00:00
Commented Jul 2, 2012 at 2:03 — Levon
– Levon, Commented Jul 2, 2012 at 2:03

Ned Batchelder · Accepted Answer · 2012-07-02 02:04:20Z

3

You have a number of problems. First, read lines in a file like this:

with open('file.txt') as f:
    for line in f:
        # process each line.

The way you are reading lines, you create a list with readlines, then make it a string with str, which will give you data like "['line1\n', 'line2\n']". Then iterating over that string will give you each character in turn.

But you probably don't need to read the file yourself at all. The built-in module ConfigParser will parse these files for you directly, give it a look.

answered Jul 2, 2012 at 2:04

Ned Batchelder

378k77 gold badges583 silver badges675 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Squid Over a year ago

Hi Ned, thank you very much for your answer. I have looked at a few examples of the ConfigParser and I have a feeling I will be using that for my solution. However I wanted to share with you the following: with open('file.txt') as f: for line in f: myreg = re.compile(r'(?<=SECTION:13).+(max_speed\s\=\s\w+)') myreg1 = myreg.search(lines) print myreg1.group(1) Why myreg1 would not match max_speed under SECTION:13? Any ideas?

Ned Batchelder Over a year ago

Squid: you are searching the lines one-by-one, so you can't expect to match against [SECTION:13] and max_speed=.., because they are on different lines. You won't ever have a single string with both of them.

Squid Over a year ago

Hi Ned, I looked at the ConfigParser module and it does exactly what I need. However there is a problem. As as the options under section are NOT indented, the code will work. In my case the options under the sections are TAB twice. I tried using .strip() after the string that holds the file, but no luck...Any ideas?

pinkdawn · Accepted Answer · 2012-07-02 02:55:52Z

0

you should try some like this: (I'm not running and test the code, make it run yourself)

import re    
pattern = '(?<=SECTION:13).+(max_speed\s\=\s\w+)'
mattches = re.findall(pattern, '\n'.join(open('file.txt').readlines()))
print mattches

answered Jul 2, 2012 at 2:55

pinkdawn

1,03311 silver badges21 bronze badges

3 Comments

pinkdawn Over a year ago

Using ConfigParser is always a better choice !

Ned Batchelder Over a year ago

'\n'.join(open(..).readlines()) ? Have you tried: open(..).read() ?

Squid Over a year ago

Hi pinkdawn...any ideas of how to deal with indentation under the sections while using the ConfigParser module?

pinkdawn · Accepted Answer · 2012-07-02 05:04:44Z

0

To deal with indentation under the sections while using the ConfigParser module, just using following code:

from ConfigParser import ConfigParser

class fp():
    def __init__(self, filename):
        self.fileobj = open(filename)

    def readline(self):
        return self.fileobj.readline().lstrip()

f = fp('e:/file.txt')
config = ConfigParser()
config.readfp(f)
print config.get('SECTION:3', 'state')

answered Jul 2, 2012 at 5:04

pinkdawn

1,03311 silver badges21 bronze badges

Collectives™ on Stack Overflow

python regular expression, wrong results

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related