0

I am new to python and regular expressions. I am currently trying to make a program that reads the contents of the file below and get specific parameters and max_speeds within the sections. Under each SECTION:#, the parameters are all indented (TAB) until the next SECTION:#

[SECTION:3]
      paramter = 3
      state = AZ
      car = toyota
      max_speed = 90.000
      any_pets = yes
[SECTION:13]
      paramter = 10
      state = NY
      car = honda
      max_speed = 120.000
      any_pets = yes
[SECTION:85]
      paramter = 31
      state = TX
      car = kia
      max_speed = 30.000
      any_pets = no

This is my code:

import re
file = open('file.txt').readlines()
file_str = str(file)

for lines in file_str:
     myreg = re.compile(r'(?<=SECTION:13).+(max_speed\s\=\s\w+)')
     myreg1 = myreg.search(lines)
     print myreg1.group(1)

The problem is that the results are always wrong...it's as if the regular expression always matches the results of the last section.

Please let me know what am i doing wrong and what would be the best way of doing it. Thank you!

1
  • 3
    You might be interested in the Python ConfigParser Commented Jul 2, 2012 at 2:03

3 Answers 3

3

You have a number of problems. First, read lines in a file like this:

with open('file.txt') as f:
    for line in f:
        # process each line.

The way you are reading lines, you create a list with readlines, then make it a string with str, which will give you data like "['line1\n', 'line2\n']". Then iterating over that string will give you each character in turn.

But you probably don't need to read the file yourself at all. The built-in module ConfigParser will parse these files for you directly, give it a look.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Ned, thank you very much for your answer. I have looked at a few examples of the ConfigParser and I have a feeling I will be using that for my solution. However I wanted to share with you the following: with open('file.txt') as f: for line in f: myreg = re.compile(r'(?<=SECTION:13).+(max_speed\s\=\s\w+)') myreg1 = myreg.search(lines) print myreg1.group(1) Why myreg1 would not match max_speed under SECTION:13? Any ideas?
Squid: you are searching the lines one-by-one, so you can't expect to match against [SECTION:13] and max_speed=.., because they are on different lines. You won't ever have a single string with both of them.
Hi Ned, I looked at the ConfigParser module and it does exactly what I need. However there is a problem. As as the options under section are NOT indented, the code will work. In my case the options under the sections are TAB twice. I tried using .strip() after the string that holds the file, but no luck...Any ideas?
0

you should try some like this: (I'm not running and test the code, make it run yourself)

import re    
pattern = '(?<=SECTION:13).+(max_speed\s\=\s\w+)'
mattches = re.findall(pattern, '\n'.join(open('file.txt').readlines()))
print mattches

3 Comments

Using ConfigParser is always a better choice !
'\n'.join(open(..).readlines()) ? Have you tried: open(..).read() ?
Hi pinkdawn...any ideas of how to deal with indentation under the sections while using the ConfigParser module?
0

To deal with indentation under the sections while using the ConfigParser module, just using following code:

from ConfigParser import ConfigParser

class fp():
    def __init__(self, filename):
        self.fileobj = open(filename)

    def readline(self):
        return self.fileobj.readline().lstrip()

f = fp('e:/file.txt')
config = ConfigParser()
config.readfp(f)
print config.get('SECTION:3', 'state')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.