Parsing big text files with python specific syntax

Question

I'm trying to parse big text files with python.

These files have a syntax like this:

<option1> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

<option2> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

...
...

<optionN> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

And I want to get for instance <optionK>[<variableT>] value.

Is there an optimal way to do this by using a fileparser?

@sshashank124: The OP stated the file is huge; regex would require you read the whole file into memory, perhaps not the most practical advice? — Martijn Pieters
– Martijn Pieters, Commented Mar 20, 2014 at 9:57
@MartijnPieters: mmap allows you to apply regex to a huge file. See How to read tokens without reading whole line or file — jfs
– jfs, Commented Mar 20, 2014 at 10:07
you could try something like lepl (discontinued) to parse the file, here's a code example — jfs
– jfs, Commented Mar 20, 2014 at 10:14
@JFSebastian: Can't look it up right now but Jon Clements the other day had found you couldn't if the file was larger than available memory. But I have no first-hand experience there and I'll happily defer to you. I'd read the file line by line detection sections, myself. — Martijn Pieters
– Martijn Pieters, Commented Mar 20, 2014 at 10:34
@MartijnPieters: My answer explicitly says "It works even if the file doesn't fit in memory." I wouldn't have said that if I hadn't tried it. I also would not use a single regex to parse the file. I just mentioned it to say that it is possible — jfs
– jfs, Commented Mar 26, 2014 at 21:30

Ali SAID OMAR · Accepted Answer · 2014-03-20 11:13:16Z

Consider your above example (ugly solution) you can use http://docs.python.org/2/library/htmlparser.html as follow:

test = """
<option1> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

<option2> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

...
...

<optionN> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

"""

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    option = ""
    key = ""
    value = ""
    r = {}
    def handle_starttag(self, tag, attrs):
        self.currentTag = tag
        print "Encountered a start tag:", tag
        if "option" in tag:
            #self.r = {}
            self.option = tag
            self.r[self.option] = {}
        elif "{" in self.currentData or "=" not in self.currentData and "//" not in self.currentData:
            self.key = tag
            self.r[self.option][self.key] = ""
        elif "=" in self.currentData:
            self.value = tag
            self.r[self.option][self.key] = self.value
            #print self.r
    def handle_endtag(self, tag):
        self.currentData = None
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        self.currentData = data
        print "Encountered some data  :", data
        #find a condition to yield result here "}" ? 

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()  
parser.feed(test) 
print parser.r

Collectives™ on Stack Overflow

Parsing big text files with python specific syntax

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related