Python: Need a hint on reading blocks of data from a text file

Question

I have a file with data like this:

# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887

and I would like to read only lines between the comment lines in the following way: I read all the lines between two neighbouring comments into some array (without saving into a file), and work with it, then read the next block into array, and so on.

I managed to make it reading one block:

def main():
    sourceFile = 'test.asc'
    print 'Extracting points ...'
    extF = open(sourceFile, 'r')
    block, cursPos = readBlock(extF)
    extF.close()
    print 'Finished extraction'

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    while not line.startswith('#'):
        extPnt = Point(*[float(j) for j in line.split()])
        countPnts += 1
        extBlock.append(extPnt)
        line = extF.readline()

    cursPos = extF.tell()
    print 'Points:', countPnts
    print 'Cursor position:', cursPos
    return extBlock, cursPos

It works perfectly but only for one block of data. I cannot make it iterating between commented lines from one block to another. I was thinking about the cursor position but could not realise that. Please give me some tips about this. Thank you.

Update I implemented the idea of MattH as following:

def blocks(seq):
    buff = []
    for line in seq:
        if line.startswith('#'):
            if buff:
                #yield "".join(buff)
                buff = []
        else:
            # I need to make those numbers float
            line_spl = line.split()
            pnt = [float(line_spl[k]) for k in range(len(line_spl))]
            #print pnt
            buff.append(Point(*pnt))
    if buff:
        yield "".join(buff)

Then, if I run it:

for block in blocks(extF.readlines()):
    print 'p'

I have just empty window although print 'p' is inside the for-loop. So, there is a couple of questions:

What does the

if buff:
    yield "".join(buff)

do? When I comment it nothing changes...

Why commands inside the for-loop does not work?

This function is generator, so I do not have access to the lines which were processed before, do I?

Solution

I managed to do it myself using ideas of MattH and Ashwini Chaudhari. Finally, I got this:

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    if line.startswith('#'):
        line = extF.readline()
    else:
        while not line.startswith('#'):
            extPnt = Point(*[float(j) for j in line.split()])
            countPnts += 1
            extBlock.append(extPnt)
            line = extF.readline()

    return extBlock, countPnts

And run it with:

while extF.readline():
    block, pntNum = readBlock(extF)

It works exactly as I need.

Thanks everybody.

Probably, the generator will not work for me, since I need access to all lines of a block. — user1329187
– user1329187, Commented Aug 23, 2012 at 12:38
Well, it you're happy… I'd still suggest to look at mmap, that lets you find the positions of your '#': then, it's just a matter of reading the part of the array you need... — Pierre GM
– Pierre GM, Commented Aug 23, 2012 at 14:01
@PierreGM thank you for the comment. It looks quite complicated for me. I prefer an understandable solution even if it is not the most wise. Maybe later when I learn Python better... — user1329187
– user1329187, Commented Aug 23, 2012 at 14:20

MattH · Accepted Answer · 2012-08-22 14:19:10Z

2

Here are two simple generators, one that yields all non-comment blocks and the other only the non-comment blocks between comments. Updated for the two different possibilities and updated to have line splitting and joining in the same function for consistency.

sample = """Don't yield this
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
Don't yield this either"""

def blocks1(text):
  """All non-comment blocks"""
  buff = []
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      buff.append(line)
  if buff:
    yield "\n".join(buff)

def blocks2(text):
  """Only non-comment blocks *between* comments"""
  buff = None
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff is None:
        buff = []
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      if buff is not None:
        buff.append(line)

for block in blocks2(sample):
  print "Block:\n%s" % (block,)

Produces:

Block:
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
Block:
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...

edited Aug 22, 2012 at 14:19

answered Aug 22, 2012 at 13:39

MattH

38.4k11 gold badges85 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ashwini Chaudhary Over a year ago

what if there's a line above the first comment and/or below the last comment, then this program will consider those lines as blocks too, and OP wants only the lines between two neighborhood comments

MattH Over a year ago

Interesting point, I've added another generate that doesn't yield those without leading or trailing comments.

user1329187 Over a year ago

@MattH I do not really understand what this piece does: if buff: yield "\n".join(buff).

MattH Over a year ago

If the boolean evaluation of buff is True (in this code this is roughly equivalent to buff is not None and len(buff) > 0), then yield the contents of buff concatenated together with \n. In this code, buff will either be None or a list of zero or more strings.

user1329187 Over a year ago

@MattH Why do you need it? When I remove it nothing changes.

|

Ashwini Chaudhary · Accepted Answer · 2012-08-22 13:43:47Z

0

data.txt:

123456
1234
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
1234
12345

program:

with open('data.txt') as f:
    lines=[x.strip() for x in f if x.strip()]
    for i,x in enumerate(lines):  #loop to find the first comment line
        if x.startswith('#'):
            ind=i
            break
    for i,x in enumerate(lines[::-1]): #loop to find the first comment line from the end
        if x.startswith('#'):
            ind1=i
            break
    for x in lines[ind+1:-ind1-1]:
        if not x.startswith('#'):
            print x

output:

...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...

answered Aug 22, 2012 at 13:43

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

1 Comment

MattH Over a year ago

The OP asked to work on the non-comment blocks a block at a time, your solution does not provide them separately.

Collectives™ on Stack Overflow

Python: Need a hint on reading blocks of data from a text file

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related