0

I have a file with data like this:

# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887

and I would like to read only lines between the comment lines in the following way: I read all the lines between two neighbouring comments into some array (without saving into a file), and work with it, then read the next block into array, and so on.

I managed to make it reading one block:

def main():
    sourceFile = 'test.asc'
    print 'Extracting points ...'
    extF = open(sourceFile, 'r')
    block, cursPos = readBlock(extF)
    extF.close()
    print 'Finished extraction'

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    while not line.startswith('#'):
        extPnt = Point(*[float(j) for j in line.split()])
        countPnts += 1
        extBlock.append(extPnt)
        line = extF.readline()

    cursPos = extF.tell()
    print 'Points:', countPnts
    print 'Cursor position:', cursPos
    return extBlock, cursPos

It works perfectly but only for one block of data. I cannot make it iterating between commented lines from one block to another. I was thinking about the cursor position but could not realise that. Please give me some tips about this. Thank you.

Update I implemented the idea of MattH as following:

def blocks(seq):
    buff = []
    for line in seq:
        if line.startswith('#'):
            if buff:
                #yield "".join(buff)
                buff = []
        else:
            # I need to make those numbers float
            line_spl = line.split()
            pnt = [float(line_spl[k]) for k in range(len(line_spl))]
            #print pnt
            buff.append(Point(*pnt))
    if buff:
        yield "".join(buff)

Then, if I run it:

for block in blocks(extF.readlines()):
    print 'p'

I have just empty window although print 'p' is inside the for-loop. So, there is a couple of questions:

What does the

if buff:
    yield "".join(buff)

do? When I comment it nothing changes...

Why commands inside the for-loop does not work?

This function is generator, so I do not have access to the lines which were processed before, do I?

Solution

I managed to do it myself using ideas of MattH and Ashwini Chaudhari. Finally, I got this:

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    if line.startswith('#'):
        line = extF.readline()
    else:
        while not line.startswith('#'):
            extPnt = Point(*[float(j) for j in line.split()])
            countPnts += 1
            extBlock.append(extPnt)
            line = extF.readline()

    return extBlock, countPnts

And run it with:

while extF.readline():
    block, pntNum = readBlock(extF)

It works exactly as I need.

Thanks everybody.

3
  • Probably, the generator will not work for me, since I need access to all lines of a block. Commented Aug 23, 2012 at 12:38
  • Well, it you're happy… I'd still suggest to look at mmap, that lets you find the positions of your '#': then, it's just a matter of reading the part of the array you need... Commented Aug 23, 2012 at 14:01
  • @PierreGM thank you for the comment. It looks quite complicated for me. I prefer an understandable solution even if it is not the most wise. Maybe later when I learn Python better... Commented Aug 23, 2012 at 14:20

2 Answers 2

2

Here are two simple generators, one that yields all non-comment blocks and the other only the non-comment blocks between comments. Updated for the two different possibilities and updated to have line splitting and joining in the same function for consistency.

sample = """Don't yield this
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
Don't yield this either"""

def blocks1(text):
  """All non-comment blocks"""
  buff = []
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      buff.append(line)
  if buff:
    yield "\n".join(buff)

def blocks2(text):
  """Only non-comment blocks *between* comments"""
  buff = None
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff is None:
        buff = []
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      if buff is not None:
        buff.append(line)

for block in blocks2(sample):
  print "Block:\n%s" % (block,)

Produces:

Block:
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
Block:
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
Sign up to request clarification or add additional context in comments.

6 Comments

what if there's a line above the first comment and/or below the last comment, then this program will consider those lines as blocks too, and OP wants only the lines between two neighborhood comments
Interesting point, I've added another generate that doesn't yield those without leading or trailing comments.
@MattH I do not really understand what this piece does: if buff: yield "\n".join(buff).
If the boolean evaluation of buff is True (in this code this is roughly equivalent to buff is not None and len(buff) > 0), then yield the contents of buff concatenated together with \n. In this code, buff will either be None or a list of zero or more strings.
@MattH Why do you need it? When I remove it nothing changes.
|
0

data.txt:

123456
1234
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
1234
12345

program:

with open('data.txt') as f:
    lines=[x.strip() for x in f if x.strip()]
    for i,x in enumerate(lines):  #loop to find the first comment line
        if x.startswith('#'):
            ind=i
            break
    for i,x in enumerate(lines[::-1]): #loop to find the first comment line from the end
        if x.startswith('#'):
            ind1=i
            break
    for x in lines[ind+1:-ind1-1]:
        if not x.startswith('#'):
            print x

output:

...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...

1 Comment

The OP asked to work on the non-comment blocks a block at a time, your solution does not provide them separately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.