how to read specific number of floats from file in python?

Question

I am reading a text file from the web. The file starts with some header lines containing the number of data points, followed the actual vertices (3 coordinates each). The file looks like:

# comment
HEADER TEXT
POINTS 6 float
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
POLYGONS

the line starting with the word POINTS contains the number of vertices (in this case we have 3 vertices per line, but that could change)

This is how I am reading it right now:

ur=urlopen("http://.../file.dat")

j=0
contents = []
while 1:
    line = ur.readline()
    if not line:
        break
    else:
        line=line.lower()       

    if 'points' in line :
        myline=line.strip()
        word=myline.split()
        node_number=int(word[1])
        node_type=word[2]

        while 'polygons'  not in line :
            line = ur.readline()
            line=line.lower() 
            myline=line.split()

            i=0
            while(i<len(myline)):                    
                contents[j]=float(myline[i])
                i=i+1
                j=j+1

How can I read a specified number of floats instead of reading line by line as strings and converting to floating numbers?

Instead of ur.readline() I want to read the specified number of elements in the file

Any suggestion is welcome..

Could you explain why you think you need to read only a specific number of floats instead of reading by lines? The answer to that will help us help you... (for example, would it suffice to read the lines, split on the spaces, and return the required number of elements, converted to floats on the fly?) — Peter Hansen
– Peter Hansen, Commented Apr 20, 2010 at 23:13
the problem is that the file is big and the actual number of elements is close 100000, and doing this way is taking too much time.. — sahel
– sahel, Commented Apr 20, 2010 at 23:21
@sahel, Have you profiled (docs.python.org/library/profile.html) your code and determined where the bottlenecks are? Can you post your results and the relevant pieces of your code? (If it's some of these things, I can think of some ideas that may help a little.) Can you explain more about the format you are parsing; perhaps there is a better way of handling the file? — Mike Graham
– Mike Graham, Commented Apr 20, 2010 at 23:48
@sahel: your code as published won't work; contents = []; j = 0; contents[j] = something ==> IndexError. @Mike Graham: ummm the granularity of profile is the function; I see no functions here. — John Machin
– John Machin, Commented Apr 20, 2010 at 23:57

Mike Graham · Accepted Answer · 2010-04-20 23:40:06Z

I'm not entirely sure what your goal is from your explanation.

For the record, here is code that does basically the same thing as yours seems to be trying to that uses some techniques I would employ over the ones you have chosen. It's usually a sign that you're doing something wrong if you're using while loops and indices and indeed your code does not work because contents[j] = ... will be an IndexError.

lines = (line.strip().lower() for line in your_web_page)

points_line = next(line for line in lines if 'points' in line)
_, node_number, node_type = points_line.split()
node_number = int(node_number)

def get_contents(lines):
    for line in lines:
        if 'polygons' in line:
            break

        for number in line.split():
            yield float(number)

contents = list(get_contents(lines))

If you are more explicit about the new thing it is you want to do, maybe someone can provide a better answer for your ultimate goal.

John Machin · Accepted Answer · 2010-04-21 00:21:18Z

0

Here is a no-fuss cleanup of your code that should make the looping over the contents much faster.

ur=urlopen("http://.../file.dat")
contents = []
node_number = 0
node_type = None
while 1:
    line = ur.readline()
    if not line:
        break
    line = line.lower()       
    if 'points' in line :
        word = line.split()
        node_number = int(word[1])
        node_type = word[2]
        while 1:
            pieces = ur.readline().split()
            if not pieces: continue # or break or issue error message
            if pieces[0].lower() == 'polygons': break
            contents.extend(map(float, pieces))
assert len(contents) == node_number * 3

If you wrap the code in a function and call that, it will run even faster (because you will be accessing local variables instead of global ones).

Note that the most significant changes are near/at the end of the script.

HOWEVER: stand back and think about this for a few seconds: how much of the time is taken up by the ur.readline() and how much by unpacking the lines?

edited Apr 21, 2010 at 0:21

answered Apr 21, 2010 at 0:04

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

1 Comment

Mike Graham Over a year ago

@John Machin, Good call with standing back and thinking about it, but it's quite possible we're not standing far enough back yet.

Collectives™ on Stack Overflow

how to read specific number of floats from file in python?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related