1

I am reading a text file from the web. The file starts with some header lines containing the number of data points, followed the actual vertices (3 coordinates each). The file looks like:

# comment
HEADER TEXT
POINTS 6 float
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
POLYGONS

the line starting with the word POINTS contains the number of vertices (in this case we have 3 vertices per line, but that could change)

This is how I am reading it right now:

ur=urlopen("http://.../file.dat")

j=0
contents = []
while 1:
    line = ur.readline()
    if not line:
        break
    else:
        line=line.lower()       

    if 'points' in line :
        myline=line.strip()
        word=myline.split()
        node_number=int(word[1])
        node_type=word[2]

        while 'polygons'  not in line :
            line = ur.readline()
            line=line.lower() 
            myline=line.split()

            i=0
            while(i<len(myline)):                    
                contents[j]=float(myline[i])
                i=i+1
                j=j+1

How can I read a specified number of floats instead of reading line by line as strings and converting to floating numbers?

Instead of ur.readline() I want to read the specified number of elements in the file

Any suggestion is welcome..

7
  • 1
    Could you explain why you think you need to read only a specific number of floats instead of reading by lines? The answer to that will help us help you... (for example, would it suffice to read the lines, split on the spaces, and return the required number of elements, converted to floats on the fly?) Commented Apr 20, 2010 at 23:13
  • the problem is that the file is big and the actual number of elements is close 100000, and doing this way is taking too much time.. Commented Apr 20, 2010 at 23:21
  • @sahel, Have you profiled (docs.python.org/library/profile.html) your code and determined where the bottlenecks are? Can you post your results and the relevant pieces of your code? (If it's some of these things, I can think of some ideas that may help a little.) Can you explain more about the format you are parsing; perhaps there is a better way of handling the file? Commented Apr 20, 2010 at 23:48
  • I am trying to read vtk format Commented Apr 20, 2010 at 23:55
  • @sahel: your code as published won't work; contents = []; j = 0; contents[j] = something ==> IndexError. @Mike Graham: ummm the granularity of profile is the function; I see no functions here. Commented Apr 20, 2010 at 23:57

2 Answers 2

3

I'm not entirely sure what your goal is from your explanation.

For the record, here is code that does basically the same thing as yours seems to be trying to that uses some techniques I would employ over the ones you have chosen. It's usually a sign that you're doing something wrong if you're using while loops and indices and indeed your code does not work because contents[j] = ... will be an IndexError.

lines = (line.strip().lower() for line in your_web_page)

points_line = next(line for line in lines if 'points' in line)
_, node_number, node_type = points_line.split()
node_number = int(node_number)

def get_contents(lines):
    for line in lines:
        if 'polygons' in line:
            break

        for number in line.split():
            yield float(number)

contents = list(get_contents(lines))

If you are more explicit about the new thing it is you want to do, maybe someone can provide a better answer for your ultimate goal.

Sign up to request clarification or add additional context in comments.

Comments

0

Here is a no-fuss cleanup of your code that should make the looping over the contents much faster.

ur=urlopen("http://.../file.dat")
contents = []
node_number = 0
node_type = None
while 1:
    line = ur.readline()
    if not line:
        break
    line = line.lower()       
    if 'points' in line :
        word = line.split()
        node_number = int(word[1])
        node_type = word[2]
        while 1:
            pieces = ur.readline().split()
            if not pieces: continue # or break or issue error message
            if pieces[0].lower() == 'polygons': break
            contents.extend(map(float, pieces))
assert len(contents) == node_number * 3

If you wrap the code in a function and call that, it will run even faster (because you will be accessing local variables instead of global ones).

Note that the most significant changes are near/at the end of the script.

HOWEVER: stand back and think about this for a few seconds: how much of the time is taken up by the ur.readline() and how much by unpacking the lines?

1 Comment

@John Machin, Good call with standing back and thinking about it, but it's quite possible we're not standing far enough back yet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.