Python file parsing -> IndexError

Question

I am parsing through an ISI file with a few hundred records that all begin with a 'PT J' tag and end with an 'ER' tag. I am trying to pull the tagged info from each record within a nested loop but keep getting an IndexError. I know why I am getting it, but does anyone have a better way of identifying the start of new records than checking the first few characters?

    while file:
        while line[1] + line[2] + line[3] + line[4] != 'PT J':
            ...                
            Search through and record data from tags
            ...

I am using this same method and therefore occasionally getting the same problem with identifying tags, so if you have any suggestions for that as well I would greatly appreciate it!

Sample data, which you'll notice does not always include every tag for each record, is:

    PT J
    AF Bob Smith
    TI Python For Dummies
    DT July 4, 2012
    ER

    PT J
    TI Django for Dummies
    DT 4/14/2012
    ER

    PT J
    AF Jim Brown
    TI StackOverflow
    ER

I would like to point out that I am converting this to a .txt as well before reading it. — MTP
– MTP, Commented Jul 6, 2012 at 2:47

Ashwini Chaudhary · Accepted Answer · 2012-07-06 03:08:23Z

3

with open('data1.txt') as f:
    for line in f:
        if line.strip()=='PT J':
            for line in f:
                if line.strip()!='ER' and line.strip():
                    #do something with data
                elif line.strip()=='ER':
                     #this record ends here move to the next record
                     break

edited Jul 6, 2012 at 3:08

answered Jul 6, 2012 at 3:00

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MTP Over a year ago

I think I see what's going on here, however, how would I access different lines to manipulate or test them? Since line is acting as an iterator, we can't say within the nested 'if' statement something to the effect of line=file.readline() What would replace the line=file.readline() to allow me to get to specific lines??? I ask because in some instances there are multiple entities per tag.

Marius · Accepted Answer · 2012-07-06 02:51:46Z

2

Do the 'ER' lines only contain 'ER'? That would be why you're getting IndexErrors, because line[4] doesn't exist.

The first thing to to try would be:

while not line.startswith('PT J'):

instead of your existing while loop.

Also, slices:

line[1] + line[2] + line[3] + line[4] == line[1:5]

(The ends of slices are noninclusive)

answered Jul 6, 2012 at 2:51

Marius

60.6k16 gold badges115 silver badges108 bronze badges

2 Comments

Klaus-Dieter Warzecha Over a year ago

Yes, 'ER' (End of Record) lines typically do not contain anything else, not even trailing spaces.

MTP Over a year ago

I like your suggestion...I will have to play more with it.

Levon · Accepted Answer · 2012-07-06 03:36:52Z

You could try an approach like this to read through your file.

with open('data.txt') as f:
    for line in f:
        line = line.split() # splits your line into a list of character sequences
                            # separated based on whitespace (blanks, tabs)
        llen = len(line)
        if llen == 2 and line[0] == 'PT' and line[1] == 'J': # found start of record
           # process
           # examine line[0] for 'tags', such as "AF", "TI", "DT" and proceed
           # as dictated by your needs. 
           # e.g., 

        if llen > 1 and line[0] == "AF": # grab first/last name in line[1] and line[2]

           # The data will be on the same line and
           # accessible via the correct index values.

        if lline == 1 and line[0] == 'ER': # found end of record.

This definitely needs more "programming logic" (most likely embedded loops, or better yet, calls to functions) to put everything in the right order/sequence, but the basic constructs are there and I hope will get you started and gives you some ideas.

Collectives™ on Stack Overflow

Python file parsing -> IndexError

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related