Parsing data from file and storing in an array

Question

I am trying to parse data from a file which has two sets of data included in it. The file has header information for the first 40 lines of the file and then is followed by 1000 lines of two columns of data. An additional file has been appended to the file with the same format. That is, lines 1041 through 1081 have the second file's header information, followed by 1000 lines of two column data. The first column in for both sections of data is the same. Therefore, I want to parse the data file to remove the header section and save the data to a 3x1000 array.

The file is organized as:

Line 1: //Header information

Line 2: //Header information

...

Line 40: 1.000e3 -4.000e-3

Line 41: 1.001e3 -4.324e-3

...

Line 1000: 10.000e3 -78.678e-3

Line 1001: //Header Information

Line 1002: //Header Information

Line 1041: 1.000e3 -16.000e-3

Line 41: 1.001e3 -14.324e-3

...

Line 2000: 10.000e3 -22.178e-3

I want to parse on the columned data and output to an array with the format of

[1.000e3, -4.000e-3, -16.000e-3]

[1.001e3, -4.432e-3, -14.423e-3]

...

[10.00e3. -78.678e-3, -22.178e-3]

I have tried the following: DATA = [[0 for x in xrange(3)] for x in xrange(10000)]

for i in sort(os.listdir('.')):

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[0].append(dataLine[0])
        DATA[1].append(dataLine[1])

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[2].append(dataLine[1])

dataFile.close()

Thanks for your help in advance.

Your first problem is that range(0, 39) only has 39 values, not 40. And range(0, 10000) obviously has 10000 lines, not 1000. — abarnert
– abarnert, Commented May 4, 2015 at 23:54
Also, don't pre-fill the lists with 0's and then append onto the end; that's going to give you 1000 0's plus 1000 values, instead of just the 1000 values. — abarnert
– abarnert, Commented May 4, 2015 at 23:55
Also, if there are only 2 concatenated files, how are you going to read 3 sets of rows and headers? — abarnert
– abarnert, Commented May 4, 2015 at 23:57

abarnert · Accepted Answer · 2015-05-04 23:57:57Z

2

from itertools import islice
def get_headers_and_columns(fhandle):
   return list(islice(fhandle,0,40)),zip(*map(str.split,islice(fhandle,0,1000)))

with open("input.txt") as f_in,open("output.txt","w") as f_out:
    headers, columns = get_headers_and_columns(f_in)
    headers2, columns2 = get_headers_and_columns(f_in)
    columns.append(columns2[-1])
    f_out.write("\n".join(map(" ".join,zip(*columns)))

is one way you could accomplish this ... at least I think that will work

edited May 4, 2015 at 23:57

abarnert

368k54 gold badges626 silver badges691 bronze badges

answered May 4, 2015 at 23:50

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

abarnert Over a year ago

You meant fhandle, not f_in, in the second islice, right? (It should still work, because the only argument you'll ever get is the f_in global anyway, but…)

Joran Beasley Over a year ago

yeah ... copy/paste fail :P

abarnert Over a year ago

You also changed one of the files to binary mode and left the other as text; I just edited out the b.

Adam Smith Over a year ago

I'm upvoting because I believe you that this will solve the problem, but I can't understand what the OQA is asking, so...

Joran Beasley Over a year ago

yeh ... i figured it was just text anyway ... binary mode shouldnt have hurt it and I am just always in the habit of opening for write in bin mode :P @abarnert

|

Collectives™ on Stack Overflow

Parsing data from file and storing in an array

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related