0

I am trying to parse data from a file which has two sets of data included in it. The file has header information for the first 40 lines of the file and then is followed by 1000 lines of two columns of data. An additional file has been appended to the file with the same format. That is, lines 1041 through 1081 have the second file's header information, followed by 1000 lines of two column data. The first column in for both sections of data is the same. Therefore, I want to parse the data file to remove the header section and save the data to a 3x1000 array.

The file is organized as:

Line 1: //Header information

Line 2: //Header information

...

Line 40: 1.000e3 -4.000e-3

Line 41: 1.001e3 -4.324e-3

...

Line 1000: 10.000e3 -78.678e-3

Line 1001: //Header Information

Line 1002: //Header Information

Line 1041: 1.000e3 -16.000e-3

Line 41: 1.001e3 -14.324e-3

...

Line 2000: 10.000e3 -22.178e-3

I want to parse on the columned data and output to an array with the format of

[1.000e3, -4.000e-3, -16.000e-3]

[1.001e3, -4.432e-3, -14.423e-3]

...

[10.00e3. -78.678e-3, -22.178e-3]

I have tried the following: DATA = [[0 for x in xrange(3)] for x in xrange(10000)]

for i in sort(os.listdir('.')):

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[0].append(dataLine[0])
        DATA[1].append(dataLine[1])

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[2].append(dataLine[1])

dataFile.close()

Thanks for your help in advance.

3
  • Your first problem is that range(0, 39) only has 39 values, not 40. And range(0, 10000) obviously has 10000 lines, not 1000. Commented May 4, 2015 at 23:54
  • Also, don't pre-fill the lists with 0's and then append onto the end; that's going to give you 1000 0's plus 1000 values, instead of just the 1000 values. Commented May 4, 2015 at 23:55
  • 1
    Also, if there are only 2 concatenated files, how are you going to read 3 sets of rows and headers? Commented May 4, 2015 at 23:57

1 Answer 1

2
from itertools import islice
def get_headers_and_columns(fhandle):
   return list(islice(fhandle,0,40)),zip(*map(str.split,islice(fhandle,0,1000)))

with open("input.txt") as f_in,open("output.txt","w") as f_out:
    headers, columns = get_headers_and_columns(f_in)
    headers2, columns2 = get_headers_and_columns(f_in)
    columns.append(columns2[-1])
    f_out.write("\n".join(map(" ".join,zip(*columns)))

is one way you could accomplish this ... at least I think that will work

Sign up to request clarification or add additional context in comments.

7 Comments

You meant fhandle, not f_in, in the second islice, right? (It should still work, because the only argument you'll ever get is the f_in global anyway, but…)
yeah ... copy/paste fail :P
You also changed one of the files to binary mode and left the other as text; I just edited out the b.
I'm upvoting because I believe you that this will solve the problem, but I can't understand what the OQA is asking, so...
yeh ... i figured it was just text anyway ... binary mode shouldnt have hurt it and I am just always in the habit of opening for write in bin mode :P @abarnert
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.