0

I have a big (75MB) datafile (EMP.txt) which looks like

01ABCD      FIT        PROGRAMMER30000EFGH            
02IJK     LMMACCOUNTS  MANAGER   50000OPQRST   UV

and so on. I have a structure file (EMPSTRU.txt) of the datafile which looks like

001 EMPID LENGTH 2
002 EMPNAME LENGTH 10
003 SEX LENGTH 1
004 DEPARTMENT LENGTH 10
005 DESIGNATION LENGTH 10
006 SALARY LENGTH 5
007 SUPERNAME LENGTH 10

Now how do I parse the datafile to csv format? I am using slice method to extract from the datafile. Also there are at least 150 field names. Is there a better way in python to get the column names? Currently I am manually typing them like

EMPID = Dataline(0:2)

Please help. Thanks.

1 Answer 1

1

You can parse the EMPSTRU.txt file directly to a description usable by struct.unpack, for example

import struct
print(struct.unpack("2s3s2s", "abcdefg"))

outputs

["ab", "cde", "fg"]

In your case it should requires something like... (untested)

import struct

rdef = ""
colnames = []
for L in open("EMPSTRU.txt"):
    L = L.strip()
    lpos = L.rindex(" LENGTH ")
    rdef += L[lpos+8:] + "s"
    colnames.append(L[4:lpos]) # skip field number

then you can extract a data record with:

content = struct.unpack(rdef, record)

and rewrite it to destination file with

out.write("\t".join(content) + "\n")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.