Python Parse Structered Datafile to CSV

Question

I have a big (75MB) datafile (EMP.txt) which looks like

01ABCD      FIT        PROGRAMMER30000EFGH            
02IJK     LMMACCOUNTS  MANAGER   50000OPQRST   UV

and so on. I have a structure file (EMPSTRU.txt) of the datafile which looks like

001 EMPID LENGTH 2
002 EMPNAME LENGTH 10
003 SEX LENGTH 1
004 DEPARTMENT LENGTH 10
005 DESIGNATION LENGTH 10
006 SALARY LENGTH 5
007 SUPERNAME LENGTH 10

Now how do I parse the datafile to csv format? I am using slice method to extract from the datafile. Also there are at least 150 field names. Is there a better way in python to get the column names? Currently I am manually typing them like

EMPID = Dataline(0:2)

Please help. Thanks.

6502 · Accepted Answer · 2018-05-28 03:05:39Z

1

You can parse the EMPSTRU.txt file directly to a description usable by struct.unpack, for example

import struct
print(struct.unpack("2s3s2s", "abcdefg"))

outputs

["ab", "cde", "fg"]

In your case it should requires something like... (untested)

import struct

rdef = ""
colnames = []
for L in open("EMPSTRU.txt"):
    L = L.strip()
    lpos = L.rindex(" LENGTH ")
    rdef += L[lpos+8:] + "s"
    colnames.append(L[4:lpos]) # skip field number

then you can extract a data record with:

content = struct.unpack(rdef, record)

and rewrite it to destination file with

out.write("\t".join(content) + "\n")

answered May 28, 2018 at 3:05

6502

115k17 gold badges177 silver badges277 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Parse Structered Datafile to CSV

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related