I have a rather large text file with multiple columns that I must convert to a 15 column .csv file to be read in excel. The logic for parsing the fields I need is written out below, but I am having trouble writing it to .csv.
columns = [ 'TRANSACTN_NBR', 'RECORD_NBR',
'SEQUENCE_OR_PIC_NBR', 'CR_DB', 'RT_NBR', 'ACCOUNT_NBR',
'RSN_COD', 'ITEM_AMOUNT', 'ITEM_SERIAL', 'CHN_IND',
'REASON_DESCR', 'SEQ2', 'ARCHIVE_DATE', 'ARCHIVE_TIME', 'ON_US_IND' ]
for line in in_file:
values = line.split()
if 'PRINT DATE:' in line:
dtevalue = line.split(a,1)[-1].split(b)[0]
lines.append(dtevalue)
elif 'PRINT TIME:' in line:
timevalue = line.split(c,1)[-1].split(b)[0]
lines.append(timevalue)
elif (len(values) >= 4 and values[3] == 'C'
and len(values[2]) >= 2 and values[2][:2] == '41'):
print(values)
elif (len(values) >= 4 and values[3] == 'D'
and values[4] in rtnbr):
on_us = '1'
else:
on_us = '0'
print (lines[0])
print (lines[1])
I have originally tried the csv module but the parsed rows are written in 12 columns and I could not find a way to write the date and time (parsed separately) in the columns after each row I was also looking at the pandas package but have only seen ways to extract patterns, which wouldn't work with the established parsed criteria
Is there a way to write to csv using the above criteria? Or do I have to scrap it and rewrite the code within a specific package? Any help is appreciated
EDIT: Text file sample:
* START ******************************************************************************************************************** START *
* START ******************************************************************************************************************** START *
* START ******************************************************************************************************************** START *
1--------------------
1ANTECR09 CHEK DPCK_R_009
TRANSIT EXTRACT SUB-SYSTEM
CURRENT DATE = 08/03/2017 JOURNAL REPORT PAGE 1
PROCESS DATE =
ID = 022000046-MNT
FILE HEADER = H080320171115
+____________________________________________________________________________________________________________________________________
R T SEQUENCE CR BT A RSN ITEM ITEM CHN USER REASO
NBR NBR OR PIC NBR DB NBR NBR COD AMOUNT SERIAL IND .......FIELD.. DESCR
5,556 01 7450282689 C 538196640 9835177743 15 $9,064.81 00 CREDIT
5,557 01 7450282690 D 031301422 362313705 38 $592.35 43431 DR CR
5,558 01 7450282691 D 021309379 601298839 38 $1,491.04 44896 DR CR
5,559 01 7450282692 D 071108834 176885 38 $6,688.00 1454 DR CR
5,560 01 7450282693 D 031309123 1390001566241 38 $293.42 6878 DR CR
--------------------
34,615 207 4100223726 C 538196620 9866597322 10 $645.49 00 CREDIT
34,616 207 4100223727 D 022000046 8891636675 31 $645.49 111583 DR ON-
--------------------
34,617 208 4100223728 C 538196620 11701364 10 $756.19 00 CREDIT
34,618 208 4100223729 D 071923828 00 54 $305.31 11384597 BAD AC
34,619 208 4100223730 D 071923828 35110011 30 $450.88 10913052 6 DR SEL
--------------------
Desired output: looking at only lines containing seq starting with 42, contains C
1293 83834 4100225908 C 538196620 9860890913 10 161.5 0 CREDIT 41 3-Aug-17 11:15:51
1294 83838 4100225911 C 538196620 25715845 10 138 0 CREDIT 41 3-Aug-17 11:15:51