writing .txt to .csv excel columns in Python

Question

I have a rather large text file with multiple columns that I must convert to a 15 column .csv file to be read in excel. The logic for parsing the fields I need is written out below, but I am having trouble writing it to .csv.

columns = [ 'TRANSACTN_NBR', 'RECORD_NBR', 
        'SEQUENCE_OR_PIC_NBR', 'CR_DB', 'RT_NBR', 'ACCOUNT_NBR', 
        'RSN_COD', 'ITEM_AMOUNT', 'ITEM_SERIAL', 'CHN_IND', 
        'REASON_DESCR', 'SEQ2', 'ARCHIVE_DATE', 'ARCHIVE_TIME', 'ON_US_IND' ]

    for line in in_file:
        values = line.split()
        if 'PRINT DATE:' in line:
            dtevalue = line.split(a,1)[-1].split(b)[0]
            lines.append(dtevalue)

        elif 'PRINT TIME:' in line:
            timevalue = line.split(c,1)[-1].split(b)[0]
            lines.append(timevalue)   

        elif (len(values) >= 4 and values[3] == 'C'
            and len(values[2]) >= 2 and values[2][:2] == '41'):
            print(values)

        elif (len(values) >= 4 and values[3] == 'D'
            and values[4] in rtnbr):
            on_us = '1'
        else:
            on_us = '0'

print (lines[0])
print (lines[1])

I have originally tried the csv module but the parsed rows are written in 12 columns and I could not find a way to write the date and time (parsed separately) in the columns after each row I was also looking at the pandas package but have only seen ways to extract patterns, which wouldn't work with the established parsed criteria

Is there a way to write to csv using the above criteria? Or do I have to scrap it and rewrite the code within a specific package? Any help is appreciated

EDIT: Text file sample:

    * START ******************************************************************************************************************** START *
 * START ******************************************************************************************************************** START *
 * START ******************************************************************************************************************** START *
1--------------------
1ANTECR09                                                 CHEK                                                 DPCK_R_009
                                                     TRANSIT EXTRACT SUB-SYSTEM
    CURRENT DATE = 08/03/2017                             JOURNAL     REPORT                                              PAGE    1
    PROCESS DATE =
 ID = 022000046-MNT                                                                      
    FILE HEADER = H080320171115                                      
+____________________________________________________________________________________________________________________________________
     R               T      SEQUENCE    CR      BT                A RSN               ITEM           ITEM CHN          USER    REASO
        NBR       NBR       OR PIC NBR  DB      NBR              NBR COD             AMOUNT         SERIAL IND  .......FIELD..  DESCR
      5,556        01        7450282689 C 538196640        9835177743 15          $9,064.81              00                    CREDIT
      5,557        01        7450282690 D 031301422         362313705 38            $592.35           43431                    DR CR
      5,558        01        7450282691 D 021309379         601298839 38          $1,491.04           44896                    DR CR
      5,559        01        7450282692 D 071108834            176885 38          $6,688.00            1454                    DR CR
      5,560        01        7450282693 D 031309123     1390001566241 38            $293.42            6878                    DR CR

 --------------------
     34,615       207        4100223726 C 538196620        9866597322 10            $645.49              00                    CREDIT
     34,616       207        4100223727 D 022000046        8891636675 31            $645.49          111583                    DR ON-
 --------------------
     34,617       208        4100223728 C 538196620          11701364 10            $756.19              00                    CREDIT
     34,618       208        4100223729 D 071923828                00 54            $305.31        11384597                    BAD AC
     34,619       208        4100223730 D 071923828          35110011 30            $450.88        10913052 6                  DR SEL
 --------------------

Desired output: looking at only lines containing seq starting with 42, contains C

1293    83834   4100225908  C   538196620   9860890913  10  161.5   0       CREDIT  41  3-Aug-17    11:15:51
1294    83838   4100225911  C   538196620   25715845    10  138 0       CREDIT  41  3-Aug-17    11:15:51

Yeah, include an example line of input and an example line of desired output — maxymoo
– maxymoo, Commented Aug 17, 2017 at 23:36

kautenja · Accepted Answer · 2017-08-17 23:47:44Z

1

Look at the ‘pandas‘ package, more specifically the class DataFrame. With a little cleverness you ought to be able to read your table using ‘pandas.read_table()‘ which returns a dataframe that you can output to csv with ‘to_csv()‘ effectively a 2 line solution. You’ll need to look at the docs to find the parameters you’ll need to properly read your table format, but should be a little easier than doing it manually.

answered Aug 17, 2017 at 23:47

kautenja

1402 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

writing .txt to .csv excel columns in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related