0

I have a rather large text file with multiple columns that I must convert to a 15 column .csv file to be read in excel. The logic for parsing the fields I need is written out below, but I am having trouble writing it to .csv.

columns = [ 'TRANSACTN_NBR', 'RECORD_NBR', 
        'SEQUENCE_OR_PIC_NBR', 'CR_DB', 'RT_NBR', 'ACCOUNT_NBR', 
        'RSN_COD', 'ITEM_AMOUNT', 'ITEM_SERIAL', 'CHN_IND', 
        'REASON_DESCR', 'SEQ2', 'ARCHIVE_DATE', 'ARCHIVE_TIME', 'ON_US_IND' ]

    for line in in_file:
        values = line.split()
        if 'PRINT DATE:' in line:
            dtevalue = line.split(a,1)[-1].split(b)[0]
            lines.append(dtevalue)

        elif 'PRINT TIME:' in line:
            timevalue = line.split(c,1)[-1].split(b)[0]
            lines.append(timevalue)   

        elif (len(values) >= 4 and values[3] == 'C'
            and len(values[2]) >= 2 and values[2][:2] == '41'):
            print(values)

        elif (len(values) >= 4 and values[3] == 'D'
            and values[4] in rtnbr):
            on_us = '1'
        else:
            on_us = '0'

print (lines[0])
print (lines[1])

I have originally tried the csv module but the parsed rows are written in 12 columns and I could not find a way to write the date and time (parsed separately) in the columns after each row I was also looking at the pandas package but have only seen ways to extract patterns, which wouldn't work with the established parsed criteria

Is there a way to write to csv using the above criteria? Or do I have to scrap it and rewrite the code within a specific package? Any help is appreciated

EDIT: Text file sample:

    * START ******************************************************************************************************************** START *
 * START ******************************************************************************************************************** START *
 * START ******************************************************************************************************************** START *
1--------------------
1ANTECR09                                                 CHEK                                                 DPCK_R_009
                                                     TRANSIT EXTRACT SUB-SYSTEM
    CURRENT DATE = 08/03/2017                             JOURNAL     REPORT                                              PAGE    1
    PROCESS DATE =
 ID = 022000046-MNT                                                                      
    FILE HEADER = H080320171115                                      
+____________________________________________________________________________________________________________________________________
     R               T      SEQUENCE    CR      BT                A RSN               ITEM           ITEM CHN          USER    REASO
        NBR       NBR       OR PIC NBR  DB      NBR              NBR COD             AMOUNT         SERIAL IND  .......FIELD..  DESCR
      5,556        01        7450282689 C 538196640        9835177743 15          $9,064.81              00                    CREDIT
      5,557        01        7450282690 D 031301422         362313705 38            $592.35           43431                    DR CR
      5,558        01        7450282691 D 021309379         601298839 38          $1,491.04           44896                    DR CR
      5,559        01        7450282692 D 071108834            176885 38          $6,688.00            1454                    DR CR
      5,560        01        7450282693 D 031309123     1390001566241 38            $293.42            6878                    DR CR

 --------------------
     34,615       207        4100223726 C 538196620        9866597322 10            $645.49              00                    CREDIT
     34,616       207        4100223727 D 022000046        8891636675 31            $645.49          111583                    DR ON-
 --------------------
     34,617       208        4100223728 C 538196620          11701364 10            $756.19              00                    CREDIT
     34,618       208        4100223729 D 071923828                00 54            $305.31        11384597                    BAD AC
     34,619       208        4100223730 D 071923828          35110011 30            $450.88        10913052 6                  DR SEL
 --------------------

Desired output: looking at only lines containing seq starting with 42, contains C

1293    83834   4100225908  C   538196620   9860890913  10  161.5   0       CREDIT  41  3-Aug-17    11:15:51
1294    83838   4100225911  C   538196620   25715845    10  138 0       CREDIT  41  3-Aug-17    11:15:51
3
  • Can you show an example line of the text file? Commented Aug 17, 2017 at 23:28
  • Yeah, include an example line of input and an example line of desired output Commented Aug 17, 2017 at 23:36
  • my bad, added both Commented Aug 17, 2017 at 23:40

1 Answer 1

1

Look at the ‘pandas‘ package, more specifically the class DataFrame. With a little cleverness you ought to be able to read your table using ‘pandas.read_table()‘ which returns a dataframe that you can output to csv with ‘to_csv()‘ effectively a 2 line solution. You’ll need to look at the docs to find the parameters you’ll need to properly read your table format, but should be a little easier than doing it manually.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.