Python output fixed width format text file with special lines as SAS do

Question

I have the sample data as below:

# df
VAR1   SEQ    VAR2    VAR3       DATE    VAR4     VAR5    VAR6    VAR7
AAA      1     YYY      01   20000630      AL    11111    ABCD      PA
BBB      1     YYY      01   20100701      GA    12345    EDED      NY
BBB      2     YYY      01   20150815      GA    12345              NY
BBB      3     YYY      01   19950105      GA    12345    YTRU      NY
BBB      4     YYY      01   20000701      GA    12345    IIII      NY
BBB      5     YYY      01   20210701      GA    12345              NY
CCC      1     NNN      01   20210630      CA    33333    SSSS      NJ
CCC      2     NNN      01   20210629      CA    33333              NJ

In SAS, we can export fixed width format file as below:

BLANK_VAR1 = " "

%MACRO FRIST;
    PUT @  1  "00FIRST"
        @  8  VAR1       $CHAR5.
        @ 13  BLANK_VAR1 $CHAR2.
        @ 15  VAR2       $CHAR3.
    ;
%MEND FRIST;

%MACRO SECOND;
    PUT @  1  "00SECOND"
        @  9  VAR3       $CHAR2.
        @ 11  BLANK_VAR1 $CHAR2.
        @ 13  VAR4       $CHAR2.
        @ 15  VAR5       $CHAR5.
    ;
%MEND SECOND;

%MACRO THIRD(sequence);
    num = &sequence.;
    PUT @  1  num        Z2.0
        @  3  "THIRD"    $CHAR5.
        @  8  DATE       $CHAR8.
    ;
%MEND THIRD;

%MACRO FOURTH(sequence);
    num = &sequence.;
    PUT @  1  num        Z2.0
        @  3  "FOURTH"   $CHAR5.
        @  9  VAR6       $CHAR25.
        @ 34  BLANK_VAR1 $CHAR2.
        @ 36  VAR7       $CHAR2.
    ;
%MEND FOURTH;

filename outtmp "/home/folder/outfile_tmp";  

DATA _NULL_;                                                                    
   SET df;    
 
   BY VAR1 SEQ;                                                           
   FILE outtmp;  
                                                                
   IF FIRST.VAR1 THEN DO;
      %FRIST;
      %SECOND;
      REC_CNT = 0;                                                             
   END;                                                                       
   REC_CNT + 1;
   IF REC_CNT LE 3 THEN DO;
      %THIRD(REC_CNT);
      IF VAR6 NE ' ' THEN DO;
         %FOURTH(COUNTN);
      END;
   END;                                                         
RUN;


filename output "/home/folder/output"; 


%MACRO INREC;                                                                     
   PUT 001 RECIN $CHAR150.;
%MEND INREC;


%MACRO FILE_FIRST;
    DATE = TODAY();
    PUT @  1  "###FIRSTLINE###"
        @ 16  DATE       JULIAN5.
        @ 21  BLANK_VAR1 $CHAR2.
        @ 23  "###FIRSTLINEEND###"
    ;
%MEND FILE_FIRST;


%MACRO FILE_LAST;
    DATE = TODAY();
    PUT @  1  "###LASTLINE###"
        @ 15  DATE       JULIAN5.
        @ 20  BLANK_VAR1 $CHAR2.
        @ 22  "###LASTLINEEND###"
    ;
%MEND FILE_LAST;


DATA output;                                                                  
   INFILE outtmp truncover;                                                               
   INPUT                                                                       
      @ 001 RECIN $CHAR150.;                                                                         
RUN; 

DATA _NULL_;
   SET output  end=last;  
   file output  lrecl=256 ;
   IF _N_ = 1 THEN DO;
      %FILE_FIRST;
   END;

   %INREC;  
                                                                  
   IF last THEN DO;
      %FILE_LAST; 
   END;
RUN;

This is the output:

###FIRSTLINE###21182  ###LASTLINEEND###
00FIRSTAAA  YYY
00SECOND01  AL11111
01THIRD20000630
01FOURTHABCD                       PA
00FIRSTBBB  YYY
00SECOND01  GA12345
01THIRD20100701
01FOURTHEDED                       NY
02THIRD20150815
03THIRD19950105
03FOURTHYTRU                       NY
00FIRSTCCC  NNN
00SECOND01  CA33333
01THIRD20210630
01FOURTHSSSS                       NJ
###LASTLINE###21182  ###LASTLINEEND###

The logic for above program is:

There are four parts that needed to be output.
If there are multiple same VAR1, then only output FIRST and SECOND once.
Output THIRD part for SEQ is less than 3. If the SEQ is larger than 3, do not output. Ignore.
Output the FOURTH part following the third logic and also if VAR6 is not missing.
Note: In THIRD and FOURTH part, the first two string should change from 01 to 03 depends on the records.

How can I replicate this format in Python?
I found that np.savetxt() with fmt argument might be a way link; however, the file should be the same order as original dataframe.

pandas has function read_fwf() to read fixed width format file; however, no to_fwf() function to export.

I have been stuck for several days, so any idea should be helpful!

I imagine you'd just make the full line one string, then write that? — Joe
– Joe, Commented Jul 1, 2021 at 23:08
@Joe First and Last line might be done by this but the middle part is difficult to do so if you have a large dataset. — Peter Chen
– Peter Chen, Commented Jul 1, 2021 at 23:43
To be clear, you just want your data exported fixed width or something else? I can't tell from the output you posted? — JonSG
– JonSG, Commented Jul 1, 2021 at 23:50
@PeterChen I mean, it's difficult in SAS too, but ... it's basically what you are doing even in SAS - you are converting the records to strings (in a text file) and then reading them back in and writing them out again. (It's probably much easier to do this in SAS than you're doing it, honest, but leaving that aside...) The bigger problem here honestly is just stating the rules you're following - it's not really clear what the rules are, beyond the program instructions of course. You write First Second once, Third three times, and Fourth if it has a value in the first three rows? — Joe
– Joe, Commented Jul 2, 2021 at 0:21
@JonSG Fixed with format is what he's doing in SAS, at least. But it's more like it's being split across a bunch of rows, something that's sort of like the old "card column" format but not exactly that either... I presume it must be something for a bank system, as nothing else would be so odd. — Joe
– Joe, Commented Jul 2, 2021 at 0:22

Joe · Accepted Answer · 2021-07-02 03:31:00Z

1

This isn't exactly a good way to do it, but maybe it gives you an idea how to do the logic. I'm just writing to a list, you can then write the list out - but probably you should do it the way JonSG did in his (deleted) answer where you use a file writer instead. There's probably a better approach using a data class, but that's not my expertise.

import pandas as pd

df = pd.read_csv(r"h:\temp\df_text.csv")

outlist = []

for index,row in df.iterrows():
    if(row['SEQ']==1):
        tempstr = '00FIRST'+row.VAR1+'  '+row.VAR2
        outlist.append(tempstr)
        tempstr = '00SECOND'+str(row.VAR3)+'  '+str(row.VAR4)+str(row.VAR5)
        outlist.append(tempstr)
    if(row['SEQ'] <= 3):
        seqval ='0'+str(row.SEQ) if  row.SEQ < 10 else str(row.SEQ)
        tempstr = str(row['SEQ'])+'THIRD'+str(row.DATE)
        outlist.append(tempstr)
        if (row.VAR6 != '  '):
            tempstr = str(row['SEQ'])+'FOURTH'+row.VAR6+'  '+row.VAR7
            outlist.append(tempstr)

answered Jul 2, 2021 at 3:31

Joe

63.5k7 gold badges51 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Peter Chen Over a year ago

Your answer fulfill the logic but the only thing is each output width/length (it seems that you use space directly to do so, cool!).. I'm also finding his delete answer but it seems that there is no record to track for now

Peter Chen Over a year ago

is it possible to set up tuple in the list and use that for FIRST, SECOND, THIRD, and FOURTH output?

Joe Over a year ago

The way I did it, no, that wouldn't help - not that I can see anyway? I think the "best" way to do this is with a dataclass that has the output built into it as a method, honestly, but the above either with a list or with a file writer should accomplish what you want - maybe with a little more care about formatting, as Jon shows, using the {: <4} type formatters.

Peter Chen Over a year ago

It seems that we cannot check the deleted answer that @JonSG posted...

Collectives™ on Stack Overflow

Python output fixed width format text file with special lines as SAS do

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related