I have the sample data as below:
# df
VAR1 SEQ VAR2 VAR3 DATE VAR4 VAR5 VAR6 VAR7
AAA 1 YYY 01 20000630 AL 11111 ABCD PA
BBB 1 YYY 01 20100701 GA 12345 EDED NY
BBB 2 YYY 01 20150815 GA 12345 NY
BBB 3 YYY 01 19950105 GA 12345 YTRU NY
BBB 4 YYY 01 20000701 GA 12345 IIII NY
BBB 5 YYY 01 20210701 GA 12345 NY
CCC 1 NNN 01 20210630 CA 33333 SSSS NJ
CCC 2 NNN 01 20210629 CA 33333 NJ
In SAS, we can export fixed width format file as below:
BLANK_VAR1 = " "
%MACRO FRIST;
PUT @ 1 "00FIRST"
@ 8 VAR1 $CHAR5.
@ 13 BLANK_VAR1 $CHAR2.
@ 15 VAR2 $CHAR3.
;
%MEND FRIST;
%MACRO SECOND;
PUT @ 1 "00SECOND"
@ 9 VAR3 $CHAR2.
@ 11 BLANK_VAR1 $CHAR2.
@ 13 VAR4 $CHAR2.
@ 15 VAR5 $CHAR5.
;
%MEND SECOND;
%MACRO THIRD(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "THIRD" $CHAR5.
@ 8 DATE $CHAR8.
;
%MEND THIRD;
%MACRO FOURTH(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "FOURTH" $CHAR5.
@ 9 VAR6 $CHAR25.
@ 34 BLANK_VAR1 $CHAR2.
@ 36 VAR7 $CHAR2.
;
%MEND FOURTH;
filename outtmp "/home/folder/outfile_tmp";
DATA _NULL_;
SET df;
BY VAR1 SEQ;
FILE outtmp;
IF FIRST.VAR1 THEN DO;
%FRIST;
%SECOND;
REC_CNT = 0;
END;
REC_CNT + 1;
IF REC_CNT LE 3 THEN DO;
%THIRD(REC_CNT);
IF VAR6 NE ' ' THEN DO;
%FOURTH(COUNTN);
END;
END;
RUN;
filename output "/home/folder/output";
%MACRO INREC;
PUT 001 RECIN $CHAR150.;
%MEND INREC;
%MACRO FILE_FIRST;
DATE = TODAY();
PUT @ 1 "###FIRSTLINE###"
@ 16 DATE JULIAN5.
@ 21 BLANK_VAR1 $CHAR2.
@ 23 "###FIRSTLINEEND###"
;
%MEND FILE_FIRST;
%MACRO FILE_LAST;
DATE = TODAY();
PUT @ 1 "###LASTLINE###"
@ 15 DATE JULIAN5.
@ 20 BLANK_VAR1 $CHAR2.
@ 22 "###LASTLINEEND###"
;
%MEND FILE_LAST;
DATA output;
INFILE outtmp truncover;
INPUT
@ 001 RECIN $CHAR150.;
RUN;
DATA _NULL_;
SET output end=last;
file output lrecl=256 ;
IF _N_ = 1 THEN DO;
%FILE_FIRST;
END;
%INREC;
IF last THEN DO;
%FILE_LAST;
END;
RUN;
###FIRSTLINE###21182 ###LASTLINEEND###
00FIRSTAAA YYY
00SECOND01 AL11111
01THIRD20000630
01FOURTHABCD PA
00FIRSTBBB YYY
00SECOND01 GA12345
01THIRD20100701
01FOURTHEDED NY
02THIRD20150815
03THIRD19950105
03FOURTHYTRU NY
00FIRSTCCC NNN
00SECOND01 CA33333
01THIRD20210630
01FOURTHSSSS NJ
###LASTLINE###21182 ###LASTLINEEND###
The logic for above program is:
- There are four parts that needed to be output.
- If there are multiple same
VAR1, then only outputFIRSTandSECONDonce. - Output
THIRDpart forSEQis less than 3. If theSEQis larger than 3, do not output. Ignore. - Output the
FOURTHpart following the third logic and also ifVAR6is not missing. - Note: In
THIRDandFOURTHpart, the first two string should change from01to03depends on the records.
How can I replicate this format in Python?
I found that np.savetxt() with fmt argument might be a way link; however, the file should be the same order as original dataframe.
pandas has function read_fwf() to read fixed width format file; however, no to_fwf() function to export.
I have been stuck for several days, so any idea should be helpful!
