Add column name to a DataFrame in for loop in pandas

Question

My dataset has no header, so no column name. The dataset begins with the information from the first line. I'd like to add column names.

Edit add DataSet:

30/10/2016 17:18:51 [13] 10-Full: L 1490; A 31; F 31; S 31; DL 0; SL 0; DT 5678
30/10/2016 17:18:51 [13] 00-Always: Returning 31 matches
30/10/2016 17:18:51 [13] 30-Normal: Query complete
30/10/2016 17:18:51 [13] 30-Normal: Request completed in 120 ms.
30/10/2016 17:19:12 [15] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:12 [15] 00-Always: action=Query&Text=(("XXXXXX":*/DOCUMENT/DRECONTENT/ObjectInfo/type+OR+"XXXXXX":*/DOCUMENT/.....
30/10/2016 17:19:12 [15] 10-Full: L 2; A 1; F 1; S 0; DL 0; SL 0; DT 5373
30/10/2016 17:19:12 [15] 00-Always: Returning 0 matches
30/10/2016 17:19:12 [15] 30-Normal: Query complete
30/10/2016 17:19:12 [15] 30-Normal: Request completed in 93 ms.
30/10/2016 17:19:20 [17] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:20 [17] 00-Always: action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX/type+AND+XXXXXX.......
30/10/2016 17:19:51 [19] 10-Full: L 255; A 0; F 0; S 0; DL 0; SL 0; DT 5021
30/10/2016 17:19:51 [19] 00-Always: Returning 0 matches
30/10/2016 17:19:51 [19] 30-Normal: Query complete
30/10/2016 17:19:51 [19] 30-Normal: Request completed in 29 ms.
30/10/2016 17:20:44 [27] 00-Always: Request from 120.0.0.1
30/10/2016 17:20:44 [27] 00-Always: action=Query&Tex(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+(
30/10/2016 17:20:44 [27] 10-Full: L 13; A 0; F 0; S 0; DL 0; SL 0; DT 5235
30/10/2016 17:20:44 [27] 00-Always: Returning 0 matches
30/10/2016 17:20:44 [27] 30-Normal: Query complete
30/10/2016 17:20:44 [27] 30-Normal: Request completed in 27 ms.
30/10/2016 17:21:09 [25] 00-Always: Request from 120.0.0.1
30/10/2016 17:21:09 [25] 00-Always: action=Query&Text=XXXXXX:*/DOCUMENT/DRECONTENT/ObjectIn

My Code:

for df in pd.read_csv('data.csv', sep='\s',  header=None, chunksize=6):
df.reset_index(drop=True, inplace=True)
df.fillna('', inplace=True)
d = pd.DataFrame([df.loc[3,0], df.loc[3,1], ' '.join(df.loc[3,4:8]), ' '.join(df.loc[4,4:6]), ' '.join(df.loc[5,4:])])
d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

Output from "My Code":

30/10/2016;17:19:12;Request completed in 93 ms.;Request from 120.0.0.1;action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX....
30/10/2016;17:18:51;Request completed in 120 ms.;Request from 120.0.0.1;action=Query&Text=(("EOM.CompoundStory":*/DOCUMENT/DRECONTE....
30/10/2016;17:19:51;Request completed in 29 ms.;Request from 120.0.0.1;action=Query&Text=(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+((.....
30/10/2016;17:20:44;Request completed in 27 ms.;Request from 120.0.0.1;action=Query&Text=XXXXX:*/DOCUMENT/DRECONT....

Now I want to add in the first row a header like 1;2;3;4;5

My approach:

d.T.to_csv('out2.csv', index=False, header=['1', '2', '3', '4', '5'], mode='a', sep=';')

My Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=

My excepted Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=

At the point operator at d, unfortunately no d.columns proposal is displayed .. — madik_atma
– madik_atma, Commented Dec 9, 2016 at 10:16

jezrael · Accepted Answer · 2016-12-09 09:59:33Z

2

You can try create empty df with header, write to out.log and then append data with no header:

cols = ['1', '2', '3', '4', '5']
pd.DataFrame(columns=cols).to_csv('out.log', index=False, sep=';')

for df in pd.read_csv('data.csv', sep='\s+',  header=None, chunksize=6):
    df.reset_index(drop=True, inplace=True)
    df.fillna('', inplace=True)
    d = pd.DataFrame([df.loc[3,0], 
                      df.loc[3,1], 
                      ' '.join(df.loc[3,4:8]), 
                      ' '.join(df.loc[4,4:6]), 
                      ' '.join(df.loc[5,4:])])
    d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

edited Dec 9, 2016 at 9:59

answered Dec 9, 2016 at 9:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

madik_atma Over a year ago

in the second line is this now: 0.0;0.0;0.0;0.0;0.0 and then comes the data

madik_atma Over a year ago

The first version of you had worked, with these 0.0 errors. Unfortunately, I have the following error:File "pandas\parser.pyx", line 846, in pandas.parser.TextReader.read (pandas\parser.c:10364) File "pandas\parser.pyx", line 880, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10845) pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11257) pandas.io.common.CParserError: Error tokenizing data. C error: Expected 7 fields in line 759, saw 18

jezrael Over a year ago

It is really interesting. Can you send me youd file data.csv' to my email in my profile if not confidental data? what is separator in read_csv? \s or s\+ ?

madik_atma Over a year ago

its now working, i had a type error. its create a new .csv without 0.0! thanks for your offer :)

Collectives™ on Stack Overflow

Add column name to a DataFrame in for loop in pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related