1

My dataset has no header, so no column name. The dataset begins with the information from the first line. I'd like to add column names.

Edit add DataSet:

30/10/2016 17:18:51 [13] 10-Full: L 1490; A 31; F 31; S 31; DL 0; SL 0; DT 5678
30/10/2016 17:18:51 [13] 00-Always: Returning 31 matches
30/10/2016 17:18:51 [13] 30-Normal: Query complete
30/10/2016 17:18:51 [13] 30-Normal: Request completed in 120 ms.
30/10/2016 17:19:12 [15] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:12 [15] 00-Always: action=Query&Text=(("XXXXXX":*/DOCUMENT/DRECONTENT/ObjectInfo/type+OR+"XXXXXX":*/DOCUMENT/.....
30/10/2016 17:19:12 [15] 10-Full: L 2; A 1; F 1; S 0; DL 0; SL 0; DT 5373
30/10/2016 17:19:12 [15] 00-Always: Returning 0 matches
30/10/2016 17:19:12 [15] 30-Normal: Query complete
30/10/2016 17:19:12 [15] 30-Normal: Request completed in 93 ms.
30/10/2016 17:19:20 [17] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:20 [17] 00-Always: action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX/type+AND+XXXXXX.......
30/10/2016 17:19:51 [19] 10-Full: L 255; A 0; F 0; S 0; DL 0; SL 0; DT 5021
30/10/2016 17:19:51 [19] 00-Always: Returning 0 matches
30/10/2016 17:19:51 [19] 30-Normal: Query complete
30/10/2016 17:19:51 [19] 30-Normal: Request completed in 29 ms.
30/10/2016 17:20:44 [27] 00-Always: Request from 120.0.0.1
30/10/2016 17:20:44 [27] 00-Always: action=Query&Tex(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+(
30/10/2016 17:20:44 [27] 10-Full: L 13; A 0; F 0; S 0; DL 0; SL 0; DT 5235
30/10/2016 17:20:44 [27] 00-Always: Returning 0 matches
30/10/2016 17:20:44 [27] 30-Normal: Query complete
30/10/2016 17:20:44 [27] 30-Normal: Request completed in 27 ms.
30/10/2016 17:21:09 [25] 00-Always: Request from 120.0.0.1
30/10/2016 17:21:09 [25] 00-Always: action=Query&Text=XXXXXX:*/DOCUMENT/DRECONTENT/ObjectIn

My Code:

for df in pd.read_csv('data.csv', sep='\s',  header=None, chunksize=6):
df.reset_index(drop=True, inplace=True)
df.fillna('', inplace=True)
d = pd.DataFrame([df.loc[3,0], df.loc[3,1], ' '.join(df.loc[3,4:8]), ' '.join(df.loc[4,4:6]), ' '.join(df.loc[5,4:])])
d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

Output from "My Code":

30/10/2016;17:19:12;Request completed in 93 ms.;Request from 120.0.0.1;action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX....
30/10/2016;17:18:51;Request completed in 120 ms.;Request from 120.0.0.1;action=Query&Text=(("EOM.CompoundStory":*/DOCUMENT/DRECONTE....
30/10/2016;17:19:51;Request completed in 29 ms.;Request from 120.0.0.1;action=Query&Text=(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+((.....
30/10/2016;17:20:44;Request completed in 27 ms.;Request from 120.0.0.1;action=Query&Text=XXXXX:*/DOCUMENT/DRECONT....

Now I want to add in the first row a header like 1;2;3;4;5

My approach:

d.T.to_csv('out2.csv', index=False, header=['1', '2', '3', '4', '5'], mode='a', sep=';')

My Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=   

My excepted Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=   
7
  • Can you post a sample of the file data.csv? Commented Dec 9, 2016 at 9:49
  • What does the dataframe d look like? Commented Dec 9, 2016 at 9:56
  • 1
    Dataset has been added Commented Dec 9, 2016 at 9:58
  • 1
    its now has been added :) Commented Dec 9, 2016 at 10:05
  • 1
    At the point operator at d, unfortunately no d.columns proposal is displayed .. Commented Dec 9, 2016 at 10:16

1 Answer 1

2

You can try create empty df with header, write to out.log and then append data with no header:

cols = ['1', '2', '3', '4', '5']
pd.DataFrame(columns=cols).to_csv('out.log', index=False, sep=';')

for df in pd.read_csv('data.csv', sep='\s+',  header=None, chunksize=6):
    df.reset_index(drop=True, inplace=True)
    df.fillna('', inplace=True)
    d = pd.DataFrame([df.loc[3,0], 
                      df.loc[3,1], 
                      ' '.join(df.loc[3,4:8]), 
                      ' '.join(df.loc[4,4:6]), 
                      ' '.join(df.loc[5,4:])])
    d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')
Sign up to request clarification or add additional context in comments.

4 Comments

in the second line is this now: 0.0;0.0;0.0;0.0;0.0 and then comes the data
The first version of you had worked, with these 0.0 errors. Unfortunately, I have the following error:File "pandas\parser.pyx", line 846, in pandas.parser.TextReader.read (pandas\parser.c:10364) File "pandas\parser.pyx", line 880, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10845) pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11257) pandas.io.common.CParserError: Error tokenizing data. C error: Expected 7 fields in line 759, saw 18
It is really interesting. Can you send me youd file data.csv' to my email in my profile if not confidental data? what is separator in read_csv? \s or s\+ ?
its now working, i had a type error. its create a new .csv without 0.0! thanks for your offer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.