I am trying to load log files into a dataframe using pandas. I have 2 files I try to merge into 1. What happens is that the dataframe turns out empty,which is strange because the same code with other log files of the same type.
Here is the output I get :
rows of df1 146299.000000
columns of df1 6.000000
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
It says the right amount of rows and columns, but does not give the data inside,whats happening? Here is the code and the data sample.
code :
trace_path = '/Users/ramapriyasridharan/Documents/new_exp/new_trace/m3xlarge/01'
client_path = os.path.join(trace_path,'client')
middleware_path = os.path.join(trace_path,'middleware')
df = pd.DataFrame(columns=['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time'])
#df = None
for root, _,files in os.walk(middleware_path):
for f in files:
if 'server' not in f : continue
print 'current file name %s:' %f
#df.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
f1 = os.path.join(middleware_path,f)
df1 = pd.read_csv(f1,header=None,sep=',')
df1.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
#df1 = refine(df1)
print ' rows of df1 %f' %df1.shape[0]
print 'columns of df1 %f'%df1.shape[1]
print 'len of df1 %f' %len(df1)
df1 = refine(df1)
print df1
if df.shape[0] == 0:
df = df1
print df
else:
df = pd.concat([df,df1],axis=0)
print df
print df
print ' rows of df %f' %df.shape[0]
print 'columns of df %f'%df.shape[1]
full output:
python find_service_time.py
current file name rsridhar-serverworker-1448992797827.log:
rows of df1 146299.000000
columns of df1 6.000000
len of df1 146299.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
current file name rsridhar-serverworker-1448992805710.log:
rows of df1 194827.000000
columns of df1 6.000000
len of df1 194827.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
rows of df 0.000000
columns of df 6.000000
len of refined df 0.000000
min timestamp : nan
done
Traceback (most recent call last):
File "find_service_time.py", line 170, in <module>
main()
File "find_service_time.py", line 94, in main
t_per_sec = map(lambda x: len(df[df['timestamp']==x]), range(1,int(np.max(df['timestamp']))))
ValueError: cannot convert float NaN to integer
sample data :
1448992805978,GET_QUEUE,1,2,0,2
1448992805978,SEND_MSG,18,147,1,157
1448992805978,SEND_MSG,26,153,0,159
1448992805979,SEND_MSG,20,149,1,163
1448992805979,GET_QUEUE,1,3,1,4
1448992805980,GET_QUEUE,1,3,0,3
1448992805981,GET_QUEUE,2,3,1,4
1448992805981,GET_QUEUE,1,3,1,4
1448992805982,SEND_MSG,5,129,0,133
1448992805983,GET_QUEUE,1,8,0,8
1448992805983,GET_QUEUE,3,5,1,6
1448992805983,GET_QUEUE,0,1,5,6
1448992805984,GET_QUEUE,3,5,2,7
1448992805984,GET_QUEUE,2,5,1,7
1448992805985,GET_QUEUE,0,5,3,8
1448992805985,GET_QUEUE,5,10,0,10
1448992805986,GET_QUEUE,4,9,1,10
1448992805986,GET_QUEUE,9,10,0,10
1448992805987,GET_QUEUE,0,7,3,10
1448992805987,GET_QUEUE,4,5,5,10
1448992805988,GET_QUEUE,5,6,5,11
1448992805989,GET_QUEUE,2,6,6,12
1448992805990,GET_QUEUE,1,4,7,11
1448992805990,GET_QUEUE,0,2,8,10
1448992805991,GET_QUEUE,5,10,4,14
1448992805991,GET_QUEUE,2,4,8,12
1448992805991,GET_QUEUE,0,6,7,13
1448992805992,GET_QUEUE,11,16,0,16
1448992805992,GET_QUEUE,0,4,9,13
1448992805993,GET_QUEUE,4,6,8,14
1448992805992,GET_QUEUE,8,15,0,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,0,10,6,16
1448992805993,GET_QUEUE,6,9,7,16
1448992805994,GET_QUEUE,1,6,8,14
1448992805994,GET_LATEST_MSG_DELETE,1,8,7,15
1448992805995,GET_QUEUE,2,7,9,16
1448992805995,GET_QUEUE,4,6,6,12
1448992805996,GET_QUEUE,10,20,0,20
1448992805996,GET_QUEUE,12,13,6,19
Any suggestions are welcome,thats just a patch of the code.
refine(df1)function doing?pd.concatafter initialisingdfas a dataframe with no rows and then trying to concatenate on a dataframe without rows. Rather trydf = df.append(df1). Or if that is not what you want, and you wish to join on the index, initialise df as the first log file.