I am exporting hdfs query output into a csv file using INSERT OVERWRITE LOCAL DIRECTORY command. Since this export the data without header. I got another dataframe from Oracle output with file header which I need to compare against hdfs output.
df1 = pd.read_csv('/home/User/hdfs_result.csv', header = None)
print(df1)
0 1 2
0 XPRN A 2019-12-16 00:00:00
1 XPRW I 2019-12-16 00:00:00
2 XPS2 I 2003-09-30 00:00:00
df = pd.read_sql(sqlquery, sqlconn)
UNIT STATUS Date
0 XPRN A 2019-12-16 00:00:00
1 XPRW A 2019-12-16 00:00:00
2 XPS2 I 2003-09-30 00:00:00
Since df1 is having no header i cant use Merge or Join to compare data. Though I can do df-df1.
Please suggest how can i compare and print the difference?