I have the following problem I need help with. I have 310 records in a csv file that contains some information about bugs. In another csv file I have 800 thousand records containing statistics about the bags (events that possibly led to the bugs).
With the script below, I am trying to
- Loop through the bugs and select one.
- loop through the statistics records and check some conditions
- If there is a match, add a column from the bugs records to the statistics records.
- Save the new file
My question is if I could archieve this in a more efficient way using numpy or anything else. The current method is taking forever to run because of the size of the statistics
Any help or tips in the right direction will be appreciated. thanx in adavance
dataset = pd.read_csv('310_records.csv')
dataset1 = pd.read_csv('800K_records.csv')
cols_error = dataset.iloc[:, [0, 1, 2, 3, 4, 5, 6]]
cols_stats = dataset1.iloc[:, [1, 2, 3, 4, 5, 6, 7, 8, 9]]
cols_stats['Fault'] = ''
cols_stats['Created'] = ''
for i, error in cols_error.iterrows():
fault_created = error [0]
fault_ucs = error [1]
fault_dn = error [2]
fault_epoch_end = error [3]
fault_epoch_begin = error [4]
fault_code = error [6]
for index, stats in cols_stats.iterrows():
stats_epoch = stats[0]
stats_ucs = stats[5]
stats_dn = stats[7]
print("error:", i, " Stats:", index)
if(stats_epoch >= fault_epoch_begin and stats_epoch <= fault_epoch_end):
if(stats_dn == fault_dn):
if(stats_ucs == fault_ucs):
cols_stats.iloc[index, 9] = fault_code
cols_stats.iloc[index, 10] = fault_created
else:
cols_stats.iloc[index, 9] = 0
cols_stats.iloc[index, 10] = fault_created
cols_stats.to_csv('datasets/dim_stats_error.csv', sep=',', encoding='utf-8')
printstatement.