this is a follow up question to the one yesterday. I have a dataframe created by a csv file, and I am trying to compare a current and next value. If they are the same, I do one thing, else, I do another. I am reaching an out of range issue and was hoping I could find a workaround for this.
CSV:
date fruit quantity
4/5/2014 13:34 Apples 73
4/5/2014 3:41 Cherries 85
4/6/2014 12:46 Pears 14
4/8/2014 8:59 Oranges 52
4/10/2014 2:07 Apples 152
4/10/2014 18:10 Bananas 23
4/10/2014 2:40 Strawberries 98
Expected output CSV (backup CSV):
date fruit quantity fruitid
4/5/2014 13:34 Apples 73 fruit0
4/5/2014 3:41 Cherries 85 fruit1
4/6/2014 12:46 Pears 14 fruit2
4/8/2014 8:59 Oranges 52 fruit3
4/10/2014 2:07 Apples 152 fruit0
4/10/2014 18:10 Bananas 23 fruit4
4/10/2014 2:40 Strawberries 98 fruit5
Final CSV:
date fruitid quantity
4/5/2014 13:34 fruit0 73
4/5/2014 3:41 fruit1 85
4/6/2014 12:46 fruit2 14
4/8/2014 8:59 fruit3 52
4/10/2014 2:07 fruit0 152
4/10/2014 18:10 fruit4 23
4/10/2014 2:40 fruit5 98
Code:
import pandas as pd
import numpy
df = pd.read_csv('example2.csv', header=0, dtype='unicode')
df_count = df['fruit'].value_counts()
df.sort_values(['fruit'], ascending=True, inplace=True) #sorting the column
#fruit
df.reset_index(drop=True, inplace=True)
#print(df)
x = 0 #starting my counter values or position in the column
#old_fruit = df.fruit[x]
#new_fruit = df.fruit[x+1]
df.loc[:,'NewCol'] = 0 # to create the new column
print(df)
for x in range(0, len(df)):
old_fruit = df.fruit[x] #Starting fruit
new_fruit = old_fruit[x+1] #next fruit to compare with
if old_fruit == new_fruit:
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x] = 'fruit' + str(x) #if they are the same, put
#fruit[x] or fruit0 in the current row
else:
print("Not the Same")
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x+1] = 'fruit' +str(x+1) #if they are the same,
#put fruit[x+1] or fruit1 in the current row
print(df)