I have a dataframe (df) with distance traveled and I have assigned a label based on certain conditions.
distance=[0,0.0001,0.20,1.23,4.0]
df = pd.DataFrame(distance,columns=["distance"])
df['label']=0
for i in range(0, len(df['distance'])):
if (df['distance'].values[i])<=0.10:
df['label'][i]=1
elif (df['distance'].values[i])<=0.50:
df['label'][i]=2
elif (df['distance'].values[i])>0.50:
df['label'][i]=3
This is working fine. However, I have more than 1 million records with distance and this for loop is taking longer time than expected. Can we optimize this code to reduce the execution time?
elif...became just anelse:0.10 < df['distance'].values[i])<=0.50? I'd probably create a new dataframe column for each condition and then merge them, slicing then broadcasting should be quicker than loopingdf['label'][i] = 1not create an error, if you setdf['label']to0? And: don't know if you use python2 or python3 - but foor python2 replacerangewithxrange