I have a python program that takes a long time to run, most likely because I am using loops, and I am hoping that I can get some help using Pandas or Numpy in a section to speed it up. It seems like the first FOR loop could be optimized a little with pandas or numpy. That said, I am not all that familiar with the intricacies of pandas or numpy to achieve what this loop does. Any help is appreciated and please let me know if there are any questions, thank you!
df = data below
df2 = pandas.DataFrame()
for i in df.index:
if df.V[i]>1:
for f in range(0,df.V[i]):
df2 = df2.append(df.loc[i],ignore_index=True)
elif df.V[i]==1:
df2 = df2.append(df.loc[i],ignore_index=True)
df2.V = 1
df2['Grouper']=""
bv=10
y=bv
x=len(df2)
for d in range(0,x,y):
z = d+y
df2['Grouper'][d:z]=d
df3 = df2.groupby('Grouper').agg({'Date_Time':'first','L1':'last','H':'max','L2':'min','O':'first'})
df3 = df3.reset_index(drop=True)
df3 = df3[['Date_Time','O','H','L1','L2']]
This is a sample of the data I am using with this program(df):
Date_Time O H L1 L2 V
0 2016-10-13 17:00:00 50.39 50.39 50.39 50.39 1
1 2016-10-13 17:00:02 50.39 50.39 50.39 50.39 27
2 2016-10-13 17:00:04 50.38 50.38 50.38 50.38 1
3 2016-10-13 17:00:09 50.38 50.38 50.38 50.38 1
4 2016-10-13 17:00:10 50.38 50.38 50.38 50.38 6
5 2016-10-13 17:00:14 50.38 50.38 50.38 50.38 19
6 2016-10-13 17:00:15 50.38 50.38 50.38 50.38 3
7 2016-10-13 17:00:20 50.37 50.38 50.37 50.38 5
8 2016-10-13 17:00:21 50.38 50.38 50.38 50.38 2
9 2016-10-13 17:00:22 50.38 50.38 50.37 50.37 3