I'm dealing with a pandas dataframe with 5 columns of data. I need to add filters on each columns to perform certain calculations.
for mfilter in raw_df['Column1'].unique():
m_filter=raw_df[raw_df['Column1']==mfilter]
for rfilter in m_filter['Column2'].unique():
r_filter=m_filter[m_filter['Column2']==rfilter]
for cfilter in r_filter['Column3'].unique():
c_filter=r_filter[r_filter['Column3']==cfilter]
for cafilter in c_filter['Column4'].unique():
ca_filter=c_filter[c_filter['Column4']==ca_filter]
for part in ca_filter['part_no'].unique():
part_df=ca_filter[(category_filter['part_no']==part)]
I have an other column 'Values' on which I will be performing some calculations after entering the 'part' for loop.
Due to very large data, this is taking around 7-8hrs ( around 1 second for each part) of time for complete execution. Is there any better way to reduce the time taken and improve the time efficiency?
Here's some sample data:
Column1 Column2 Column3 part_no Values
A J X 1 1
A K Y 2 2
B K X 3 3
C L Y 4 4
C L X 5 5
D J X 6 6
D J X 6 7
D J X 6 8
C L Y 4 9
C L Y 4 10
C L Y 4 11
In the dataset if we observe, Values column has certain values for each part( in each category). On obtaining each part data, I have to perform certain calculations with the help of the values of that part_data. I will be pushing this part_df to another function where rest of the task takes place.
groupby(['Column1','Column2','Column3', 'Column4', 'part_no']).mfilterandm_filterwhich mean different things is an absolute nightmare, and (2) you seem to overwritepart_dfin every iteration, so why not just skip to the last iteration and generate the final part_df right away? I'm sure that's not what you want, but that's what your posted code seems to do.