Handling multiple for loops in python pandas

Question

I'm dealing with a pandas dataframe with 5 columns of data. I need to add filters on each columns to perform certain calculations.

for mfilter in raw_df['Column1'].unique():

        m_filter=raw_df[raw_df['Column1']==mfilter]
        for rfilter in m_filter['Column2'].unique():

            r_filter=m_filter[m_filter['Column2']==rfilter]
            for cfilter in r_filter['Column3'].unique():

                c_filter=r_filter[r_filter['Column3']==cfilter]
                for cafilter in c_filter['Column4'].unique():

                    ca_filter=c_filter[c_filter['Column4']==ca_filter]

                    for part in ca_filter['part_no'].unique():


                        part_df=ca_filter[(category_filter['part_no']==part)]

I have an other column 'Values' on which I will be performing some calculations after entering the 'part' for loop.

Due to very large data, this is taking around 7-8hrs ( around 1 second for each part) of time for complete execution. Is there any better way to reduce the time taken and improve the time efficiency?

Here's some sample data:

Column1 Column2 Column3 part_no Values
A   J   X   1   1
A   K   Y   2   2
B   K   X   3   3
C   L   Y   4   4
C   L   X   5   5
D   J   X   6   6
D   J   X   6   7
D   J   X   6   8
C   L   Y   4   9
C   L   Y   4   10
C   L   Y   4   11

In the dataset if we observe, Values column has certain values for each part( in each category). On obtaining each part data, I have to perform certain calculations with the help of the values of that part_data. I will be pushing this part_df to another function where rest of the task takes place.

Can you provide a small amount of sample data (e.g. as CSV or actual Pandas code that builds an example DataFrame), describe in English what you're trying to accomplish as your end goal, and show the end results for the sample data? — John Zwinck
– John Zwinck, Commented Jun 6, 2020 at 6:06
Looks like a groupby(['Column1','Column2','Column3', 'Column4', 'part_no']). — Quang Hoang
– Quang Hoang, Commented Jun 6, 2020 at 6:09
Two notes: (1) having variables called mfilter and m_filter which mean different things is an absolute nightmare, and (2) you seem to overwrite part_df in every iteration, so why not just skip to the last iteration and generate the final part_df right away? I'm sure that's not what you want, but that's what your posted code seems to do. — John Zwinck
– John Zwinck, Commented Jun 6, 2020 at 6:10
@JohnZwinck I have updated the post with more insights. Please have a look at it. — Akhilesh Pothuri
– Akhilesh Pothuri, Commented Jun 6, 2020 at 6:18

John Zwinck · Accepted Answer · 2020-06-06 06:43:47Z

1

You can use something like this (I didn't use Column4 because that's not present in your sample data):

df.groupby(['Column1', 'Column2', 'Column3', 'part_no']).apply(print)

It calls the function specified (print in this case) on each group having the same values for the specified columns. The output is:

  Column1 Column2 Column3  part_no  Values
0       A       J       X        1       1
  Column1 Column2 Column3  part_no  Values
1       A       K       Y        2       2
  Column1 Column2 Column3  part_no  Values
2       B       K       X        3       3
  Column1 Column2 Column3  part_no  Values
4       C       L       X        5       5
   Column1 Column2 Column3  part_no  Values
3        C       L       Y        4       4
8        C       L       Y        4       9
9        C       L       Y        4      10
10       C       L       Y        4      11
  Column1 Column2 Column3  part_no  Values
5       D       J       X        6       6
6       D       J       X        6       7
7       D       J       X        6       8

Now all you need to do is define a function containing whatever you had in your inner loop, for example:

def Pothuri(part_df):
    # whatever other code you didn't show us, using part_df['Values'] etc.

Then:

df.groupby(['Column1', 'Column2', 'Column3', 'part_no']).apply(Pothuri)

answered Jun 6, 2020 at 6:43

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

John Zwinck Over a year ago

You're welcome. If you don't mind, I'd appreciate a reply here when you figure out how much time it takes to run now. It will depend a lot on the data and how long your function actually takes to run.

Akhilesh Pothuri Over a year ago

yea sure, is it possible to pass another dataframe which I have into the same function along with the grouped_data ?

John Zwinck Over a year ago

Yes, you can do .apply(Pothuri, arg1, arg2) and it will pass arg1 and arg2 as additional arguments to your function every time. Docs here: pandas.pydata.org/pandas-docs/stable/reference/api/…

Akhilesh Pothuri Over a year ago

Initially the code took 15hrs to run, following this method saved the time by 4hrs ( currently 11hrs)

Collectives™ on Stack Overflow

Handling multiple for loops in python pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related