Python :Nested for loops fail on the second loop

Question

I am trying to divide a large data set into smaller parts for an analysis. I been using a for-loop to divide the data set before implementing the decision trees. Please see a small version of the data set below:

ANZSCO4_CODE          Skill_name              Cluster         date
  1110                  computer                 S              1
  1110                  communication            C              1
  1110                  SAS                      S              2
  1312                  IT support               S              1
  1312                  SAS                      C              2
  1312                  IT support               S              1
  1312                  SAS                      C              1

First step I create an empty dictionary:

d = {}

and the lists:

 list = [1110, 1322, 2111]
 s_type = ['S','C']

Then run the following loop:

for i in list:
    d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )

The result is a dictionary with 2 data sets inside.

As a next step I would like to subdivide the data sets into S and C. I run the following code:

for i in list:
    d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )

    for b in s_type:
         d[i]=  d[i][d[i]['SKILL_CLUSTER_TYPE']==b]

As a final result I would expect to have 4 separate data sets, being: 1110 x S, 1110 x C , 1312 x S and 1312 and C.

However when I implement the second code I get only 2 data sets inside the dictionary and they are empty.

can you please show me, what is in the list variable?

user2906838
– user2906838

2018-07-25 04:55:35 +00:00
Commented Jul 25, 2018 at 4:55 — user2906838
– user2906838, Commented Jul 25, 2018 at 4:55
@user2906838 , sorry I missed that. It is edit now

Ian_De_Oliveira
– Ian_De_Oliveira

2018-07-25 05:06:32 +00:00
Commented Jul 25, 2018 at 5:06 — Ian_De_Oliveira
– Ian_De_Oliveira, Commented Jul 25, 2018 at 5:06

Ashish Acharya · Accepted Answer · 2018-07-25 05:06:28Z

2

Maybe something like this works:

from collections import defaultdict

d = defaultdict(pd.DataFrame)

# don't name your list "list"
anzco_list = [1110, 1312]
s_type = ['S','C']

for i in anzco_list:
    for b in s_type:
        d[i][b] = df1[(df1['ANZSCO4_CODE'] == i) & (df1['SKILL_CLUSTER_TYPE'] == b)]

Then you can access your DataFrames like this:

d[1112]['S']

answered Jul 25, 2018 at 5:06

Ashish Acharya

3,4091 gold badge19 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ian_De_Oliveira Over a year ago

thanks for your support. I'm getting the following error :ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

Ashish Acharya Over a year ago

You can use jezrael's answer below. That seems like a better way to do it.

jezrael · Accepted Answer · 2018-07-25 05:45:37Z

1

I think there was empty DataFrames, because in data was not values from list called L (Dont use variable name list, because python reserved word).

from  itertools import product

L = [1110, 1312, 2111]
s_type = ['S','C']

Then create all combinations all lists:

comb = list(product(L, s_type))
print (comb)
[(1110, 'S'), (1110, 'C'), (1312, 'S'), (1312, 'C'), (2111, 'S'), (2111, 'C')]

And last create dictionary of DataFrames:

d = {}
for i, j in comb:
    d['{}x{}'.format(i, j)] = df1[(df1['ANZSCO4_CODE'] == i) & (df1['Cluster'] == j)]

Or use dictionary comprehension:

d = {'{}x{}'.format(i, j): df1[(df1['ANZSCO4_CODE'] == i) & (df1['Cluster'] == j)] 
      for i, j in comb}

print (d['1110xS'])
   ANZSCO4_CODE Skill_name Cluster
0          1110   computer       S
2          1110        SAS       S

EDIT:

If need all combinations of possible data by columns use groupby:

d = {'{}x{}x{}'.format(i,j,k): df2 
      for (i,j, k), df2 in df1.groupby(['ANZSCO4_CODE','Cluster','date'])}
print (d)
{'1110xCx1':    ANZSCO4_CODE     Skill_name Cluster  date
1          1110  communication       C     1, '1110xSx1':    ANZSCO4_CODE Skill_name Cluster  date
0          1110   computer       S     1, '1110xSx2':    ANZSCO4_CODE Skill_name Cluster  date
2          1110        SAS       S     2, '1312xCx1':    ANZSCO4_CODE Skill_name Cluster  date
6          1312        SAS       C     1, '1312xCx2':    ANZSCO4_CODE Skill_name Cluster  date
4          1312        SAS       C     2, '1312xSx1':    ANZSCO4_CODE  Skill_name Cluster  date
3          1312  IT support       S     1
5          1312  IT support       S     1}

print (d.keys())
dict_keys(['1110xCx1', '1110xSx1', '1110xSx2', '1312xCx1', '1312xCx2', '1312xSx1'])

Another different approach is if need processes each group is use GroupBy.apply:

def func(x):
    print (x)
    #some code for process each group
    return x

   ANZSCO4_CODE     Skill_name Cluster  date
1          1110  communication       C     1
   ANZSCO4_CODE     Skill_name Cluster  date
1          1110  communication       C     1
   ANZSCO4_CODE Skill_name Cluster  date
0          1110   computer       S     1
   ANZSCO4_CODE Skill_name Cluster  date
2          1110        SAS       S     2
   ANZSCO4_CODE Skill_name Cluster  date
6          1312        SAS       C     1
   ANZSCO4_CODE Skill_name Cluster  date
4          1312        SAS       C     2
   ANZSCO4_CODE  Skill_name Cluster  date
3          1312  IT support       S     1
5          1312  IT support       S     1

df2 = df1.groupby(['ANZSCO4_CODE','Cluster','date']).apply(func)
print (df2)

edited Jul 25, 2018 at 5:45

answered Jul 25, 2018 at 5:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

6 Comments

Ian_De_Oliveira Over a year ago

Hi, I'm getting the following error : TypeError: 'list' object is not callable when calling comb = list(product(L, s_type))

Ashish Acharya Over a year ago

Trying naming your list something other than list.

jezrael Over a year ago

@Ian_De_Oliveira - Problem is before is used variable list, solution is restart your IDE or use list = builtins.list

jezrael Over a year ago

@Ian_De_Oliveira - And exactly this is reason why is necessary dont use variable list ;)

Ian_De_Oliveira Over a year ago

@jezrael and @ Ashish Acharya , thanks for both responses, also thanks for advising me to do not use list..@jezrael in a hypothetical scenario if I add a date variable would I be able to do combinations with 3 constraints?

|

Collectives™ on Stack Overflow

Python :Nested for loops fail on the second loop

2 Answers 2

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related