I am trying to divide a large data set into smaller parts for an analysis. I been using a for-loop to divide the data set before implementing the decision trees. Please see a small version of the data set below:
ANZSCO4_CODE Skill_name Cluster date
1110 computer S 1
1110 communication C 1
1110 SAS S 2
1312 IT support S 1
1312 SAS C 2
1312 IT support S 1
1312 SAS C 1
First step I create an empty dictionary:
d = {}
and the lists:
list = [1110, 1322, 2111]
s_type = ['S','C']
Then run the following loop:
for i in list:
d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )
The result is a dictionary with 2 data sets inside.
As a next step I would like to subdivide the data sets into S and C. I run the following code:
for i in list:
d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )
for b in s_type:
d[i]= d[i][d[i]['SKILL_CLUSTER_TYPE']==b]
As a final result I would expect to have 4 separate data sets, being: 1110 x S, 1110 x C , 1312 x S and 1312 and C.
However when I implement the second code I get only 2 data sets inside the dictionary and they are empty.