This one is weird --
let's say I have a df like this:
user_id city state network
123 austin tx att
113 houston tx tmobile
343 miami fl att
356 seattle wa verizon
and I have another df1 like this (these 2 dfs wont be the same shape):
col1
'network': 'att'
'city': 'austin'
'state': 'tx'
'city': 'seattle'
I'm trying to build a final_df like this:
user_id is_network_att is_city_austin is_state_tx is_city_seattle
123 1 1 1 0
113 0 0 1 0
343 1 0 0 0
356 0 0 0 1
Easier to just show it - but a sentence to describe it:
I'm trying to create conditional/true-false columns out of df1.col1 in a new final_df that use df column's data.
Strategies I'm tying:
-throw the df1 columns in a list or dictionary and loop through each element and then somehow loop through each row and incorporate and if statement for each row
-maybe make a makeshift column in df1 of the exact code that would create the column in final_df and somehow use the text in this columnd as code
**here's a handful of the rows i'm trying to put in the dictionary
Here's a handful of rows in that I'm trying to put in a dictionary:
912 'organization': 'atlantic metro communications'
913 'isp_name': 'Atlantic Metro Communications'
915 'location_name': 'martinez ca'
917 'location_name': 'martinez ca'
918 'location_name': 'martinez ca'
919 'location_name': 'martinez ca'
920 'isp_name': 'Hurricane Electric'
922 'organization': 'hurricane electric'
923 'organization': 'hurricane electric'
924 'isp_name': 'Hurricane Electric'
925 'count_users_per_ip': 28.0
926 'organization': 'atlantic metro communications'
927 'isp_name': 'Atlantic Metro Communications'
928 'isp_name': 'Hurricane Electric'
929 'organization': 'hurricane electric'
930 'isp_name': 'Hurricane Electric'
931 'organization': 'hurricane electric'
932 'location_name': 'hermosillo son'
933 'organization': 'atlantic metro communications'
934 'isp_name': 'Atlantic Metro Communications'
935 'location_state': ' son'
966 'count_users_per_ip': 28.0
1057 'count_users_per_device': 4.0
1218 'count_ips_per_user': 3.0
1408 'moderated_action': 'SOFT_BLOCK'
1418 'moderated_action': 'SOFT_BLOCK'
1430 'moderated_action': 'SOFT_BLOCK'
1438 'moderated_action': 'SOFT_BLOCK'
1517 'app_build': '405000004'
1605 'app_build': '405000004'
Update - heres as far as Ive got:
def transpose_features(df1,col1,main_df,attr1,attr2):
from ast import literal_eval
# dic = literal_eval(f"{{{', '.join(df1[col1])}}}")
dic = {}
for i in df_features[attr1].tolist():
dic[i] = df_features[df_features[attr1]==i][attr2].tolist()
df_final = (main_df.drop(columns=list(dic))
.join(main_df[list(dic)].eq(dic).astype(int)
.rename(columns=lambda x: f'is_{x}_{dic[x]}')
)
)
print(df_final.shape)
return df_final
df_final = transpose_features(
df1 = df_features
,col1 = 'attr'
,main_df = df
,attr1 = 'attr1'
,attr2 = 'attr2'
)
df_final.head()
-This code pulls all the values into a list attaches that list to each key in the dictionary. But the issue now is - I need to basically an or statement in the method @mozway provided - that says "does user have ANY of the values in the list in each dict key".
Hard to even type that.
df1? do you have strings? dictionaries?