I have a DataFrame like below:
df
A B C D E key
0 test Z 10.0 a a 10111
1 test A 10.0 a a 10111
2 test x 2.0 a b 11010
3 test 5 12.0 b b 10100
4 test x 5.0 c b 11000
5 test 2 14.0 g c 10111
What I need to get is to concatenate all strings accordingly to key column:
keyat position[0]is forcol A,keyat position[1]is forcol Band so on...- each
1in is for take, each0is for skip column
Result should look like:
A B C D E key key_val
0 test Z 10.0 a a 10111 test10.0aa
1 test A 10.0 a a 10111 test10.0aa
2 test x 2.0 a b 11010 testxa
3 test 5 12.0 b b 10100 test12.0
4 test x 5.0 c b 11000 testx
5 test 2 14.0 g c 10111 test14.0gc
What I did so far - I've created key_list column with:
df['key_list'] = df['key'].apply(lambda x: list(str(x)))
df
A B C D E key key_list
0 test Z 10.0 a a 10111 [1, 0, 1, 1, 1]
1 test A 10.0 a a 10111 [1, 0, 1, 1, 1]
2 test x 2.0 a b 11010 [1, 1, 0, 1, 0]
3 test 5 12.0 b b 10100 [1, 0, 1, 0, 0]
4 test x 5.0 c b 11000 [1, 1, 0, 0, 0]
5 test 2 14.0 g c 10111 [1, 0, 1, 1, 1]
Next step I've tried this (I wanted to multiply by 1 or 0 to include or exclude string):
df.apply((df['A'].astype(str) * df['key_list'][0]) +
(df['B'].astype(str) * df['key_list'][1]) +
(df['C'].astype(str) * df['key_list'][2]) +
(df['D'].astype(str) * df['key_list'][3]) +
(df['E'].astype(str) * df['key_list'][4]), axis=1)
but that seems to be wrong idea: ValueError: operands could not be broadcast together with shapes (6,) (5,). I follow common practice of string concatenation, just with extra step:
df['A'].astype(str) + df['B'].astype(str) + df['C'].astype(str) + df['D'].astype(str) + df['E'].astype(str)