1

I have a DataFrame like below:

df
      A    B     C    D    E    key
0  test    Z  10.0    a    a  10111
1  test    A  10.0    a    a  10111
2  test    x   2.0    a    b  11010
3  test    5  12.0    b    b  10100
4  test    x   5.0    c    b  11000
5  test    2  14.0    g    c  10111

What I need to get is to concatenate all strings accordingly to key column:

  • key at position [0] is for col A, key at position [1] is for col B and so on...
  • each 1 in is for take, each 0 is for skip column

Result should look like:

      A    B     C    D    E    key     key_val
0  test    Z  10.0    a    a  10111  test10.0aa
1  test    A  10.0    a    a  10111  test10.0aa
2  test    x   2.0    a    b  11010      testxa
3  test    5  12.0    b    b  10100    test12.0
4  test    x   5.0    c    b  11000       testx
5  test    2  14.0    g    c  10111  test14.0gc

What I did so far - I've created key_list column with:

df['key_list'] = df['key'].apply(lambda x: list(str(x)))

df
      A  B     C  D  E    key         key_list
0  test  Z  10.0  a  a  10111  [1, 0, 1, 1, 1]
1  test  A  10.0  a  a  10111  [1, 0, 1, 1, 1]
2  test  x   2.0  a  b  11010  [1, 1, 0, 1, 0]
3  test  5  12.0  b  b  10100  [1, 0, 1, 0, 0]
4  test  x   5.0  c  b  11000  [1, 1, 0, 0, 0]
5  test  2  14.0  g  c  10111  [1, 0, 1, 1, 1]

Next step I've tried this (I wanted to multiply by 1 or 0 to include or exclude string):

df.apply((df['A'].astype(str) * df['key_list'][0]) +
         (df['B'].astype(str) * df['key_list'][1]) +
         (df['C'].astype(str) * df['key_list'][2]) +
         (df['D'].astype(str) * df['key_list'][3]) +
         (df['E'].astype(str) * df['key_list'][4]), axis=1)

but that seems to be wrong idea: ValueError: operands could not be broadcast together with shapes (6,) (5,). I follow common practice of string concatenation, just with extra step:

df['A'].astype(str) + df['B'].astype(str) + df['C'].astype(str) + df['D'].astype(str) + df['E'].astype(str)

1 Answer 1

3

Idea is convert key column to mask, then replace not match by empty string in DataFrame.where and sum together for join:

c = ['A','B','C','D','E']

L = [list(str(x)) for x in df['key']]
m = pd.DataFrame(L, columns=c, index=df.index).fillna(0).astype(int).astype(bool)
print (m)
      A      B      C      D      E
0  True  False   True   True   True
1  True  False   True   True   True
2  True   True  False   True  False
3  True  False   True  False  False
4  True   True  False  False  False
5  True  False   True   True   True

df['key_val'] = df[c].where(m, '').astype(str).sum(axis=1)
print (df)
      A  B     C  D  E    key     key_val
0  test  Z  10.0  a  a  10111  test10.0aa
1  test  A  10.0  a  a  10111  test10.0aa
2  test  x   2.0  a  b  11010      testxa
3  test  5  12.0  b  b  10100    test12.0
4  test  x   5.0  c  b  11000       testx
5  test  2  14.0  g  c  10111  test14.0gc
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.