2

I have a three sets of code (code1, code 2 and code 3) having alphanumeric objects. All the codes are separated by delimiter (;) , and the codes are related based on sequence like A123 of code 1 is related to A of code 2 and A445 of code 3, and so on. Code 3 has some codes duplicated or repeated.

My desired output is to get the concatenated "code 4" where code 1 and code 2 are concatenated based on either of these two conditions

a) if the corresponding code in code 3 has no repeated value

b) if the corresponding code in code 3 has repeated value, then the position corresponding to the position of last repeated value in code 3 needs to be used for concatenating code 1 and code 2 (like B678 R4 because A445 is repeated twice in code 3, and the 4th position of A445 needs to be considered for concatenating code 1 and code 2)

Let me know if any logic can be used to get the output. Thanks in advance!

Python script for dataframe df is

df11 = pd.DataFrame({"code1": ["A123; A321; B478; B678; C567", "A321; A821; B448; B698; C577"], "code2": ["A; B5; N5; R4; H5", "A3; B; N; R7; H2"],"code3": ["A445; A323; A323; A445; A659", "A328; A328; A621; A442; A621"]},      index=[0, 1], )

Desired output along with the input codes should look like this enter image description here

2 Answers 2

3

STEPS:

  1. use applymap to convert each value into a list.
  2. explode the dataframe.
  3. strip off the extra space if any.
  4. drop the duplicates in the df based on the code3 column and keep the last value.
  5. drop the code3 column and join code1 & code2.
  6. Finally, aggregate them back using groupby to get the desired output.
df2 =(
    df11.assign(
        desired_output=df11.applymap(
            lambda x: x.split(';'))
        .apply(pd.Series.explode)
        .applymap(str.strip)
        .drop_duplicates(subset='code3', keep='last')
        .drop('code3', 1)
        .apply(' '.join, 1)
        .groupby(level=0)
        .agg('; '.join))
)

UPDATED ANSWER:

df2 = (
    df11.assign(
        desired_output=
        df11.apply(lambda s: s.str.split('; ').explode().str.strip())
        .drop_duplicates(subset='code3', keep='last')
        .drop('code3', 1)
        .apply(' '.join, 1)
        .groupby(level=0)
        .agg('; '.join)
        )
)

OUTPUT:

                          code1              code2  \
0  A123; A321; B478; B678; C567  A; B5; N5; R4; H5   
1  A321; A821; B448; B698; C577   A3; B; N; R7; H2   

                          code3             desired_output  
0  A445; A323; A323; A445; A659  B478 N5; B678 R4; C567 H5  
1  A328; A328; A621; A442; A621   A821 B; B698 R7; C577 H2  
Sign up to request clarification or add additional context in comments.

4 Comments

wow! I would need many more years of practice to achieve such expert manipulation!
@Nk03 Good answer, although you can substantially reduce the number of apply/apply maps steps, for e.g you could split the strings in a vectorized manner using df.apply(lambda s: s.str.split('; ').explode())
@ShubhamSharma I was not aware of the fact that we can use explode inside apply. Thanks !! :)
@Nk03 I have some issue with drop duplicate command while solving similar problem, i have created a new thread and linked it (see the Linked section) .
2

I have done a few manipulations:

(1) Use regular expression to extract items into a list, and reverse the list order.

(2) Find the index(s) of unique items in 'Code 3'.

(3) Concat the corresponding values in 'Code 1' and 'Code 2' based on the index(s).

import re

df = pd.DataFrame({"code1": ["A123; A321; B478; B678; C567", "A321; A821; B448; B698; C577"], "code2": ["A; B5; N5; R4; H5", "A3; B; N; R7; H2"],"code3": ["A445; A323; A323; A445; A659", "A328; A328; A621; A442; A621"]},      index=[0, 1], )
for col in df.columns:
    df[col] = df[col].apply(lambda x: re.findall(r'\w+', x)).apply(lambda x: x[::-1])

df['idx'] = df['code3'].apply(lambda x: [x.index(e) for e in set(x)])
df['code4'] = df.apply(lambda row: [row['code1'][i] + ' ' + row['code2'][i] for i in row['idx']], axis=1)

Output df

    code1                           code2               code3                           idx         code4
0   [C567, B678, B478, A321, A123]  [H5, R4, N5, B5, A] [A659, A445, A323, A323, A445]  [0, 2, 1]   [C567 H5, B478 N5, B678 R4]
1   [C577, B698, B448, A821, A321]  [H2, R7, N, B, A3]  [A621, A442, A621, A328, A328]  [0, 3, 1]   [C577 H2, A821 B, B698 R7]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.