Pandas/Python: Replacing column values from another column values using .replace()

Question

Data:

screenshot


import pandas as pd
dict= {'REF': ['A','B','C','D'],
        'ALT': [['E','F'], ['G'], ['H','I','J'], ['K,L']],
        'sample1': ['0', '0', '1', '2'],
        'sample2': ['1', '0', '3', '0']
        }
df = pd.DataFrame(dict)

Problem: I need to replace the values in columns'Sample1' and 'Sample2'. If there is 0, then 'REF' column value should be placed. If 1, then first element of list in column 'ALT' should be placed, if 2, then second element of 'ALT' column list, and so on.
My Solution:

 sample_list = ['sample1', 'sample2']
    for sample in sample_list:

        #replace 0s 
        df[sample] = df.apply(lambda x: x[sample].replace('0', x['REF']), axis=1)
        #replace other numbers
        for i in range(1,4):
            try:
                df[sample] = df.apply(lambda x: x[sample].replace(f'{i}', x['ALT'][i-1]), axis=1)
            except:
                pass

However, because list length is different in every 'ALT' column row, it seems that there is IndexError, and values are not replaced after 1. You can see it from the output:

screenshot

'{"REF":{"0":"A","1":"B","2":"C","3":"D"},"ALT":{"0":["E","F"],"1":["G"],"2":["H","I","J"],"3":["K"]},"sample1":{"0":"A","1":"B","2":"H","3":"2"},"sample2":{"0":"E","1":"B","2":"3","3":"D"}}'

How can I solve it?

UPDATE: If I have NaN value in sample1 or sample2, I can't convert values to int and don't how to skip these values

enter image description here

So, NaN values should not be converted and stayed NaN

Expected output:

enter image description here

I think you have a typo in your ALT column, K and L should be separated. — Ismael EL ATIFI
– Ismael EL ATIFI, Commented Dec 22, 2020 at 9:39

Dani Mesejo · Accepted Answer · 2020-12-22 09:11:28Z

1

You could do:

df['sample1'] = np.where(df['sample1'].eq(0), df['REF'],
                         [v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample1'].astype(int))])

df['sample2'] = np.where(df['sample2'].eq(0), df['REF'],
                         [v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample2'].astype(int))])

print(df)

Output

  REF        ALT sample1 sample2
0   A     [E, F]       E       E
1   B        [G]       G       G
2   C  [H, I, J]       H       J
3   D        [K]       K       K

Note that I use a different input given the one in your example is not valid.

answered Dec 22, 2020 at 9:11

Dani Mesejo

62.1k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniyar Karabayev Over a year ago

Thanks! But what I can do if there is NaN value in some of the rows in sample column? then df['sample2'].astype(int)) will not work. How to skip these rows?

Ismael EL ATIFI · Accepted Answer · 2020-12-23 10:42:54Z

0

Using a simple concatenation of REF and ALT columns and apply :

import pandas as pd
d= {'REF': ['A','B','C','D'],
        'ALT': [['E','F'], ['G'], ['H','I','J'], ['K','L']],
        'sample1': ['0', '0', '1', '2'],
        'sample2': ['1', '0', '3', '0']
        }
df = pd.DataFrame(d)


df["REF_ALT"] = df["REF"].map(list)+df["ALT"]  # concatenate REF and ALT
df["sample1"] = df.apply(lambda row: np.nan if np.isnan(row["sample1"]) else row["REF_ALT"][int(row["sample1"])], axis=1)
df["sample2"] = df.apply(lambda row: np.nan if np.isnan(row["sample2"]) else row["REF_ALT"][int(row["sample2"])], axis=1)
df.pop("REF_ALT")
df

edited Dec 23, 2020 at 10:42

answered Dec 22, 2020 at 9:36

Ismael EL ATIFI

2,12822 silver badges16 bronze badges

6 Comments

Daniyar Karabayev Over a year ago

Thanks for simple answer! But what I can do if there is NaN value in some of sample columns? then int(row["sample"]) will not work

Ismael EL ATIFI Over a year ago

In that case you need to replace the NaN values beforehand with .fillna()

Daniyar Karabayev Over a year ago

But I need to keep these NaN values and don't replace them, so I can't use either .fillna() or convert to integer

Ismael EL ATIFI Over a year ago

Ok so please clarify what is the expected output in case of nan

Daniyar Karabayev Over a year ago

Expected output is just to keep NaN values (don't replace) in sample column, and replace only numbers

|

anon01 · Accepted Answer · 2020-12-22 09:41:34Z

0

A simple solution:

df = pd.DataFrame.from_dict({
 'REF': {0: 'A', 1: 'B', 2: 'C', 3: 'D'},
 'ALT': {0: ['E', 'F'], 1: ['G'], 2: ['H', 'I', 'J'], 3: ['K', 'L']},
 'sample1': {0: 0, 1: 0, 2: 1, 3: 2},
 'sample2': {0: 1, 1: 0, 2: 3, 3: 0},
})

# create a temp col s that includes a single string with letters:
df["s"] = df.REF + df.ALT.str.join("")    
df["sample1"] = df.apply(lambda x: x["s"][x.sample1], axis=1)
df["sample2"] = df.apply(lambda x: x["s"][x.sample2], axis=1)
df = df.drop(columns="s")

output:

  REF        ALT sample1 sample2
0   A     [E, F]       A       E
1   B        [G]       B       B
2   C  [H, I, J]       H       J
3   D     [K, L]       L       D

edited Dec 22, 2020 at 9:41

answered Dec 22, 2020 at 9:32

anon01

11.2k8 gold badges41 silver badges64 bronze badges

Collectives™ on Stack Overflow

Pandas/Python: Replacing column values from another column values using .replace()

3 Answers 3

1 Comment

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related