Create new column by matching value from different columns and different rows python pandas

Question

I have a multicolumns df with about 2000 rows. df look like this:

 site     le_cell  le_id    ca    ca_id
1  101       1011      1    NAN    NAN
2  101       1012      2    NAN    NAN
3  101       1013      3    NAN    NAN
4  110       1101      1     2      11
5  110       1102      2     2      12
6  110       1103      3     2      13
7  110       1104      11    2       1
8  110       1105      12    2       2
9  110       1106      13    2       3

Here's the problem. I need to create a new column, called 'part_id' and values gonna be: Groupby 'site', and if there is no 'ca' (ca = NAN) then 'part_id' is equal le_id (part_id=le_id). If has 'ca', than read 'ca_id' and value of 'part_id' is gonna be 1, 2, 3. 1 and 11 = 1, 2 and 12 = 2, 3 and 13 = 3. Desired output:

  site     le_cell  le_id    ca    ca_id  part_id
1  101       1011      1    NAN    NAN      1
2  101       1012      2    NAN    NAN      2
3  101       1013      3    NAN    NAN      3
4  110       1101      1     2      11      1
5  110       1102      2     2      12      2
6  110       1103      3     2      13      3
7  110       1104      11    2       1      1
8  110       1105      12    2       2      2
9  110       1106      13    2       3      3

Just to mention, a can't just transform all le_id values from 11, 12, 13 to 1, 2, 3. So I need to go through 'ca' and match with 'le_cell' with the same 'le_id' as that 'ca_id'.

I've tried with converting to dict, but it's not gonna well, really have no idea how to start. At least, give me some hint.

zipa · Accepted Answer · 2017-08-17 12:04:21Z

1

You can define a mapper and use apply with lambda that will assign value based on your conditions:

mapper = {1: 1,
          11: 1,
          2: 2,
          12: 2,
          3: 3,
          13: 3}

df['part_id'] = df.apply(lambda row: row.le_id if np.isnan(row.ca) else mapper[row.ca_id], axis=1)

    ca  ca_id  le_cell  le_id  site  part_id
0  NaN    NaN     1011      1   101      1.0
1  NaN    NaN     1012      2   101      2.0
2  NaN    NaN     1013      3   101      3.0
3  2.0   11.0     1101      1   110      1.0
4  2.0   12.0     1102      2   110      2.0
5  2.0   13.0     1103      3   110      3.0
6  2.0    1.0     1104     11   110      1.0
7  2.0    2.0     1105     12   110      2.0
8  2.0    3.0     1106     13   110      3.0

Hope you don't mind the float, but if you do here is the conversion:

df['part_id'] = df['part_id'].astype(int)

answered Aug 17, 2017 at 12:04

zipa

28k6 gold badges45 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

zipa Over a year ago

@jezrael this solution is based of example data, and given rules :) if OP's example covers all nan cases this should work

jezrael · Accepted Answer · 2017-08-17 14:28:08Z

I think you can create boolean mask and then add values to column by numpy.where:

#if need check if all values per group are NaN
a = df['ca'].isnull().groupby(df['site']).all()
m = df['site'].isin(a.index[a])

#if need check if column ca is NaN
#m= df['ca'].isnull()

d = {11:1,12:2,13:3}
df['part_id'] = np.where(m, df['le_id'], df['ca_id'].replace(d))
print (df)
   site  le_cell  le_id   ca  ca_id  part_id
1   101     1011      1  NaN      0        1
2   101     1012      2  NaN      0        2
3   101     1013      3  NaN      0        3
4   110     1101      1  2.0     11        1
5   110     1102      2  2.0     12        2
6   110     1103      3  2.0     13        3
7   110     1104     11  2.0      1        1
8   110     1105     12  2.0      2        2
9   110     1106     13  2.0      3        3

Collectives™ on Stack Overflow

Create new column by matching value from different columns and different rows python pandas

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related