2

I have a multicolumns df with about 2000 rows. df look like this:

 site     le_cell  le_id    ca    ca_id
1  101       1011      1    NAN    NAN
2  101       1012      2    NAN    NAN
3  101       1013      3    NAN    NAN
4  110       1101      1     2      11
5  110       1102      2     2      12
6  110       1103      3     2      13
7  110       1104      11    2       1
8  110       1105      12    2       2
9  110       1106      13    2       3

Here's the problem. I need to create a new column, called 'part_id' and values gonna be: Groupby 'site', and if there is no 'ca' (ca = NAN) then 'part_id' is equal le_id (part_id=le_id). If has 'ca', than read 'ca_id' and value of 'part_id' is gonna be 1, 2, 3. 1 and 11 = 1, 2 and 12 = 2, 3 and 13 = 3. Desired output:

  site     le_cell  le_id    ca    ca_id  part_id
1  101       1011      1    NAN    NAN      1
2  101       1012      2    NAN    NAN      2
3  101       1013      3    NAN    NAN      3
4  110       1101      1     2      11      1
5  110       1102      2     2      12      2
6  110       1103      3     2      13      3
7  110       1104      11    2       1      1
8  110       1105      12    2       2      2
9  110       1106      13    2       3      3

Just to mention, a can't just transform all le_id values from 11, 12, 13 to 1, 2, 3. So I need to go through 'ca' and match with 'le_cell' with the same 'le_id' as that 'ca_id'.

I've tried with converting to dict, but it's not gonna well, really have no idea how to start. At least, give me some hint.

2 Answers 2

1

You can define a mapper and use apply with lambda that will assign value based on your conditions:

mapper = {1: 1,
          11: 1,
          2: 2,
          12: 2,
          3: 3,
          13: 3}

df['part_id'] = df.apply(lambda row: row.le_id if np.isnan(row.ca) else mapper[row.ca_id], axis=1)

    ca  ca_id  le_cell  le_id  site  part_id
0  NaN    NaN     1011      1   101      1.0
1  NaN    NaN     1012      2   101      2.0
2  NaN    NaN     1013      3   101      3.0
3  2.0   11.0     1101      1   110      1.0
4  2.0   12.0     1102      2   110      2.0
5  2.0   13.0     1103      3   110      3.0
6  2.0    1.0     1104     11   110      1.0
7  2.0    2.0     1105     12   110      2.0
8  2.0    3.0     1106     13   110      3.0

Hope you don't mind the float, but if you do here is the conversion:

df['part_id'] = df['part_id'].astype(int)
Sign up to request clarification or add additional context in comments.

1 Comment

@jezrael this solution is based of example data, and given rules :) if OP's example covers all nan cases this should work
1

I think you can create boolean mask and then add values to column by numpy.where:

#if need check if all values per group are NaN
a = df['ca'].isnull().groupby(df['site']).all()
m = df['site'].isin(a.index[a])

#if need check if column ca is NaN
#m= df['ca'].isnull()

d = {11:1,12:2,13:3}
df['part_id'] = np.where(m, df['le_id'], df['ca_id'].replace(d))
print (df)
   site  le_cell  le_id   ca  ca_id  part_id
1   101     1011      1  NaN      0        1
2   101     1012      2  NaN      0        2
3   101     1013      3  NaN      0        3
4   110     1101      1  2.0     11        1
5   110     1102      2  2.0     12        2
6   110     1103      3  2.0     13        3
7   110     1104     11  2.0      1        1
8   110     1105     12  2.0      2        2
9   110     1106     13  2.0      3        3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.