5

Good morning everyone. I am working with Python and Pandas.

I have two DataFrames, of the following type:

df_C = pd.DataFrame(data=[[-3,-1,-1], [5,3,3], [3,3,1], [-1,-1,-3], [-3,-1,-1], [2,3,1], [1,1,1]], columns=['C1','C2','C3'])

   C1  C2  C3
0  -3  -1  -1
1   5   3   3
2   3   3   1
3  -1  -1  -3
4  -3  -1  -1
5   2   3   1
6   1   1   1


df_F = pd.DataFrame(data=[[-1,1,-1,-1,-1],[1,1,1,1,1],[1,1,1,-1,1],[1,-1,-1,-1,1],[-1,0,0,-1,-1],[1,1,1,-1,0],[1,1,-1,1,-1]], columns=['F1','F2','F3','F4','F5'])

   F1  F2  F3  F4  F5
0  -1   1  -1  -1  -1
1   1   1   1   1   1
2   1   1   1  -1   1
3   1  -1  -1  -1   1
4  -1   0   0  -1  -1
5   1   1   1  -1   0
6   1   1  -1   1  -1

I would like to be able to "cross" these two DataFrames, to generate or one in 3D, as follows:

Matrix 3D

The new data that is generated must compare the values of the df_F with the values of the df_C, taking into account the following:

  • If both values are positive, generate 1
  • If both values are negative, generate 1
  • If one value is positive and the other negative, it generates 0
  • If any of the values is zero, it generates None (NaN)

True table

Comparison of the data df_C vs df_F

df_C vs df_F = 3D
  +       +     1
  +       -     0
  +       0     None
  -       +     0
  -       -     1
  -       0     None
  0       +     None
  0       -     None
  0       0     None

You, who are experts in programming, could you please guide me, as I generate this matrix, I compare the values. I wish to do it with Pandas. I have done it with loops (for) and conditions (if), but it is visually unpleasant and I think that with Pandas it is more efficient and elegant.

Thank you.

1 Answer 1

3

Numpy broadcasting and np.select

Broadcast and multiply the values in df_C with the values from df_F in such a way that the shape of the resulting product matrix will be (3, 7, 5), then test for the condition where the values in the product matrix are positive, negative or zero and assign the corresponding values 1, 0 and NaN where the condition holds True

a = df_C.values.T[:, :, None] * df_F.values
a = np.select([a > 0, a < 0], [1, 0], np.nan)

array([[[ 1.,  0.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  0.,  1.],
        [ 0.,  1.,  1.,  1.,  0.],
        [ 1., nan, nan,  1.,  1.],
        [ 1.,  1.,  1.,  0., nan],
        [ 1.,  1.,  0.,  1.,  0.]],

       [[ 1.,  0.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  0.,  1.],
        [ 0.,  1.,  1.,  1.,  0.],
        [ 1., nan, nan,  1.,  1.],
        [ 1.,  1.,  1.,  0., nan],
        [ 1.,  1.,  0.,  1.,  0.]],

       [[ 1.,  0.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  0.,  1.],
        [ 0.,  1.,  1.,  1.,  0.],
        [ 1., nan, nan,  1.,  1.],
        [ 1.,  1.,  1.,  0., nan],
        [ 1.,  1.,  0.,  1.,  0.]]])
Sign up to request clarification or add additional context in comments.

1 Comment

Shubham, thank you very much, Your solution is flawless, with a perfect explanation. Very elegant!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.