22

In Pandas DataFrame, I can use DataFrame.isin() function to match the column values against another column.

For example: suppose we have one DataFrame:

df_A = pd.DataFrame({'col1': ['A', 'B', 'C', 'B', 'C', 'D'], 
                     'col2': [1, 2, 3, 4, 5, 6]})
df_A

    col1  col2
0    A     1
1    B     2
2    C     3
3    B     4
4    C     5
5    D     6         

and another DataFrame:

df_B = pd.DataFrame({'col1': ['C', 'E', 'D', 'C', 'F', 'G', 'H'], 
                     'col2': [10, 20, 30, 40, 50, 60, 70]})
df_B

    col1  col2
0    C    10
1    E    20
2    D    30
3    C    40
4    F    50
5    G    60
6    H    70       

I can use .isin() function to match the column values of df_B against the column values of df_A

E.g.:

df_B[df_B['col1'].isin(df_A['col1'])]

yields:

    col1  col2
0    C    10
2    D    30
3    C    40

What's the equivalent operation in PySpark DataFrame?

df_A = pd.DataFrame({'col1': ['A', 'B', 'C', 'B', 'C', 'D'], 
                     'col2': [1, 2, 3, 4, 5, 6]})
df_A = sqlContext.createDataFrame(df_A)

df_B = pd.DataFrame({'col1': ['C', 'E', 'D', 'C', 'F', 'G', 'H'], 
                     'col2': [10, 20, 30, 40, 50, 60, 70]})
df_B = sqlContext.createDataFrame(df_B)


df_B[df_B['col1'].isin(df_A['col1'])]

The .isin() code above gives me an error messages:

u'resolved attribute(s) col1#9007 missing from 
col1#9012,col2#9013L in operator !Filter col1#9012 IN 
(col1#9007);;\n!Filter col1#9012 IN (col1#9007)\n+- 
LogicalRDD [col1#9012, col2#9013L]\n'

1 Answer 1

45

This kind of operation is called left semi join in spark:

df_B.join(df_A, ['col1'], 'leftsemi')
Sign up to request clarification or add additional context in comments.

2 Comments

what if they have different column names?
@BlueClouds In this case you just modify join condition: df_B.join(df_A, [df_B.col1 == db_A.col2], 'leftsemi')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.