Filter on multiple columns in Spark Dataframe based API

Question

I have a dataframe like:

+--------+-------+--------------------+-------------------+
|     id1|    id2|                body|         created_at|
+--------+-------+--------------------+-------------------+
|1       |      4|....................|2017-10-01 00:00:05|
|2       |      3|....................|2017-10-01 00:00:05|
|3       |      2|....................|2017-10-01 00:00:05|
|4       |      1|....................|2017-10-01 00:00:05|
+--------+-------+--------------------+-------------------+

I would like to filter the table using both id1 and id2. For example get rows where id1=1, id2=4 and id1=2, id2=3.

Currently, I'm using loop to generate a giant query string for df.filter(), i.e. ((id1 = 1) and (id2 = 4)) or ((id1 = 2) and (id2 = 3)). Just wondering if there is a more properly way to achieve this?

MaxU - stand with Ukraine · Accepted Answer · 2017-10-12 22:41:21Z

1

You can generate a helper DF (table):

tmp:

+--------+-------+
|     id1|    id2|
+--------+-------+
|1       |      4|
|2       |      3|
+--------+-------+

and then join them:

SELECT a.*
FROM tab a
JOIN tmp b
  ON (a.id1 = b.id1 and a.id2 = b.id2)

where tab is your original DF, registered as a table

answered Oct 12, 2017 at 22:41

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Harrison Over a year ago

Thanks MaxU, how is the performance of this approach? I tried it to select 2 rows from 437 rows, which took 8.11s and my original approach took 0.03s.

MaxU - stand with Ukraine Over a year ago

I guess this approach will be slower, but it doesn't depend on number of rows in the tmp table.Your approach may fail if the condition string will be too long...

Collectives™ on Stack Overflow

Filter on multiple columns in Spark Dataframe based API

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related