3

I have a Dataframe userdf as

val userdf = sparkSession.read.json(sparkContext.parallelize(Array("""[{"id" : 1,"name" : "user1"},{"id" : 2,"name" : "user2"}]""")))

scala> userdf.show
+---+-----+
| id| name|
+---+-----+
|  1|user1|
|  2|user2|
+---+-----+

I want to retrieve user with id === 1 and same I can achieve using code like

scala> userdf.filter($"id"===1).show
+---+-----+
| id| name|
+---+-----+
|  1|user1|
+---+-----+

What I want to achieve is like

val filter1 = $"id"===1
userdf.filter(filter1).show

These filters are fetch from configuration files and I am trying to achieve a more complex query using this building block, something like

userdf.filter(filter1 OR filter2).filter(filter3).show 

where filter1, filter2, filter3, AND and OR condition are fetched from configurations

Thanks

1 Answer 1

2

the filter method can also accept a string that it a sql expression.
this code should produce the same result

userdf.filter("id = 1").show

so you can just get that string from config

Sign up to request clarification or add additional context in comments.

5 Comments

this solution will not work with multiple "and" and "or" condition. ie, userdf.filter($"name"==="user1" || $"id" === 1) works fine, but userdf.filter("id=2" || "id=1") is not working. stackoverflow.com/questions/35881152/…
as long as it is a valid sql statement it should work: userdf.filter("id=2 or id=1")
thanks. the statement "as long as it is a valid sql statement" is very helpful.
@user811602 any luck on resolving this issue?
@JellfLL I have created valid sql string statement as commented by lev.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.