8

To give the backfround I have loaded the JSON using

sqlContext.read.json(sn3://...)
df.registerTable("posts")

I have the following schema for my table in Spark

scala> posts.printSchema
root
 |-- command: string (nullable = true)
 |-- externalId: string (nullable = true)
 |-- sourceMap: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- url: string (nullable = true)
 |-- type: string (nullable = true)

I want to select all posts with hashtag "nike"

sqlContext.sql("select sourceMap['hashtags'] as ht from posts where ht.contains('nike')");

I get an error undefined function ht.contains

I am not sure what method to use to search within the array.

Thanks!

1 Answer 1

17

I found the answer referring to Hive SQL.

sqlContext.sql("select sourceMap['hashtags'] from posts where array_contains(sourceMap['hashtags'], 'nike')");

The key function is array_contains()

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.