Querying json object in dataframe using Pyspark

Question

I have a MySql table with following schema:

id-int
path-varchar
info-json {"name":"pat", "address":"NY, USA"....}

I used JDBC driver to connect pyspark to MySql. I can retrieve data from mysql using

df = sqlContext.sql("select * from dbTable")

This query works all fine. My question is, how can I query on "info" column? For example, below query works all fine in MySQL shell and retrieve data but this is not supported in Pyspark (2+).

select id, info->"$.name" from dbTable where info->"$.name"='pat'

Zhang Tong · Accepted Answer · 2017-01-11 01:24:34Z

16

from pyspark.sql.functions import *
res = df.select(get_json_object(df['info'],"$.name").alias('name'))
res = df.filter(get_json_object(df['info'], "$.name") == 'pat')

There is already a function named get_json_object

For your situation:

df = spark.read.jdbc(url='jdbc:mysql://localhost:3306', table='test.test_json',
                     properties={'user': 'hive', 'password': '123456'})
df.createOrReplaceTempView('test_json')
res = spark.sql("""
select col_json,get_json_object(col_json,'$.name') from test_json
""")
res.show()

Spark sql is almost like HIVE sql, you can see

https://cwiki.apache.org/confluence/display/Hive/Home

edited Jan 11, 2017 at 1:24

answered Jan 10, 2017 at 8:18

Zhang Tong

4,7593 gold badges21 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ciri Over a year ago

Thanks for your reply. This method only works when the data is loaded in a data frame. There are hundreds of thousands of records. It might not be efficient way to load the complete table and filter the data against it. Is there way to retrieve the data only matched (json search) with a query rather than loading the complete table?

Collectives™ on Stack Overflow

Querying json object in dataframe using Pyspark

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related