0

Attempting to logically parse through the following sample json list:

FruitJson = [
 ('{"num":100, "fruit" : ["apple", "peach", "grape", "melon"]}',), 
 ('{"num":101, "fruit" : ["melon", "apple", "mango", "banana"]}',),  
]

Ideal Output:

fruit count
apple 2
melon 2
peach 1
grape 1
mangno 1
banana 1

I managed to get the first row of the list into a dataframe, but unable to progress further from here:

dbutils.fs.put("/temp/test.json",'{"num":100, "fruit" : ["apple", "peach", "grape", "melon"]}'\
'{"num":101, "fruit" : ["melon", "apple", "mango", "banana"]}',True)
df = spark.read.option("multiline","true").json('/temp/test.json')
display(df)

You advice is much appreciated.

2
  • What have you tried so far? Commented Aug 3, 2021 at 1:04
  • Updated thread with what i've tried. Basically I managed to upload only the first row into a json.file and then used spark.read.option("multiline","true").json('/temp/test.json') to store data into a dataframe. Been stuck here for a while. Any help is appreciated. Commented Aug 3, 2021 at 1:55

1 Answer 1

1

First, your multiline option should be False, not True. multiline=False means your JSON has multiple lines, one row per line. Docs

Second, what you're trying to achieve is a simple aggregation, but you will need to explode the list to multiple rows first.

from pyspark.sql import functions as F

(df
    .withColumn('fruit', F.explode('fruit'))
    .groupBy('fruit')
    .agg(
        F.count('*').alias('cnt')
    )
    .show()
)

# +------+---+
# | fruit|cnt|
# +------+---+
# | grape|  1|
# | apple|  2|
# | mango|  1|
# |banana|  1|
# | melon|  2|
# | peach|  1|
# +------+---+
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! Really appreciate all your help here. Explode was the missing link on my end.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.