Py4JJavaError in Pyspark

Question

I am working on Spark using Python API. Below is my code. When I execute the line wordCount.first(). I am receiving ValueError: need more than 1 value to unpack. Any light on the above error would be appreciated. Thanks...

#create an RDD with textFile method
text_data_file=sc.textFile('/resources/yelp_labelled.txt')

#import the required library for word count operation
from operator import add
#Use filter to return RDD for words length greater than zero
wordCountFilter=text_data_file.filter(lambda x:len(x)>0)
#use flat map to split each line into words
wordFlatMap=wordCountFilter.flatMap(lambda x: x.split())
#map each key with value 1 using map function
wordMapper=wordFlatMap.flatMap(lambda x:(x,5))
#Use reducebykey function to reduce above mapped keys
#returns the key-value pairs by adding values for similar keys
wordCount=wordMapper.reduceByKey(add)
#view the first element
wordCount.first()

File "/home/notebook/spark-1.6.0-bin-`hadoop2.6/python/lib/pyspark.zip/pyspark/shuffle.py", line 236, in mergeValues for k, v in iterator: ValueError: need more than 1 value to unpack`

Alper t. Turker · Accepted Answer · 2018-02-05 16:49:02Z

1

Your mistake is here:

wordMapper=wordFlatMap.flatMap(lambda x:(x,5))

it should be

wordMapper=wordFlatMap.map(lambda x:(x,5))

otherwise you just emit

and

as separate values. Spark will try to expand x and fail, it its length is not equal to 2. Otherwise it will try to unpack 5 and fail as well.

answered Feb 5, 2018 at 16:49

Alper t. Turker

35.3k9 gold badges89 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Py4JJavaError in Pyspark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related