0

I am working on Spark using Python API. Below is my code. When I execute the line wordCount.first(). I am receiving ValueError: need more than 1 value to unpack. Any light on the above error would be appreciated. Thanks...

#create an RDD with textFile method
text_data_file=sc.textFile('/resources/yelp_labelled.txt')

#import the required library for word count operation
from operator import add
#Use filter to return RDD for words length greater than zero
wordCountFilter=text_data_file.filter(lambda x:len(x)>0)
#use flat map to split each line into words
wordFlatMap=wordCountFilter.flatMap(lambda x: x.split())
#map each key with value 1 using map function
wordMapper=wordFlatMap.flatMap(lambda x:(x,5))
#Use reducebykey function to reduce above mapped keys
#returns the key-value pairs by adding values for similar keys
wordCount=wordMapper.reduceByKey(add)
#view the first element
wordCount.first()

File "/home/notebook/spark-1.6.0-bin-`hadoop2.6/python/lib/pyspark.zip/pyspark/shuffle.py", line 236, in mergeValues for k, v in iterator: ValueError: need more than 1 value to unpack`

1 Answer 1

1

Your mistake is here:

wordMapper=wordFlatMap.flatMap(lambda x:(x,5))

it should be

wordMapper=wordFlatMap.map(lambda x:(x,5))

otherwise you just emit

x

and

5

as separate values. Spark will try to expand x and fail, it its length is not equal to 2. Otherwise it will try to unpack 5 and fail as well.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.