so I'm using Spark to do sentiment analysis, and I keep getting errors with the serializers it uses (I think) to pass python objects around.
PySpark worker failed with exception:
Traceback (most recent call last):
File "/Users/abdul/Desktop/RSI/spark-1.0.1-bin- hadoop1/python/pyspark/worker.py", line 77, in main
serializer.dump_stream(func(split_index, iterator), outfile)
File "/Users/abdul/Desktop/RSI/spark-1.0.1-bin- hadoop1/python/pyspark/serializers.py", line 191, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/Users/abdul/Desktop/RSI/spark-1.0.1-bin- hadoop1/python/pyspark/serializers.py", line 123, in dump_stream
for obj in iterator:
File "/Users/abdul/Desktop/RSI/spark-1.0.1-bin- hadoop1/python/pyspark/serializers.py", line 180, in _batched
for item in iterator:
TypeError: __init__() takes exactly 3 arguments (2 given)
and the code for serializers is available here
and my code is here