-1

I'm working on a spark code, I always got error:

TypeError: 'float' object is not iterable

on the line of reduceByKey() function. Can someone help me? This is the stacktrace of the error:

d[k] = comb(d[k], v) if k in d else creator(v)
  File "/home/hw/SC/SC_spark.py", line 535, in <lambda>
TypeError: 'float' object is not iterable

Here is code:

def field_valid(m):
    dis=m[1]
    TxP=m[2]
    ef=m[3]
    pl=m[4]
    if TxP != 'NaN' and disl != 'NaN' and ef !='NaN' and pl != 'NaN':
        return True
    else:
        return False

def parse_input(d):
    #d=data.split(',')

    s_name='S'+d[6] # serving cell name

    if d[2] =='NaN' or d[2] == '':
        ef='NaN'
    else:
        ef=float(d[2].strip().rstrip())

    if d[7] =='NaN' or d[7] == '' or d[7] == '0':
        TxP='NaN'
    else:
        TxP=float(d[7].strip().rstrip())

    if d[9] =='NaN' or d[9] == '':
        dis='NaN'
    else:
        dis=float(d[9].strip().rstrip())

    if d[10] =='NaN' or d[10] == '':
        pl='NaN'
    else:
        pl=float(d[10].strip().rstrip())

return s_name,dis, TxP, ef, pl


sc=SparkContext(appName="SC_spark")
lines=sc.textFile(ip_file)
lines=lines.map(lambda m: (m.split(",")))
lines=lines.filter(lambda m: (m[6] != 'cell_name'))
my_rdd=lines.map(parse_input).filter(lambda m: (field_valid(m)==True))
my_rdd=my_rdd.map(lambda x: (x[0],(x[1],x[2])))                                                                                                                                          
my_rdd=my_rdd.reduceByKey(lambda x,y:(max(x[0],y[0]),sum(x[1],y[1])))  #this line got error

Here is some sample data:


Class,PB,EF,RP,RQ,ID,cell_name,TxP,BW,DIS,PL,geom
NaN,10,5110,-78.0,-7.0,134381669,S417|134381669|5110,62.78151250383644,10,2578.5795095469166,113.0,NaN
NaN,10,5110,-71.0,-6.599999904632568,134381669,S417|134381669|5110,62.78151250383644,10,2689.630258510342,106.0,NaN
NaN,10,5110,-77.0,-7.300000190734863,134381669,S417|134381669|5110,62.78151250383644,10,2907.8184899249713,112.0,19.299999999999983
NaN,10,5110,-91.0,-11.0,134381669,S417|134381669|5110,62.78151250383644,10,2779.96762695867,126.0,5.799999999999997
NaN,10,5110,-90.0,-12.69999980926514,134381669,S417|134381669|5110,62.78151250383644,10,2749.8351648579583,125.0,9.599999999999994
NaN,10,5110,-95.0,-13.80000019073486,134381669,S417|134381669|5110,62.78151250383644,10,2942.7938902934643,130.0,-2.4000000000000057
NaN,10,5110,-70.0,-7.099999904632568,134381669,S417|134381669|5110,62.78151250383644,10,3151.930706017461,105.0,22.69999999999999
14
  • 1
    I am not familiar with pyspark, but in the line where the error occurs you call sum with two arguments. Unless the first one is an iterable and the second an int, your error is probably there. Try calling sum(1.0, 2) on a python console. It gives me a very similar error. Commented Apr 22, 2018 at 6:07
  • Hi @bla, I just tested out, made sure all fields are converted to float. You noticed I filtered the line with NaN on those values, so, the number is float only. I also checked the syntax of lambda function, I separate to (k,v). I didn't find anything wrong. Did you find anything wrong? Commented Apr 22, 2018 at 6:14
  • What exactly is m.split(",") doing? You have no commas in the data Commented Apr 22, 2018 at 6:15
  • @HelenZ you cannot pass a float as the first argument of sum. It expects an interable. Check it out: docs.python.org/3.5/library/functions.html#sum. I cannot confirm that this is the case, since I am not sure x[1] is a float. But the stacktrace are very similar. Commented Apr 22, 2018 at 6:19
  • 1
    Hi @cricket_007. BTW i just changed to x[1]+y[1], and it works!! I'm new to spark, and can't distinguish spark1 and spark2 yet. can you tell me how to do in spark2? the expected result is sum and max of value 'dis' by the same key, and key is column 'cell_name'. Commented Apr 22, 2018 at 6:32

1 Answer 1

0

the expected result is sum and max of value

In that case, you are looking for x[1] + y[1], and not use the built-in sum() function.

my_rdd.reduceByKey( lambda x,y: ( max(x[0],y[0]), x[1] + y[1] ) )
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @cricket_700, can I ask another question? Now I want to save result into a .txt file, but I want to add a header to the .txt file, how should I do it? I used this statement: my_rdd.repartition(1).saveAsTextFile("sc_result/result.txt")
You need to union your RDD with a header RDD. stackoverflow.com/questions/26157456/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.