0

I used the below code to calculate the average of an attribute

from pyspark.sql import functions as F

from pyspark.sql.functions import mean

result = df.select([mean("Age")])

result.show()

I got the output as 56.4567 i need to convert it into an integer

1
  • Sorry, I just can't seem to get pyspark installed. Commented May 28, 2020 at 14:47

3 Answers 3

1

If you want the result as int and not df run

result = round(df.select(mean("Age")).collect()[0][0])

result will be of int type.

Sign up to request clarification or add additional context in comments.

Comments

0
result_as_integer = int(result)

or

result_as_float = float(result)

3 Comments

I am getting an error "TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataFrame'"
can't convert df to int directly
While this code may resolve the OP's issue, it is best to include an explanation as to how your code addresses the OP's issue. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform, that differentiates it from forums. You can edit to add additional info &/or to supplement your explanations with source documentation.
-1

First you need to convert pyspark dataframe result to real number:

result = result.take(1)[0].asDict()['avg(Age)']

or

result = result.collect()[0]['avg(Age)']

or

result = result.collect()[0][0]

if you need the floor of the number:

import math
math.floor(float(result))

#56

if you need the ceiling of the number:

import math
math.ceil(float(result))

#57

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.