6

I have a sample dataframe

df_that_I_have
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 50      | 1     |
+---------+---------+-------+
| Japan   | 20      | 3     |
+---------+---------+-------+
| India   | 20      | 1     |
+---------+---------+-------+
| Japan   | 10      | 3     |
+---------+---------+-------+

and I want a dataframe that looks like this

df_that_I_want
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 70      | 10    | // 5 * Sum of "some" for India, i.e. (1 + 1)
+---------+---------+-------+
| Japan   | 30      | 30    | // 5 * Sum of "some" for Japan, i.e. (3 + 3)
+---------+---------+-------+

The second dataframe has the sum of members and the sum of some multiplied 5.

This is what I'm doing to achieve this

val df_that_I_want = df_that_I_have
                        .select(df_that_I_have("country"),
                                df_that_I_have.groupBy("country").sum("members"),
                                5 * df_that_I_have.groupBy("country").sum("some")) //Problem here

But the compiler does not allow me to do this because apparently I can't multiply 5 with a column.

How can I multiply an Integer value with the sum of some for each country?

3 Answers 3

6

You can try lit function.

scala> val df_that_I_have = Seq(("India",50,1),("India",20,1),("Japan",20,3),("Japan",10,3)).toDF("Country","Members","Some")
df_that_I_have: org.apache.spark.sql.DataFrame = [Country: string, Members: int, Some: int]

scala> val df1 = df_that_I_have.groupBy("country").agg(sum("members"), sum("some") * lit(5))
df1: org.apache.spark.sql.DataFrame = [country: string, sum(members): bigint, ((sum(some),mode=Complete,isDistinct=false) * 5): bigint]

scala> val df_that_I_want= df1.select($"Country",$"sum(Members)".alias("Members"), $"((sum(Some),mode=Complete,isDistinct=false) * 5)".alias("Some"))
df_that_I_want: org.apache.spark.sql.DataFrame = [Country: string, Members: bigint, Some: bigint]

scala> df_that_I_want.show

+-------+-------+----+
|Country|Members|Some|
+-------+-------+----+
|  India|     70|  10|
|  Japan|     30|  30|
+-------+-------+----+
Sign up to request clarification or add additional context in comments.

Comments

2

Please try this

df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))

Comments

0

df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))

lit function is used for creating the column of literal value which is 5 here.

when you are not able to multiply 5 directly, it is creating a column containing 5 and multiplying with it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.