2

Is there any function available for adding multiple Integer Column values and create a new column.

For example: Multiple counts to single total count column.

I hope concat will work only for String columns.

1 Answer 1

1

There are two easy ways of doing that. The first is simply using + and typing the column names out, the other is using a combination of add and reduce to sum many columns at once.

Below is an example, where two ways are shown to take the sum of all columns that have an x in their name (so we do not include column y1 in out total).

Hope this helps!

import pyspark.sql.functions as F
import pandas as pd

# SAMPLE DATA -----------------------------------------------------------------------
df = pd.DataFrame({'x1': [0,0,0,1,1],
                   'x2': [6,5,4,3,2],
                   'x3': [2,2,2,2,2],
                   'y1': [1,1,1,1,1]})
df = spark.createDataFrame(df)

# Sum by typing the column names explicitly
df = df.withColumn('total_1',F.col('x1') + F.col('x2') + F.col('x3'))

# Sum many columns without typing them out using reduce
import operator
import functools
cols_to_sum = [col for col in df.columns if 'x' in col] 
df = df.withColumn('total_2',functools.reduce(operator.add, [F.col(x) for x in cols_to_sum]))

df.show()

Output:

+---+---+---+---+-------+-------+
| x1| x2| x3| y1|total_1|total_2|
+---+---+---+---+-------+-------+
|  0|  6|  2|  1|      8|      8|
|  0|  5|  2|  1|      7|      7|
|  0|  4|  2|  1|      6|      6|
|  1|  3|  2|  1|      6|      6|
|  1|  2|  2|  1|      5|      5|
+---+---+---+---+-------+-------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.