How can I add the values from dataframe A to a new column (sum) in dataframe B that contains the given pairs of dataframe A? Preferably with a UDF?
output should look like this:
dataframe A:
|id|value|
|--|-----|
|1 | 10|
|2 | 0.3|
|3 | 100|
dataframe B:(with added column sum)
|src|dst|sum |
|---|---|-----|
|1 |2 |10.3 |
|2 |3 |100.3|
|3 |1 |110 |
I've tried this
dfB = dfB.withColumn('sum', sum(dfB.source,dfB.dst,dfA))
def sum(src,dst,dfA):
return dfA.filter(dfA.id == src).collect()[0][1][0] + dfA.filter(dfA.id == dst).collect()[0][1][0]