Spark Dataframe-Python

Question

In pandas, I can successfully run the following:

def car(t)
    if t in df_a:
       return df_a[t]/df_b[t]
    else:
       return 0

But how can I do the exact same thing with spark dataframe?Many thanks!
The data is like this

df_a
a 20
b 40
c 60

df_b
a 80
b 50
e 100

The result should be 0.25 when input car(a)

I am using hadoop, just want to convert the code from pandas to spark — Vin Lam
– Vin Lam, Commented Oct 18, 2016 at 13:43
Yes but what does that function do, you should show the input and the output. — Alberto Bonsanto
– Alberto Bonsanto, Commented Oct 18, 2016 at 14:18
df_a contain the id, I run df_a.value_counts() before I run the code above. — Vin Lam
– Vin Lam, Commented Oct 18, 2016 at 16:01

Alberto Bonsanto · Accepted Answer · 2016-10-18 17:54:05Z

3

First you have to join both dataframes, then you have to filter by the letter you want and select the operation you need.

df_a = sc.parallelize([("a", 20), ("b", 40), ("c", 60)]).toDF(["key", "value"])
df_b = sc.parallelize([("a", 80), ("b", 50), ("e", 100)]).toDF(["key", "value"])

def car(c):
  return df_a.join(df_b, on=["key"]).where(df_a["key"] == c).select((df_a["value"] / df_b["value"]).alias("ratio")).head()

car("a")

# Row(ratio=0.25)

edited Oct 18, 2016 at 17:54

answered Oct 18, 2016 at 16:27

Alberto Bonsanto

18.1k10 gold badges67 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vin Lam Over a year ago

One more question, can the input be a dataframe? I would like to input a dataframe df_c which conatin the key and then the car() will loop through each row of the key in df_c and then the output will the ratio for each key.

Alberto Bonsanto Over a year ago

You have to show me an example first. However, avoid thinking in such imperative way, spark is lazy and most of computation is done in parallel

Collectives™ on Stack Overflow

Spark Dataframe-Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related