Call function in pyspark with values from dataframe as strings

Question

I have to call a function func_test(spark,a,b) which accepts two string values and create a df out of it. spark is a SparkSession variable These two string values are two columns of another dataframe and would be different for different rows of that dataframe.

I am unable to achieve this.

Things tried so far:
1.

ctry_df = func_test(spark, df.select("CTRY").first()["CTRY"],df.select("CITY").first()["CITY"])

Gives CTRY and CITY of only the first record of the df.

2.

ctry_df = func_test(spark, df['CTRY'],df['CITY'])

Gives Column<b'CTRY'> and Column<b'CITY'> as values.

Example: df is:

+----------+----------+-----------+
|     CTRY |     CITY |    XYZ    |
+----------+----------+-----------+
|      US  |     LA   |      HELLO|                                    
|      UK  |     LN   |      WORLD|
|      SN  |     SN   |      SPARK|
+----------+----------+-----------+

So, I want first call to fetch func_test(spark,US,LA); second call to go func_test(spark,UK,LN); third call to be func_test(spark,SN,SN) and so on.

Pyspark - 3.7
Spark - 2.2

Edit 1:

Issue in detail:

func_test(spark,string1,string2) is a function which accepts two string values. Inside this function is a set of various dataframe operations done. For example:- First spark sql in the func_test is a normal select and these two variables string1 and string2 are used in the where clause. The result of this spark sql which generates a df is a temp table of next spark sql and so on. Finally, it creates a df which this function func_test(spark,string1,string2) returns.

Now, In the main class, I have to call this func_test and the two parameters string1 and string2 will be fetched from records of dataframe. So that, first func_test call generates query as select * from dummy where CTRY='US' and CITY='LA'. And the subsequent operations happen which results in df. Second call to func_test becomes select * from dummy where CTRY='UK' and CITY='LN'. Third call becomes select * from dummy where CTRY='SN' and CITY='SN' and so on.

I didn't get your question exactly, but I think you want to define new column by function on another 2 column value, Am I right? — yasi
– yasi, Commented Nov 18, 2019 at 10:37
No. I dont want to create a new column. I want to create a new dataframe ctry_df which does not have any relation with df. But ctry_df must have columns which are result of operation performed by func_test(a,b) — earl
– earl, Commented Nov 18, 2019 at 10:40
The schema would be different of the new dataframe. If it is unclear, I can try to explain the question in a more detailed form — earl
– earl, Commented Nov 18, 2019 at 10:58

jigyasu nayyar · Accepted Answer · 2019-11-19 13:25:42Z

1

instead of first() use collect() and iterate through the loop

collect_vals = df.select('CTRY','CITY').distinct().collect()
for row_col in collect_vals:
    func_test(spark, row_col['CTRY'],row_col['CITY'])

hope this helps !!

answered Nov 19, 2019 at 13:25

jigyasu nayyar

14110 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

earl Over a year ago

Why do we need distinct here? The combination of CTRY and CITY could be repeated also in some other row of the df.

earl Over a year ago

Removed distinct and now it is able to working close to expected. Thanks

jigyasu nayyar Over a year ago

I assume you need to use single combination once. but if you don't want to de-duplicate than distinct is not required.

earl Over a year ago

one input needed in continuation to this:- On first call of CTRY and CITY, a df(one row) is returned. On next call for next CTRY and CITY, next df(one more row) is returned. Is there a way to stack all these df one over the other and create a final df (or a temp table) with as many rows at the end?

jigyasu nayyar Over a year ago

You simply make a join with DF from where u r returning the filtered values. Join on the same key country and city.

|

Collectives™ on Stack Overflow

Call function in pyspark with values from dataframe as strings

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related