Get a Spark dataframe field into a String value

Question

I am currently trying to filter my dataframe into an if and get the field returned into variable. Here is my code:

if df_table.filter(col(field).contains("val")):
   id_2 = df_table.select(another_field)
   print(id_2)
   # Recursive call with new variable

The problem is : it looks like the if filtering works, but id_2 gives me the column name and type where I want the value itself from that field. The output for this code is:

DataFrame[ID_1: bigint]
DataFrame[ID_2: bigint]
...

If I try collect like this : id_2 = df_table.select(another_field).collect() I get this : [Row(ID_1=3013848), Row(ID_1=319481), Row(ID_1=391948)...] which looks like just listing all id in a list.

I thought of doing : id_2 = df_table.select(another_field).filter(col(field).contains("val")) but I still get the same result as first attempt.

I would like my id_2 for each iteration of my loop to take value from the field I am filtering on. Like :

3013848
319481
...

But not a list from every value of matching fields from my dataframe.

Any idea on how I could get that into my variable ?

Thank you for helping.

Try with .collect. If you would like a deeper support, please provide a small reproducible example with your desired output. — Ric S
– Ric S, Commented Nov 30, 2022 at 16:24
I don't understand what you are trying to do with the if df_table.filter(col(field).contains("val")), but in order to have the list of only ids (and not Row), try use list comprehension: result = [i[0] for i in id_2] — Ric S
– Ric S, Commented Dec 1, 2022 at 10:55

Young · Accepted Answer · 2022-12-01 15:59:10Z

1

In fact, dataFrame.select(colName) is supposed to return a column(a dataframe of with only one column) but not the column value of the line. I see in your comment you want to do recursive lookup in a spark dataframe. The thing is, firstly, spark AFAIK, doesn't support recursive operation. If you have a deep recursive operation to do, you'd better collect the dataframe you have and do it on your driver without spark. Instead, you can use what library you want but you lose the advantage of treating the data in the distributive way. Secondly, spark isn't designed to do operations with iteration on each record. Try to achieve with join of dataframes, but it return to my first point, if your later operation of join depends on your join result, in a recursive way, just forget spark.

answered Dec 1, 2022 at 15:59

Young

5943 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Get a Spark dataframe field into a String value

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related