0

I am currently trying to filter my dataframe into an if and get the field returned into variable. Here is my code:

if df_table.filter(col(field).contains("val")):
   id_2 = df_table.select(another_field)
   print(id_2)
   # Recursive call with new variable

The problem is : it looks like the if filtering works, but id_2 gives me the column name and type where I want the value itself from that field. The output for this code is:

DataFrame[ID_1: bigint]
DataFrame[ID_2: bigint]
...

If I try collect like this : id_2 = df_table.select(another_field).collect() I get this : [Row(ID_1=3013848), Row(ID_1=319481), Row(ID_1=391948)...] which looks like just listing all id in a list.

I thought of doing : id_2 = df_table.select(another_field).filter(col(field).contains("val")) but I still get the same result as first attempt.

I would like my id_2 for each iteration of my loop to take value from the field I am filtering on. Like :

3013848
319481
...

But not a list from every value of matching fields from my dataframe.

Any idea on how I could get that into my variable ?

Thank you for helping.

3
  • Try with .collect. If you would like a deeper support, please provide a small reproducible example with your desired output. Commented Nov 30, 2022 at 16:24
  • @RicS Query edited Commented Dec 1, 2022 at 10:21
  • I don't understand what you are trying to do with the if df_table.filter(col(field).contains("val")), but in order to have the list of only ids (and not Row), try use list comprehension: result = [i[0] for i in id_2] Commented Dec 1, 2022 at 10:55

1 Answer 1

1

In fact, dataFrame.select(colName) is supposed to return a column(a dataframe of with only one column) but not the column value of the line. I see in your comment you want to do recursive lookup in a spark dataframe. The thing is, firstly, spark AFAIK, doesn't support recursive operation. If you have a deep recursive operation to do, you'd better collect the dataframe you have and do it on your driver without spark. Instead, you can use what library you want but you lose the advantage of treating the data in the distributive way. Secondly, spark isn't designed to do operations with iteration on each record. Try to achieve with join of dataframes, but it return to my first point, if your later operation of join depends on your join result, in a recursive way, just forget spark.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.