0

I have a dataframe like this. I want to get "mert" ones in Names column and the Index of the Status column to the Name column is Active. I transform this from json but I can't filter. how can I do it?

+---+-----------------------------------------+------------------+
|ID |Names                                    | Status           |
+---+-----------------------------------------+------------------+
|1  |[[[aaaa, mert], [cccc, Doe]]]            | [Active, Active] |
|2  |[[[aaa, Michael], [ggg, ]]]              | [Active, Active] |
|3  |[[[cccc, mert], [gg, Merk  ]]]           | [Suspend, Active]|
|3  |[[[dddd, Angela], [fggg, Merl]]]         | [Active, Suspend]|
+---+-----------------------------------------+------------------+
2
  • Can you add what output you expect? Commented Dec 4, 2022 at 23:37
  • What is your schema? Post output of df.printSchema(). Commented Dec 5, 2022 at 10:38

1 Answer 1

1

It is not clear if your data type is arrays or string. From the problem context it looks like array.

If it is array, then:

  • remove outer layers of arrays with explode() (twice)
  • zip "Names" & "Status" using arrays_zip() (so it can be refered by same index)
  • filter record if array contains required value
df = spark.createDataFrame(data=[[1,[[["aaaa","mert"],["cccc","Doe"]]],["Active","Active"]],[2,[[["aaa","Michael"],["ggg",""]]],["Active","Active"]],[3,[[["cccc","mert"],["gg","Merk  "]]],["Suspend","Active"]],[4,[[["dddd","Angela"],["fggg","Merl"]]],["Active","Suspend"]]], schema=["ID","Names","Status"])

df = df.withColumn("Names2", F.explode("Names")) \
       .withColumn("Names2", F.explode("Names2")) \
       .withColumn("Names_Status", F.arrays_zip("Names2", "Status")) \
       .filter((F.array_contains(F.col("Names_Status").getField("Names2"), "mert")) \
               & (F.array_contains(F.col("Names_Status").getField("Status"), "Active"))) \
       .drop("Names2", "Names_Status")

[Out]:
+---+------------------------------+-----------------+
|ID |Names                         |Status           |
+---+------------------------------+-----------------+
|1  |[[[aaaa, mert], [cccc, Doe]]] |[Active, Active] |
|3  |[[[cccc, mert], [gg, Merk  ]]]|[Suspend, Active]|
+---+------------------------------+-----------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.