0

I have this structtype with over a 1000 fields, every field type is a string.

root
 |-- mac: string (nullable = true)
 |-- kv: struct (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_FEAT_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_FEAT_CODE: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_HELP_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_HELP_CODE: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_SYST_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_SYST_CODE: string (nullable = true)
 |    |-- FTP_SERVER_HELLO_B64: string (nullable = true)
 |    |-- FTP_STATUS_HELLO_CODE: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_ACTION_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_DETECTION_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_INPUT_PASSWORD_NAME_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_INPUT_TEXT_NAME_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_METHOD_0: string (nullable = true)
 |    |-- HTML_REDIRECT_TYPE_0: string (nullable = true)

I want to select only the fields which are non null, and some identifier of which fields are non-null. Is there anyway to convert this struct to an array without explicitly referring to each element ?

2
  • "select only the fields which are non null" across all the rows or row per row? What should be the result? How many fields should the result Dataset have? As many as non-null fields? One with another struct? Commented Sep 29, 2017 at 13:04
  • Ideally, it is a sparse representation of the complete data. So, per row all non null values should be present, something like (field_k:val1, field_l:val2, ..., field_n:valx) Commented Sep 29, 2017 at 13:11

1 Answer 1

1

I'd use an udf:

from pyspark.sql.types import *
from pyspark.sql.functions import udf

as_array = udf(
    lambda arr: [x for x in arr if x is not None], 
    ArrayType(StringType()))


df.withColumn("arr", as_array(df["kv"])))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.