12

I have pyspark dataframe with a column named Filters: "array>"

I want to save my dataframe in csv file, for that i need to cast the array to string type.

I tried to cast it: DF.Filters.tostring() and DF.Filters.cast(StringType()), but both solutions generate error message for each row in the columns Filters:

org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@56234c19

The code is as follows

from pyspark.sql.types import StringType

DF.printSchema()

|-- ClientNum: string (nullable = true)
|-- Filters: array (nullable = true)
    |-- element: struct (containsNull = true)
          |-- Op: string (nullable = true)
          |-- Type: string (nullable = true)
          |-- Val: string (nullable = true)

DF_cast = DF.select ('ClientNum',DF.Filters.cast(StringType())) 

DF_cast.printSchema()

|-- ClientNum: string (nullable = true)
|-- Filters: string (nullable = true)

DF_cast.show()

| ClientNum | Filters 
|  32103    | org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d9e517ce
|  218056   | org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3c744494

Sample JSON data:

{"ClientNum":"abc123","Filters":[{"Op":"foo","Type":"bar","Val":"baz"}]}

Thanks !!

4
  • Can you share the minimal code. Commented Apr 11, 2017 at 13:19
  • Can you print schema and show data before the transformation. Also print schema after the transformation. Commented Apr 11, 2017 at 13:30
  • The schema seems to be correct. Commented Apr 11, 2017 at 13:43
  • M not able to recreate the issue. Can you show data before the transformation. Commented Apr 11, 2017 at 13:50

3 Answers 3

9

I created a sample JSON dataset to match that schema:

{"ClientNum":"abc123","Filters":[{"Op":"foo","Type":"bar","Val":"baz"}]}

select(s.col("ClientNum"),s.col("Filters").cast(StringType)).show(false)

+---------+------------------------------------------------------------------+
|ClientNum|Filters                                                           |
+---------+------------------------------------------------------------------+
|abc123   |org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@60fca57e|
+---------+------------------------------------------------------------------+

Your problem is best solved using the explode() function which flattens an array, then the star expand notation:

s.selectExpr("explode(Filters) AS structCol").selectExpr("structCol.*").show()
+---+----+---+
| Op|Type|Val|
+---+----+---+
|foo| bar|baz|
+---+----+---+

To make it a single column string separated by commas:

s.selectExpr("explode(Filters) AS structCol").select(F.expr("concat_ws(',', structCol.*)").alias("single_col")).show()
+-----------+
| single_col|
+-----------+
|foo,bar,baz|
+-----------+

Explode Array reference: Flattening Rows in Spark

Star expand reference for "struct" type: How to flatten a struct in a spark dataframe?

Sign up to request clarification or add additional context in comments.

3 Comments

This creates columns in the top structure rather than a single column with the contents of all the columns as a string
@alfredox Updated to add single column version.
How do we join this back to the original ids? @GarrenS
8

For me in Pyspark the function to_json() did the job.

As a plus compared to the simple casting to String, it keeps the "struct keys" as well (not only the "struct values"). So for the reported example I would have something like:

[{"Op":"foo","Type":"bar","Val":"baz"}]

This was much more useful to me since that I had to write results to a Postgres table. In this format I can easily use supported JSON functions in Postgres

1 Comment

This doesn't seem to work with nested struct type
-3

You can try this:

DF = DF.withColumn('Filters', DF.Filters.cast("string"))

2 Comments

I tried, same result : org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3
I'd say you have to run UDF where you can apply some logic to convert array to string and then select new column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.