Convert string type to array type in spark sql

Question

I have table in Spark SQL in Databricks and I have a column as string. I converted as new columns as Array datatype but they still as one string. Datatype is array type in table schema

Column as String

Data1 
[2461][2639][2639][7700][7700][3953]

Converted to Array

Data_New
["[2461][2639][2639][7700][7700][3953]"]

String to array conversion

df_new = df.withColumn("Data_New", array(df["Data1"]))

Then write as parquet and use as spark sql table in databricks

When I search for string using array_contains function I get results as false

select *
from table_name
where array_contains(Data_New,"[2461]")

When I search for all string then query turns the results as true

Please suggest if I can separate these string as array and can find any array using array_contains function.

blackbishop · Accepted Answer · 2020-01-10 18:31:02Z

2

Just remove leading and trailing brackets from the string then split by ][ to get an array of strings:

df = df.withColumn("Data_New", split(expr("rtrim(']', ltrim('[', Data1))"), "\\]\\["))
df.show(truncate=False)

+------------------------------------+------------------------------------+
|Data1                               |Data_New                            |
+------------------------------------+------------------------------------+
|[2461][2639][2639][7700][7700][3953]|[2461, 2639, 2639, 7700, 7700, 3953]|
+------------------------------------+------------------------------------+

Now use array_contains like this:

df.createOrReplaceTempView("table_name")

sql_query = "select * from table_name where array_contains(Data_New,'2461')"
spark.sql(sql_query).show(truncate=False)

answered Jan 10, 2020 at 18:31

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LaSul · Accepted Answer · 2020-01-10 15:53:29Z

0

Actually this is not an array, this is a full string so you need a regex or similar

expr = "[2461]"
df_new.filter(df_new["Data_New"].rlike(expr))

answered Jan 10, 2020 at 15:53

LaSul

2,4412 gold badges23 silver badges39 bronze badges

Comments

E.ZY. · Accepted Answer · 2020-01-16 16:23:15Z

-1

import

from pyspark.sql import functions as sf, types as st

create table

a = [["[2461][2639][2639][7700][7700][3953]"], [None]]
sdf = sc.parallelize(a).toDF(["col1"])
sdf.show()
+--------------------+
|                col1|
+--------------------+
|[2461][2639][2639...|
|                null|
+--------------------+

convert type

def spliter(x):
    if x is not None:
        return x[1:-1].split("][")
    else:
        return None
udf = sf.udf(spliter, st.ArrayType(st.StringType()))
sdf.withColumn("array_col1", udf("col1")).withColumn("check", sf.array_contains("array_col1", "2461")).show()
+--------------------+--------------------+-----+
|                col1|          array_col1|check|
+--------------------+--------------------+-----+
|[2461][2639][2639...|[2461, 2639, 2639...| true|
|                null|                null| null|
+--------------------+--------------------+-----+

edited Jan 16, 2020 at 16:23

answered Jan 10, 2020 at 16:08

E.ZY.

7465 silver badges12 bronze badges

4 Comments

user2841795 Over a year ago

Looks like this will work but when try to write dataframe to parquet file I get error. File "<command-1489663418629327>", line 2, in <lambda> TypeError: 'NoneType' object is not subscriptable Please suggest

E.ZY. Over a year ago

do you have missing data like nan in your data frame? if that is the case, I need to modify the UDF to filter out the missing value

user2841795 Over a year ago

Yes, there is missing values in dataframe.

E.ZY. Over a year ago

I updated the answer, please be in caution, if your df is very large,python udf will hurt your performance, so blackbishop's answer is better in terms of performance

Collectives™ on Stack Overflow

Convert string type to array type in spark sql

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related