0

I have HIVE-table with 2 columns (string, array<struct<type=string, cnt=int>>) like this:
||id || params ||
|| id1 || [{type=A, cnt=4}, {type=B, cnt=2}]
|| id2 || [{type=A, cnt=3}, {type=C, cnt=1}, {type=D, cnt=0}]
|| id3 || [{type=E, cnt=1}]

I need to transform it to table with separated int columns, where columns name are 'types' and values are equal to cnt:

|| id || A || B || C || D || E ||
|| id1 || 4 || 2 || NULL || NULL || NULL ||
|| id2 || 3 || NULL || 1 || 0 || NULL ||
|| id3 || NULL || NULL || NULL || NULL || 1 ||
What is the best and effective way to transform table? Both in Spark SQL and PySpark style. Thank you.

1
  • 2
    explode array, extract values(each value in new colum, if exists - value, if not - 0) and then group by id and use aggregate sum function. Commented Oct 16, 2020 at 13:53

1 Answer 1

1

Try this - not sure if sum required or not, but seems safe to assume:

from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType

# Some variation in your data
df = spark.createDataFrame([(1, ["type=AA, cnt=4", "type=B, cnt=2222"]),
                            (2, ["type=AA, cnt=3", "type=C, cnt=1", "type=D, cnt=0"]),
                            (3, ["type=E, cnt=1"])],["id", "params"])
# Explode
df = df.select(df.id, F.explode(df.params))

# Make separate cols and trip leading strings & convert to Int
split_col = F.split(df['col'], ',')
df = df.withColumn('type', split_col.getItem(0)).withColumn('count', split_col.getItem(1)).drop('col')
df = df.withColumn('type',F.expr("substring(type, 6, length(type))")).withColumn('count',F.expr("substring(count, 6, length(count))").cast(IntegerType()))

# Pivot to your format
df.groupBy("id").pivot("type").agg(F.sum("count")).sort(F.col("id").asc()).show()

returns:

+---+----+----+----+----+----+
| id|  AA|   B|   C|   D|   E|
+---+----+----+----+----+----+
|  1|   4|2222|null|null|null|
|  2|   3|null|   1|   0|null|
|  3|null|null|null|null|   1|
+---+----+----+----+----+----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.