Spark create a dataframe from multiple lists/arrays

Question

So, I have 2 lists in Spark(scala). They both contain the same number of values. The first list a contains all strings and the second list b contains all Long's.

a: List[String] = List("a", "b", "c", "d")
b: List[Long] = List(17625182, 17625182, 1059731078, 100)

I also have a schema defined as follows:

val schema2=StructType(
  Array(
    StructField("check_name", StringType, true),
    StructField("metric", DecimalType(38,0), true)
  )
)

What is the best way to convert my lists to a single dataframe, that has schema schema2 and the columns are made from a and b respectively?

mck · Accepted Answer · 2021-03-15 14:28:21Z

3

You can create an RDD[Row] and convert to Spark dataframe with the given schema:

val df = spark.createDataFrame(
    sc.parallelize(a.zip(b).map(x => Row(x._1, BigDecimal(x._2)))), 
    schema2
)

df.show
+----------+----------+
|check_name|    metric|
+----------+----------+
|         a|  17625182|
|         b|  17625182|
|         c|1059731078|
|         d|       100|
+----------+----------+

answered Mar 15, 2021 at 14:28

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Emiliano Martinez · Accepted Answer · 2021-03-15 14:38:11Z

0

Using Dataset:

import spark.implicits._
case class Schema2(a: String, b: Long)

val el = (a zip b) map { case (a, b) => Schema2(a, b)}
val df = spark.createDataset(el).toDF()

answered Mar 15, 2021 at 14:38

Emiliano Martinez

4,1432 gold badges13 silver badges21 bronze badges

Collectives™ on Stack Overflow

Spark create a dataframe from multiple lists/arrays

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related