3

I have a dataframe having data as below

Key  Today  MTD  QTD  HTD  YTD 
K1   10     20   10   20   50
K2   20     30   20   10   60

I am looking output like

Key  PRD     Amt
K1   Today   10
K1   MTD     20
K1   QTD     10
K1   HTD     20
K1   YTD     50

I tried working with Pivot but it gives other way. I am not sure if I can use flat map or map? Please advise.

3
  • converting a coulmn to row does not make sense. In general for any table like strucutre we call a vertical sequence as column and a horizontal sequence as row. But if we look at a column or row without the context of a table then both are just sequences. Commented Aug 19, 2016 at 16:59
  • Also I don't think anyone can understand your question by looking at its current state. I will try to reformat it from my understanding of your question. Commented Aug 19, 2016 at 17:00
  • 1
    @SarveshKumarSingh I beg to differ. There is a reason why the "melt.data.frame" function exists in R -- I use it all the time. Commented Oct 6, 2017 at 19:52

1 Answer 1

6
import org.apache.spark.sql._
import spark.implicits._

val list = List(("K1", 10, 20, 10, 20,50), ("K2", 20, 30, 20, 10, 60))
val yourDF = sc.parallelize(list).toDF("Key", "Today", "MTD", "QTD", "HTD", "YTD")

// yourDF.show()
// +---+-----+---+---+---+---+
// |Key|Today|MTD|QTD|HTD|YTD|
// +---+-----+---+---+---+---+
// | K1|   10| 20| 10| 20| 50|
// | K2|   20| 30| 20| 10| 60|
// +---+-----+---+---+---+---+

val newDataFrame = yourDF
  .rdd
  .flatMap(row => {
    val key = row.getString(0)
    val todayAmt = row.getInt(1)
    val mtdAmt = row.getInt(2)
    val qtdAmt = row.getInt(3)
    val htdAmt = row.getInt(4)
    val ytdAmt = row.getInt(5)

    List(
      (key, "today", todayAmt),
      (key, "MTD", mtdAmt),
      (key, "QTD", qtdAmt),
      (key, "HTD", htdAmt),
      (key, "YTD", ytdAmt)
    )
  })
  .toDF("Key", "PRD", "Amt" )

// newDataFrame.show()
// +---+-----+---+
// |Key|  PRD|Amt|
// +---+-----+---+
// | K1|today| 10|
// | K1|  MTD| 20|
// | K1|  QTD| 10|
// | K1|  HTD| 20|
// | K1|  YTD| 50|
// | K2|today| 20|
// | K2|  MTD| 30|
// | K2|  QTD| 20|
// | K2|  HTD| 10|
// | K2|  YTD| 60|
// +---+-----+---+
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks this helps :)
when I run this code on my streaming dataframe, I get java.lang.ClassNotFoundException: scala.Any. Any idea why?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.