0

I have a JSON as below

            {"uniqueTranId":"12345", "age":25, "name":"Maichael"}, 
            {"uniqueTranId":"67891", "age":30,"name":"Andy"},
            {"uniqueTranId":"54326", "age":19, "name":"Justin" }

From the Json I have a DataFrame as

                    +----+--------+------------+
                    | age|    name|uniqueTranId|
                    +----+--------+------------+
                    |  25|Maichael|       12345|
                    |  30|    Andy|       67891|
                    |  19|  Justin|       54326|
                    +----+--------+------------+

I would like to convert this DataFrame as below.

   List(
       ("12345"), Map["SomeConstant", Array[(uniqueTranId -> 12345, age -> 25, name -> Maichael)]] ,
       ("67891"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 30, name -> Andy)]],
       ("54326"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 19, name -> Justin)]] 
       )

The following is the Type I am looking for.

List([uniqueTranId,  Map["SomeConstant", Array[(json_key -> json_value)]])])    

Any immediate help is very much appreciated.

2
  • You do not need spark if your expected end-result is a scala list you collected from the DF.. Just perform the mapping using collected Rows Commented Aug 23, 2017 at 13:38
  • @Micheal Lemay...I tried...**val tempArray = df.collect.map(r => Map(df.columns.zip(r.toSeq):_))* Map(age -> 25, name -> Maichael, uniqueTranId -> 12345) Map(age -> 30, name -> Andy, uniqueTranId -> 67891) Map(age -> 19, name -> Justin, uniqueTranId -> 54326) and val temp = List(tempArray.map(p => (p.getOrElse("uniqueTranId", null), p)):_*) (12345,Map(age -> 25, name -> Maichael, uniqueTranId -> 12345)) (67891,Map(age -> 30, name -> Andy, uniqueTranId -> 67891)) (54326,Map(age -> 19, name -> Justin, uniqueTranId -> 54326))...after that no luck Commented Aug 23, 2017 at 13:49

1 Answer 1

1

This should do it..

val data = sc.parallelize(List(
  """{"uniqueTranId":"12345", "age":25, "name":"Maichael"}""", 
  """{"uniqueTranId":"67891", "age":30,"name":"Andy"}""",
  """{"uniqueTranId":"54326", "age":19, "name":"Justin" }"""))

val df = spark.read.json(data)
val collected = df.collect

collected.map(row => {
  (row.getString(row.fieldIndex("uniqueTranId")),
   Map("someconstant" -> row.getValuesMap(df.columns).map(x => (x._1, x._2.toString)).toArray))
})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.