2

I have a json string as below in a dataframe

  aaa  |  bbb |  ccc |ddd | eee          
 --------------------------------------
   100 | xxxx |  123 |yyy|2017
   100 | yyyy |  345 |zzz|2017
   200 | rrrr |  500 |qqq|2017
   300 | uuuu |  200 |ttt|2017
   200 | iiii |  500 |ooo|2017

I want to get the result as

 {100,[{xxxx:{123,yyy}},{yyyy:{345,zzz}}],2017}
 {200,[{rrrr:{500,qqq}},{iiii:{500,ooo}}],2017}
 {300,[{uuuu:{200,ttt}}],2017}

Kindly help

3
  • Your title and question don't match at all. Commented Sep 6, 2017 at 12:33
  • what should i mention Commented Sep 6, 2017 at 12:41
  • The output that you're suggesting isn't a json. Commented Sep 6, 2017 at 12:53

2 Answers 2

4

This works:

 val df = data
    .withColumn("cd", array('ccc, 'ddd)) // create arrays of c and d
    .withColumn("valuesMap", map('bbb, 'cd)) // create mapping
    .withColumn("values", collect_list('valuesMap) // collect mappings
                 .over(Window.partitionBy('aaa)))
    .withColumn("eee", first('eee) // e is constant, just get first value of Window
                 .over(Window.partitionBy('aaa)))
   .select("aaa", "values", "eee") // select only columns that are in the question selected
   .select(to_json(struct("aaa", "values", "eee")).as("value")) // create JSON

Make sure you do

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._`
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Gaweda. But partitionby is not working. But now the requirement is changes. I have list as below
@gayathri What do you mean by "not working"? I've tested it on your data. If you want to have plain String list, you can do collect()
Hi Gaweda, iam using IntelliJ(scala 2.10.6) .withColumn("valuesMap", map('bbb, 'cd)) , map is not recognized and aso over(Window.partitionBy('aaa)) not recognized. I have imported the sql.functions Pls help
map is inside functions object, so it should be visible
0

You can create a map defining the values as constants with lit() or taking them from other columns in the dataframe with $"col_name", like this:

val new_df = df.withColumn("map_feature", map(lit("key1"), lit("value1"), lit("key2"), $"col2"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.