1

I am using spark 2.2 and java 1.8

sample XML format -

         <?xml version="1.0" encoding="UTF-8" ?>
         <!-- Generated by Oracle DVM Editor version 1.0 at [9/6/11 5:14 PM]. 
         -->
      <dvm name="CIHSubscriptionTypeMapping" xmlns="http://xmlns.oracle.com/dvm">
       <description>
      </description>
     <columns>
       <column name="SSPMW"/>
       <column name="CIH"/>
      </columns>
    <rows>   

      <row>
        <cell>ute.recordClass</cell>
        <cell>sku_type</cell>
      </row>    
      <row>
        <cell>ute.name.en</cell>
        <cell>name_en</cell>
      </row>

     </rows>
 </dvm>

Reading XML file in spark with java API -

Dataset<Row> xmlDF =spark.read()
    .format("com.databricks.spark.xml")
    .options("rowTag","row")
    .load("sample.xml");



xmlDF.printSchema()

root
 |-- cell: array (nullable = true)
 |    |-- element: string (containsNull = true)


xmlDF.show(false)


cell
================
         [ute.recordClass, sku_type] 
         [ute.name.en, name_en]  

I want convert above "cell" column to Lookup Map(String,String) and later will use it for broadcast.

Example - (ute.sku.price,list_price)......

Can some one help on this?.Thanks.

0

1 Answer 1

1

You can use map inbuilt function as

import org.apache.spark.sql.*;

xmlDF.select(functions.map(functions.col("cell").getItem(0), functions.col("cell").getItem(1)).as("cell")).show(false);

which should give you

+-----------------------------+
|cell                         |
+-----------------------------+
|[ute.recordClass -> sku_type]|
|[ute.name.en -> name_en]     |
+-----------------------------+

root
 |-- cell: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

Update

You commented as you want in just java map. For that you can use following method

    List<Row> rows = xmlDF.select(functions.col("cell").getItem(0).as("key"), functions.col("cell").getItem(1).as("value")).collectAsList();
    Map<String, String> hashMap = new HashMap<String, String>();
    for(Row row : rows){
        hashMap.put(row.getString(0), row.getString(1));
    }

hashMap is a java map.

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Ramesh. I want above output in just java map. Can you please help on this?.Thx
@Sekhar I have updated the answer. if the answer is helpful please consider accepting it
Thanks Ramesh for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.