2

I have a CSV file with below data :

1,2,5  
2,4  
2,3 

I want to load them into a Dataframe having schema of string of array

The output should be like below.

[1, 2, 5]  
[2, 4]  
[2, 3] 

This has been answered using scala here: Spark: Convert column of string to an array

I want to make it happen in Java.
Please help

2
  • The question you have attached uses DSL. It'll be almost similar in Java as well. Did you try writing anything? If yes, what error did you get? Commented Dec 7, 2017 at 6:26
  • I was trying to load it a RDD and attach schema to it as below : JavaRDD<Row> rowRDD = sparkSession.read().textFile("D:\\sanjaya\\OAWorkspace\\spark-basics\\src\\main\\resources\\marketbasketdata.csv") .javaRDD().map((Function<String, Row>) record -> { String[] attributes = record.split(","); return RowFactory.create(Arrays.asList(attributes)); <br> Commented Dec 7, 2017 at 6:42

2 Answers 2

4

Below is the sample code in Java. You need to read your file using spark.read().text(String path) method and then call the split function.

import static org.apache.spark.sql.functions.split;

public class SparkSample {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName("SparkSample")
                .master("local[*]")
                .getOrCreate();
        //Read file
        Dataset<Row> ds = spark.read().text("c://tmp//sample.csv").toDF("value");
        ds.show(false);     
        Dataset<Row> ds1 = ds.select(split(ds.col("value"), ",")).toDF("new_value");
        ds1.show(false);
        ds1.printSchema();
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0

you can use VectorAssembler class to create as array of features, which is particulary useful with pipelines:

val assembler = new VectorAssembler()
  .setInputCols(Array("city", "status", "vendor"))
  .setOutputCol("features")

https://spark.apache.org/docs/2.2.0/ml-features.html#vectorassembler

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.