1

I'm writing a test program in java and would like to parallelize a list object.

SparkSession spark = SparkSession
      .builder()
      .master("local[*]")
      .appName("JavaWordCount")
      .getOrCreate();

System.out.println("hello");
List<String> l = new ArrayList<>(5);
l.add("view.txt");
spark.sparkContext().parallelize(l,1,"test");

The method parallelize(Seq, int, ClassTag) in the type SparkContext is not applicable for the arguments (List, int, String)

I'm not sure what would be the third parameter - class Tag

2 Answers 2

5

In general when working with Java you should prefer JavaSparkContext methods:

import org.apache.spark.api.java.JavaSparkContext;

JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());

JavaRDD<String> rdd = jsc.parallelize(l, 1);

SparkContext is intended mostly for Scala usage.

Sign up to request clarification or add additional context in comments.

Comments

0

Give you want to parallelize a list of String, this should do:

ClassTag.apply(String.class)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.