0
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.commons.lang.time.StopWatch;
import java.util.ArrayList;
import java.util.List;

public class Prime {

    //Method to calculate and count the prime numbers
    public List<Integer> countPrime(int n){
        List<Integer> primes = new ArrayList<>();
        for (int i = 2; i < n; i++){
            boolean  isPrime = true;

            //check if the number is prime or not
            for (int j = 2; j < i; j++){
                if (i % j == 0){
                    isPrime = false;
                    break;  // exit the inner for loop
                }
            }

            //add the primes into the List
            if (isPrime){
                primes.add(i);
            }
        }
        return primes;
    }

    //Main method to run the program
    public static void main(String[]args){
        StopWatch watch = new StopWatch();
        watch.start();

        //creating javaSparkContext object
        SparkConf conf = new SparkConf().setAppName("haha").setMaster("local[2]");
        JavaSparkContext sc = new JavaSparkContext(conf);

        //new prime object
        Prime prime = new Prime();
        //prime.countPrime(1000000);

        //parallelize the collection
        JavaRDD<Integer> rdd = sc.parallelize(prime.countPrime(1000000),12);
        long count = rdd.filter(e  -> e == 2|| e % 2 != 0).count();


        //Stopping the execution time and printing the results
        watch.stop();
        System.out.println("Total time took to run the process is " + watch);
        System.out.println("The number of prime between 0 to 1000000  is " + count);
        sc.stop();
    }
}

Hi there , i have this following code which parallelize an algorithm. The algorithm counts the number of prime in a given range. But the code is only parallelizing the list of primes but not the process itself. How can modify the code to parallelize process of finding the primes?

1 Answer 1

1

It's an order of operations issue - you're running prime.CountPrime before you've created your Spark RDD. Spark runs operations in parallel that are defined within the RDD object's map, reduce, filter, etc. operations. You need to rethink your approach:

  1. Use sc.range(1, 1000000, 1, 12) to create an RDD of all integers from 1 to 1,000,000.

  2. Create an isPrime(int n) method to evaluate if a given integer is prime.

  3. filter your RDD on the condition of your isPrime method (this is the part that will execute in parallel).

  4. count the filtered RDD.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.