1

I am trying to learn Spark GraphX on Windows 10 by replicating the code here. The code is developed using an older version of Spark and I'm not able to find a solution to create a vertex. The following is the code

import scala.util.MurmurHash
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val path = "F:/Soft/spark/2008.csv"
val df_1 = spark.read.option("header", true).csv(path)

val flightsFromTo = df_1.select($"Origin",$"Dest")
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString))

// error caused by the following line
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))

The following is the error message:

<console>:57: error: missing parameter type
       val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))
                                                                                  ^

I think the syntax is obsolete and I tried to find the latest syntax on official documents but it was of no help. The data set can be downloaded from here.

UPDATE:

Basically, I'm trying to create a Vertex and Edge, to finally create a graph as shown in the tutorial. I'm also new to the Map-Reduce paradigm.

3 Answers 3

4

The following lines of code worked for me.

// imported latest library - works without this too, just gives a warning
import scala.util.hashing.MurmurHash3

// datasets are set to rdd - this is the cause of the error
val flightsFromTo = df_1.select($"Origin",$"Dest").rdd
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString)).rdd

val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash3.stringHash(x), x))
Sign up to request clarification or add additional context in comments.

Comments

1

You can try: val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x(0)), x(1)))

1 Comment

It still gives the same error: <console>:37: error: missing parameter type val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x(0)), x(1)))
1

// In order to apply map(), just try to convert the variable to RDD.

val airportVertices: RDD[(VertexId, String)] = airportCodes.rdd.distinct().map(x => (MurmurHash3.stringHash(x), x))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.