5

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicable for the arguments (new Function(){})

I am following spark 1.3 documentation as well. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Have any one solution?

Here is my testing code.

   // SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();
2
  • could you please provide your full code? btw which version os SPARK are you using? (>1.3.0) Commented Apr 22, 2015 at 8:46
  • I am using spark 1.3.1 with spark-sql 1.3.1 Commented Apr 23, 2015 at 9:52

6 Answers 6

12

Change this to:

Java 6 & 7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();

Once you call javaRDD() it works just like any other RDD map function.

This works with Spark 1.3.0 and up.

Sign up to request clarification or add additional context in comments.

1 Comment

What happens if you transform it in an RDD? Is the transform lazy? Is the memory moved into a new structure? Can the execution still be optimize?
2

No need to convert to RDD, its delays the execution it can be done as below

`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

// Prior to java 1.8 
Encoder<String> encoder = Encoders.STRING();
    List<String> rowsList = df.map((new MapFunction<Row, String>() {
        private static final long serialVersionUID = 1L;

        @Override
        public String call(Row row) throws Exception {
            return "string:>" + row.getString(0).toString() + "<";
        }
    }), encoder).collectAsList();

// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);

}`

1 Comment

Which spark version are you using for this?
0

Do you have the correct dependency set in your pom. Set this and try

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>

2 Comments

I am using below dependencies with Java 1.7 [code] <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>1.3.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>1.3.1</version> </dependency> [/code]
Documentation says you can run all the normal functions of JavaRDD against DataFrames but that does not appear to be the case here. I was able to reproduce your problem. map() method of DataFrame class expects 2 arguments. May be explicitly convert the Dataframe to RDD as teenagers.javaRDD() then apply the map.
0

try this:

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.toJavaRDD().map(
        new Function<Row, String>() {
      public String call(Row row) {
        return "Name: " + row.getString(0);
      }
    }).collect();

you have to transforme your DataFrame to javaRDD

Comments

0

check if you are using the correct import for

Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

Comments

0

Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.