0

I executed simple sample (spark, Windows7) and get unexpected error message FileAlreadyExistsException. I cannot find the folder or file on my computer.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/PluralsightData/ReadMeWordCountViaApp already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1191) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)

package main

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

object WordCounter {
    def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("Word Counter")
        val sc = new SparkContext(conf)
        //val textFile = sc.textFile("file:///Spark/README.md")
        val textFile = sc.textFile("file:///README.md")
        val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
        val countPrep = tokenizedFileData.map(word=>(word, 1))
        val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
        val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)
        sortedCounts.saveAsTextFile("file:///PluralsightData/ReadMeWordCountViaApp")
    }
}

Sources of the sample can be found https://github.com/constructor-igor/TechSugar/tree/master/ScalaSamples/WordCounterSample

4
  • 1
    Well... it is as clear as it says that output directory already exists and thus your output saveAsTextFile will not work. Most big-data frameworks prefer to avoid the chances of over-writing any existing data. So... they do not allow output in existing directories. Just pick some other directory for your output. Commented Feb 6, 2017 at 13:50
  • How can I found directory where saveAsTextFile store result and open it? Commented Feb 6, 2017 at 16:13
  • 1
    What about using an absolute path like "file:///C:/temp/WordCount? Or look at stackoverflow.com/questions/38669206/… about some possible glitches across Spark versions. Commented Feb 6, 2017 at 22:28
  • yes, it solved my issue. thank you. Commented Feb 7, 2017 at 9:51

1 Answer 1

1

According to comments:

  1. Spark prefer to avoid over-writing any existing data.

  2. Absolute path of target file allows to find result's data on local disk.

    sortedCounts.saveAsTextFile("file:///C:/temp/ReadMeWordCountViaApp")

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.