CAn anyone tell me please what is wrong with my code: Below is my spark code in scala:
import java.text.SimpleDateFormat
import org.apache.spark.sql.SparkSession
import scala.xml.XML
object TopTenTags09 {
def main(args:Array[String]){
val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS")
val format2 = new SimpleDateFormat("yyyy-MM")
val spark = SparkSession.builder().appName("Number of posts which are questions and contains specified words").master("local").getOrCreate()
val data = spark.read.textFile("/home/harsh/Hunny/HadoopPractice/Spark/DF/StackOverFlow/Posts.xml").rdd
val result = data.filter{line=>{line.trim().startsWith("<row")}}
.filter{line=>{line.contains("PostTypeId=\"1\"")}}
.map { line=>{
val xml = XML.loadString(line)
if(xml.attribute("Tags").mkString.toLowerCase().contains("hadoop") ||
xml.attribute("Tags").mkString.toLowerCase().contains("spark")){
(Integer.parseInt(xml.attribute("Score").toString()),Integer.parseInt(xml.attribute("Score").toString()))
}
}}/*.filter(line=>line._1>2)
.sortByKey(false)*/
result.foreach(println) //throwing error while printing
spark.stop
}
}
And below is the error I am getting while running it:
java.lang.NumberFormatException: For input string: "Some(12)"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
I am new to spark and the the error is making me crazy because as mentioned in error ther is no "Some" in code or in data.Can anyone help me please. Sample data
<row Id="5" PostTypeId="1" CreationDate="2014-05-13T23:58:30.457" Score="7" ViewCount="286" Body="<p>I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?</p>

<p>For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.</p>

<p>Obviously, randomly generating code would be impractical, so how could I do this?</p>
" OwnerUserId="5" LastActivityDate="2014-05-14T00:36:31.077" Title="How can I do simple machine learning without hard-coding behavior?" Tags="<machine-learning>" AnswerCount="1" CommentCount="1" FavoriteCount="1" ClosedDate="2014-05-14T14:40:25.950" />
<row Id="7" PostTypeId="1" AcceptedAnswerId="10" CreationDate="2014-05-14T00:11:06.457" Score="2" ViewCount="266" Body="<p>As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.</p>
" OwnerUserId="36" LastEditorUserId="97" LastEditDate="2014-05-16T13:45:00.237" LastActivityDate="2014-05-16T13:45:00.237" Title="What open-source books (or other materials) provide a relatively thorough overview of data science?" Tags="<education><open-source>" AnswerCount="3" CommentCount="4" FavoriteCount="1" ClosedDate="2014-05-14T08:40:54.950" />
<row Id="9" PostTypeId="2" ParentId="5" CreationDate="2014-05-14T00:36:31.077" Score="4" Body="<p>Not sure if this fits the scope of this SE, but here's a stab at an answer anyway.</p>

<p>With all AI approaches you have to decide what it is you're modelling and what kind of uncertainty there is. Once you pick a framework that allows modelling of your situation, you then see which elements are "fixed" and which are flexible. For example, the model may allow you to define your own network structure (or even learn it) with certain constraints. You have to decide whether this flexibility is sufficient for your purposes. Then within a particular network structure, you can learn parameters given a specific training dataset.</p>

<p>You rarely hard-code behavior in AI/ML solutions. It's all about modelling the underlying situation and accommodating different situations by tweaking elements of the model.</p>

<p>In your example, perhaps you might have the robot learn how to detect obstacles (by analyzing elements in the environment), or you might have it keep track of where the obstacles were and which way they were moving.</p>
" OwnerUserId="51" LastActivityDate="2014-05-14T00:36:31.077" CommentCount="0" />
<row Id="10" PostTypeId="2" ParentId="7" CreationDate="2014-05-14T00:53:43.273" Score="9" Body="<p>One book that's freely available is "The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman (published by Springer): <a href="http://statweb.stanford.edu/~tibs/ElemStatLearn/">see Tibshirani's website</a>.</p>

<p>Another fantastic source, although it isn't a book, is Andrew Ng's Machine Learning course on Coursera. This has a much more applied-focus than the above book, and Prof. Ng does a great job of explaining the thinking behind several different machine learning algorithms/situations.</p>
" OwnerUserId="22" LastActivityDate="2014-05-14T00:53:43.273" CommentCount="1" />
<row Id="14" PostTypeId="1" CreationDate="2014-05-14T01:25:59.677" Score="14" ViewCount="686" Body="<p>I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed.</p>

<p>My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. What are the differences between Data Science and Data Mining and in particular what more would I need to look at to become proficient in Data Mining?</p>
" OwnerUserId="66" LastEditorUserId="322" LastEditDate="2014-06-17T16:17:20.473" LastActivityDate="2014-06-20T17:36:05.023" Title="Is Data Science the Same as Data Mining?" Tags="<data-mining><definitions>" AnswerCount="4" CommentCount="1" FavoriteCount="2" />
foreachis the first action... You would have to look one or two line numbers above it, usually. But I think it doesn't matter any more, in this case the error is probably in theInteger.parseInt-line.