I have "a.txt" which is in csv format and is separated by tabs:
16777216 16777471 -33.4940 143.2104
16777472 16778239 Fuzhou 26.0614 119.3061
Then I run:
sc.textFile("path/to/a.txt").map(line => line.split("\t")).toDF("startIP", "endIP", "City", "Longitude", "Latitude")
THen I got:
java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match. Old column names (1): value New column names (5): startIP, endIP, City, Longitude, Latitude at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.Dataset.toDF(Dataset.scala:376) at org.apache.spark.sql.DatasetHolder.toDF(DatasetHolder.scala:40) ... 47 elided
If I just run:
res.map(line => line.split("\t")).take(2)
I got:
rdd: Array[Array[String]] = Array(Array(16777216, 16777471, "", -33.4940, 143.2104), Array(16777472, 16778239, Fuzhou, 26.0614, 119.3061))
What is wrong here?