1

I have a dataframe with two columns:

id (string), date (timestamp)

I would like to loop through the dataframe, and add a new column with an url, which includes the id. The algorithm should look something like this:

 add one new column with the following value:
 for each id
       "some url" + the value of the dataframe's id column

I tried to make this work in Scala, but I have problems with getting the specific id on the index of "a"

 val k = df2.count().asInstanceOf[Int]
      // for loop execution with a range
      for( a <- 1 to k){
         // println( "Value of a: " + a );
         val dfWithFileURL = dataframe.withColumn("fileUrl", "https://someURL/" + dataframe("id")[a])

      }

But this

dataframe("id")[a]

is not working with Scala. I could not find solution yet, so every kind of suggestions are welcome!

6
  • 5
    Do you even need a loop? df2.withColumn("fileUrl", "https://someURL/" + $"id") might work? Commented Feb 26, 2019 at 10:48
  • 2
    You want to add as many columns as there is rows ? This won't scale at all and if you don't need it to be scalable, there is not reason to use a Spark DataFrame... Commented Feb 26, 2019 at 10:55
  • So the solution of @zacdav throws the following error: error: type mismatch; found : String required: org.apache.spark.sql.Column I'm working in a Databricks notebook, maybe it works in a different way? I try to investigate this too... Commented Feb 26, 2019 at 16:30
  • By the way, @eliasah, no, I want to add one column only with a url, which has the id added to the link, the id comes from the first column. Commented Feb 26, 2019 at 16:30
  • 1
    Then your pseudo-code is wrong because this is what it does. Commented Feb 26, 2019 at 16:35

2 Answers 2

2

You can simply use the withColumn function in Scala, something like this:

val df = Seq(
  ( 1, "1 Jan 2000" ),
  ( 2, "2 Feb 2014" ),
  ( 3, "3 Apr 2017" )
)
  .toDF("id", "date" )


// Add the fileUrl column
val dfNew = df
  .withColumn("fileUrl", concat(lit("https://someURL/"), $"id"))
  .show

My results:

scala results

Sign up to request clarification or add additional context in comments.

Comments

1

Not sure if this is what you require but you can use zipWithIndex for indexing.

data.show()

+---+---------------+
| Id|            Url|
+---+---------------+
|111|http://abc.go.org/|
|222|http://xyz.go.net/|
+---+---------------+   

import org.apache.spark.sql._
val df = sqlContext.createDataFrame(
data.rdd.zipWithIndex
.map{case (r, i) => Row.fromSeq(r.toSeq:+(s"""${r.getString(1)}${i+1}"""))},
    StructType(data.schema.fields :+ StructField("fileUrl", StringType, false))
)                            

Output:

df.show(false)

+---+---------------+----------------+
|Id |Url            |fileUrl         |
+---+---------------+----------------+
|111|http://abc.go.org/|http://abc.go.org/1|
|222|http://xyz.go.net/|http://xyz.go.net/2|
+---+---------------+----------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.