13

I have a dataframe which has columns around 400, I want to drop 100 columns as per my requirement. So i have created a Scala List of 100 column names. And then i want to iterate through a for loop to actually drop the column in each for loop iteration.

Below is the code.

final val dropList: List[String] = List("Col1","Col2",...."Col100”)

def drpColsfunc(inputDF: DataFrame): DataFrame = { 
    for (i <- 0 to dropList.length - 1) {
        val returnDF = inputDF.drop(dropList(i))
    }
    return returnDF
}

val test_df = drpColsfunc(input_dataframe) 

test_df.show(5)
9
  • I am getting a compile error that could not resolve "returnDF". Can anyone please help to fix this. Commented Sep 30, 2016 at 8:14
  • 2
    Please make the question self-contained. Why do you put part of your question in the comments? How do I ask a good question? Commented Sep 30, 2016 at 8:37
  • 2
    Please edit your question with the additional information you have added in the comments ! Commented Sep 30, 2016 at 8:38
  • @ Martin and Eliasah-- Done the changes in the question. Thanks Commented Sep 30, 2016 at 13:10
  • The issue that i was facing with above code is, I am getting a compile error that "could not resolve 'returnDF'". Can anyone please help to fix this. Commented Sep 30, 2016 at 14:25

5 Answers 5

32

If you just want to do nothing more complex than dropping several named columns, as opposed to selecting them by a particular condition, you can simply do the following:

df.drop("colA", "colB", "colC")
Sign up to request clarification or add additional context in comments.

Comments

21

Answer:

val colsToRemove = Seq("colA", "colB", "colC", etc) 

val filteredDF = df.select(df.columns .filter(colName => !colsToRemove.contains(colName)) .map(colName => new Column(colName)): _*) 

3 Comments

Works fine. Could you please elaborate significance of "_*" here ?
on : _* : if you know python it's similar to the unpack unary operator * you put in front of a list for example. The expression above before : _* is a sequence of Column (more precisely an Array[Column]), but (one form of) select expects a varargs field of Column, ie. a variable number of Column objects. See for example: alvinalexander.com/scala/…
df.drop(colsToRemove : _*) this is a simpler|cleaner solution.
12

This should work fine :

val dropList : List[String]  |
val df : DataFrame  |
val test_df = df.drop(dropList : _*) 

Comments

4

You can just do,

def dropColumns(inputDF: DataFrame, dropList: List[String]): DataFrame = 
    dropList.foldLeft(inputDF)((df, col) => df.drop(col))

It will return you the DataFrame without the columns passed in dropList.

As an example (of what's happening behind the scene), let me put it this way.

scala> val list = List(0, 1, 2, 3, 4, 5, 6, 7)
list: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7)

scala> val removeThese = List(0, 2, 3)
removeThese: List[Int] = List(0, 2, 3)

scala> removeThese.foldLeft(list)((l, r) => l.filterNot(_ == r))
res2: List[Int] = List(1, 4, 5, 6, 7)

The returned list (in our case, map it to your DataFrame) is the latest filtered. After each fold, the latest is passed to the next function (_, _) => _.

Comments

2

You can use the drop operation to drop multiple columns. If you are having column names in the list that you need to drop than you can pass that using :_* after the column list variable and it would drop all the columns in the list that you pass.

Scala:

val df = Seq(("One","Two","Three"),("One","Two","Three"),("One","Two","Three")).toDF("Name","Name1","Name2")
val columnstoDrop = List("Name","Name1")
val df1 = df.drop(columnstoDrop:_*)

Python: In python you can use the * operator to do the same stuff.

data = [("One", "Two","Three"), ("One", "Two","Three"), ("One", "Two","Three")]
columns = ["Name","Name1","Name2"]
df = spark.sparkContext.parallelize(data).toDF(columns)
columnstoDrop = ["Name","Name1"]
df1 = df.drop(*columnstoDrop)

Now in df1 you would get the dataframe with only one column i.e Name2.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.