6

This is my dataframe I'm trying to drop the duplicate columns with same name using index:

df = spark.createDataFrame([(1,2,3,4,5)],['c','b','a','a','b'])
df.show()

Output:

+---+---+---+---+---+
|  c|  b|  a|  a|  b|
+---+---+---+---+---+
|  1|  2|  3|  4|  5|
+---+---+---+---+---+

I got the index of the dataframe

col_dict = {x: col for x, col in enumerate(df.columns)}
col_dict

Output:

{0: 'c', 1: 'b', 2: 'a', 3: 'a', 4: 'b'}

Now i need to drop that duplicate column name with the same name

2 Answers 2

9

There is no method for droping columns using index. One way for achieving this is to rename the duplicate columns and then drop them.

Here is an example you can adapt:

df_cols = df.columns
# get index of the duplicate columns
duplicate_col_index = list(set([df_cols.index(c) for c in df_cols if df_cols.count(c) == 2]))

# rename by adding suffix '_duplicated'
for i in duplicate_col_index:
    df_cols[i] = df_cols[i] + '_duplicated'

# rename the column in DF
df = df.toDF(*df_cols)

# remove flagged columns
cols_to_remove = [c for c in df_cols if '_duplicated' in c]
df.drop(*cols_to_remove).show()

+---+---+---+
|  c|  a|  b|
+---+---+---+
|  1|  4|  5|
+---+---+---+
Sign up to request clarification or add additional context in comments.

Comments

0

@blackbishop's answer is a good one. I voted for it. However, there is one potential issue. If you have a unique column with a name like a_duplicated it will fail. This is unlikely, but with large volumes of user submitted info it is a concern and should not be neglected.

df_cols = df.columns
# get index of the duplicate columns
duplicate_col_index = list(set([df_cols.index(c) for c in df_cols if df_cols.count(c) == 2]))

# rename by adding suffix '_duplicated'
for i in duplicate_col_index:
    df_cols[i] = df_cols[i] + '_duplicated'

# rename the column in DF
df = df.toDF(*df_cols)

# remove flagged columns
cols_to_remove = [df_cols[i] for i in duplicate_col_index] # <--- Change!!! 

df.drop(*cols_to_remove).show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.