Renaming multiple identically named columns in Pandas DataFrame

Question

I have several columns named the same in a df. I need to rename them but the problem is that the df.rename method renames them all the same way. How I can rename the below blah(s) to blah1, blah4, blah5?

df = pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns = ['blah','blah2','blah3','blah','blah']
df

#     blah  blah2  blah3  blah  blah
# 0   0     1      2      3     4
# 1   5     6      7      8     9

Here is what happens when using the df.rename method:

df.rename(columns={'blah':'blah1'})

#     blah1  blah2  blah3  blah1  blah1
# 0   0      1      2      3      4
# 1   5      6      7      8      9

MaxU - stand with Ukraine · Accepted Answer · 2024-08-11 10:47:39Z

40

We can use the internal (undocumented) method:

In [38]: pd.io.parsers.base_parser.ParserBase({'names':df.columns, 'usecols':None})._maybe_dedup_names(df.columns)
Out[38]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']

This is the "magic" function:

   def _maybe_dedup_names(self, names: Sequence[Hashable]) -> Sequence[Hashable]:
        # see gh-7160 and gh-9424: this helps to provide
        # immediate alleviation of the duplicate names
        # issue and appears to be satisfactory to users,
        # but ultimately, not needing to butcher the names
        # would be nice!
        if self.mangle_dupe_cols:
            names = list(names)  # so we can index
            counts: DefaultDict[Hashable, int] = defaultdict(int)
            is_potential_mi = _is_potential_multi_index(names, self.index_col)

            for i, col in enumerate(names):
                cur_count = counts[col]

                while cur_count > 0:
                    counts[col] = cur_count + 1

                    if is_potential_mi:
                        # for mypy
                        assert isinstance(col, tuple)
                        col = col[:-1] + (f"{col[-1]}.{cur_count}",)
                    else:
                        col = f"{col}.{cur_count}"
                    cur_count = counts[col]

                names[i] = col
                counts[col] = cur_count + 1

        return names

edited Aug 11, 2024 at 10:47

answered May 4, 2017 at 21:16

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

miguelfg Over a year ago

This is create, for others to use it just do: df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)

evilolive Over a year ago

The comment by @miguelfg is a great solution if you are not reading from csv. No need to write any function on your own!

David M. Perlman Over a year ago

This also works fine with ParserBase({}) because the input to _maybe_dedup_names() is not taken from the class, but given directly as a function argument. You just need to make the constructor's validation checks happy and an empty dict does that. It's probably not any faster but it makes the code less confusing because it's clear where the data is being passed.

Carlos Hanson Over a year ago

I'm using Pandas 1.5.0 and ParserBase({'usecols': None}) is required. The key, names, is not required.

mikm Over a year ago

In version 1.5.2, ParserBase needs to be imported from pandas.io.parsers.base_parser. (I don't know if it was moved to pandas.io.parsers.base_parser or is no longer exported from pandas.io.parsers.) {'usecols': None} still required.

wjandrea · Accepted Answer · 2025-09-22 14:14:09Z

40

I was looking to find a solution within Pandas more than a general Python solution.

columns.get_loc() returns a masked array if it finds duplicates with 'True' values pointing to the locations where duplicates are found. I then use the mask to assign new values into those locations. In my case, I know ahead of time how many dups I'm going to get and what I'm going to assign to them but it looks like df.columns.get_duplicates() would return a list of all dups and you can then use that list in conjunction with get_loc() if you need a more generic dup-weeding action

Updated as of Sept 2020

cols = pd.Series(df.columns)
for dup in df.columns[df.columns.duplicated(keep=False)]: 
    cols[df.columns.get_loc(dup)] = [
        f'{dup}.{d_idx}'
        if d_idx != 0
        else dup 
        for d_idx in range(df.columns.get_loc(dup).sum())]
df.columns = cols

    blah    blah2   blah3   blah.1  blah.2
 0     0        1       2        3       4
 1     5        6       7        8       9

New better method (update 3 Dec 2019)

The code in SatishSK's answer using cols == dup is better than above code. It produces the same output.

edited Sep 22 at 14:14

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Jul 10, 2014 at 21:25

Lamakaha

9792 gold badges10 silver badges17 bronze badges

5 Comments

Hack-R Over a year ago

Why would I get AttributeError: 'slice' object has no attribute 'sum'? hmmm

rosefun Over a year ago

Not useful for me, all duplicate columns rename at the same time.

miguelfg Over a year ago

It wasn't working to me because the first two occurrences of a duplicated name were being renamed equal, either as origin, or with .1, or with .2 depending different trials I did with str(d_idx). So I think is a problem with a pointer. Solution of @MaxU worked to me in one line

Hendy Feb 3 at 19:53

Out of curiosity, wondered why do we need the .index.values.tolist() vs. just cols[cols == dup]?

Hendy Feb 3 at 20:35

For the curious, answering my own question. cols[cols == dup] isn't mutable, so you can't assign new values to it.

Glen Thompson · Accepted Answer · 2017-07-06 19:21:55Z

You could use this:

def df_column_uniquify(df):
    df_columns = df.columns
    new_columns = []
    for item in df_columns:
        counter = 0
        newitem = item
        while newitem in new_columns:
            counter += 1
            newitem = "{}_{}".format(item, counter)
        new_columns.append(newitem)
    df.columns = new_columns
    return df

Then

import numpy as np
import pandas as pd

df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']

so that df:

   blah  blah2  blah3   blah   blah
0     0      1      2      3      4
1     5      6      7      8      9

then

df = df_column_uniquify(df)

so that df:

   blah  blah2  blah3  blah_1  blah_2
0     0      1      2       3       4
1     5      6      7       8       9

Nayef · Accepted Answer · 2020-12-21 08:57:24Z

5

I just wrote this code it uses a list comprehension to update all duplicated names.

df.columns = [x[1] if x[1] not in df.columns[:x[0]] else f"{x[1]}_{list(df.columns[:x[0]]).count(x[1])}" for x in enumerate(df.columns)]

answered Dec 21, 2020 at 8:57

Nayef

511 silver badge1 bronze badge

Comments

wjandrea · Accepted Answer · 2025-09-22 14:52:08Z

5

In Pandas v2.1 you can use the pd.io.common.dedup_names function, like:

In [137]: pd.io.common.dedup_names(df.columns, is_potential_multiindex=False)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']

The earlier method _maybe_dedup_names has been removed so no longer works. For reference: pd.io.parsers.base_parser.ParserBase({'names':df.columns, 'usecols':None})._maybe_dedup_names(df.columns)

edited Sep 22 at 14:52

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Sep 8, 2023 at 13:05

drkane

1011 silver badge2 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:10:00Z

4

You could assign directly to the columns:

In [12]:

df.columns = ['blah','blah2','blah3','blah4','blah5']
df
Out[12]:
   blah  blah2  blah3  blah4  blah5
0     0      1      2      3      4
1     5      6      7      8      9

[2 rows x 5 columns]

If you want to dynamically just rename the duplicate columns then you could do something like the following (code taken from answer 2: Index of duplicates items in a python list):

In [25]:

import collections
dups = collections.defaultdict(list)
dup_indices=[]
col_list=list(df.columns)
for i, e in enumerate(list(df.columns)):
  dups[e].append(i)
for k, v in sorted(dups.items()):
  if len(v) >= 2:
    dup_indices = v

for i in dup_indices:
    col_list[i] = col_list[i] + ' ' + str(i)
col_list
Out[25]:
['blah 0', 'blah2', 'blah3', 'blah 3', 'blah 4']

You could then use this to assign back, you could also have a function to generate a unique name that is not present in the columns prior to renaming.

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Jul 10, 2014 at 19:55

EdChum

397k204 gold badges836 silver badges583 bronze badges

7 Comments

DSM Over a year ago

Or something like df.columns = ['blah{}'.format(i) for i in range(1,len(df.columns)+1)], or "blah" + pd.Series(range(1,6)).astype(str), etc.

EdChum Over a year ago

@DSM yes that would work, I was assuming that the OP example was not a real example

Lamakaha Over a year ago

direct assignment of column names doesn't work for me - I don't really want to know where these duplicate columns are located in relation to other columns. I just truly need to rename them. To make my example clearer, say I read in 3 columns with the name 'price' and I know that the first price is open price, second is close price and the third is end of day price so i need to rename them along these lines. There might be a ton of other columns in there and I don't want to know what they are and where all these columns are sitting in relation to each other

EdChum Over a year ago

In that case I would just get the columns as a list, find the duplicates, rename those and assign back, you can convert the columns to a list doing list(df.columns), that would be too difficult

EdChum Over a year ago

Here is a related post:stackoverflow.com/questions/5419204/… so you could use that to get the indices where there are duplicates and then just enumerate a new suffix or whatever, modify the list and assign back

|

Ömer Erden · Accepted Answer · 2019-05-07 06:55:23Z

Thank you @Lamakaha for the solution. Your idea gave me a chance to modify it and make it workable in all the cases.

I am using Python 3.7.3 version.

I tried your piece of code on my data set which had only one duplicated column i.e. two columns with same name. Unfortunately, the column names remained As-Is without being renamed. On top of that I got a warning that "get_duplicates() is deprecated and same will be removed in future version". I used duplicated() coupled with unique() in place of get_duplicates() which did not yield the expected result.

I have modified your piece of code little bit which is working for me now for my data set as well as in other general cases as well.

Here are the code runs with and without code modification on the example data set mentioned in the question along with results:

df=pd.DataFrame(np.arange(2*5).reshape(2,5))

df.columns=['blah','blah2','blah3','blah','blah']
df

cols=pd.Series(df.columns)

for dup in df.columns.get_duplicates(): 
    cols[df.columns.get_loc(dup)]=[dup+'.'+str(d_idx) if d_idx!=0 else dup for d_idx in range(df.columns.get_loc(dup).sum())]
df.columns=cols

df

f:\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: 'get_duplicates' is deprecated and will be removed in a future release. You can use idx[idx.duplicated()].unique() instead

Output:

    blah    blah2   blah3   blah    blah.1
0   0   1   2   3   4
1   5   6   7   8   9

Two of the three "blah"(s) are not renamed properly.

Modified code

df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df

cols=pd.Series(df.columns)

for dup in cols[cols.duplicated()].unique(): 
    cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]
df.columns=cols

df

Output:

    blah    blah2   blah3   blah.1  blah.2
0   0   1   2   3   4
1   5   6   7   8   9

Here is a run of modified code on some another example:

cols = pd.Series(['X', 'Y', 'Z', 'A', 'B', 'C', 'A', 'A', 'L', 'M', 'A', 'Y', 'M'])

for dup in cols[cols.duplicated()].unique():
    cols[cols[cols == dup].index.values.tolist()] = [dup + '_' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]

cols

Output:
0       X
1       Y
2       Z
3       A
4       B
5       C
6     A_1
7     A_2
8       L
9       M
10    A_3
11    Y_1
12    M_1
dtype: object

Hope this helps anybody who is seeking answer to the aforementioned question.

T. Jewell · Accepted Answer · 2020-05-16 00:27:04Z

3


duplicated_idx = dataset.columns.duplicated()

duplicated = dataset.columns[duplicated_idx].unique()



rename_cols = []

i = 1
for col in dataset.columns:
    if col in duplicated:
        rename_cols.extend([col + '_' + str(i)])
    else:
        rename_cols.extend([col])

dataset.columns = rename_cols

answered May 16, 2020 at 0:27

T. Jewell

611 silver badge2 bronze badges

2 Comments

Muhammad Dyas Yaskur Over a year ago

While this code may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion

pnv Over a year ago

Please add i+=1 after the if/else in the for loop

normanius · Accepted Answer · 2019-03-28 19:06:19Z

Since the accepted answer (by Lamakaha) is not working for recent versions of pandas, and because the other suggestions looked a bit clumsy, I worked out my own solution:

def dedupIndex(idx, fmt=None, ignoreFirst=True):
    # fmt:          A string format that receives two arguments: 
    #               name and a counter. By default: fmt='%s.%03d'
    # ignoreFirst:  Disable/enable postfixing of first element.
    idx = pd.Series(idx)
    duplicates = idx[idx.duplicated()].unique()
    fmt = '%s.%03d' if fmt is None else fmt
    for name in duplicates:
        dups = idx==name
        ret = [ fmt%(name,i) if (i!=0 or not ignoreFirst) else name
                      for i in range(dups.sum()) ]
        idx.loc[dups] = ret
    return pd.Index(idx)

Use the function as follows:

df.columns = dedupIndex(df.columns)
# Result: ['blah', 'blah2', 'blah3', 'blah.001', 'blah.002']
df.columns = dedupIndex(df.columns, fmt='%s #%d', ignoreFirst=False)
# Result: ['blah #0', 'blah2', 'blah3', 'blah #1', 'blah #2']

pyjamas · Accepted Answer · 2020-11-26 05:20:40Z

Here's a solution that also works for multi-indexes

# Take a df and rename duplicate columns by appending number suffixes
def rename_duplicates(df):
    import copy
    new_columns = df.columns.values
    suffix = {key: 2 for key in set(new_columns)}
    dup = pd.Series(new_columns).duplicated()

    if type(df.columns) == pd.core.indexes.multi.MultiIndex:
        # Need to be mutable, make it list instead of tuples
        for i in range(len(new_columns)):
            new_columns[i] = list(new_columns[i])
        for ix, item in enumerate(new_columns):
            item_orig = copy.copy(item)
            if dup[ix]:
                for level in range(len(new_columns[ix])):
                    new_columns[ix][level] = new_columns[ix][level] + f"_{suffix[tuple(item_orig)]}"
                suffix[tuple(item_orig)] += 1

        for i in range(len(new_columns)):
            new_columns[i] = tuple(new_columns[i])

        df.columns = pd.MultiIndex.from_tuples(new_columns)
    # Not a MultiIndex
    else:
        for ix, item in enumerate(new_columns):
            if dup[ix]:
                new_columns[ix] = item + f"_{suffix[item]}"
                suffix[item] += 1
        df.columns = new_columns

shapiromatron · Accepted Answer · 2021-02-10 00:54:42Z

Created a function with some tests so it should be drop in ready; this is a little different than Lamakaha's excellent solution since it renames the first appearance of a duplicate column:

from collections import defaultdict
from typing import Dict, List, Set

import pandas as pd

def rename_duplicate_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Rename column headers to ensure no header names are duplicated.

    Args:
        df (pd.DataFrame): A dataframe with a single index of columns

    Returns:
        pd.DataFrame: The dataframe with headers renamed; inplace
    """
    if not df.columns.has_duplicates:
        return df
    duplicates: Set[str] = set(df.columns[df.columns.duplicated()].tolist())
    indexes: Dict[str, int] = defaultdict(lambda: 0)
    new_cols: List[str] = []
    for col in df.columns:
        if col in duplicates:
            indexes[col] += 1
            new_cols.append(f"{col}.{indexes[col]}")
        else:
            new_cols.append(col)
    df.columns = new_cols
    return df

def test_rename_duplicate_columns():
    df = pd.DataFrame(data=[[1, 2]], columns=["a", "b"])
    assert rename_duplicate_columns(df).columns.tolist() == ["a", "b"]

    df = pd.DataFrame(data=[[1, 2]], columns=["a", "a"])
    assert rename_duplicate_columns(df).columns.tolist() == ["a.1", "a.2"]

    df = pd.DataFrame(data=[[1, 2, 3]], columns=["a", "b", "a"])
    assert rename_duplicate_columns(df).columns.tolist() == ["a.1", "b", "a.2"]

alecxe · Accepted Answer · 2020-12-18 14:05:09Z

0

We can just assign each column a different name.

Suppoese duplicate column name is like = [a,b,c,d,d,c]

Then just create a list of name what you want to assign:

C = [a,b,c,d,D1,C1]
df.columns = c

This works for me.

edited Dec 18, 2020 at 14:05

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

answered Dec 18, 2020 at 13:13

Krishn Kumar

1

Comments

Oenomaus · Accepted Answer · 2022-08-06 17:53:59Z

0

This is my solution:

cols = []  # for tracking if we alread seen it before
new_cols = []

for col in df.columns:
    cols.append(col)
    count = cols.count(col)
    
    if count > 1:
        new_cols.append(f'{col}_{count}')
    else:
        new_cols.append(col)

df.columns = new_cols

answered Aug 6, 2022 at 17:53

Oenomaus

311 silver badge3 bronze badges

1 Comment

Adrian Mole Over a year ago

You should add some explanation as to why/how this solution is different from (and/or better than) any of the numerous other answers already posted.

Dance Party · Accepted Answer · 2022-10-14 21:02:49Z

0

Here's an elegant solution:

Isolate a dataframe with only the repeated columns (looks like it will be a series but it will be a dataframe if >1 column with that name):

df1 = df['blah']

For each "blah" column, give it a unique number

df1.columns = ['blah_' + str(int(x)) for x in range(len(df1.columns))]

Isolate a dataframe with all but the repeated columns:

df2 = df[[x for x in df.columns if x != 'blah']]

Merge back together on indices:

df3 = pd.merge(df1, df2, left_index=True, right_index=True)

Et voila:

   blah_0  blah_1  blah_2  blah2  blah3
0       0       3       4      1      2
1       5       8       9      6      7

answered Oct 14, 2022 at 21:02

Dance Party

3,73311 gold badges45 silver badges68 bronze badges

Collectives™ on Stack Overflow

Renaming multiple identically named columns in Pandas DataFrame

14 Answers 14

5 Comments

New better method (update 3 Dec 2019)

5 Comments

Comments

Comments

Comments

7 Comments

Modified code

Comments

2 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

5 Comments

New better method (update 3 Dec 2019)

5 Comments

Comments

Comments

Comments

7 Comments

Modified code

Comments

2 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related