Rename Columns in Python using Regular Expressions

Question

I have a data set that has columns for number of units sold in a given month - the problem being that the monthly units columns are named in MM/yyyy format, meaning that I have 12 columns of units information per record.

So for instance, my data looks like:

ProductID  |  CustomerID  |  04/2018  |  03/2018  |  02/2018  |  FileDate  |
a1032      |  c1576       |     36    |     12    |     19    | 04/20/2018 |

What causes this to be problematic is that a new file comes in every month, with the same file name, but different column headers for the units information based on the last 12 months.

What I would like to do, is rename the monthly units columns to Month1, Month2, Month3... based on a simple regex such as ([0-9]*)/([0-9]*) that will result in the output:

ProductID  |  CustomerID  |   Month1  |   Month2  |   Month3  |  FileDate  |
a1032      |  c1576       |     36    |     12    |     19    | 04/20/2018 |

I know that this should be possible using Python, but as I have never used Python before (I am an old .Net developer) I honestly have no idea how to achieve this.

I have done a bit of research on renaming columns in Python, but none of them mentioned pattern matching to rename a column, eg:

 df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})

UPDATE: The data that I am showing in my example is only a subset of the columns; total, in my data set I have 120 columns, only 12 of which need to be renamed, this is why I thought that regex might be the simplest way to go.

Tyler K · Accepted Answer · 2018-04-20 19:17:11Z

1

import re

# regex pattern
pattern = re.compile("([0-9]*)/([0-9]*)")

# get headers as list
headers = list(df)

# apply regex
months = 1
for index, header in enumerate(headers):
    if pattern.match(header):
        headers[index] = 'Month{}'.format(months)
        months += 1

# set new list as column headers
df.columns = headers

answered Apr 20, 2018 at 19:17

Tyler K

3382 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Acccumulation · Accepted Answer · 2018-04-20 19:12:06Z

1

If you have some set names that you want to convert to, then rather than using rename, it might easier to just pass a new list to the df.columns attribute

df.columns = ['ProductID','CustomerID']+['Month{}'.format(i) for i in range(12)]+['FileDate']

If you want to use rename, if you can write a function find_new_name that does the conversion you want for a single name, you can rename an entire list old_names with

df.rename(columns = {oldname:find_new_name(old_name) for old_name in old_names})

Or if you have a function that takes a new name and figures out what old name corresponds to it, then it would be

df.rename(columns = {find_old_name(new_name):new_name for new_name in new_names})

You can also do

for new_name in new_names:
    old_name = find_new_name(old_name)
    df[new_name] = df[old_name]

This will copy the data into new columns with the new names rather than renaming, so you can then subset to just the columns you want.

answered Apr 20, 2018 at 19:12

Acccumulation

3,6311 gold badge11 silver badges13 bronze badges

1 Comment

Jeff Beese Over a year ago

Thanks for the answer - one of the reasons that I was considering Regex is because my full data set contains 120 columns. I realized I did not include this information in my question and have updated it accordingly

ujhuyz0110 · Accepted Answer · 2018-04-20 19:32:31Z

1

Since rename could take a function as a mapper, we could define a customized function which returns a new column name in the new format if the old column name matches regex; otherwise, returns the same column name. For example,

import re


def mapper(old_name):
    match = re.match(r'([0-9]*)/([0-9]*)', old_name)
    if match:
        return 'Month{}'.format(int(match.group(1)))
    return old_name

df = df.rename(columns=mapper)

answered Apr 20, 2018 at 19:32

ujhuyz0110

3831 silver badge8 bronze badges

Collectives™ on Stack Overflow

Rename Columns in Python using Regular Expressions

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related