2

Trying to rename pandas dataframe column using regex, I know how to do it rename the list as per below but could not get success results with df.rename.

Input:

df.columns.values = ['Time', '101 <RoomTemperature> (C)', '102 <ChemberTemperature> (C)', '103 <U1100> (C)', '103 <U1200 (C)', '103 U1500> (C)']

Trials of the Renaming dataframe column as per below code using regex but it does not work. I could not think of how to put multiple instruction together in df.rename method.

df.rename(columns={c: c.strip() for c in df.columns.values.tolist()
                                if "<" and ">" in c: 
                  re.search(r"(?<=<).*(?=>)",c).group(0)}, inplace=True)

I want it to follow regex and rename it to as per below:

df.columns.values = ["Time", "RoomTemperature", "ChemberTemperature", "U1100", "103 <U1200 (C)", "103 U1500> (C)"]

3 Answers 3

3

You could extract the functionality into a function and do the following:

import re
import pandas as pd

# sample data 
df = pd.DataFrame(
    columns=['Time', '101 <RoomTemperature> (C)', '102 <ChemberTemperature> (C)', '103 <U1100> (C)', '103 <U1200 (C)',
             '103 U1500> (C)'])


# replacement function 
def repl(name):
    match = re.search(r"<(.*?)>", name)
    return match.group(1) if match else name


df.rename(columns={c: repl(c.strip()) for c in df.columns}, inplace=True)

print(df.columns)

Output

Index(['Time', 'RoomTemperature', 'ChemberTemperature', 'U1100',
       '103 <U1200 (C)', '103 U1500> (C)'],
      dtype='object')

That being said, you also need to fix your regular expression.

Sign up to request clarification or add additional context in comments.

Comments

2

You can use regular expressions to extract the matching group as per your requirements and then you can use DataFrame.rename to alter the column labels.

Try this:

import re

col_dict = {}
for col in df.columns:
    mobj = re.search(r"\<(.*?)\>", col)
    if mobj:
        col_dict[col] = mobj.group(1)

df.rename(columns=col_dict, inplace=True)

After renaming df.columns will be:

['Time', 'RoomTemperature', 'ChemberTemperature', 'U1100', '103 <U1200 (C)', '103 U1500> (C)']

4 Comments

You are applying rename for each column, this could be applied just once
@DaniMesejo I don't think thats going to make much difference.
you are using the rename method n times instead of just 1, obviously there is a performance issue here
Thanks, Shubham and Dani
1

Another solution, regex can have some unfriendly look about it, despite its power:

columns = ['Time', '101 <RoomTemperature> (C)', '102 <ChemberTemperature> (C)', '103 <U1100> (C)', '103 <U1200 (C)', '103 U1500> (C)']
df = pd.DataFrame([[1,2,3,4,5,6]],columns=columns)

   p = re.compile(r'((?<=<).*?(?=>))')

  #create a dict for the replacement

  replace_dict = {w:p.search(w).group() for w in df.columns if p.search(w)}

 #pass dictionary into rename method

 df.rename(columns=replace_dict)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.