1

how to remove part of string "test_" in column headers. image the dataframe has many columns, so df.rename(columns={"test_Stock B":"Stock B"}) is not the solution i am looking for!


import pandas as pd

data = {'Stock A':[1, 1, 1, 1],
           'test_Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'test_Stock D':[2, 2, 2, 3],
           }

df = pd.DataFrame(data)

# expect
data = {'Stock A':[1, 1, 1, 1],
           'Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'Stock D':[2, 2, 2, 3],
           }

df_expacte = pd.DataFrame(data)

I expect all column headers only labeled as "Stock x" instead of "test_Stock x". Thank you for the ideas!

5 Answers 5

3

You can redefine the columns via list comprehension with:

df.columns = [x.replace("test_","") for x in df]

This outputs:

   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3
Sign up to request clarification or add additional context in comments.

Comments

1

You can clean your data before converting it to the dataframe using this code:

cleaned_data = {k.replace('test_', ''): v for k,v in data.items()}

Comments

0

If need extract values Stock x use Series.str.extract:

#if need uppercase letter after Stock + space
df.columns = df.columns.str.extract('(Stock\s+[A-Z]{1})', expand=False)
#if need any value after Stock + space
#df.columns = df.columns.str.extract('(Stock\s+.*)', expand=False)
print (df)
   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3

Or if need remove test_ use Series.str.replace:

df.columns = df.columns.str.replace('test_', '')

Comments

0
import pandas as pd

data = {'Stock A':[1, 1, 1, 1],
           'test_Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'test_Stock D':[2, 2, 2, 3],
           }

df = pd.DataFrame(data)

df.columns = [x.replace('test_','') for x in df.columns]

output :

print(df)
Out[9]: 
   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3

Comments

0

You can use a regular expression (see python documentation) to replace or remove the prefix "test_". The column headers can be treated either as a python list or as a pandas series. In any case you can apply iteratively the substitution on each element of the column headers.

Option A

Pandas has a collection of string processing methods which you can access via the str attribute of pandas Series. As column headers is a Series, you can replace the desired pattern with,

df.columns = df.columns.str.replace(r'^test_', '')

Option B

The regex module can be used to replace the desired pattern using the re.sub method on each column header, using a list comprehension.

import re
df.columns = [re.sub(r'^test_', '', col) for col in df.columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.