Manipulate string values in pandas

Question

I have a pandas dataframe with different formats for one column like this

Name	Values
First	5-9
Second	7
Third	-
Fourth	12-16

I need to iterate over all Values column, and if the format is like the first row 5-9 or like fourth row 12-16 replace it with the mean between the 2 numbers in string. For first row replace 5-9 to 7, or for fourth row replace 12-16 to 14. And if the format is like third row - replace it to 0

I have tried

if df["Value"].str.len() > 1:
    df["Value"] = df["Value"].str.split('-')
    df["Value"] = (df["Value"][0] + df["Value"][1]) / 2
elif df["Value"].str.len() == 1:
    df["Value"] = df["Value"].str.replace('-', 0)

Expected output

Name	Values
First	7
Second	7
Third	0
Fourth	14

Shubham Sharma · Accepted Answer · 2022-07-10 15:57:50Z

3

Let us split and expand the column then cast values to float and calculate mean along column axis:

s = df['Values'].str.split('-', expand=True)
df['Values'] = s[s != ''].astype(float).mean(1).fillna(0)

     Name  Values
0   First     7.0
1  Second     7.0
2   Third     0.0
3  Fourth    14.0

answered Jul 10, 2022 at 15:57

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ynjxsjmh · Accepted Answer · 2022-07-10 16:09:02Z

1

You can use str.replace with customized replacement function

mint = lambda s: int(s or 0)
repl = lambda m: str(sum(map(mint, map(m.group, [1,2])))/2)
df['Values'] = df['Values'].str.replace('(\d*)-(\d*)', repl, regex=True)

print(df)

     Name Values
0   First    7.0
1  Second      7
2   Third    0.0
3  Fourth   14.0

answered Jul 10, 2022 at 16:09

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Collectives™ on Stack Overflow

Manipulate string values in pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related