modify strings in pandas dataframe

Question

I have the following dataframe called df

    country ticker   
01  ST      ENRO.ST
02  ST      ERICb.ST
03  ST      BTSb.ST
04  US      MSFT
05  HK      0070.HK
06  ST      SAABb.ST
07  ST      SaA.ST

I want to do the following,

if the country == 'ST', select the string in the ticker row.

check if there are any lowercase characters.

If there is a lowercase character, add a hyphen before it and make the letter uppercase, like this.

    country ticker   
01  ST      ENRO.ST
02  ST      ERIC-B.ST
03  ST      BTS-B.ST
04  US      MSFT
05  HK      0070.HK
06  ST      SAAB-B.ST
07  ST      S-AA.ST

I would do the following if it was just one string,

teststr = [char for char in "ERICb.ST"]:
for i,v in enumerate(teststr):
    if teststr[i].islower():
        mod = i

teststr[mod] = teststr[mod].upper()

teststr.insert(mod,'-')
teststr = ''.join(teststr)

but i dont know how to apply it to every row if it meets that condition.

Is it possible that there are multiple lowercase letters which have to be replaced? — Erfan
– Erfan, Commented May 5, 2020 at 23:30

Erfan · Accepted Answer · 2020-05-05 23:44:07Z

2

First we split the strings up based on the lowercase letters, then we join them back with - as delimiter on the first two parts and uppercase the strings, then we add the last part. Finally we use Series.where to only modify the rows where country == ST:

s1 = df['ticker'].str.split('([a-z])')
s2 = s1.str[:2].str.join('-').str.upper() + s1.str[2:].str.join('')
df['ticker'] = s2.where(df['country'].eq('ST'), df['ticker'])

  country     ticker
0      ST    ENRO.ST
1      ST  ERIC-B.ST
2      ST   BTS-B.ST
3      US       MSFT
4      HK    0070.HK
5      ST  SAAB-B.ST
6      ST    S-AA.ST

edited May 5, 2020 at 23:44

answered May 5, 2020 at 23:35

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anarchy Over a year ago

hey sorry, what about only selecting the rows with 'ST' in the country, because there could be rows with lower case characters if there when its not ST and i dont want to touch those, i mentioned it in the first part of the question

Andy L. · Accepted Answer · 2020-05-05 23:44:29Z

0

you may use replace function with str.replace

repl = lambda x: '-'+x.group(0).upper()

df.loc[df.country.eq('ST'), 'ticker'] = (df.loc[df.country.eq('ST'), 'ticker']
                                           .str.replace('([a-z])', repl))

Out[58]:
  country     ticker
1      ST    ENRO.ST
2      ST  ERIC-B.ST
3      ST   BTS-B.ST
4      US       MSFT
5      HK    0070.HK
6      ST  SAAB-B.ST
7      ST    S-AA.ST

Note: as you said there is only a single lowercase char in each string so I use the pattern [a-z]

answered May 5, 2020 at 23:44

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Collectives™ on Stack Overflow

modify strings in pandas dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related