Replace string in Pandas column by substring from list

Question

I have a DF:

DF
camp, value
asd_abcd_gr_yxz_aaaa, 5
efgh_kr_ijk, 10
hjssaasd_kr_adsad, 15
asdas_kr_asd, 2
asd_fr_asda_bb_bbbbbbb, 12
adklasdj_gr_asdsad, 3

and much longer.

After comparing with elements in list [_gr_, _kr_, _fr_, etc..] I want the result to be

DF
camp, value
gr, 8
kr, 27
fr, 12

preferably as short as possible without looping through the DF. The list is much longer than _gr_, _kr_, _fr_

Thanks in advance!

Sorry can you post raw input data, code to recreate your df and what the desired output df should look like. Are you saying you have a column with values like ` 'abcd_gr_yxz'` and if it contains anything in gr then you want to replace with gr? — EdChum
– EdChum, Commented Mar 23, 2016 at 14:19
So in this case would it be true that you want to always strip the leading and trailing words at the _ underscore characters? — EdChum
– EdChum, Commented Mar 23, 2016 at 14:50
Sorry, my example wasn't general enough. There might exist several underscores :/ — ONils
– ONils, Commented Mar 23, 2016 at 15:20
Your new example has the keys you want always being the next-to-last. Is that invariant, or are you just really bad at coming up with sufficiently general examples? ;-) You need to explain how the original camp column becomes the new camp column, whether by some rule or by comparison with an explicitly specified list. — DSM
– DSM, Commented Mar 23, 2016 at 15:24

jezrael · Accepted Answer · 2020-02-09 07:54:58Z

5

You can try str.contains with loc:

print df
                 camp  value
0         abcd_gr_yxz      5
1         efgh_kr_ijk     10
2   hjssaasd_kr_adsad     15
3        asdas_kr_asd      2
4         asd_fr_asda     12
5  adklasdj_gr_asdsad      3

ABR = ['_gr_', '_kr_', '_fr_']

for x in ABR:
    df.loc[df['camp'].str.contains(x), 'camp'] = x
print df 
   camp  value
0  _gr_      5
1  _kr_     10
2  _kr_     15
3  _kr_      2
4  _fr_     12
5  _gr_      3

print df.groupby('camp')['value'].sum().reset_index()
   camp  value
0  _fr_     12
1  _gr_      8
2  _kr_     27

Or str.extract and str.strip:

ABR = ['_gr_', '_kr_', '_fr_']

s = '(' + '|'.join(ABR) + ')'
print s
(_gr_|_kr_|_fr_)

df['camp'] = df['camp'].str.extract(s, expand=False)

df = df.groupby('camp', as_index=False)['value'].sum()
df['camp'] = df['camp'].str.strip('_')
print df
  camp  value
0   fr     12
1   gr      8
2   kr     27

edited Feb 9, 2020 at 7:54

answered Mar 23, 2016 at 14:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ONils Over a year ago

Yes but then I would need two loops. One for the list as well. Isn't there any easier way to do this in one row? using pandas replace/extract etc functions

jezrael Over a year ago

can you accept my answer if is helpful so it doesn't remain unanswered, there will be an empty tick mark at the top left of my answer, thanks

ONils Over a year ago

But it's not the desired output

Collectives™ on Stack Overflow

Replace string in Pandas column by substring from list

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related