0

I have a DF:

DF
camp, value
asd_abcd_gr_yxz_aaaa, 5
efgh_kr_ijk, 10
hjssaasd_kr_adsad, 15
asdas_kr_asd, 2
asd_fr_asda_bb_bbbbbbb, 12
adklasdj_gr_asdsad, 3

and much longer.

After comparing with elements in list [_gr_, _kr_, _fr_, etc..] I want the result to be

DF
camp, value
gr, 8
kr, 27
fr, 12

preferably as short as possible without looping through the DF. The list is much longer than _gr_, _kr_, _fr_

Thanks in advance!

5
  • 1
    Sorry can you post raw input data, code to recreate your df and what the desired output df should look like. Are you saying you have a column with values like ` 'abcd_gr_yxz'` and if it contains anything in gr then you want to replace with gr? Commented Mar 23, 2016 at 14:19
  • Now it's edited @EdChum Commented Mar 23, 2016 at 14:49
  • So in this case would it be true that you want to always strip the leading and trailing words at the _ underscore characters? Commented Mar 23, 2016 at 14:50
  • Sorry, my example wasn't general enough. There might exist several underscores :/ Commented Mar 23, 2016 at 15:20
  • Your new example has the keys you want always being the next-to-last. Is that invariant, or are you just really bad at coming up with sufficiently general examples? ;-) You need to explain how the original camp column becomes the new camp column, whether by some rule or by comparison with an explicitly specified list. Commented Mar 23, 2016 at 15:24

1 Answer 1

5

You can try str.contains with loc:

print df
                 camp  value
0         abcd_gr_yxz      5
1         efgh_kr_ijk     10
2   hjssaasd_kr_adsad     15
3        asdas_kr_asd      2
4         asd_fr_asda     12
5  adklasdj_gr_asdsad      3

ABR = ['_gr_', '_kr_', '_fr_']

for x in ABR:
    df.loc[df['camp'].str.contains(x), 'camp'] = x
print df 
   camp  value
0  _gr_      5
1  _kr_     10
2  _kr_     15
3  _kr_      2
4  _fr_     12
5  _gr_      3

print df.groupby('camp')['value'].sum().reset_index()
   camp  value
0  _fr_     12
1  _gr_      8
2  _kr_     27

Or str.extract and str.strip:

ABR = ['_gr_', '_kr_', '_fr_']

s = '(' + '|'.join(ABR) + ')'
print s
(_gr_|_kr_|_fr_)

df['camp'] = df['camp'].str.extract(s, expand=False)

df = df.groupby('camp', as_index=False)['value'].sum()
df['camp'] = df['camp'].str.strip('_')
print df
  camp  value
0   fr     12
1   gr      8
2   kr     27
Sign up to request clarification or add additional context in comments.

3 Comments

Yes but then I would need two loops. One for the list as well. Isn't there any easier way to do this in one row? using pandas replace/extract etc functions
can you accept my answer if is helpful so it doesn't remain unanswered, there will be an empty tick mark at the top left of my answer, thanks
But it's not the desired output

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.