2

I have a simple dataframe like this:

df = pd.DataFrame({'class':['a','b','c','d','e'],
                  'name':['Adi','leon','adi','leo','andy'],
                  'age':['9','8','9','9','8'],
                   'score':['40','90','35','95','85']})

then the result is like this

 class  name   age  score
    a   Adi     9   40
    b   leon    8   90
    a   adi     9   35
    d   leo     9   95
    e   andy    8   85

how can I combine the row named 'Adi' with 'adi' in the same column while he is only one person and the score 'Adi' is 75, not 40 and 35

1
  • 1
    You can convert them all to upper or lower and then group them summing the score Commented Jan 8, 2022 at 14:44

2 Answers 2

1

You could use pandas.DataFrame.groupby and pandas.DataFrame.aggregate after first making the name column lowercase:

import pandas as pd

df = pd.DataFrame({
    'class': ['a', 'b', 'c', 'd', 'e'],
    'name': ['Adi', 'leon', 'adi', 'leo', 'andy'],
    'age': ['9', '8', '9', '9', '8'],
    'score': ['40', '90', '35', '95', '85']
})
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
aggregate_funcs = {
    'class': lambda s: ', '.join(set(s)),
    'age': lambda s: ', '.join(set(s)),
    'score': sum
}
df = df.groupby(df['name']).aggregate(aggregate_funcs)
print(df)

Output:

     class age  score
name                 
adi   c, a   9     75
andy     e   8     85
leo      d   9     95
leon     b   8     90
Sign up to request clarification or add additional context in comments.

6 Comments

Regarding your earlier comment (question got deleted, can't reply there): That's not O(1) space but O(n) space. For example for nums = list(range(1, 10**6)), tracemalloc reports you use 28 MB. As expected for a million int objects each taking 28 bytes.
Yeah sorry, I meant O(1) in terms of extra additional space, since nums has to be passed as an argument.
Not sure how you mean that. It isn't O(1) in terms of "extra additional" space. Unless you're defining that in a way I don't see? :-) You are creating 28 MB of additional data there. I have two ideas for similar solutions that really are O(1) extra space, one of them able to restore the data in the end (which I think is the least one should do in the real world... if you tried yours at work, I kinda doubt you'd get an LGTM :-)
Everything that you said makes sense, it would be O(N) in terms of space. I have however seen a few questions relating to API design for image processing in the interview question back, in terms of the manipulation of a 2D matrix of 1, 0's representing black and white pixels, like dilating, etc. In these questions, the author mentions if the candidate talks about the possibility of creating two separate functions i.e., one that is destructive on the input image (and the reasons for wanting one), and another one that makes and returns a modified copy, it is a positive signal.
I agree with that, although I'd say that's rather different, as the output in those cases is the modified matrix, right? I.e., modifying it can be the desired effect. In our case, on the other hand, the output is a single int, and your code modifies the input list not as the desired effect but as a side effect. That's what I think makes it not so good.
|
1

drop_duplicates() is the best way if you are using pandas

df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
df['score'] = df['score'].groupby(df['name']).transform(sum)
df.drop_duplicates(subset='name',keep='first',inplace=True)

output:

  class  name age  score
0     a   adi   9     75
1     b  leon   8     90
3     d   leo   9     95
4     e  andy   8     85

you will have this output if you set keep='last':

  class  name age  score
1     b  leon   8     90
2     c   adi   9     75
3     d   leo   9     95
4     e  andy   8     85

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.