Using pandas dataframe, how to group by multiple columns and adding new column

Question

I am rather new in using pandas dataframe and have a grouping problem: i want to group a 6-column dataframe for all rows with the same values in the first 3 columns, and then i want to add a new column with the value of the last column where the value of the 4th column = 0.

So, the original dataframe looks like this:

          A         B     C  D           E   F    G
 0    11018  20190102     0  0  1546387200  37   34
 1    11018  20190102     0  1  1546390800  33   36
 2    11018  20190102     0  2  1546394400  19   19
 3    11018  20190102     0  3  1546398000  17   26
 4    11018  20190102     0  4  1546401600  16   26
 5    11018  20190102     0  5  1546405200  13   23
 6    11018  20190102     0  6  1546408800  11   15
 7    11018  20190102  1200  0  1546430400  25   24
 8    11018  20190102  1200  1  1546434000  21    3
 9    11018  20190102  1200  2  1546437600  13    4
 10   11018  20190102  1200  3  1546441200   7    3
 11   11018  20190102  1200  4  1546444800   2    1
 12   11018  20190102  1200  5  1546448400  -3    6
 13   11018  20190102  1200  6  1546452000  -7    2
 14   11035  20190103     0  0  1546473600 -15 -14
 15   11035  20190103     0  1  1546477200 -17 -11
 16   11035  20190103     0  2  1546480800 -20 -12
 17   11035  20190103     0  3  1546484400 -23 -16
 18   11035  20190103     0  4  1546488000 -26 -11
 19   11035  20190103     0  5  1546491600 -28 -11
 20   11035  20190103     0  6  1546495200 -27 -12
 21   11031  20190103  1100  0  1546516800   0   1
 22   11031  20190103  1100  1  1546520400   4  -7
 23   11031  20190103  1100  2  1546524000   5  -6
 24   11031  20190103  1100  3  1546527600   2 -16
 25   11031  20190103  1100  4  1546531200  -3 -14
 26   11031  20190103  1100  5  1546534800  -8 -12
 27   11031  20190103  1100  6  1546538400 -12 -14
 .
 .
 .
 .

etc.

And the new dataframe should be:

          A         B     C  D           E   F    G    H
 0    11018  20190102     0  0  1546387200  37   34   34
 1    11018  20190102     0  1  1546390800  33   36   34
 2    11018  20190102     0  2  1546394400  19   19   34
 3    11018  20190102     0  3  1546398000  17   26   34
 4    11018  20190102     0  4  1546401600  16   26   34
 5    11018  20190102     0  5  1546405200  13   23   34
 6    11018  20190102     0  6  1546408800  11   15   34
 7    11018  20190102  1200  0  1546430400  25   24   24
 8    11018  20190102  1200  1  1546434000  21    3   24
 9    11018  20190102  1200  2  1546437600  13    4   24
 10   11018  20190102  1200  3  1546441200   7    3   24
 11   11018  20190102  1200  4  1546444800   2    1   24
 12   11018  20190102  1200  5  1546448400  -3    6   24
 13   11018  20190102  1200  6  1546452000  -7    2   24
 14   11035  20190103     0  0  1546473600 -15 -14   -14
 15   11035  20190103     0  1  1546477200 -17 -11   -14
 16   11035  20190103     0  2  1546480800 -20 -12   -14
 17   11035  20190103     0  3  1546484400 -23 -16   -14
 18   11035  20190103     0  4  1546488000 -26 -11   -14
 19   11035  20190103     0  5  1546491600 -28 -11   -14
 20   11035  20190103     0  6  1546495200 -27 -12   -14
 21   11031  20190103  1100  0  1546516800   0   1     1
 22   11031  20190103  1100  1  1546520400   4  -7     1
 23   11031  20190103  1100  2  1546524000   5  -6     1
 24   11031  20190103  1100  3  1546527600   2 -16     1
 25   11031  20190103  1100  4  1546531200  -3 -14     1
 26   11031  20190103  1100  5  1546534800  -8 -12     1
 27   11031  20190103  1100  6  1546538400 -12 -14     1
 .
 .
 .
 .

etc.

Is there an easy solution for this problem? Note that the rows in the original dataframe could be mixed up, too. Thanks for help!

Josh Friedlander · Accepted Answer · 2019-01-24 10:59:21Z

1

An alternative solution:

def col_6(df):
    df['H'] = df[df['D'] == 0]['G'].values[0]
    return df

df.groupby(['A','B','C']).apply(col_6)

answered Jan 24, 2019 at 10:59

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

David Over a year ago

I deleted my answer since it effectively works. I had to use apply instead of transform, but that would ruin the dimensionality of the dataframe. +1

akann Over a year ago

Thanks! I tried your solution, gave no errors but column H was not appended... Did it work properly with my data?

Josh Friedlander Over a year ago

yes - maybe you need to assign it? as in df = df.groupby(['A','B','C']).apply(col_6)

akann Over a year ago

My dataframe comes from an SQL request: dataset = pd.DataFrame.from_records(out) dataset.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G'] dataset.groupby(['A','B','C']).apply(col_6)

Josh Friedlander Over a year ago

That shouldn't matter. Did you try my suggestion?

|

Collectives™ on Stack Overflow

Using pandas dataframe, how to group by multiple columns and adding new column

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related