Adding rows in dataframe based on changing column condition

Question

I have a dataframe that looks like this.

          name            Datetime            col_3          col_4        
8       'Name 1'     2017-01-02T00:00:00       160           1600          
9       'Name 1'     2017-01-02T00:00:00       160           1600          
10      'Name 1'     2017-01-03T00:00:00       160           1800          
..                   ...     ...          ...       ...
150     'Name 2'     2004-10-13T00:00:00       160           1600          
151     'Name 2'     2004-10-14T00:00:00       160           1600          
152     'Name 2'     2004-10-15T00:00:00       160           1800       
..                   ...     ...          ...       ...
435     'Name 3'     2009-01-02T00:00:00       160           1600          
436     'Name 3'     2009-01-02T00:00:00       170           1500          
437     'Name 3'     2009-01-03T00:00:00       160           1800
..                   ...     ...          ...       ...

Essentially, I want to delete the 'name' column and I want to add a row each time the 'Name-#' field changes, containing only that 'Name-#':

                 Datetime            col_2         col_3        
    7            'Name 1'
    8       2017-01-02T00:00:00       160           1600          
    9       2017-01-02T00:00:00       160           1600                   
    ..                ...     ...          ...       ...
    149          'Name 2'
    150     2004-10-13T00:00:00       160           1600          
    151     2004-10-14T00:00:00       160           1600              
    ..                   ...     ...          ...       ...
    435          'Name 3'          
    436     2009-01-02T00:00:00       170           1500          
    437     2009-01-03T00:00:00       160           1800
    ..                ...     ...          ...       ...

I know how to add rows once the name column changes, but I need to automate the process of adding in the 'name-#' field in the Datetime column such that different data of the same style can be put though the code. Any help would be much appreciated. Thanks!

May I ask why you want to add in rows this way? This is usually not advisable and there might be a better way to approach it. For example if you are trying to split the dataframe into tables for each name, or if you want to apply some operation to each name group, there are other ways to deal with that, such as doing df.groupby('name') and working with that object. — teepee
– teepee, Commented Nov 25, 2020 at 17:10
I am trying to load data from one software into another, what I have is what I download from the first software, and the way the second software reads data (in txt. file format) is what I need to obtain. — kn2298
– kn2298, Commented Nov 25, 2020 at 17:14

Paul Brennan · Accepted Answer · 2020-11-26 23:57:36Z

1

I think what you are after is groupby

df.groupby('name')

so you could do

for name, dfsub in df.groupby('name'):
    ...

This would allow you to work on each group individually

An example

import pandas as pd

df = pd.DataFrame( {
   'Name': ['a','a','a','b','b','b','b','c','c','d','d','d'],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } )

giving a dataframe

   Name B   C
0   a   5   1
1   a   5   1
2   a   6   1
3   b   7   1
4   b   5   1
5   b   6   1
6   b   6   1
7   c   7   1
8   c   7   1
9   d   6   1
10  d   7   1
11  d   7   1

Now we can just look at the output of a groupby. groupby in a loop returns two things, the first is the group name, and the second is the subset of the dataframe with the data grouped by it.

for name, dfsub in df.groupby('Name'):
    print("Name is :"+name)
    dfsub1 = dfsub.drop(‘Name’, axis=1)
    print(dfsub1)
    print() # new line for clarity

and this gives

Name is :a
      B  C
0    5  1
1    5  1
2    6  1

Name is :b
    B  C
3   7  1
4   5  1
5   6  1
6   6  1

Name is :c
      B  C
7    7  1
8    7  1

Name is :d
       B  C
9      6  1
10    7  1
11    7  1

where you get the name you are dealing with, then the dataframe dfsub that contains just the data that you are looking at.

edited Nov 26, 2020 at 23:57

answered Nov 25, 2020 at 17:11

Paul Brennan

2,7364 gold badges23 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

kn2298 Over a year ago

could you perhaps expand on this? I am a beginner with python and not yet familiar with the groupby function.

Paul Brennan Over a year ago

Expanded big man. @knorr976 let me know if you need more

kn2298 Over a year ago

Hi Paul, not sure that gives me the result I am looking for.. I need the 'Name-#' to be written only once, with the data associated to that name presented below it (i.e. taking out the 'name' column)

Paul Brennan Over a year ago

Updated again! Hope this is more to your liking

kn2298 Over a year ago

Paul - that works much better, thank you! Is there a way to combine these now to make a new dataframe that I can export? Such that the df contains empty rows just with the name, and its corresponding data beneath it?

|

Collectives™ on Stack Overflow

Adding rows in dataframe based on changing column condition

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related