Replace values of a Pandas dataframe's Column based on values of another column

Question

Current Pandas Dataframe:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    0          6       35475624
2    0          6       35475624
3    0          7       35475624
4    0          7       35475624
5    0          11      35475624

I need to group every 2 Indexes into 1 Chunk_Num.

That is:

1) assign rows at index 0,1 to Chunk_Num=0

2) assign rows at index 2,3 to Chunk_Num=1

3) assign rows at index 4,5 to Chunk_Num=2

Needed o/p:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    0          6       35475624
2    1          6       35475624
3    1          7       35475624
4    2          7       35475624
5    2          11      35475624

Right now, I have:

index_list= [0,1,2,3,4,5]
chunk_list_elements=[0,1,2]

for i , c in zip(index_list, chunk_list_elements): # 3rd el of chunk_list, is mapped to 3rd el of index_list.
    transition2_df.loc[i,'Chunk_Num']= c
    transition2_df.loc[i+1,'Chunk_Num']= c
    i= i+2
display(transition2_df)

And that gives me:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    1          6       35475624
2    2          6       35475624
3    2          7       35475624
4    0          7       35475624
5    0          11      35475624

I'm not sure what I'm missing here. I'm open to other approaches as well besides using zip().

Please help.

Or (df.index.notna().cumsum()-1)//2 of not default range index. — Scott Boston
– Scott Boston, Commented Aug 27, 2019 at 20:24

Scott Boston · Accepted Answer · 2019-08-27 20:26:23Z

2

Use:

df['Chunk_Num'] = df.index // 2

Or

df['Chunk_num'] = (df.index.notna().cumsum()-1)//2

Output:

   Chunk_Num  reading_id      imei
0          0           4  35475624
1          0           6  35475624
2          1           6  35475624
3          1           7  35475624
4          2           7  35475624
5          2          11  35475624

answered Aug 27, 2019 at 20:26

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

sandra16 Over a year ago

Excellent, thank you so much. I need to clarify something ,though: when I write: for index,row in df.iterrows(): df['Chunk_Num']= index//2 , all the entries in the Chunk_Num column become 2. Thus, what is the difference between df['Chunk_Num']= index//2 and df['Chunk_Num']= df.index//2?

Scott Boston Over a year ago

Inside your for loop you are assign the entire column Chunk_Num each iteration loop. Hence the last loop's value get set to all rows. I thin you were trying to do this row['Chunk_Num'] = index // 2...

sandra16 Over a year ago

I see.... Could you please point out why one of the following statements , would conceptually be preferred over another, to replace the value of df['Chunk_Num'] at the current index ? row['Chunk_Num'] = index // 2 versus `df.at[index,'Chunk_Num']= index//2'? The latter option is what I earlier tried to use, giving me wrong results.

Scott Boston Over a year ago

@sandra16 df.at.. statement should work also as long as you have the default range index.

Collectives™ on Stack Overflow

Replace values of a Pandas dataframe's Column based on values of another column

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related