1

Current Pandas Dataframe:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    0          6       35475624
2    0          6       35475624
3    0          7       35475624
4    0          7       35475624
5    0          11      35475624

I need to group every 2 Indexes into 1 Chunk_Num.

That is:

1) assign rows at index 0,1 to Chunk_Num=0

2) assign rows at index 2,3 to Chunk_Num=1

3) assign rows at index 4,5 to Chunk_Num=2

Needed o/p:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    0          6       35475624
2    1          6       35475624
3    1          7       35475624
4    2          7       35475624
5    2          11      35475624

Right now, I have:

index_list= [0,1,2,3,4,5]
chunk_list_elements=[0,1,2]

for i , c in zip(index_list, chunk_list_elements): # 3rd el of chunk_list, is mapped to 3rd el of index_list.
    transition2_df.loc[i,'Chunk_Num']= c
    transition2_df.loc[i+1,'Chunk_Num']= c
    i= i+2
display(transition2_df)

And that gives me:

   Chunk_Num |reading_id |imei
   ____________________________________
0    0          4       35475624
1    1          6       35475624
2    2          6       35475624
3    2          7       35475624
4    0          7       35475624
5    0          11      35475624

I'm not sure what I'm missing here. I'm open to other approaches as well besides using zip().

Please help.

2
  • 1
    df.index//2 if you using the default range index. Commented Aug 27, 2019 at 20:21
  • Or (df.index.notna().cumsum()-1)//2 of not default range index. Commented Aug 27, 2019 at 20:24

1 Answer 1

2

Use:

df['Chunk_Num'] = df.index // 2

Or

df['Chunk_num'] = (df.index.notna().cumsum()-1)//2

Output:

   Chunk_Num  reading_id      imei
0          0           4  35475624
1          0           6  35475624
2          1           6  35475624
3          1           7  35475624
4          2           7  35475624
5          2          11  35475624
Sign up to request clarification or add additional context in comments.

4 Comments

Excellent, thank you so much. I need to clarify something ,though: when I write: for index,row in df.iterrows(): df['Chunk_Num']= index//2 , all the entries in the Chunk_Num column become 2. Thus, what is the difference between df['Chunk_Num']= index//2 and df['Chunk_Num']= df.index//2?
Inside your for loop you are assign the entire column Chunk_Num each iteration loop. Hence the last loop's value get set to all rows. I thin you were trying to do this row['Chunk_Num'] = index // 2...
I see.... Could you please point out why one of the following statements , would conceptually be preferred over another, to replace the value of df['Chunk_Num'] at the current index ? row['Chunk_Num'] = index // 2 versus `df.at[index,'Chunk_Num']= index//2'? The latter option is what I earlier tried to use, giving me wrong results.
@sandra16 df.at.. statement should work also as long as you have the default range index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.