Adding new column to dataframe using existing variable

Question

I am trying to create a new variable (column) in an existing dataframe.

Participant   Session   Trial_number    Accuracy    Block
 G01S01          1             3             1          1
 G01S02          1             4             1          2
 G02S01          1             5             1          5
 G01S01          1             6             1          8
 G01S01          1             7             1          10

Basically, I want to create a new variable "Epoch" based on the Block column. Block values between 1-4 belong to Epoch 1, Epoch 2 the other four and so on. It would look something like this:

Participant   Session   Trial_number    Accuracy    Block    Epoch
 G01S01          1             3             1          1          1
 G01S02          1             4             1          2          1
 G02S01          1             5             1          5          2
 G01S01          1             6             1          8          2
 G01S01          1             7             1          10         3

Additionally, I also want to create another variable based on the Participant ID, if it ends with 1 the participant belongs to group 1, if it ends with 2, the participant belongs to group 2.

I tried doing the first problem, but basically did not work.

import pandas as pd

df = pd.read_csv('merge.csv')

Epoch = []

x = 0

while x < 179424:
    if df['Block'][x] < 5:
        Epoch == 1
    elif 4 < df['Block'][x] < 9:
        Epoch == 2
    elif 8 < df['Block'][x] < 13:
        Epoch == 3
    elif 12 < df['Block'][x] < 17:
        Epoch == 4
    else:
        Epoch == 5
    x += 1

(179424 is the number of rows in my spreadsheet)

You may consider re-writing your if-elif logic. It's not intuitive to see x < 5 followed by 4 < x < 9, even if it does work out for your integer values. Would be far clearer to write as 5 <= x < 9 — ALollz
– ALollz, Commented May 8, 2019 at 1:33
Yes, you are right. It is visually more appealing and readable. — CatM
– CatM, Commented May 8, 2019 at 10:09

Erfan · Accepted Answer · 2019-05-07 23:29:14Z

2

You can use pandas.cut for this to make bins and assign labels based on those bins:

df['Epoch'] = pd.cut(df['Block'], 
                     [1,4,8,12], 
                     labels=[1,2,3],
                     include_lowest=True)

print(df)
  Participant  Session  Trial_number  Accuracy  Block Epoch
0      G01S01        1             3         1      1     1
1      G01S02        1             4         1      2     1
2      G02S01        1             5         1      5     2
3      G01S01        1             6         1      8     2
4      G01S01        1             7         1     10     3

answered May 7, 2019 at 23:29

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Umar.H Over a year ago

try df['Group'] = df['Participant'].str[-1] nice answer Erfan bhai.

CatM Over a year ago

This worked for me! Thank you. I need to look more into pandas functions.

Erfan Over a year ago

Glad I could help :) @CatM Yes you should check the pandas docs, for 90% of your needs, there is a pandas function

CatM Over a year ago

Do you advise any book, website, etc?

Erfan Over a year ago

As a low level introduction I would suggest Data School on youtube and besides that the pandas documentation is very helpful as well. Plus it covers everything pandas offers @CatM

Ingo · Accepted Answer · 2019-05-07 23:31:20Z

0

I think, you want to use the apply method of the data frame. That method takes a function as an argument and applies that function to every row of the dataframe (or every column, depending on the value of axis). From your code example, I suspect that this would be a meaningful function:

def derive_epoch(row):
    if row['Block'] < 5:
        return 1
    elif row['Block'] < 9:
        return 2
    elif row['Block'] < 13:
        return 3
    elif row['Block'] < 17:
        return 4
    else:
        return 5

Then, I just apply it like this:

df['Epoch'] = df.apply(derive_epoch, axis=1)

I hope that helps!]

answered May 7, 2019 at 23:31

Ingo

1,27311 silver badges18 bronze badges

1 Comment

ALollz Over a year ago

Apply is horribly slow, and in general should not be used. The more performant solution would be to use np.select

F Blanchet · Accepted Answer · 2019-05-07 23:34:45Z

0

You can use // to extract the epoch number and apply to the 'Block' column :

df['Epoch'] = df.apply(lambda x : x['Block']//4 +1)

answered May 7, 2019 at 23:34

F Blanchet

1,5283 gold badges23 silver badges35 bronze badges

Comments

Alexandre B. · Accepted Answer · 2019-05-07 23:41:30Z

0

Another very simple solution:

#Import pandas 
import pandas as pd

# Read csv file
df = pd.read_csv('merge.csv', sep=';')

# Add epoch column
df['Epoch'] = df['Block'] // 4 + 1
# Add group column
df['Group'] = df['Participant'].str[-1]

print(df)

edited May 7, 2019 at 23:41

answered May 7, 2019 at 23:34

Alexandre B.

5,5002 gold badges19 silver badges46 bronze badges

Collectives™ on Stack Overflow

Adding new column to dataframe using existing variable

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related