1

I am trying to create a new variable (column) in an existing dataframe.

Participant   Session   Trial_number    Accuracy    Block
 G01S01          1             3             1          1
 G01S02          1             4             1          2
 G02S01          1             5             1          5
 G01S01          1             6             1          8
 G01S01          1             7             1          10

Basically, I want to create a new variable "Epoch" based on the Block column. Block values between 1-4 belong to Epoch 1, Epoch 2 the other four and so on. It would look something like this:

Participant   Session   Trial_number    Accuracy    Block    Epoch
 G01S01          1             3             1          1          1
 G01S02          1             4             1          2          1
 G02S01          1             5             1          5          2
 G01S01          1             6             1          8          2
 G01S01          1             7             1          10         3

Additionally, I also want to create another variable based on the Participant ID, if it ends with 1 the participant belongs to group 1, if it ends with 2, the participant belongs to group 2.


I tried doing the first problem, but basically did not work.

import pandas as pd

df = pd.read_csv('merge.csv')

Epoch = []

x = 0

while x < 179424:
    if df['Block'][x] < 5:
        Epoch == 1
    elif 4 < df['Block'][x] < 9:
        Epoch == 2
    elif 8 < df['Block'][x] < 13:
        Epoch == 3
    elif 12 < df['Block'][x] < 17:
        Epoch == 4
    else:
        Epoch == 5
    x += 1

(179424 is the number of rows in my spreadsheet)

2
  • You may consider re-writing your if-elif logic. It's not intuitive to see x < 5 followed by 4 < x < 9, even if it does work out for your integer values. Would be far clearer to write as 5 <= x < 9 Commented May 8, 2019 at 1:33
  • Yes, you are right. It is visually more appealing and readable. Commented May 8, 2019 at 10:09

4 Answers 4

2

You can use pandas.cut for this to make bins and assign labels based on those bins:

df['Epoch'] = pd.cut(df['Block'], 
                     [1,4,8,12], 
                     labels=[1,2,3],
                     include_lowest=True)

print(df)
  Participant  Session  Trial_number  Accuracy  Block Epoch
0      G01S01        1             3         1      1     1
1      G01S02        1             4         1      2     1
2      G02S01        1             5         1      5     2
3      G01S01        1             6         1      8     2
4      G01S01        1             7         1     10     3
Sign up to request clarification or add additional context in comments.

5 Comments

try df['Group'] = df['Participant'].str[-1] nice answer Erfan bhai.
This worked for me! Thank you. I need to look more into pandas functions.
Glad I could help :) @CatM Yes you should check the pandas docs, for 90% of your needs, there is a pandas function
Do you advise any book, website, etc?
As a low level introduction I would suggest Data School on youtube and besides that the pandas documentation is very helpful as well. Plus it covers everything pandas offers @CatM
0

I think, you want to use the apply method of the data frame. That method takes a function as an argument and applies that function to every row of the dataframe (or every column, depending on the value of axis). From your code example, I suspect that this would be a meaningful function:

def derive_epoch(row):
    if row['Block'] < 5:
        return 1
    elif row['Block'] < 9:
        return 2
    elif row['Block'] < 13:
        return 3
    elif row['Block'] < 17:
        return 4
    else:
        return 5

Then, I just apply it like this:

df['Epoch'] = df.apply(derive_epoch, axis=1)

I hope that helps!]

1 Comment

Apply is horribly slow, and in general should not be used. The more performant solution would be to use np.select
0

You can use // to extract the epoch number and apply to the 'Block' column :

df['Epoch'] = df.apply(lambda x : x['Block']//4 +1)

Comments

0

Another very simple solution:

#Import pandas 
import pandas as pd

# Read csv file
df = pd.read_csv('merge.csv', sep=';')

# Add epoch column
df['Epoch'] = df['Block'] // 4 + 1
# Add group column
df['Group'] = df['Participant'].str[-1]

print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.