0

I have two dataframes, each has the same columns 1) the response made by a participant 2) the response time in seconds and milliseconds (s.ms). For example,

subjectData = 

Key     RT
0   v   2.20
1   v   4.34
2   v   5.51
3   v  10.39
4   w  12.50
5   v  14.62
6   v  20.22

I also have a dataframe that is the 'correct' responses and times. For example,

correctData = 

Key     RT
0   v   2.25
1   w   4.34
2   v   5.61
3   v  20.30

I want to indicate that there is a match in both response key AND the response time within -+1 second. So, first check that the response key matches, and if it does, then compare the time that this response occurred. If it occurred within 1s, it is deemed correct. Notice that the subject may have responded more times than was correct. So I want to compare these columns regardless of order. For example, notice above that the 6th response in the subjectData frame matches the 3rd in the correctData frame (within one second). Because of this, the third entry in the output is TRUE, indicating that the third correct answer was matched.

So the end result should look like this

TRUE
FALSE
TRUE
TRUE

Notice that the output is the same length as the correctData dataframe, and indicates which correct answers match the subjectData. So it indicates that the subject got it correctly IF they pressed the correct button, within one second of the 'correct' time listed in the dataframe provided. Please note that these dataframes will most likely NOT be the same length (the subject may respond more or less than the 'correct' number of responses). So 'join' may not work here.

Any ideas on how to do this most efficiently?

2
  • If the dataframes are different lengths, should the resulting boolean list be the same length as the subjectData or the correctData? Commented Sep 30, 2019 at 20:10
  • the resulting boolean should be the length of the correctData if possible....thank you! Commented Sep 30, 2019 at 20:27

4 Answers 4

2
subjectData = pd.DataFrame({'Key': ['v', 'v', 'v', 'v', 'w', 'v', 'v'],
                            'RT': [2.20, 4.34, 5.51, 10.39, 12.50, 14.62, 20.22]})

correctData = pd.DataFrame({'Key': ['v', 'w', 'v', 'v'],
                            'RT': [2.25, 4.34, 5.61, 20.30]})

df = subjectData.merge(correctData.reset_index(), on='Key', how='right', 
                       suffixes=['_subj', '_corr'])

df_timed = df[(df['RT_subj'] - df['RT_corr']).between(-1,1)]

correctData.index.isin(df_timed['index'])

Output:

array([ True, False,  True,  True])
Sign up to request clarification or add additional context in comments.

7 Comments

HI and thank you for your answer! I get an error that says, " ValueError: Can only compare identically-labeled Series objects"
@Katie Oh sorry, earlier I missed the requirement for dataframes of different lengths. I've edited my answer to support that as well.
@EliadL...I thought this worked, but upon closer inspection, it's not quite right. I am hoping to compare the columns regardless of time order. That is, it may be that the 3 response in the correctData dataframe matches the 6 response in the subjectData dataframe. Is there a way to ensure that regardless of row I am getting those that match? Thank you!
@Katie please edit your question to include and demonstrate this new criterion.
I have edited the question, hopefully this is more clear. Thank you!
|
1

1) Use DataFrame.eq to compare the key column of both dataframe:

cond1=subjectData['Key'].eq(correctData['Key'])

2) then check if it is in the range of + -1s

cond2=(subjectData['RT']<(correctData['RT']+1))&(subjectData['RT']>(correctData['RT']-1))

3) finally check which rows meet both conditions (con1,cond2):

cond1&cond2

0     True
1    False
2     True
3    False
dtype: bool

1 Comment

Hello and thank you for your help! I got an error stating that, " ValueError: Can only compare identically-labeled Series objects"
1

I'd use numpy.isclose

(subjectData.Key == correctData.Key) & np.isclose(subjectData.RT, correctData.RT, atol=1)

0     True
1    False
2     True
3    False
Name: Key, dtype: bool

Comments

0

See if this works.

cutoff_at_index = min(correctData.shape[0], subjectData.shape[0])
equal = subjectData.Key[:cutoff_at_index] == correctData.Key[:cutoff_at_index]
between = (subjectData.RT[:cutoff_at_index] >= correctData.RT[:cutoff_at_index]-1) \
          & (subjectData.RT[:cutoff_at_index] <=correctData.RT[:cutoff_at_index]+1)
equal & between

1 Comment

HI and thank you for your answer! I get an error that says, " ValueError: Can only compare identically-labeled Series objects"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.