1

I'm currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a '1' in another column (not shown) to mark that ID and row/s.

I'm looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I'll create a 1 in another column on the same row.

ID DATE
1 11/11/2011
1 11/11/2011
2 5/05/2011
2 20/06/2011
3 2/04/2011
3 10/08/2011
4 8/12/2011
4 1/02/2012
4 12/03/2012

For this post, I'm mainly looking to save the multiple values as variables or a list. I'm hoping to figure out the rest once this roadblock has been removed.

Here's what I got currently, but I don't think it'll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.

import pandas as pd
from datetime import *
filename = 'TestData.csv'

df = pd.read_csv(filename)

print (df.iloc[0,1])

x = 0

for i in df.iloc:

FixDate = df.iloc[x, 1]

d1, m1, y1 = FixDate.split('/')

d1 = int(d1)
m1 = int(m1)
y1 = int(y1)

finaldate = date(y1, m1, d1)

print(finaldate)

x = x + 1

Any help is appreciated, thank you!

5
  • So finnaly need new column filled by 1/0 or something else? Commented Oct 5, 2022 at 4:56
  • Yeah I'd need a new column. But I'd imagine I'd add that at the start before going through the dates. Doesn't matter what it's filled with. It's a visual indicator of which rows need to be manually changed. Commented Oct 5, 2022 at 5:15
  • But I'd imagine I'd add that at the start before going through the dates. Not understand. Commented Oct 5, 2022 at 5:17
  • The dataframe doesn't originally contain the 'test' column. To prevent problems my current plan is to add a new column before going through the dates. I believe your solution fixes that issue anyways. Commented Oct 5, 2022 at 5:23
  • Ya, it depends what need. First I think need new column test, if need sometnigh else new column is not necessary. Commented Oct 5, 2022 at 5:24

1 Answer 1

1

In pandas for performance is best avoid loops, if need new column tested if same values in DATE per groups use GroupBy.transform with DataFrameGroupBy.nunique and then compare values by 1:

df = pd.read_csv(filename)

df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
   ID        DATE  test
0   1  11/11/2011     1
1   1  11/11/2011     1
2   2   5/05/2011     0
3   2  20/06/2011     0
4   3   2/04/2011     0
5   3  10/08/2011     0
6   4   8/12/2011     0
7   4   1/02/2012     0
8   4  12/03/2012     0

If need filter matched rows:

mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)

df1 = df[mask]
print (df1)
   ID        DATE
0   1  11/11/2011
1   1  11/11/2011

In last step convert values to lists:

IDlist  = df1['ID'].tolist()
Sign up to request clarification or add additional context in comments.

2 Comments

That's awesome, I'll read through the documentation you've linked so I can understand how to create something similar for the future. I'll see if what you've sent is the solution I need. I tried to keep post fairly small, but I realize the post may need context. What I actually need to do is create a mark for any DATE that are two months apart and have the same ID. Which is why I thought I could save the date values as variables?
@Ciecmah - Ya, in pandas the best avoid loops, it is obviously many lines of code, slow and mainly not necessary (if exist pandas function for it)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.