How to save multiple values on different rows as a variable or list in a CSV using Python Pandas

Question

I'm currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a '1' in another column (not shown) to mark that ID and row/s.

I'm looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I'll create a 1 in another column on the same row.

ID	DATE
1	11/11/2011
1	11/11/2011
2	5/05/2011
2	20/06/2011
3	2/04/2011
3	10/08/2011
4	8/12/2011
4	1/02/2012
4	12/03/2012

For this post, I'm mainly looking to save the multiple values as variables or a list. I'm hoping to figure out the rest once this roadblock has been removed.

Here's what I got currently, but I don't think it'll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.

import pandas as pd
from datetime import *
filename = 'TestData.csv'

df = pd.read_csv(filename)

print (df.iloc[0,1])

x = 0

for i in df.iloc:

FixDate = df.iloc[x, 1]

d1, m1, y1 = FixDate.split('/')

d1 = int(d1)
m1 = int(m1)
y1 = int(y1)

finaldate = date(y1, m1, d1)

print(finaldate)

x = x + 1

Any help is appreciated, thank you!

So finnaly need new column filled by 1/0 or something else? — jezrael
– jezrael, Commented Oct 5, 2022 at 4:56
Yeah I'd need a new column. But I'd imagine I'd add that at the start before going through the dates. Doesn't matter what it's filled with. It's a visual indicator of which rows need to be manually changed. — Ciecmah
– Ciecmah, Commented Oct 5, 2022 at 5:15
But I'd imagine I'd add that at the start before going through the dates. Not understand. — jezrael
– jezrael, Commented Oct 5, 2022 at 5:17
The dataframe doesn't originally contain the 'test' column. To prevent problems my current plan is to add a new column before going through the dates. I believe your solution fixes that issue anyways. — Ciecmah
– Ciecmah, Commented Oct 5, 2022 at 5:23
Ya, it depends what need. First I think need new column test, if need sometnigh else new column is not necessary. — jezrael
– jezrael, Commented Oct 5, 2022 at 5:24

jezrael · Accepted Answer · 2022-10-05 04:55:44Z

1

In pandas for performance is best avoid loops, if need new column tested if same values in DATE per groups use GroupBy.transform with DataFrameGroupBy.nunique and then compare values by 1:

df = pd.read_csv(filename)

df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
   ID        DATE  test
0   1  11/11/2011     1
1   1  11/11/2011     1
2   2   5/05/2011     0
3   2  20/06/2011     0
4   3   2/04/2011     0
5   3  10/08/2011     0
6   4   8/12/2011     0
7   4   1/02/2012     0
8   4  12/03/2012     0

If need filter matched rows:

mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)

df1 = df[mask]
print (df1)
   ID        DATE
0   1  11/11/2011
1   1  11/11/2011

In last step convert values to lists:

IDlist  = df1['ID'].tolist()

edited Oct 5, 2022 at 4:55

answered Oct 5, 2022 at 4:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ciecmah Over a year ago

That's awesome, I'll read through the documentation you've linked so I can understand how to create something similar for the future. I'll see if what you've sent is the solution I need. I tried to keep post fairly small, but I realize the post may need context. What I actually need to do is create a mark for any DATE that are two months apart and have the same ID. Which is why I thought I could save the date values as variables?

jezrael Over a year ago

@Ciecmah - Ya, in pandas the best avoid loops, it is obviously many lines of code, slow and mainly not necessary (if exist pandas function for it)

Collectives™ on Stack Overflow

How to save multiple values on different rows as a variable or list in a CSV using Python Pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related