0

I have a pandas data frame of 30 000 rows that looks like this:

ID     year    month    var1-var300    test   

1111   2017    7        ...            1      
1111   2017    9        ...            0      
2222   2017    6        ...            1      
2222   2017    6        ...            0      
2222   2016    6        ...            0      
3333   2017    3        ...            1      
3333   2017    3        ...            0     
3333   2015    8        ...            0      
...

Here is what I want to do for each row: if test=1, I would like to extract the variables "ID year month", loop over the entire data frame and if this combination of variables is found in any other row, assign 1 to a new variable 'check'. The final data frame should look like this:

ID     year    month    var1-var300    test   check
1111   2017    7        ...            1      0
1111   2017    9        ...            0      0
2222   2017    6        ...            1      1
2222   2017    6        ...            0      0
2222   2016    6        ...            0      0
3333   2017    3        ...            1      1
3333   2017    3        ...            0      0
3333   2015    8        ...            0      0
...

Here is some kind of pseudo-code I have imagined:

for line in df:
    if line['test']=1:
        I=line['ID']
        Y=line['year']
        MO=line['month']
        for row in df:
            if row['ID']=I & row['year']=Y & row['month']=MO:
                line['check']=1
                break

Any idea how to do a similar code that works in Python?

1
  • 1
    you should not use assignment operator "=" there, instead use comparison operator "==". Commented Nov 25, 2020 at 10:05

4 Answers 4

1

You should be able to invert your logic:

  1. Group by ID and year
  2. Do your check inside each group
def func(group):
    if len(group) > 1:
        group.loc[group['test'] == 1, 'check'] = 1
    return group

df = df.groupby(['ID', 'year']).apply(func)
Sign up to request clarification or add additional context in comments.

Comments

1

I think you can just use a transform to count group. Then you can get the result. Just two lines.

This is my solution.

Create Test Data:

import pandas as pd
ID = [1111, 1111, 2222, 2222, 2222, 3333, 3333, 3333]
year = [2017, 2017, 2017, 2017, 2016, 2017, 2017, 2015]
month = [7, 9, 6, 6, 6, 3, 3, 8]
test = [1, 0, 1, 0, 0, 1, 0, 0]
df = pd.DataFrame({
    "ID": ID,
    "year": year,
    "month": month,
    "test": test
})

Get the result:

df.loc[:, "group_count"] = df.groupby(["ID", "year", "month"]).transform("count").values
df.loc[:, "check"] = ((df["test"]>0) & (df["group_count"] > 1)).astype(int)

3 Comments

Interesting approach...tank you! :-)
I tried the first row in your solution but I got an error. Here is a version that works: df["group_count"] = df.groupby(["ID", "year", "month"])["ID"].transform("count")
@Falco I did not got error on my local machine. I guess may be it is because version. My python 3.7, pandas 0.25.1.
0

So you want a single column where it is indicated if the corresponding row's ID, year and month correspond to ID, year and month of a row with test == 1?

You iterate with iterrows():

to_check = []
for index, row in df.iterrows():
    if row['test']==1: # in your pseudocode, you use single =; that's for assigning variables
        to_ckeck.append([row['ID'], row['year'], row['month']])

check = []
for index, row in df.iterrows():
      if [row['ID'], row['year'], row['month']] in to_check:
          check.append(1)
      else:
          check.append(0)
df["check"] = check

Comments

0

You can make some changes as below and try:

for line in df:
    if line['test']==1:
        I=line['ID']
        Y=line['year']
        MO=line['month']
        for row in df:
            if row['ID']==I and row['year']==Y and row['month']==MO:
                line['check']=1
                break

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.