Multiple If Statements in Python Dataframe

Question

I have a dataframe called df1 and a list of dataframes called list.

In each of them exists columns date, like 2019-01-01 and another columns ID (not unique), and some other stuff.

Example:

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

list = [df_A, df_B, df_C]

# Example from a list:

df_A 

ID   date        Name
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo

df_B

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

df_C

ID   date        Name
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

I want to append values to df1 based on ID and Date from this list of dataframes.

Conditions are:

If the max date for ID 111 in df1 are equal than the max date for ID 111 in one of the df from list, then do nothing.
If the max date for ID 222 in df1 are less than the max date for ID 222 in one of the df from list, then do some stuff.

How the putput should look like:

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

The date from df_B is equal to df1, so we don't update df1, but for other 2 dfs we need to append their values to df1

it would be desirable to add a concrete example of a expected output — Yuca
– Yuca, Commented Aug 21, 2019 at 12:52
thanks, also note that this is not a minimal reproducible example since there's no df_C, please follow the guidelines so it's easier to help — Yuca
– Yuca, Commented Aug 21, 2019 at 13:18

Code Different · Accepted Answer · 2019-08-21 15:07:03Z

1

Don't use list as a variable name as it's a Python builtin data type. I'd also concatenate df_A, df_B, and df_C into a single dataframe for easier manipulation:

# Concatenate df_A, df_B, df_C into a single frame, called df2
df2 = pd.concat([df_A, df_B, df_C], ignore_index=True)

# Line up df1 and df2 by ID so we can compare their dates
compare = df1[['ID', 'date']].merge(df2, on='ID', suffixes=('1', '2'))

# For cases where date1 < date2, append them to df1
new_df = compare.query('date1 < date2').rename(columns={'date2': 'date'})[['ID', 'date', 'Name']]
df1 = df1.append(new_df, ignore_index=True)

answered Aug 21, 2019 at 15:07

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

António Caeiro · Accepted Answer · 2019-08-21 14:40:41Z

i would use pandas.Dataframe.groupby and pandas.Dataframe.append(assuming your date column is in date format) and do something like:

# i don't think you should use list so i renamed it to my_list
my_list = [df_A, df_B, df_C]

for cdf in my_list:
    # in original dataframe group by ID get max date
    group_df1 =  df1.groupby(['ID']).max()['date']
    # in other dataframe group by ID get max date
    group_cdf = cdf.groupby(['ID']).max()['date']
    # get IDs to add 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

here is a full code with your example:

df1 = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df1.columns = ['ID','date','Name']
df1['date'] = pd.to_datetime(df1['date'])


df_A = pd.DataFrame(          \
[[111,'2019-01-02','Katrin'], \
[222,'2019-01-02','Ivan'], \
[333,'2019-01-02','Leo']])
df_A.columns = ['ID','date','Name']
df_A['date'] = pd.to_datetime(df_A['date'])

df_B = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df_B.columns = ['ID','date','Name']
df_B['date'] = pd.to_datetime(df_B['date'])


df_C = pd.DataFrame(          \
[[111,'2019-01-09','Sam_1'], \
[222,'2019-01-09','Leo_1'], \
[333,'2019-01-09','Marcel']])
df_C.columns = ['ID','date','Name']
df_C['date'] = pd.to_datetime(df_C['date'])

my_list = [df_A, df_B, df_C]

for cdf in my_list:
    group_df1 =  df1.groupby(['ID']).max()['date']
    group_cdf = cdf.groupby(['ID']).max()['date'] 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

i get the following result:

   ID  date       Name   
0  111 2019-01-01    John
1  222 2019-01-01   Smith
2  333 2019-01-01     Sam
0  111 2019-01-02  Katrin
1  222 2019-01-02    Ivan
2  333 2019-01-02     Leo
0  111 2019-01-09   Sam_1
1  222 2019-01-09   Leo_1
2  333 2019-01-09  Marcel

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Hello, it seems like what I need, but when I've tried with my data, it throws an error, ValueError: Can only compare identically-labeled Series objects, when it tries to compare group_cdf and group_df1. I checked date and ID are the same type, but the indexes are not the same and dropping them didn't help. Maybe you know, if there are any other possibilities to compare these 2 dfs?

Collectives™ on Stack Overflow

Multiple If Statements in Python Dataframe

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related