Loop on dates within a pandas dataframe

Question

I have a problem to create a loop on this data:

            TCT
03/02/2020  105
03/03/2020  68
03/16/2020  55
03/08/2020  37
03/10/2020  36

got by high=df['Date'].value_counts().to_frame('TCT').head(5)

I would like to see if for each date some words are included in my dataframe. To search the word I am doing as follows:

word=['mum','old man','children','family]
sub_df.apply(lambda x : x.str.contains('|'.join(word))).any(1)]

where sub_df is defined as follows:

ref='03/02/2020'
sub_df=df[df['Date']==ref]

Example

Date               Tt
03/02/2020         indent code by 4 spaces ...
03/02/2020         backtick escapes
...
03/03/2020         add language identifier to highlight code
03/03/2020         create code fences with backticks ` or tildes ~...
...
03/06/2020         to make links (use https whenever possible)

How could I include a loop on the above dates?

Why not just set the date column as an index column, df.set_index(....)? — 4.Pi.n
– 4.Pi.n, Commented Jul 4, 2020 at 18:38

4.Pi.n · Accepted Answer · 2020-07-04 18:57:40Z

1

df.set_index('date_column')

df.loc[ref].query(f'column == {value}')

# or 

def is_substr(row, value):
  if value in row:
    return row
  else:
    return None

df.loc[ref]['column'].apply(is_substr, args=['sub_string'])

Then use df.isna().sum() or df.dropna()

df = pd.DataFrame({'date':['1/2/2020']*3, 'col':['blah_1', 'blah_2', 'n32']})

df.set_index('date', inplace=True)

df.loc['1/2/2020']['col'].apply(is_substr, args=['2'])

date
1/2/2020      None
1/2/2020    blah_2
1/2/2020       n32
Name: col, dtype: object

edited Jul 4, 2020 at 18:57

answered Jul 4, 2020 at 18:50

4.Pi.n

1,1617 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

still_learning Over a year ago

Thank you for your answer @m-zayan. I have a question: since my sub dataset is made by a counter where I have no name for the date column, what should I use in your code? Date should be those already selected in the groupby, i.e. those ones with highest frequency

4.Pi.n Over a year ago

If you mean, that you have already select sub_data date indices, then you can directly use sub_data. apply(..... ) to check if specific substring contained in each row or not, then use sub_data. dropna() to remove rows which don't contain the desired substring, in case of you need to get dates with the highest frequencynumpy.unique(column, return_count=true).

still_learning Over a year ago

Thank you @m-zayan. Since in the first table in the question the information (column's name) Date is missing, how can I recall it in your code?

4.Pi.n Over a year ago

you can rename it, but it's seems to me that date column, is already an index column, you could check sub_df.index, i.e. you can refer it df.index

4.Pi.n Over a year ago

The df.loc['1/2/2020'] is equivalent to sub_df=df[df['Date']==ref] as the ref= 1/2/2020 reference date, and I am using date column as an index column which more efficient, and what you actually need is to use sub_df. apply(..), and all other functions are just complementary functions, so If you already have you own code to get sub_df with highest frequency date, then you can ignore any of those functions ex. .set_index(..), but note that you need to use .apply(..) with a specific column pd. Series

|

Collectives™ on Stack Overflow

Loop on dates within a pandas dataframe

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related