0

I have a problem to create a loop on this data:

            TCT
03/02/2020  105
03/03/2020  68
03/16/2020  55
03/08/2020  37
03/10/2020  36

got by high=df['Date'].value_counts().to_frame('TCT').head(5)

I would like to see if for each date some words are included in my dataframe. To search the word I am doing as follows:

word=['mum','old man','children','family]
sub_df.apply(lambda x : x.str.contains('|'.join(word))).any(1)]

where sub_df is defined as follows:

ref='03/02/2020'
sub_df=df[df['Date']==ref]

Example

Date               Tt
03/02/2020         indent code by 4 spaces ...
03/02/2020         backtick escapes
...
03/03/2020         add language identifier to highlight code
03/03/2020         create code fences with backticks ` or tildes ~...
...
03/06/2020         to make links (use https whenever possible)

How could I include a loop on the above dates?

3
  • 1
    For starters, please provide a few rows of sub_df Commented Jul 4, 2020 at 18:37
  • Why not just set the date column as an index column, df.set_index(....)? Commented Jul 4, 2020 at 18:38
  • @Balaji Ambresh, I updated the dataset Commented Jul 4, 2020 at 18:42

1 Answer 1

1
df.set_index('date_column')

df.loc[ref].query(f'column == {value}')

# or 

def is_substr(row, value):
  if value in row:
    return row
  else:
    return None

df.loc[ref]['column'].apply(is_substr, args=['sub_string'])

Then use df.isna().sum() or df.dropna()


df = pd.DataFrame({'date':['1/2/2020']*3, 'col':['blah_1', 'blah_2', 'n32']})

df.set_index('date', inplace=True)

df.loc['1/2/2020']['col'].apply(is_substr, args=['2'])

date
1/2/2020      None
1/2/2020    blah_2
1/2/2020       n32
Name: col, dtype: object
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for your answer @m-zayan. I have a question: since my sub dataset is made by a counter where I have no name for the date column, what should I use in your code? Date should be those already selected in the groupby, i.e. those ones with highest frequency
If you mean, that you have already select sub_data date indices, then you can directly use sub_data. apply(..... ) to check if specific substring contained in each row or not, then use sub_data. dropna() to remove rows which don't contain the desired substring, in case of you need to get dates with the highest frequencynumpy.unique(column, return_count=true).
Thank you @m-zayan. Since in the first table in the question the information (column's name) Date is missing, how can I recall it in your code?
you can rename it, but it's seems to me that date column, is already an index column, you could check sub_df.index, i.e. you can refer it df.index
The df.loc['1/2/2020'] is equivalent to sub_df=df[df['Date']==ref] as the ref= 1/2/2020 reference date, and I am using date column as an index column which more efficient, and what you actually need is to use sub_df. apply(..), and all other functions are just complementary functions, so If you already have you own code to get sub_df with highest frequency date, then you can ignore any of those functions ex. .set_index(..), but note that you need to use .apply(..) with a specific column pd. Series
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.