how to loop through a list and execute multiple filter condition in python

Question

This data is about file information in a specific folder which is expected to grow over time, meaning there will be many files with similar name pattern. But the filenames are not exactly the same. The code below captures the filename that matches a given pattern and also if there are multiple outputs, selects the latest one based on last_modified date. In this example that is filename1

Sample data frame:

d = {'file_name': ['finding_finding_april_040119_1012', 'finding_finding_april_040119_1111', 'question_answer_april_040119_0915', 'question_answer_april_040119_0945', 'review_rational_040119_0805'], 'No_of_records': [23, 32, 45, 42, 28 ], 'size_in_MB': [10, 15, 8, 12, 10 ], 'Last_modified': ['2019-04-01 05:00:15+00:00', '2019-04-01 05:00:20+00:00', '2019-04-01 07:00:15+00:00', '2019-04-01 07:15:15+00:00', '2019-04-01 05:00:15+00:00']}
import pandas as pd
df = pd.DataFrame(data = d)
df['Last_modified'] = pd.to_datetime(df['Last_modified'])

This is how the table looks like:

Code I am using:

mask1 = df['file_name'].str.contains("finding_finding_april")
df2 = df.loc[mask1]
mask2 = (df2['Last_modified'] == df2['Last_modified'].max())
df3 = df2.loc[mask2]
filename1 = df3.iloc[0,2]

The conditions mask1, mask2 can not be used together like mask1 & mask2. The code works as it is. But I think there should be a better way of writing this.

Is there a way to improve the code using nested for loop or list comprehension?
If I have a list of patterns like the following, how can I run a loop through the list to create filename1 ,filename2 without running the code separately for each of them.

list = ['finding_finding_april', 'question_answer_april', 'review_rational_april' ... ...]

Now I know how to run loop through a list and do something simple but not sure what to do in this situation.

can you provide a dataset example for case 2? and expected output? — anky
– anky, Commented Apr 2, 2019 at 16:28
@anky_91 , as I mentioned I cant do mask1 & mask2 together. mask2 works on the result I get after filtering with mask1. Case 2 applies to this same example too. Instead of doing df['file_name'].str.contains("finding_finding_april") separately each time I want to match a pattern, I want to execute the whole process through a list of patterns. — singularity2047
– singularity2047, Commented Apr 2, 2019 at 16:33

Jeffin Sam · Accepted Answer · 2019-04-02 16:17:39Z

1

you can iterate through the list and just create a list of filename, append the contents, just like the following

list = ['finding_finding_april', 'question_answer_april', 'review_rational_april']
for i in range(0,len(list)):
    mask1 = df['file_name'].str.contains(list[i])
    df2 = df.loc[mask1]
    .
    .
    filename.append(df3.iloc[0,2])

answered Apr 2, 2019 at 16:17

Jeffin Sam

1188 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

singularity2047 Over a year ago

Thanks a lot. I have to get each filename as filename1 = filename[0], filename2 = filename[1] etc. can we also create filename1, filename2 ... ... within the code ?

Collectives™ on Stack Overflow

how to loop through a list and execute multiple filter condition in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related