1

I am trying to read some .csv field data on python for post-processing, I typically just use something like:

for flist in glob('*.csv'):
    df = pd.read_csv(flist, delimiter = ',')

However I need to filter through the bad files which contain "Run_Terminated" somewhere in the file and skip the file entirely. I'm still new to python so I'm not familiar with all of its functionalities, any input would be appreciated. Thank you.

1 Answer 1

2

What you could do is first read the file fully in memory (using a io.StringIO file-like object and look for the Run_Terminated string anywhere in the file (dirty, but should be OK),

Then pass the handle to read_csv (since you can pass a handle OR a filename) so you don't have to read it again from the file.

import pandas as pd
import glob
import io

for flist in glob('*.csv'):
    with open(flist) as f:
        data = io.StringIO()
        data.write(f.read())
    if "Run_Terminated" not in data.getvalue():
        data.seek(0)  # rewind or it won't read anything
        df = pd.read_csv(data, delimiter = ',')
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! Really appreciate the explanation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.