12

I read all the files in one folder one by one into a pandas.DataFrame and then I check them for some conditions. There are a few thousand files, and I would love to make pandas raise an exception when a file is empty, so that my reader function would skip this file.

I have something like:

class StructureReader(FileList):
    def __init__(self, dirname, filename):
        self.dirname=dirname
        self.filename=str(self.dirname+"/"+filename)
    def read(self):
        self.data = pd.read_csv(self.filename, header=None, sep = ",")
        if len(self.data)==0:
           raise ValueError
class Run(object):
    def __init__(self, dirname):
        self.dirname=dirname
        self.file__list=FileList(dirname)
        self.result=Result()
    def run(self):
        for k in self.file__list.file_list[:]:
            self.b=StructureReader(self.dirname, k)
            try:
                self.b.read()
                self.b.find_interesting_bonds(self.result)
                self.b.find_same_direction_chain(self.result)
            except ValueError:
                pass

Regular file that I'm searching for some condition looks like:

"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,

But somehow I don't ever get ValueError raised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame ..." lines in my results file. How can I skip empty files?

2

4 Answers 4

14

I'd first check if the file is empty, and if it isn't empty I'll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142 you can find a nice way to check if a file is empty:

import os
def is_non_zero_file(fpath):  
    return os.path.isfile(fpath) and os.path.getsize(fpath) > 0
Sign up to request clarification or add additional context in comments.

1 Comment

With pathlib check the size with fpath = Path(fpath) and fpath.stat().st_size == 0
6

You can get your work done with following code, just add your CSVs path to the path variable, and run. You should get an object raw_data which is a Pandas dataframe.

import os, pandas as pd, glob
import pandas.io.common

path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))

for i in range(0,len(files_list)):
   try:
       raw_data = pd.read_csv(files_list[i])
   except pandas.errors.EmptyDataError:
      print(files_list[i], " is empty and has been skipped.")

Comments

4

You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not

Comments

1

How about this

files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.