I have some large csv and xlsx files which I need to set up pandas DataFrames for. I have code which locates these files within the directory (when printed, these show correct pathnames). These paths are then passed to a helper function which is meant to set up the required DataFrames for the files, then the data will be passed to other functions for some manipulation. I intend to have the data written to a file (by loading a template, writing the data to it, and saving this file) once this is completed.
I currently have code like:
import pandas
# some set-up functions (which work; verified using print statements)
def createDataFrame(filename):
if filename.endswith('.csv'):
df = pandas.read_csv(StringIO(filename), skip_blank_lines=True, index_col=False,
encoding="utf-8", skipinitialspace=True)
When I try print(df), I get:
Empty DataFrame
Columns: [a.csv]
Index: []
and print(StringIO(filename)) gives me:
<_io.StringIO object at 0x004D1990>
However, when I leave out the StringIO() around filename in the function, I get this error:
OSError: File b'a.csv' does not exist
Everywhere that I've been able to find information on this has either just said import and start using, or talks about using read_csv() rather than from_csv() (from this question, which wasn't very helpful here), and even the current pandas docs basically say that it should be as easy as passing the file to pandas.read_csv().
1) I've checked that I have full permissions and that the file is valid and exists. Why am I getting the OSError?
2) When I use StringIO(), why am I still getting an empty DataFrame here? How can I fix this?
Thanks in advance.
StringIO? won't it just work without this? i.e.pandas.read_csv(filename,.....)