I am trying to create dask dataframe from HDFS file(csv). The csv file stored in HDFS has many part files.
On read_csv api call:
dd.read_csv("hdfs:<some path>/data.csv")
Following error occurs:
OSError: Could not open file: <some path>/data.csv, mode: rb Path is not a file: <some path>/data.csv
In fact /data.csv is directory containing many part files. I'm not sure if there is some different API to read such hdfs csv.
"hdfs:/some/path/data.csv/*.csv"(note the '/' after the colon and the glob pattern)?