1

I would like to write and later read a dataframe in Python.

df_final.to_csv(self.get_local_file_path(hash,dataset_name), sep='\t', encoding='utf8')
...
df_final = pd.read_table(self.get_local_file_path(hash,dataset_name), encoding='utf8',index_col=[0,1])

But then I get:

sys:1: DtypeWarning: Columns (7,17,28) have mixed types. Specify dtype option on import or set low_memory=False.

I found this question. Which in the bottom line says I should specify the field types when I read the file because "low_memory" is deprecated... I find it very inefficient.

Isn't there a simple way to write & later read a Dataframe? I don't care about the human-readability of the file.

2 Answers 2

1

You can pickle your dataframe:

df_final.to_pickle(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_pickle(self.get_local_file_path(hash,dataset_name))

If your dataframe ist big and this gets to slow, you might have more luck using the HDF5 format:

df_final.to_hdf(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_hdf(self.get_local_file_path(hash,dataset_name))

You might need to install PyTables first.

Both ways store the data along with their types. Therefore, this should solve your problem.

Sign up to request clarification or add additional context in comments.

Comments

0

The warning is because Pandas has detected conflicting Data values in your Column. You can specify the datatypes in the DataFrame Constructor if you wish.

,dtype={'FIELD':int,'FIELD2':str} 

Etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.