1

I'm trying to do something fairly simple, but either odo is broken or I don't understand how datashapes work in the context of this package.

The CSV file:

email,dob
[email protected],1982-07-13
[email protected],1997-01-01
...

The code:

from odo import odo
import pandas as pd

df = pd.read_csv("...")
connection_str = "postgresql+psycopg2:// ... "

t = odo('path/to/data.csv', connection_str, dshape='var * {email: string, dob: datetime}')

The error:

AssertionError: datashape must be Record type, got 0 * {email: string, dob: datetime}

It's the same error if I try to go directly from a DataFrame -> Postgres as well:

t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')

A few other things that don't fix the problem: 1) removing the header line from the CSV file, 2) changing var to the actual number of rows in the DataFrame.

What am I doing wrong here?

6
  • have you tried pd.to_sql? Seems like you're just trying to save a csv into a postgres table? pandas.pydata.org/pandas-docs/stable/generated/… Commented Sep 18, 2017 at 21:00
  • yes, it's just really slow. odo is supposed to use postgres's copy internals to do it much, much more quickly: odo.pydata.org/en/latest/perf.html Commented Sep 18, 2017 at 21:09
  • I'm not familiar with odo but you can do fast loading yourself stackoverflow.com/questions/41875817/… Commented Sep 18, 2017 at 22:35
  • No ideally you want to copy from a file to Postgres directly. That way Postgres + the OS does all the real work (much faster). I'm loading hundreds of GB. I put the example above where I tried to go from Python in memory to Postgres just to demonstrate that the odo library wasn't working as intended. Commented Sep 18, 2017 at 22:38
  • do you need pandas in the first place - csv straight to postgres should be easy stackoverflow.com/questions/2987433/… Commented Sep 19, 2017 at 15:15

1 Answer 1

1

Does connection_str have a table name? That fixed it for me when I ran into a similar issue but with a sqlite database.

Should be something like:

connection_str = "postgresql+psycopg2://your_database_name::data"
t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')

where 'data' in 'connection_str' is your new table name.

See also:

python odo sql AssertionError: datashape must be Record type, got 0 * {...}

https://github.com/blaze/odo/issues/580

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.